Rope on all Llama models for arbitrarily long inputs #2036

surak · 2023-07-20T14:44:38Z

As the latest transformers library is a requirement and the pull request below is in it, we can use the latest rope setting:

huggingface/transformers#24653

Here is the documentation for it: https://huggingface.co/docs/transformers/main/en/model_doc/llama#transformers.LlamaConfig.rope_scaling

merrymercy · 2023-07-21T09:21:06Z

Transformers updated by #2011.
Please help us contribute a PR. You can start by replacing the monkey patch in longchat with the official support

FastChat/fastchat/model/model_adapter.py

Lines 528 to 547 in 8d8c96c

    
           def load_model(self, model_path: str, from_pretrained_kwargs: dict): 
        
               revision = from_pretrained_kwargs.get("revision", "main") 
        
               config = AutoConfig.from_pretrained(model_path, revision=revision) 
        
               # Apply monkey patch, TODO(Dacheng): Add flash attention support 
        
               from fastchat.model.llama_condense_monkey_patch import ( 
        
                   replace_llama_with_condense, 
        
               ) 
        
               replace_llama_with_condense(config.rope_condense_ratio) 
        
               tokenizer = AutoTokenizer.from_pretrained( 
        
                   model_path, use_fast=self.use_fast_tokenizer, revision=revision 
        
               ) 
        
               model = AutoModelForCausalLM.from_pretrained( 
        
                   model_path, 
        
                   low_cpu_mem_usage=True, 
        
                   **from_pretrained_kwargs, 
        
               ) 
        
               return model, tokenizer

DachengLi1 · 2023-07-22T06:04:35Z

@surak feel free to loop me in.

surak · 2023-07-24T16:09:59Z

@DachengLi1 I have no clue how to do so :-)

merrymercy added the good first issue Good for newcomers label Jul 21, 2023

surak closed this as completed Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rope on all Llama models for arbitrarily long inputs #2036

Rope on all Llama models for arbitrarily long inputs #2036

surak commented Jul 20, 2023

merrymercy commented Jul 21, 2023

DachengLi1 commented Jul 22, 2023

surak commented Jul 24, 2023

Rope on all Llama models for arbitrarily long inputs #2036

Rope on all Llama models for arbitrarily long inputs #2036

Comments

surak commented Jul 20, 2023

merrymercy commented Jul 21, 2023

DachengLi1 commented Jul 22, 2023

surak commented Jul 24, 2023