Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoRA is incompatible with DeepSpeed ZeRO3 #24445

Closed
4 tasks
Weiyun1025 opened this issue Jun 23, 2023 · 11 comments · Fixed by huggingface/peft#1450
Closed
4 tasks

LoRA is incompatible with DeepSpeed ZeRO3 #24445

Weiyun1025 opened this issue Jun 23, 2023 · 11 comments · Fixed by huggingface/peft#1450

Comments

@Weiyun1025
Copy link

Weiyun1025 commented Jun 23, 2023

System Info

pytorch==2.0.0, transformers==4.28.0, peft==0.2.0

When use LoRA to wrap model in __init__ and enable deepspeed ZeRO3, i will get the following errors:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/peft/peft_model.py:287 in __getattr__                            │
│                                                                              │
│   284 │   def __getattr__(self, name: str):                                  │
│   285 │   │   """Forward missing attributes to the wrapped module."""        │
│   286 │   │   try:                                                           │
│ ❱ 287 │   │   │   return super().__getattr__(name)  # defer to nn.Module's l │
│   288 │   │   except AttributeError:                                         │
│   289 │   │   │   return getattr(self.base_model, name)                      │
│   290                                                                        │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/torch/nn/modules/module.py:1614 in __getattr__                   │
│                                                                              │
│   1611 │   │   │   modules = self.__dict__['_modules']                       │
│   1612 │   │   │   if name in modules:                                       │
│   1613 │   │   │   │   return modules[name]                                  │
│ ❱ 1614 │   │   raise AttributeError("'{}' object has no attribute '{}'".form │
│   1615 │   │   │   type(self).__name__, name))                               │
│   1616 │                                                                     │
│   1617 │   def __setattr__(self, name: str, value: Union[Tensor, 'Module'])  │
╰──────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'PeftModelForCausalLM' object has no attribute 
'_ds_child_entered'

During handling of the above exception, another exception occurred:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/peft/peft_model.py:287 in __getattr__                            │
│                                                                              │
│   284 │   def __getattr__(self, name: str):                                  │
│   285 │   │   """Forward missing attributes to the wrapped module."""        │
│   286 │   │   try:                                                           │
│ ❱ 287 │   │   │   return super().__getattr__(name)  # defer to nn.Module's l │
│   288 │   │   except AttributeError:                                         │
│   289 │   │   │   return getattr(self.base_model, name)                      │
│   290                                                                        │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/torch/nn/modules/module.py:1614 in __getattr__                   │
│                                                                              │
│   1611 │   │   │   modules = self.__dict__['_modules']                       │
│   1612 │   │   │   if name in modules:                                       │
│   1613 │   │   │   │   return modules[name]                                  │
│ ❱ 1614 │   │   raise AttributeError("'{}' object has no attribute '{}'".form │
│   1615 │   │   │   type(self).__name__, name))                               │
│   1616 │                                                                     │
│   1617 │   def __setattr__(self, name: str, value: Union[Tensor, 'Module'])  │
╰──────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'PeftModelForCausalLM' object has no attribute 'base_model'

It seems like that deepspeed begins to partition parameters before PeftModelForCausalLM finish its __init__, since it can not get the attribute base_model.

It's also notable that this error leads to a infinite recursion, since PeftModel catch the AttributeError when trying to get the attribute base_model while this attribute does not exist so the AttributeError will be raised again and again.

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /mnt/petrelfs/wangweiyun/projects/region_wise_model/main_clip_v6.py:120 in   │
│ <module>                                                                     │
│                                                                              │
│   117                                                                        │
│   118                                                                        │
│   119 if __name__ == '__main__':                                             │
│ ❱ 120 │   main()                                                             │
│   121                                                                        │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/projects/region_wise_model/main_clip_v6.py:42 in    │
│ main                                                                         │
│                                                                              │
│    39 │                                                                      │
│    40 │   if config.use_window_attn:                                         │
│    41 │   │   state_dict = preprocess_state_dict(model_args.model_name_or_pa │
│ ❱  42 │   │   model = HuskyForCLIP.from_pretrained(model_args.model_name_or_ │
│    43 │   else:                                                              │
│    44 │   │   model = HuskyForCLIP.from_pretrained(model_args.model_name_or_ │
│    45                                                                        │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/transformers/modeling_utils.py:2629 in from_pretrained           │
│                                                                              │
│   2626 │   │   │   init_contexts.append(init_empty_weights())                │
│   2627 │   │                                                                 │
│   2628 │   │   with ContextManagers(init_contexts):                          │
│ ❱ 2629 │   │   │   model = cls(config, *model_args, **model_kwargs)          │
│   2630 │   │                                                                 │
│   2631 │   │   # Check first if we are `from_pt`                             │
│   2632 │   │   if use_keep_in_fp32_modules:                                  │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/deepspeed/runtime/zero/partition_parameters.py:382 in wrapper    │
│                                                                              │
│    379 │   │   │   │   │   is_child_module = True                            │
│    380 │   │   │   │   │   setattr(module, "_ds_child_entered", True)        │
│    381 │   │   │   │                                                         │
│ ❱  382 │   │   │   │   f(module, *args, **kwargs)                            │
│    383 │   │   │   │                                                         │
│    384 │   │   │   │   if is_child_module:                                   │
│    385 │   │   │   │   │   # child's __init__ is done, now we can run a sing │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/projects/region_wise_model/custom_models/husky_clip │
│ _ablate.py:1472 in __init__                                                  │
│                                                                              │
│   1469 # shared align token + Both flatten + soft prompt (best)              │
│   1470 class HuskyForCLIPV6(WindowRegionHusky):                              │
│   1471 │   def __init__(self, config: WindowRegionHuskyConfig):              │
│ ❱ 1472 │   │   super().__init__(config)                                      │
│   1473 │   │                                                                 │
│   1474 │   │   self.logit_scale = nn.Parameter(torch.ones([]) * np.log(1 / 0 │
│   1475 │   │   self.text_projection = nn.Parameter(torch.empty(self.language │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/deepspeed/runtime/zero/partition_parameters.py:382 in wrapper    │
│                                                                              │
│    379 │   │   │   │   │   is_child_module = True                            │
│    380 │   │   │   │   │   setattr(module, "_ds_child_entered", True)        │
│    381 │   │   │   │                                                         │
│ ❱  382 │   │   │   │   f(module, *args, **kwargs)                            │
│    383 │   │   │   │                                                         │
│    384 │   │   │   │   if is_child_module:                                   │
│    385 │   │   │   │   │   # child's __init__ is done, now we can run a sing │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/projects/region_wise_model/custom_models/husky_wind │
│ ow.py:47 in __init__                                                         │
│                                                                              │
│    44 │   │   │   │   │   self.vision_model.encoder.layers[idx] = WindowBLIP │
│    45 │   │   │                                                              │
│    46 │   │   │   if self.config.lora:                                       │
│ ❱  47 │   │   │   │   self.wrap_lora()                                       │
│    48 │   │   │   if self.config.lora_vision:                                │
│    49 │   │   │   │   self.wrap_lora_vision()                                │
│    50 │   │   self.post_init()                                               │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/projects/region_wise_model/custom_models/husky_src/ │
│ husky_chat.py:436 in wrap_lora                                               │
│                                                                              │
│   433 │   │   │   lora_dropout=lora_dropout,                                 │
│   434 │   │   │   target_modules=target_modules                              │
│   435 │   │   )                                                              │
│ ❱ 436 │   │   self.language_model = get_peft_model(self.language_model, peft │
│   437 │   │   self.config.lora = True                                        │
│   438 │   │   self.language_model.print_trainable_parameters()               │
│   439                                                                        │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/peft/mapping.py:145 in get_peft_model                            │
│                                                                              │
│   142 │   │   peft_config = _prepare_lora_config(peft_config, model_config)  │
│   143 │   else:                                                              │
│   144 │   │   peft_config = _prepare_prompt_learning_config(peft_config, mod │
│ ❱ 145 │   return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](mod │
│   146                                                                        │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/deepspeed/runtime/zero/partition_parameters.py:377 in wrapper    │
│                                                                              │
│    374 │   │   │   │   print_rank_0(f'Before initializing {module.__class__. │
│    375 │   │   │   │                                                         │
│    376 │   │   │   │   is_child_module = False                               │
│ ❱  377 │   │   │   │   if not hasattr(module, "_ds_child_entered"):          │
│    378 │   │   │   │   │   # child's __init__ was called, since parents all  │
│    379 │   │   │   │   │   is_child_module = True                            │
│    380 │   │   │   │   │   setattr(module, "_ds_child_entered", True)        │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/peft/peft_model.py:289 in __getattr__                            │
│                                                                              │
│   286 │   │   try:                                                           │
│   287 │   │   │   return super().__getattr__(name)  # defer to nn.Module's l │
│   288 │   │   except AttributeError:                                         │
│ ❱ 289 │   │   │   return getattr(self.base_model, name)                      │
│   290 │                                                                      │
│   291 │   def forward(self, *args, **kwargs):                                │
│   292 │   │   """                                                            │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/peft/peft_model.py:289 in __getattr__                            │
│                                                                              │
│   286 │   │   try:                                                           │
│   287 │   │   │   return super().__getattr__(name)  # defer to nn.Module's l │
│   288 │   │   except AttributeError:                                         │
│ ❱ 289 │   │   │   return getattr(self.base_model, name)                      │
│   290 │                                                                      │
│   291 │   def forward(self, *args, **kwargs):                                │
│   292 │   │   """                                                            │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/peft/peft_model.py:289 in __getattr__                            │
│                                                                              │
│   286 │   │   try:                                                           │
│   287 │   │   │   return super().__getattr__(name)  # defer to nn.Module's l │
│   288 │   │   except AttributeError:                                         │
│ ❱ 289 │   │   │   return getattr(self.base_model, name)                      │
│   290 │                                                                      │
│   291 │   def forward(self, *args, **kwargs):                                │
│   292 │   │   """                                                            │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/peft/peft_model.py:289 in __getattr__                            │
│                                                                              │
│   286 │   │   try:                                                           │
│   287 │   │   │   return super().__getattr__(name)  # defer to nn.Module's l │
│   288 │   │   except AttributeError:                                         │
│ ❱ 289 │   │   │   return getattr(self.base_model, name)                      │
│   290 │                                                                      │
│   291 │   def forward(self, *args, **kwargs):                                │
│   292 │   │   """                                                            │

Who can help?

@pacman100

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

environments: pytorch==2.0.0, transformers==4.28.0, peft==0.2.0

slurm launch command: srun --gres=gpu:8 --ntasks=8 --ntasks-per-node=8 --cpus-per-task=8 python -u bug_unit_test.py --output_dir ./outputs/debug --deepspeed ./configs/default_offload_opt_param_zero3.json

deepspeed config to reproduce:

{
    "bf16": {
        "enabled": "auto"
    },
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": "auto",
            "betas": "auto",
            "eps": "auto",
            "weight_decay": "auto"
        }
    },
    "zero_optimization": {
        "stage": 3,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "offload_param": {
            "device": "cpu",
            "pin_memory": true
        },
        "overlap_comm": true,
        "contiguous_gradients": true,
        "sub_group_size": 1e9,
        "reduce_bucket_size": "auto",
        "stage3_prefetch_bucket_size": "auto",
        "stage3_param_persistence_threshold": "auto",
        "stage3_max_live_parameters": 1e9,
        "stage3_max_reuse_distance": 1e9,
        "stage3_gather_16bit_weights_on_model_save": true
    },
    "gradient_accumulation_steps": "auto",
    "gradient_clipping": "auto",
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}

code to reproduce:

import os
import subprocess
import torch
from transformers import (
    HfArgumentParser, TrainingArguments,
    PreTrainedModel, LlamaModel, LlamaConfig
)
from peft import LoraConfig, TaskType, get_peft_model


class BugModel(PreTrainedModel):
    config_class = LlamaConfig

    def __init__(self, config):
        super().__init__(config)
        self.model = LlamaModel(config)
        self.wrap_lora()
        # init code for other modules, which is not important to reproduce this bug
        pass

    def wrap_lora(
        self,
        r=16,
        lora_alpha=32,
        lora_dropout=0.05,
        target_modules=("q_proj", "k_proj", "v_proj", "o_proj"),
    ):
        peft_config = LoraConfig(
            task_type=TaskType.CAUSAL_LM,
            inference_mode=False,
            r=r,
            lora_alpha=lora_alpha,
            lora_dropout=lora_dropout,
            target_modules=target_modules
        )
        self.model = get_peft_model(self.model, peft_config)
        self.model.print_trainable_parameters()


def init_distributed_mode():
    if 'SLURM_PROCID' in os.environ:
        rank = int(os.environ['SLURM_PROCID'])
        local_rank = rank % torch.cuda.device_count()

        world_size = int(os.environ["SLURM_NTASKS"])
        local_size = int(os.environ["SLURM_NTASKS_PER_NODE"])

        if "MASTER_PORT" not in os.environ:
            port = 22110
            print(f'MASTER_PORT = {port}')
            os.environ["MASTER_PORT"] = str(port)

        node_list = os.environ["SLURM_NODELIST"]
        addr = subprocess.getoutput(f"scontrol show hostname {node_list} | head -n1")
        if "MASTER_ADDR" not in os.environ:
            os.environ["MASTER_ADDR"] = addr

        os.environ['RANK'] = str(rank)
        os.environ['LOCAL_RANK'] = str(local_rank)
        os.environ['LOCAL_WORLD_SIZE'] = str(local_size)
        os.environ['WORLD_SIZE'] = str(world_size)


parser = HfArgumentParser(TrainingArguments)
init_distributed_mode()
training_args = parser.parse_args_into_dataclasses()


model_name_or_path = '/mnt/petrelfs/share_data/wangweiyun/share_hf/vicuna-7b'
model = BugModel.from_pretrained(model_name_or_path)  # Error!


print('finish')

Expected behavior

I expect to wrap the model with LoRA during __init__ successfully when i enable ZeRO3.

@pacman100
Copy link
Contributor

Hello, please refer this doc for the correct way of using PEFT + DeepSpeed: https://huggingface.co/docs/peft/accelerate/deepspeed-zero3-offload

@Weiyun1025
Copy link
Author

Hello, please refer this doc for the correct way of using PEFT + DeepSpeed: https://huggingface.co/docs/peft/accelerate/deepspeed-zero3-offload

Thank you for your response!

I note that this doc is based on accelerate. However, my code is based on transformers.Trainer. Can you provide me any example to use PEFT + DeepSpeed with transformers.Trainer correctly?

@1ytic
Copy link
Contributor

1ytic commented Jul 19, 2023

The following steps work for me:

  1. Create TrainingArguments(..., deepspeed="ds_config_zero3.json")
  2. Load model with from_pretrained()
  3. Wrap it with get_peft_model()
  4. Run Trainer.train()

Few important notes:

  1. You have to create TrainingArguments before initialising the model with Zero3 partitioning.
  2. If you use TaskType.SEQ_CLS task, get_peft_model will break the forward path. A quick workaround is recreate unpartitioned classification head after the model initialised with deepspeed.zero.Init(), i.e. after from_pretrained().

@don-tpanic
Copy link

The following steps work for me:

  1. Create TrainingArguments(..., deepspeed="ds_config_zero3.json")
  2. Load model with from_pretrained()
  3. Wrap it with get_peft_model()
  4. Run Trainer.train()

Few important notes:

  1. You have to create TrainingArguments before initialising the model with Zero3 partitioning.
  2. If you use TaskType.SEQ_CLS task, get_peft_model will break the forward path. A quick workaround is recreate unpartitioned classification head after the model initialised with deepspeed.zero.Init(), i.e. after from_pretrained().

Thanks! And I would imagine you launch with deepspeed? Do you have to specify ds_config_zero3.json in CLI command now it is provided in TrainingArguments?

@1ytic
Copy link
Contributor

1ytic commented Aug 9, 2023

Thanks! And I would imagine you launch with deepspeed? Do you have to specify ds_config_zero3.json in CLI command now it is provided in TrainingArguments?

Yes, I launch it with deepspeed and I do not specify the config in the command, only in the TrainingArguments.

@liuyu666-thu
Copy link

The following steps work for me:

  1. Create TrainingArguments(..., deepspeed="ds_config_zero3.json")
  2. Load model with from_pretrained()
  3. Wrap it with get_peft_model()
  4. Run Trainer.train()

Few important notes:

  1. You have to create TrainingArguments before initialising the model with Zero3 partitioning.
  2. If you use TaskType.SEQ_CLS task, get_peft_model will break the forward path. A quick workaround is recreate unpartitioned classification head after the model initialised with deepspeed.zero.Init(), i.e. after from_pretrained().

@1ytic very useful explaination! Could you offer a example how to implement this quick workaround ? thx

@dbanka
Copy link

dbanka commented Sep 14, 2023

@1ytic I am getting this error while running LORA with zero 3 deepspeed.:
Something seems to have broken.

Can you please explain this more clearly:
"If you use TaskType.SEQ_CLS task, get_peft_model will break the forward path. A quick workaround is recreate unpartitioned classification head after the model initialised with deepspeed.zero.Init(), i.e. after from_pretrained()."

Traceback (most recent call last):
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 263, in
main()
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 259, in main
training_function(args)
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 220, in training_function
trainer.train()
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 263, in
main()
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 259, in main
training_function(args)
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 220, in training_function
return inner_training_loop(
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop
trainer.train()
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 263, in
tr_loss_step = self.training_step(model, inputs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 2665, in training_step
return inner_training_loop(
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop
main()
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 259, in main
training_function(args)
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 220, in training_function
trainer.train()
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
tr_loss_step = self.training_step(model, inputs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 2665, in training_step
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 263, in
self.accelerator.backward(loss)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/accelerate/accelerator.py", line 1847,in backward
return inner_training_loop(
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop
main()
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 259, in main
training_function(args)
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 220, in training_function
trainer.train()
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
self.deepspeed_engine_wrapped.backward(loss, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.accelerator.backward(loss)self.engine.backward(loss, **kwargs)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/accelerate/accelerator.py", line 1847,in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)tr_loss_step = self.training_step(model, inputs)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1923, in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 2665, in training_step
return inner_training_loop(
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop
self.deepspeed_engine_wrapped.backward(loss, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
self.engine.backward(loss, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 2080, in backward
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1923, in backward
self.accelerator.backward(loss)tr_loss_step = self.training_step(model, inputs)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/accelerate/accelerator.py", line 1847,in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 2665, in training_step
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)self.optimizer.backward(loss, retain_graph=retain_graph)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs) scaled_loss.backward(retain_graph=retain_graph)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 2080, in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
self.deepspeed_engine_wrapped.backward(loss, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
torch.autograd.backward(self.engine.backward(loss, **kwargs)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/autograd/init.py", line 200,in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1923, in backward
self.accelerator.backward(loss)Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/accelerate/accelerator.py", line 1847,in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/autograd/function.py", line 274,in apply
return user_fn(self, *args)self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 141, in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
outputs = ctx.run_function(*detached_inputs)
scaled_loss.backward(retain_graph=retain_graph) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 681, in custom_forward

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)self.deepspeed_engine_wrapped.backward(loss, **kwargs)torch.autograd.backward(

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 2080, in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
return module(*inputs, output_attentions, None) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/autograd/init.py", line 200, in backward

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
self.engine.backward(loss, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
ret_val = func(*args, **kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/autograd/function.py", line 274, in apply

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1923, in backward
return user_fn(self, *args)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 141, in backward
outputs = ctx.run_function(*detached_inputs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 681, in custom_forward
result = forward_call(*args, **kwargs)
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
return module(*inputs, output_attentions, None)self.optimizer.backward(loss, retain_graph=retain_graph)hidden_states, self_attn_weights, present_key_value = self.self_attn(

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
torch.autograd.backward(
ret_val = func(*args, **kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/autograd/init.py", line 200, in backward

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 2080, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/autograd/function.py", line 274,in apply
return user_fn(self, *args)
result = forward_call(*args, **kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 141, in backward

result = forward_call(*args, **kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 305, in forward

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward
outputs = ctx.run_function(*detached_inputs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 681, in custom_forward
query_states = self.q_proj(hidden_states)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)hidden_states, self_attn_weights, present_key_value = self.self_attn(

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
return module(*inputs, output_attentions, None)
scaled_loss.backward(retain_graph=retain_graph) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/autograd/init.py", line 200,in backward
result = hook(self, args)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
result = forward_call(*args, **kwargs)Variable._execution_engine.run_backward( # Calls into the C++ engine torun the backward passret_val = func(*args, **kwargs)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 305, in forward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/autograd/function.py", line 274,in apply
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 392, in _pre_forward_module_hook
result = forward_call(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward
query_states = self.q_proj(hidden_states)return user_fn(self, *args)

  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 141, in backward

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
self.pre_sub_module_forward_function(module)
hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 505, in pre_sub_module_forward_function

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
outputs = ctx.run_function(*detached_inputs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 681, in custom_forward
param_coordinator.fetch_sub_module(sub_module, forward=prev_grad_state)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115,in decorate_context
return module(*inputs, output_attentions, None)result = hook(self, args)
return func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 284, in fetch_sub_module
result = forward_call(*args, **kwargs)
ret_val = func(*args, **kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 305, in forward

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 392, in _pre_forward_module_hook
self.__all_gather_params(params_to_fetch, forward)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
query_states = self.q_proj(hidden_states)ret_val = func(*args, **kwargs)

  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl

self.pre_sub_module_forward_function(module) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 428, in __all_gather_params

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 505, in pre_sub_module_forward_function
result = forward_call(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward
self._all_gather_params(nonquantized_params, forward, quantize=self.zero_quantized_weights)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 446, in _all_gather_params
param_coordinator.fetch_sub_module(sub_module, forward=prev_grad_state)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115,in decorate_context
handle = partitioned_params[0].all_gather_coalesced(partitioned_params,result = hook(self, args)

  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15,in wrapped_fn

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
return func(*args, **kwargs)
ret_val = func(*args, **kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 284, in fetch_sub_module

ret_val = func(*args, **kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1155, in all_gather_coalesced

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 392, in _pre_forward_module_hook
result = forward_call(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 305, in forward
self.__all_gather_params(params_to_fetch, forward)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 428, in __all_gather_params
self.pre_sub_module_forward_function(module)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 505, in pre_sub_module_forward_function
query_states = self.q_proj(hidden_states)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
self._all_gather_params(nonquantized_params, forward, quantize=self.zero_quantized_weights)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 446, in _all_gather_params
dtype=get_only_unique_item(p.ds_tensor.dtypeparam_coordinator.fetch_sub_module(sub_module, forward=prev_grad_state)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/utils.py", line 842,in get_only_unique_item
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115,in decorate_context
handle = partitioned_params[0].all_gather_coalesced(partitioned_params,
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
return func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 284, in fetch_sub_module
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1155, in all_gather_coalesced
result = hook(self, args)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
raise RuntimeError(f"expected there to be only one unique element in {items}")
RuntimeError: expected there to be only one unique element in <generator object Init._convert_to_deepspeed_param..all_gather_coalesced.. at 0x7f8eb7f2c510>
ret_val = func(*args, **kwargs)self.__all_gather_params(params_to_fetch, forward)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 392, in _pre_forward_module_hook
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 428, in __all_gather_params
self.pre_sub_module_forward_function(module)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 505, in pre_sub_module_forward_function
dtype=get_only_unique_item(p.ds_tensor.dtypeself._all_gather_params(nonquantized_params, forward, quantize=self.zero_quantized_weights)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/utils.py", line 842,in get_only_unique_item
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 446, in _all_gather_params
param_coordinator.fetch_sub_module(sub_module, forward=prev_grad_state)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
handle = partitioned_params[0].all_gather_coalesced(partitioned_params,ret_val = func(*args, **kwargs)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115,in decorate_context
raise RuntimeError(f"expected there to be only one unique element in {items}")
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1155, in all_gather_coalesced
RuntimeError : return func(*args, **kwargs)expected there to be only one unique element in <generator object Init._convert_to_deepspeed_param..all_gather_coalesced.. at 0x7f3bd8933ca0>

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 284, in fetch_sub_module
self.__all_gather_params(params_to_fetch, forward)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 428, in __all_gather_params
dtype=get_only_unique_item(p.ds_tensor.dtype
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/utils.py", line 842,in get_only_unique_item
self._all_gather_params(nonquantized_params, forward, quantize=self.zero_quantized_weights)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 446, in _all_gather_params
handle = partitioned_params[0].all_gather_coalesced(partitioned_params,
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
raise RuntimeError(f"expected there to be only one unique element in {items}")
ret_val = func(*args, **kwargs)RuntimeError
: expected there to be only one unique element in <generator object Init._convert_to_deepspeed_param..all_gather_coalesced.. at 0x7fd9e6c0b840> File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1155, in all_gather_coalesced

dtype=get_only_unique_item(p.ds_tensor.dtype

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/utils.py", line 842,in get_only_unique_item
raise RuntimeError(f"expected there to be only one unique element in {items}")
RuntimeError: expected there to be only one unique element in <generator object Init._convert_to_deepspeed_param..all_gather_coalesced.. at 0x7fad61111000>
0%|

@don-tpanic
Copy link

The following steps work for me:

  1. Create TrainingArguments(..., deepspeed="ds_config_zero3.json")
  2. Load model with from_pretrained()
  3. Wrap it with get_peft_model()
  4. Run Trainer.train()

Few important notes:

  1. You have to create TrainingArguments before initialising the model with Zero3 partitioning.
  2. If you use TaskType.SEQ_CLS task, get_peft_model will break the forward path. A quick workaround is recreate unpartitioned classification head after the model initialised with deepspeed.zero.Init(), i.e. after from_pretrained().

Could you explain a bit more on get_peft_model breaks the forward path under SEQ_CLS? Thank you!

Copy link

github-actions bot commented Nov 4, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@olegsinavski
Copy link

olegsinavski commented Jan 5, 2024

Hello! I'm facing the same issue with deepspeed==0.12.4, stage3, no cpu offloading, transformers==4.36.2 and peft==0.7.1:

AttributeError: 'PeftModelForCausalLM' object has no attribute 
'_ds_child_entered'
....
....
  File ".../site-packages/peft/peft_model.py", line 528, in __getattr__
    return super().__getattr__(name)  # defer to nn.Module's logic
  File ".../torch/nn/modules/module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'PeftModelForCausalLM' object has no attribute 'base_model'

and eventually

RecursionError: maximum recursion depth exceeded while calling a Python object

I'm using pytorch lightning:

trainer = Trainer(
    ...
    strategy = DeepSpeedStrategy(stage=3)
)

class Module(LightningModule):
    def configure_model(self) -> None:
        deepspeed_config = self.trainer.strategy.config
        self.dschf = HfDeepSpeedConfig(deepspeed_config)
        model = AutoModelForCausalLM.from_pretrained(...)
        model = get_peft_model(
            model,
            LoraConfig(
                task_type=TaskType.CAUSAL_LM,
                inference_mode=False,
                target_modules=target_modules,
                r=48,
                lora_alpha=16,
                lora_dropout=0.0,
            ),
        )

@ANYMS-A
Copy link

ANYMS-A commented Jun 21, 2024

The following steps work for me:

  1. Create TrainingArguments(..., deepspeed="ds_config_zero3.json")
  2. Load model with from_pretrained()
  3. Wrap it with get_peft_model()
  4. Run Trainer.train()

Few important notes:

  1. You have to create TrainingArguments before initialising the model with Zero3 partitioning.
  2. If you use TaskType.SEQ_CLS task, get_peft_model will break the forward path. A quick workaround is recreate unpartitioned classification head after the model initialised with deepspeed.zero.Init(), i.e. after from_pretrained().

@1ytic very useful explaination! Could you offer a example how to implement this quick workaround ? thx

Have you find a decent solution about this? I met the same situation that using transformers.Trainer + LoRA + deepspeed to finetune a CasualLM. Since the mode is partitioned before get_peft_model(). Calling get_peft_model() raise an error. I am not quite understand how to re-create an un-partitioned model, and get the peft version of it, and finally partition the peft model again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants