LoRA is incompatible with DeepSpeed ZeRO3 #24445

Weiyun1025 · 2023-06-23T10:09:17Z

System Info

pytorch==2.0.0, transformers==4.28.0, peft==0.2.0

When use LoRA to wrap model in __init__ and enable deepspeed ZeRO3, i will get the following errors:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/peft/peft_model.py:287 in __getattr__                            │
│                                                                              │
│   284 │   def __getattr__(self, name: str):                                  │
│   285 │   │   """Forward missing attributes to the wrapped module."""        │
│   286 │   │   try:                                                           │
│ ❱ 287 │   │   │   return super().__getattr__(name)  # defer to nn.Module's l │
│   288 │   │   except AttributeError:                                         │
│   289 │   │   │   return getattr(self.base_model, name)                      │
│   290                                                                        │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/torch/nn/modules/module.py:1614 in __getattr__                   │
│                                                                              │
│   1611 │   │   │   modules = self.__dict__['_modules']                       │
│   1612 │   │   │   if name in modules:                                       │
│   1613 │   │   │   │   return modules[name]                                  │
│ ❱ 1614 │   │   raise AttributeError("'{}' object has no attribute '{}'".form │
│   1615 │   │   │   type(self).__name__, name))                               │
│   1616 │                                                                     │
│   1617 │   def __setattr__(self, name: str, value: Union[Tensor, 'Module'])  │
╰──────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'PeftModelForCausalLM' object has no attribute 
'_ds_child_entered'

During handling of the above exception, another exception occurred:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/peft/peft_model.py:287 in __getattr__                            │
│                                                                              │
│   284 │   def __getattr__(self, name: str):                                  │
│   285 │   │   """Forward missing attributes to the wrapped module."""        │
│   286 │   │   try:                                                           │
│ ❱ 287 │   │   │   return super().__getattr__(name)  # defer to nn.Module's l │
│   288 │   │   except AttributeError:                                         │
│   289 │   │   │   return getattr(self.base_model, name)                      │
│   290                                                                        │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/torch/nn/modules/module.py:1614 in __getattr__                   │
│                                                                              │
│   1611 │   │   │   modules = self.__dict__['_modules']                       │
│   1612 │   │   │   if name in modules:                                       │
│   1613 │   │   │   │   return modules[name]                                  │
│ ❱ 1614 │   │   raise AttributeError("'{}' object has no attribute '{}'".form │
│   1615 │   │   │   type(self).__name__, name))                               │
│   1616 │                                                                     │
│   1617 │   def __setattr__(self, name: str, value: Union[Tensor, 'Module'])  │
╰──────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'PeftModelForCausalLM' object has no attribute 'base_model'

It seems like that deepspeed begins to partition parameters before PeftModelForCausalLM finish its __init__, since it can not get the attribute base_model.

It's also notable that this error leads to a infinite recursion, since PeftModel catch the AttributeError when trying to get the attribute base_model while this attribute does not exist so the AttributeError will be raised again and again.

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /mnt/petrelfs/wangweiyun/projects/region_wise_model/main_clip_v6.py:120 in   │
│ <module>                                                                     │
│                                                                              │
│   117                                                                        │
│   118                                                                        │
│   119 if __name__ == '__main__':                                             │
│ ❱ 120 │   main()                                                             │
│   121                                                                        │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/projects/region_wise_model/main_clip_v6.py:42 in    │
│ main                                                                         │
│                                                                              │
│    39 │                                                                      │
│    40 │   if config.use_window_attn:                                         │
│    41 │   │   state_dict = preprocess_state_dict(model_args.model_name_or_pa │
│ ❱  42 │   │   model = HuskyForCLIP.from_pretrained(model_args.model_name_or_ │
│    43 │   else:                                                              │
│    44 │   │   model = HuskyForCLIP.from_pretrained(model_args.model_name_or_ │
│    45                                                                        │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/transformers/modeling_utils.py:2629 in from_pretrained           │
│                                                                              │
│   2626 │   │   │   init_contexts.append(init_empty_weights())                │
│   2627 │   │                                                                 │
│   2628 │   │   with ContextManagers(init_contexts):                          │
│ ❱ 2629 │   │   │   model = cls(config, *model_args, **model_kwargs)          │
│   2630 │   │                                                                 │
│   2631 │   │   # Check first if we are `from_pt`                             │
│   2632 │   │   if use_keep_in_fp32_modules:                                  │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/deepspeed/runtime/zero/partition_parameters.py:382 in wrapper    │
│                                                                              │
│    379 │   │   │   │   │   is_child_module = True                            │
│    380 │   │   │   │   │   setattr(module, "_ds_child_entered", True)        │
│    381 │   │   │   │                                                         │
│ ❱  382 │   │   │   │   f(module, *args, **kwargs)                            │
│    383 │   │   │   │                                                         │
│    384 │   │   │   │   if is_child_module:                                   │
│    385 │   │   │   │   │   # child's __init__ is done, now we can run a sing │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/projects/region_wise_model/custom_models/husky_clip │
│ _ablate.py:1472 in __init__                                                  │
│                                                                              │
│   1469 # shared align token + Both flatten + soft prompt (best)              │
│   1470 class HuskyForCLIPV6(WindowRegionHusky):                              │
│   1471 │   def __init__(self, config: WindowRegionHuskyConfig):              │
│ ❱ 1472 │   │   super().__init__(config)                                      │
│   1473 │   │                                                                 │
│   1474 │   │   self.logit_scale = nn.Parameter(torch.ones([]) * np.log(1 / 0 │
│   1475 │   │   self.text_projection = nn.Parameter(torch.empty(self.language │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/deepspeed/runtime/zero/partition_parameters.py:382 in wrapper    │
│                                                                              │
│    379 │   │   │   │   │   is_child_module = True                            │
│    380 │   │   │   │   │   setattr(module, "_ds_child_entered", True)        │
│    381 │   │   │   │                                                         │
│ ❱  382 │   │   │   │   f(module, *args, **kwargs)                            │
│    383 │   │   │   │                                                         │
│    384 │   │   │   │   if is_child_module:                                   │
│    385 │   │   │   │   │   # child's __init__ is done, now we can run a sing │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/projects/region_wise_model/custom_models/husky_wind │
│ ow.py:47 in __init__                                                         │
│                                                                              │
│    44 │   │   │   │   │   self.vision_model.encoder.layers[idx] = WindowBLIP │
│    45 │   │   │                                                              │
│    46 │   │   │   if self.config.lora:                                       │
│ ❱  47 │   │   │   │   self.wrap_lora()                                       │
│    48 │   │   │   if self.config.lora_vision:                                │
│    49 │   │   │   │   self.wrap_lora_vision()                                │
│    50 │   │   self.post_init()                                               │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/projects/region_wise_model/custom_models/husky_src/ │
│ husky_chat.py:436 in wrap_lora                                               │
│                                                                              │
│   433 │   │   │   lora_dropout=lora_dropout,                                 │
│   434 │   │   │   target_modules=target_modules                              │
│   435 │   │   )                                                              │
│ ❱ 436 │   │   self.language_model = get_peft_model(self.language_model, peft │
│   437 │   │   self.config.lora = True                                        │
│   438 │   │   self.language_model.print_trainable_parameters()               │
│   439                                                                        │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/peft/mapping.py:145 in get_peft_model                            │
│                                                                              │
│   142 │   │   peft_config = _prepare_lora_config(peft_config, model_config)  │
│   143 │   else:                                                              │
│   144 │   │   peft_config = _prepare_prompt_learning_config(peft_config, mod │
│ ❱ 145 │   return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](mod │
│   146                                                                        │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/deepspeed/runtime/zero/partition_parameters.py:377 in wrapper    │
│                                                                              │
│    374 │   │   │   │   print_rank_0(f'Before initializing {module.__class__. │
│    375 │   │   │   │                                                         │
│    376 │   │   │   │   is_child_module = False                               │
│ ❱  377 │   │   │   │   if not hasattr(module, "_ds_child_entered"):          │
│    378 │   │   │   │   │   # child's __init__ was called, since parents all  │
│    379 │   │   │   │   │   is_child_module = True                            │
│    380 │   │   │   │   │   setattr(module, "_ds_child_entered", True)        │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/peft/peft_model.py:289 in __getattr__                            │
│                                                                              │
│   286 │   │   try:                                                           │
│   287 │   │   │   return super().__getattr__(name)  # defer to nn.Module's l │
│   288 │   │   except AttributeError:                                         │
│ ❱ 289 │   │   │   return getattr(self.base_model, name)                      │
│   290 │                                                                      │
│   291 │   def forward(self, *args, **kwargs):                                │
│   292 │   │   """                                                            │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/peft/peft_model.py:289 in __getattr__                            │
│                                                                              │
│   286 │   │   try:                                                           │
│   287 │   │   │   return super().__getattr__(name)  # defer to nn.Module's l │
│   288 │   │   except AttributeError:                                         │
│ ❱ 289 │   │   │   return getattr(self.base_model, name)                      │
│   290 │                                                                      │
│   291 │   def forward(self, *args, **kwargs):                                │
│   292 │   │   """                                                            │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/peft/peft_model.py:289 in __getattr__                            │
│                                                                              │
│   286 │   │   try:                                                           │
│   287 │   │   │   return super().__getattr__(name)  # defer to nn.Module's l │
│   288 │   │   except AttributeError:                                         │
│ ❱ 289 │   │   │   return getattr(self.base_model, name)                      │
│   290 │                                                                      │
│   291 │   def forward(self, *args, **kwargs):                                │
│   292 │   │   """                                                            │
│                                                                              │
│ /mnt/petrelfs/wangweiyun/miniconda3/envs/recognize_anything/lib/python3.9/si │
│ te-packages/peft/peft_model.py:289 in __getattr__                            │
│                                                                              │
│   286 │   │   try:                                                           │
│   287 │   │   │   return super().__getattr__(name)  # defer to nn.Module's l │
│   288 │   │   except AttributeError:                                         │
│ ❱ 289 │   │   │   return getattr(self.base_model, name)                      │
│   290 │                                                                      │
│   291 │   def forward(self, *args, **kwargs):                                │
│   292 │   │   """                                                            │

Who can help?

@pacman100

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

environments: pytorch==2.0.0, transformers==4.28.0, peft==0.2.0

slurm launch command: srun --gres=gpu:8 --ntasks=8 --ntasks-per-node=8 --cpus-per-task=8 python -u bug_unit_test.py --output_dir ./outputs/debug --deepspeed ./configs/default_offload_opt_param_zero3.json

deepspeed config to reproduce:

{
    "bf16": {
        "enabled": "auto"
    },
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": "auto",
            "betas": "auto",
            "eps": "auto",
            "weight_decay": "auto"
        }
    },
    "zero_optimization": {
        "stage": 3,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "offload_param": {
            "device": "cpu",
            "pin_memory": true
        },
        "overlap_comm": true,
        "contiguous_gradients": true,
        "sub_group_size": 1e9,
        "reduce_bucket_size": "auto",
        "stage3_prefetch_bucket_size": "auto",
        "stage3_param_persistence_threshold": "auto",
        "stage3_max_live_parameters": 1e9,
        "stage3_max_reuse_distance": 1e9,
        "stage3_gather_16bit_weights_on_model_save": true
    },
    "gradient_accumulation_steps": "auto",
    "gradient_clipping": "auto",
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}

code to reproduce:

import os
import subprocess
import torch
from transformers import (
    HfArgumentParser, TrainingArguments,
    PreTrainedModel, LlamaModel, LlamaConfig
)
from peft import LoraConfig, TaskType, get_peft_model


class BugModel(PreTrainedModel):
    config_class = LlamaConfig

    def __init__(self, config):
        super().__init__(config)
        self.model = LlamaModel(config)
        self.wrap_lora()
        # init code for other modules, which is not important to reproduce this bug
        pass

    def wrap_lora(
        self,
        r=16,
        lora_alpha=32,
        lora_dropout=0.05,
        target_modules=("q_proj", "k_proj", "v_proj", "o_proj"),
    ):
        peft_config = LoraConfig(
            task_type=TaskType.CAUSAL_LM,
            inference_mode=False,
            r=r,
            lora_alpha=lora_alpha,
            lora_dropout=lora_dropout,
            target_modules=target_modules
        )
        self.model = get_peft_model(self.model, peft_config)
        self.model.print_trainable_parameters()


def init_distributed_mode():
    if 'SLURM_PROCID' in os.environ:
        rank = int(os.environ['SLURM_PROCID'])
        local_rank = rank % torch.cuda.device_count()

        world_size = int(os.environ["SLURM_NTASKS"])
        local_size = int(os.environ["SLURM_NTASKS_PER_NODE"])

        if "MASTER_PORT" not in os.environ:
            port = 22110
            print(f'MASTER_PORT = {port}')
            os.environ["MASTER_PORT"] = str(port)

        node_list = os.environ["SLURM_NODELIST"]
        addr = subprocess.getoutput(f"scontrol show hostname {node_list} | head -n1")
        if "MASTER_ADDR" not in os.environ:
            os.environ["MASTER_ADDR"] = addr

        os.environ['RANK'] = str(rank)
        os.environ['LOCAL_RANK'] = str(local_rank)
        os.environ['LOCAL_WORLD_SIZE'] = str(local_size)
        os.environ['WORLD_SIZE'] = str(world_size)


parser = HfArgumentParser(TrainingArguments)
init_distributed_mode()
training_args = parser.parse_args_into_dataclasses()


model_name_or_path = '/mnt/petrelfs/share_data/wangweiyun/share_hf/vicuna-7b'
model = BugModel.from_pretrained(model_name_or_path)  # Error!


print('finish')

Expected behavior

I expect to wrap the model with LoRA during __init__ successfully when i enable ZeRO3.

The text was updated successfully, but these errors were encountered:

pacman100 · 2023-06-23T10:28:18Z

Hello, please refer this doc for the correct way of using PEFT + DeepSpeed: https://huggingface.co/docs/peft/accelerate/deepspeed-zero3-offload

Weiyun1025 · 2023-06-23T10:40:05Z

Hello, please refer this doc for the correct way of using PEFT + DeepSpeed: https://huggingface.co/docs/peft/accelerate/deepspeed-zero3-offload

Thank you for your response!

I note that this doc is based on accelerate. However, my code is based on transformers.Trainer. Can you provide me any example to use PEFT + DeepSpeed with transformers.Trainer correctly?

1ytic · 2023-07-19T23:19:15Z

The following steps work for me:

Create TrainingArguments(..., deepspeed="ds_config_zero3.json")
Load model with from_pretrained()
Wrap it with get_peft_model()
Run Trainer.train()

Few important notes:

You have to create TrainingArguments before initialising the model with Zero3 partitioning.
If you use TaskType.SEQ_CLS task, get_peft_model will break the forward path. A quick workaround is recreate unpartitioned classification head after the model initialised with deepspeed.zero.Init(), i.e. after from_pretrained().

don-tpanic · 2023-08-09T16:55:42Z

The following steps work for me:

Create TrainingArguments(..., deepspeed="ds_config_zero3.json")

Load model with from_pretrained()

Wrap it with get_peft_model()

Run Trainer.train()

Few important notes:

You have to create TrainingArguments before initialising the model with Zero3 partitioning.

If you use TaskType.SEQ_CLS task, get_peft_model will break the forward path. A quick workaround is recreate unpartitioned classification head after the model initialised with deepspeed.zero.Init(), i.e. after from_pretrained().

Thanks! And I would imagine you launch with deepspeed? Do you have to specify ds_config_zero3.json in CLI command now it is provided in TrainingArguments?

1ytic · 2023-08-09T20:16:28Z

Thanks! And I would imagine you launch with deepspeed? Do you have to specify ds_config_zero3.json in CLI command now it is provided in TrainingArguments?

Yes, I launch it with deepspeed and I do not specify the config in the command, only in the TrainingArguments.

liuyu666-thu · 2023-08-24T06:49:32Z

The following steps work for me:

Create TrainingArguments(..., deepspeed="ds_config_zero3.json")

Load model with from_pretrained()

Wrap it with get_peft_model()

Run Trainer.train()

Few important notes:

You have to create TrainingArguments before initialising the model with Zero3 partitioning.

If you use TaskType.SEQ_CLS task, get_peft_model will break the forward path. A quick workaround is recreate unpartitioned classification head after the model initialised with deepspeed.zero.Init(), i.e. after from_pretrained().

@1ytic very useful explaination! Could you offer a example how to implement this quick workaround ? thx

dbanka · 2023-09-14T14:39:12Z

@1ytic I am getting this error while running LORA with zero 3 deepspeed.:
Something seems to have broken.

Can you please explain this more clearly:
"If you use TaskType.SEQ_CLS task, get_peft_model will break the forward path. A quick workaround is recreate unpartitioned classification head after the model initialised with deepspeed.zero.Init(), i.e. after from_pretrained()."

Traceback (most recent call last):
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 263, in
main()
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 259, in main
training_function(args)
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 220, in training_function
trainer.train()
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 263, in
main()
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 259, in main
training_function(args)
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 220, in training_function
return inner_training_loop(
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop
trainer.train()
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 263, in
tr_loss_step = self.training_step(model, inputs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 2665, in training_step
return inner_training_loop(
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop
main()
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 259, in main
training_function(args)
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 220, in training_function
trainer.train()
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
tr_loss_step = self.training_step(model, inputs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 2665, in training_step
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 263, in
self.accelerator.backward(loss)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/accelerate/accelerator.py", line 1847,in backward
return inner_training_loop(
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop
main()
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 259, in main
training_function(args)
File "/home/ec2-user/SageMaker/final_training/lora_scripts/run_clm.py", line 220, in training_function
trainer.train()
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
self.deepspeed_engine_wrapped.backward(loss, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.accelerator.backward(loss)self.engine.backward(loss, **kwargs)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/accelerate/accelerator.py", line 1847,in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)tr_loss_step = self.training_step(model, inputs)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1923, in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 2665, in training_step
return inner_training_loop(
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop
self.deepspeed_engine_wrapped.backward(loss, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
self.engine.backward(loss, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 2080, in backward
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1923, in backward
self.accelerator.backward(loss)tr_loss_step = self.training_step(model, inputs)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/accelerate/accelerator.py", line 1847,in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/trainer.py", line 2665, in training_step
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)self.optimizer.backward(loss, retain_graph=retain_graph)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs) scaled_loss.backward(retain_graph=retain_graph)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 2080, in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
self.deepspeed_engine_wrapped.backward(loss, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
torch.autograd.backward(self.engine.backward(loss, **kwargs)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/autograd/init.py", line 200,in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1923, in backward
self.accelerator.backward(loss)Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/accelerate/accelerator.py", line 1847,in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/autograd/function.py", line 274,in apply
return user_fn(self, *args)self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 141, in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
outputs = ctx.run_function(*detached_inputs)
scaled_loss.backward(retain_graph=retain_graph) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 681, in custom_forward

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)self.deepspeed_engine_wrapped.backward(loss, **kwargs)torch.autograd.backward(

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 2080, in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
return module(*inputs, output_attentions, None) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/autograd/init.py", line 200, in backward

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
self.engine.backward(loss, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
ret_val = func(*args, **kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/autograd/function.py", line 274, in apply

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1923, in backward
return user_fn(self, *args)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 141, in backward
outputs = ctx.run_function(*detached_inputs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 681, in custom_forward
result = forward_call(*args, **kwargs)
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
return module(*inputs, output_attentions, None)self.optimizer.backward(loss, retain_graph=retain_graph)hidden_states, self_attn_weights, present_key_value = self.self_attn(

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
torch.autograd.backward(
ret_val = func(*args, **kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/autograd/init.py", line 200, in backward

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 2080, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/autograd/function.py", line 274,in apply
return user_fn(self, *args)
result = forward_call(*args, **kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 141, in backward

result = forward_call(*args, **kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 305, in forward

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward
outputs = ctx.run_function(*detached_inputs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 681, in custom_forward
query_states = self.q_proj(hidden_states)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)hidden_states, self_attn_weights, present_key_value = self.self_attn(

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
return module(*inputs, output_attentions, None)
scaled_loss.backward(retain_graph=retain_graph) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/autograd/init.py", line 200,in backward
result = hook(self, args)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
result = forward_call(*args, **kwargs)Variable._execution_engine.run_backward( # Calls into the C++ engine torun the backward passret_val = func(*args, **kwargs)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 305, in forward
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/autograd/function.py", line 274,in apply
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 392, in _pre_forward_module_hook
result = forward_call(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward
query_states = self.q_proj(hidden_states)return user_fn(self, *args)

  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 141, in backward

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
self.pre_sub_module_forward_function(module)
hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 505, in pre_sub_module_forward_function

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
outputs = ctx.run_function(*detached_inputs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 681, in custom_forward
param_coordinator.fetch_sub_module(sub_module, forward=prev_grad_state)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115,in decorate_context
return module(*inputs, output_attentions, None)result = hook(self, args)
return func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 284, in fetch_sub_module
result = forward_call(*args, **kwargs)
ret_val = func(*args, **kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 305, in forward

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 392, in _pre_forward_module_hook
self.__all_gather_params(params_to_fetch, forward)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
query_states = self.q_proj(hidden_states)ret_val = func(*args, **kwargs)

  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl

self.pre_sub_module_forward_function(module) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 428, in __all_gather_params

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 505, in pre_sub_module_forward_function
result = forward_call(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward
self._all_gather_params(nonquantized_params, forward, quantize=self.zero_quantized_weights)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 446, in _all_gather_params
param_coordinator.fetch_sub_module(sub_module, forward=prev_grad_state)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115,in decorate_context
handle = partitioned_params[0].all_gather_coalesced(partitioned_params,result = hook(self, args)

  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15,in wrapped_fn

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
return func(*args, **kwargs)
ret_val = func(*args, **kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 284, in fetch_sub_module

ret_val = func(*args, **kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1155, in all_gather_coalesced

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 392, in _pre_forward_module_hook
result = forward_call(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 305, in forward
self.__all_gather_params(params_to_fetch, forward)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 428, in __all_gather_params
self.pre_sub_module_forward_function(module)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 505, in pre_sub_module_forward_function
query_states = self.q_proj(hidden_states)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
self._all_gather_params(nonquantized_params, forward, quantize=self.zero_quantized_weights)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 446, in _all_gather_params
dtype=get_only_unique_item(p.ds_tensor.dtypeparam_coordinator.fetch_sub_module(sub_module, forward=prev_grad_state)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/utils.py", line 842,in get_only_unique_item
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115,in decorate_context
handle = partitioned_params[0].all_gather_coalesced(partitioned_params,
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
return func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 284, in fetch_sub_module
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1155, in all_gather_coalesced
result = hook(self, args)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
raise RuntimeError(f"expected there to be only one unique element in {items}")
RuntimeError: expected there to be only one unique element in <generator object Init._convert_to_deepspeed_param..all_gather_coalesced.. at 0x7f8eb7f2c510>
ret_val = func(*args, **kwargs)self.__all_gather_params(params_to_fetch, forward)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 392, in _pre_forward_module_hook
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 428, in __all_gather_params
self.pre_sub_module_forward_function(module)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 505, in pre_sub_module_forward_function
dtype=get_only_unique_item(p.ds_tensor.dtypeself._all_gather_params(nonquantized_params, forward, quantize=self.zero_quantized_weights)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/utils.py", line 842,in get_only_unique_item
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 446, in _all_gather_params
param_coordinator.fetch_sub_module(sub_module, forward=prev_grad_state)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
handle = partitioned_params[0].all_gather_coalesced(partitioned_params,ret_val = func(*args, **kwargs)

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115,in decorate_context
raise RuntimeError(f"expected there to be only one unique element in {items}")
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1155, in all_gather_coalesced
RuntimeError : return func(*args, **kwargs)expected there to be only one unique element in <generator object Init._convert_to_deepspeed_param..all_gather_coalesced.. at 0x7f3bd8933ca0>

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 284, in fetch_sub_module
self.__all_gather_params(params_to_fetch, forward)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 428, in __all_gather_params
dtype=get_only_unique_item(p.ds_tensor.dtype
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/utils.py", line 842,in get_only_unique_item
self._all_gather_params(nonquantized_params, forward, quantize=self.zero_quantized_weights)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 446, in _all_gather_params
handle = partitioned_params[0].all_gather_coalesced(partitioned_params,
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
raise RuntimeError(f"expected there to be only one unique element in {items}")
ret_val = func(*args, **kwargs)RuntimeError
: expected there to be only one unique element in <generator object Init._convert_to_deepspeed_param..all_gather_coalesced.. at 0x7fd9e6c0b840> File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1155, in all_gather_coalesced

dtype=get_only_unique_item(p.ds_tensor.dtype

File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/deepspeed/runtime/utils.py", line 842,in get_only_unique_item
raise RuntimeError(f"expected there to be only one unique element in {items}")
RuntimeError: expected there to be only one unique element in <generator object Init._convert_to_deepspeed_param..all_gather_coalesced.. at 0x7fad61111000>
0%|

don-tpanic · 2023-10-10T10:32:38Z

The following steps work for me:

Create TrainingArguments(..., deepspeed="ds_config_zero3.json")

Load model with from_pretrained()

Wrap it with get_peft_model()

Run Trainer.train()

Few important notes:

You have to create TrainingArguments before initialising the model with Zero3 partitioning.

If you use TaskType.SEQ_CLS task, get_peft_model will break the forward path. A quick workaround is recreate unpartitioned classification head after the model initialised with deepspeed.zero.Init(), i.e. after from_pretrained().

Could you explain a bit more on get_peft_model breaks the forward path under SEQ_CLS? Thank you!

github-actions · 2023-11-04T08:06:01Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

olegsinavski · 2024-01-05T21:50:12Z

Hello! I'm facing the same issue with deepspeed==0.12.4, stage3, no cpu offloading, transformers==4.36.2 and peft==0.7.1:

AttributeError: 'PeftModelForCausalLM' object has no attribute 
'_ds_child_entered'
....
....
  File ".../site-packages/peft/peft_model.py", line 528, in __getattr__
    return super().__getattr__(name)  # defer to nn.Module's logic
  File ".../torch/nn/modules/module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'PeftModelForCausalLM' object has no attribute 'base_model'

and eventually

RecursionError: maximum recursion depth exceeded while calling a Python object

I'm using pytorch lightning:

trainer = Trainer(
    ...
    strategy = DeepSpeedStrategy(stage=3)
)

class Module(LightningModule):
    def configure_model(self) -> None:
        deepspeed_config = self.trainer.strategy.config
        self.dschf = HfDeepSpeedConfig(deepspeed_config)
        model = AutoModelForCausalLM.from_pretrained(...)
        model = get_peft_model(
            model,
            LoraConfig(
                task_type=TaskType.CAUSAL_LM,
                inference_mode=False,
                target_modules=target_modules,
                r=48,
                lora_alpha=16,
                lora_dropout=0.0,
            ),
        )

ANYMS-A · 2024-06-21T08:53:55Z

The following steps work for me:

Create TrainingArguments(..., deepspeed="ds_config_zero3.json")

Load model with from_pretrained()

Wrap it with get_peft_model()

Run Trainer.train()

Few important notes:

You have to create TrainingArguments before initialising the model with Zero3 partitioning.

If you use TaskType.SEQ_CLS task, get_peft_model will break the forward path. A quick workaround is recreate unpartitioned classification head after the model initialised with deepspeed.zero.Init(), i.e. after from_pretrained().

@1ytic very useful explaination! Could you offer a example how to implement this quick workaround ? thx

Have you find a decent solution about this? I met the same situation that using transformers.Trainer + LoRA + deepspeed to finetune a CasualLM. Since the mode is partitioned before get_peft_model(). Calling get_peft_model() raise an error. I am not quite understand how to re-create an un-partitioned model, and get the peft version of it, and finally partition the peft model again

ChrisChros123 mentioned this issue Sep 19, 2023

Error with Multi-GPU peft Reward Training huggingface/trl#480

Closed

github-actions bot closed this as completed Nov 12, 2023

hrushikesh198 mentioned this issue Jan 30, 2024

Lora + DeepSpeed non-trainer integration does not work #28770

Closed

4 tasks

pacman100 mentioned this issue Feb 9, 2024

Support modules_to_save config option when using DeepSpeed ZeRO-3 with ZeRO init enabled. huggingface/peft#1450

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LoRA is incompatible with DeepSpeed ZeRO3 #24445

LoRA is incompatible with DeepSpeed ZeRO3 #24445

Weiyun1025 commented Jun 23, 2023 •

edited

Loading

pacman100 commented Jun 23, 2023

Weiyun1025 commented Jun 23, 2023

1ytic commented Jul 19, 2023

don-tpanic commented Aug 9, 2023

1ytic commented Aug 9, 2023

liuyu666-thu commented Aug 24, 2023

dbanka commented Sep 14, 2023 •

edited

Loading

don-tpanic commented Oct 10, 2023

github-actions bot commented Nov 4, 2023

olegsinavski commented Jan 5, 2024 •

edited

Loading

ANYMS-A commented Jun 21, 2024

LoRA is incompatible with DeepSpeed ZeRO3 #24445

LoRA is incompatible with DeepSpeed ZeRO3 #24445

Comments

Weiyun1025 commented Jun 23, 2023 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

pacman100 commented Jun 23, 2023

Weiyun1025 commented Jun 23, 2023

1ytic commented Jul 19, 2023

don-tpanic commented Aug 9, 2023

1ytic commented Aug 9, 2023

liuyu666-thu commented Aug 24, 2023

dbanka commented Sep 14, 2023 • edited Loading

don-tpanic commented Oct 10, 2023

github-actions bot commented Nov 4, 2023

olegsinavski commented Jan 5, 2024 • edited Loading

ANYMS-A commented Jun 21, 2024

Weiyun1025 commented Jun 23, 2023 •

edited

Loading

dbanka commented Sep 14, 2023 •

edited

Loading

olegsinavski commented Jan 5, 2024 •

edited

Loading