Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. #722

Closed
bosmart opened this issue Aug 31, 2023 · 8 comments

Comments

@bosmart
Copy link

bosmart commented Aug 31, 2023

Getting the above error (You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on.) when trying to run the Llama2 SFT example:

accelerate launch sft_llama2.py --output_dir="sft"

My accelerate config file:

compute_environment: LOCAL_MACHINE
debug: false
distributed_type: MULTI_GPU
downcast_bf16: 'no'
gpu_ids: all
machine_rank: 0
main_training_function: main
mixed_precision: 'no'
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

Library versions:

accelerate: 0.23.0.dev0
peft: 0.6.0.dev0
transformers: 4.33.0.dev0
trl: 0.7.2.dev0

I have a dual 3090 machine.

@younesbelkada
Copy link
Contributor

Hi @bosmart
Thanks a lot for the issue!
Can you please have a look at my comment here: huggingface/accelerate#1840 (comment) to understand how to fix the issue and let me know how it goes?

@bosmart
Copy link
Author

bosmart commented Sep 1, 2023

Thanks @younesbelkada, makes a lot of sense now - device_map={"": 0} did make me a bit uneasy 😃

I am getting a new error now however:

File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1555, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1837, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2693, in training_step
    self.accelerator.backward(loss)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1923, in backward
    loss.backward(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 274, in apply
    return user_fn(self, *args)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 157, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.
Parameter at index 127 has been marked as ready twice. This means that multiple autograd engine  hooks have fired for this particular parameter during this iteration. You can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print parameter names for further debugging.

@bosmart
Copy link
Author

bosmart commented Sep 2, 2023

@lewtun I'm not even using RewardTrainer, getting the error with SFTTrainer. Disabling checkpointing helps to an extent - now getting CUDA out of memory instead 🤦‍♂️

Is disabling checkpointing just a workaround or is there a reason why peft+ddp+4bit can't work with checkpointing enabled?

Copy link
Contributor

github-actions bot commented Nov 1, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@younesbelkada
Copy link
Contributor

Hi @bosmart
This should be now fixed on TRL + PEFT + transformers main, please refer to my comment here: #891 (comment)
The trick is to use use_reentrant=False when calling gradient checkpointing

@bbouldin
Copy link

bbouldin commented Nov 2, 2023

For anyone getting the "You can't train" error in dpo_llama2.py, you can fix by adding the following to the configs for the model and model-ref:
device_map={"": Accelerator().local_process_index},
also add the import:
from accelerate import Accelerator

Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@joann-alvarez
Copy link

@bbouldin

For anyone getting the "You can't train" error in dpo_llama2.py, you can fix by adding the following to the configs for the model and model-ref: device_map={"": Accelerator().local_process_index}, also add the import: from accelerate import Accelerator

Sorry, what do you mean by configs for the model and model-ref?

I know I can include device_map as an argument to AutoModelForCausalLM.from_pretrained(), but I'm not sure where else it needs to be specified. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants