-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributeError: 'FP16_DeepSpeedZeroOptimizer' object has no attribute 'ipg_index' #1218
Comments
Hi @TianhaoFu can you share Also if this issue is quick to reproduce can you also try with |
config:
get error:
Env:
|
Hi @chrjxj, can you try setting |
Actually @chrjxj can you set these both to false in your config? I suspect this will fix your issues. "contiguous_gradients": false,
"overlap_comm": false, |
@jeffra thanks. it still doesn't work and throw out new error msg. |
@chrjxj, can you provide the stack trace for the new error message? |
Hi @chrjxj, did you find a solution? |
@antoiloui, are you also seeing this error? Can you share the deepspeed version you are using and the stack trace? Did you also try turning off |
Hi @jeffra, yes I'm experiencing the same issue. Here is the error I get:
And here is my config file:
|
Gotcha, I see. Thank you @antoiloui. What version of deepspeed are you running? Is it possible to provide a repro for this error that you're seeing? |
no... switched to other tasks... |
Isn't this problem solved? I'm currently facing a similar error. I'm using FusedAdam as an optimizer so I'm not using the FP16 option, but it's similar. Here is the error I get:
this is my deepspeed_config file:
|
"stage": 2 > "stage":1 |
Well, let me join this thread too.. Have the same issue as described above The code I run can be found here: Configuration I use
Traceback:
|
Try changing 'stage' from 2 to 1 in the configuration. Does it still have the same problem? I understand that this improves learning efficiency by partitioning parameters when learning a large model, but in my case, this solved the problem. The official document describes the stage as follows:
|
I have solved my problem by choosing right combination of python version and packages versions.
You can see (in my traceback) I was running deepspeed using pytorch-lightning interface. I was also playing with some configurations trying to provide predefined configurations from lightning like "deepspeed_strategy_2" and "deepspeed_strategy_3" and I got the same error every time, so I guess I just had some versions compatibility problem. |
This method can't solve my problem. I am also studying RWKV. Can you help me? |
@maomao279 Have you tried v4neo? + are you sure you use the same versions during the run and which cuda version do you use (not sure the last is important, just want to know)? |
I got the same issue. But fixed by remove a redundant backward.
And this code is from chatGPT, so it is excusable. |
Has somebody found a solution other than using different package versions or changing to stage 1. I need stage 2 to work unfortunately and can not downgrade the package versions due to dependencies. |
any idea? I got a similar bug: AttributeError: 'DeepSpeedZeroOptimizer' object has no attribute 'ipg_index' |
I solved it by using |
Thanks for sharing this update. Can you clarify that you were seeing the same error as the original post? Also, was your code following this guide for model porting: https://www.deepspeed.ai/getting-started/#writing-deepspeed-models |
@jeffra Getting similar error:
However, the same code was working a few weeks ago, but throws error now, I have checked with previous versions of deepspeed as well but keep getting this error. The first time I got this error was when I tried to manually pass |
@sneha4948 any success? |
@sneha4948 and @ryuzakace thanks for reporting this problem. |
Hey yes, it was due to version incompatibility perhaps. 25th September, newer versions of transformer were released. Setting transformer=4.44.2, resolved the deepspeed issue. |
@sneha4948, thanks for the response. I am closing this issue now. Please open a new ticket if needed. |
This solved my problem also |
Hi,
I want use DeepSpeed to speed my transformer , and I came across such problem:
My config.json is as follows:
The text was updated successfully, but these errors were encountered: