You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
CodeLlama with DeepSpeed shows incorrect results. During my investigation, I found that DeepSpeed has hardcoded rope_theta == 10000.0 in rotary embedding, while for CodeLlama rope_theta == 1000000.0.
Line with bug:
@cupertank thank you for reporting and finding the cause of this bug! I can work on getting a PR that will correct this (unless you planned to create a PR yourself).
Describe the bug
CodeLlama with DeepSpeed shows incorrect results. During my investigation, I found that DeepSpeed has hardcoded
rope_theta == 10000.0
in rotary embedding, while for CodeLlamarope_theta == 1000000.0
.Line with bug:
DeepSpeed/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu
Line 64 in 0636c74
rope_theta
in CodeLlama configI think
rope_theta
must be a parameter in rotary embeddingTo Reproduce
Steps to reproduce the behavior:
My output:
Expected behavior
I expected the same result in both engines
ds_report output
System info (please complete the following information):
transformers==4.33.2
Additional context
If you change
10000.0
to1000000.0
in this line:DeepSpeed/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu
Line 64 in 0636c74
You will get correct results:
The text was updated successfully, but these errors were encountered: