[BUG] Deepspeed shows incorrect results for CodeLlama #4442

cupertank · 2023-10-03T13:38:16Z

Describe the bug
CodeLlama with DeepSpeed shows incorrect results. During my investigation, I found that DeepSpeed has hardcoded rope_theta == 10000.0 in rotary embedding, while for CodeLlama rope_theta == 1000000.0.
Line with bug:

DeepSpeed/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu

Line 64 in 0636c74

inv_freq = 1.0 / powf(10000.0, inv_freq) * (float)seq_idx;

rope_theta in CodeLlama config

I think rope_theta must be a parameter in rotary embedding

To Reproduce
Steps to reproduce the behavior:

Run this script:

from transformers import AutoModelForCausalLM, AutoTokenizer
from deepspeed import init_inference
import torch

test_input = """import abc
import gzip
import logging
import multiprocessing
import os
import sys
from multiprocessing import Pool
from typing import Iterable, Sequence

from tqdm.auto import tqdm

logger = logging."""

tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-7b-hf", torch_dtype=torch.float16)
inputs = tokenizer(test_input, return_tensors="pt")["input_ids"].to("cuda")

model = AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-7b-hf").to("cuda")

transformers_result = model.generate(
    inputs,
    max_new_tokens=10
).to("cpu")

transformers_text = tokenizer.decode(transformers_result[0])

print("==============Transformers result==============")
print(transformers_text)
print("===============================================")


model = init_inference(
    model=model,
    dtype=torch.float16,
    replace_with_kernel_inject=True
)

deepspeed_outputs = model.generate(
    inputs,
    max_new_tokens=10
).to("cpu")

deepspeed_text = tokenizer.decode(deepspeed_outputs[0])

print("===============DeepSpeed result================")
print(deepspeed_text)
print("===============================================")

My output:

==============Transformers result==============
<s> import abc
import gzip
import logging
import multiprocessing
import os
import sys
from multiprocessing import Pool
from typing import Iterable, Sequence

from tqdm.auto import tqdm

logger = logging.getLogger(__name__)


class
===============================================
...
===============DeepSpeed result================
<s> import abc
import gzip
import logging
import multiprocessing
import os
import sys
from multiprocessing import Pool
from typing import Iterable, Sequence

from tqdm.auto import tqdm

logger = logging.getLogger(__name__))




===============================================

Expected behavior
I expected the same result in both engines

ds_report output

[2023-10-03 13:20:28,866] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/ilya_vologin/giga_pizda/venv/lib/python3.8/site-packages/torch']
torch version .................... 1.13.1+cu117
deepspeed install path ........... ['/home/ilya_vologin/giga_pizda/venv/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.10.3, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.7
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.7
shared memory (/dev/shm) size .... 83.53 GB

System info (please complete the following information):

OS: Debian 5.10.191-1 (2023-08-16) x86_64 GNU/Linux
GPU count and types: One machine with one A100 80GB
Hugging Face Transformers versions: transformers==4.33.2
Python version: 3.8.18
Any other relevant info about your setup

Additional context
If you change 10000.0 to 1000000.0 in this line:

DeepSpeed/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu

Line 64 in 0636c74

inv_freq = 1.0 / powf(10000.0, inv_freq) * (float)seq_idx;

You will get correct results:

==============Transformers result==============
<s> import abc
import gzip
import logging
import multiprocessing
import os
import sys
from multiprocessing import Pool
from typing import Iterable, Sequence

from tqdm.auto import tqdm

logger = logging.getLogger(__name__)


class
===============================================
...
===============DeepSpeed result================
<s> import abc
import gzip
import logging
import multiprocessing
import os
import sys
from multiprocessing import Pool
from typing import Iterable, Sequence

from tqdm.auto import tqdm

logger = logging.getLogger(__name__)


class
===============================================

The text was updated successfully, but these errors were encountered:

mrwyattii · 2023-10-04T00:09:13Z

@cupertank thank you for reporting and finding the cause of this bug! I can work on getting a PR that will correct this (unless you planned to create a PR yourself).

cupertank · 2023-10-09T15:35:54Z

@mrwyattii I made Pull request with bugfix, can you please take a look at it #4480?

cupertank added bug Something isn't working inference labels Oct 3, 2023

mrwyattii self-assigned this Oct 4, 2023

cupertank mentioned this issue Oct 9, 2023

[Bug fix] Add rope_theta for llama config #4480

Merged

cupertank closed this as completed Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Deepspeed shows incorrect results for CodeLlama #4442

[BUG] Deepspeed shows incorrect results for CodeLlama #4442

cupertank commented Oct 3, 2023 •

edited

Loading

mrwyattii commented Oct 4, 2023

cupertank commented Oct 9, 2023

[BUG] Deepspeed shows incorrect results for CodeLlama #4442

[BUG] Deepspeed shows incorrect results for CodeLlama #4442

Comments

cupertank commented Oct 3, 2023 • edited Loading

mrwyattii commented Oct 4, 2023

cupertank commented Oct 9, 2023

cupertank commented Oct 3, 2023 •

edited

Loading