Pythia (GPTNeoXForCausalLM) Regression (inference time) in transformers 4.35.0 #28360

JonasGeiping · 2024-01-05T17:29:26Z

System Info

transformers version: 4.35.0
Platform: Linux-5.16.19-76051619-generic-x86_64-with-glibc2.35
Python version: 3.10.11
Huggingface_hub version: 0.17.3
Safetensors version: 0.3.1
Accelerate version: 0.25.0
Accelerate config: not found
PyTorch version (GPU?): 2.3.0.dev20240104 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

The Problem

The Pythia models, and by extension the GPTNeoXForCausalLM implementation don't appear to be working correctly in 4.35. I've attached a simple reproduction snippet below. This code works on 4.34, but produces NaNs on 4.35 during the forward pass. The token ids are not particularly anomalous.

The problem is likely related to the report at #28316, but this issue shows that any effects on reward modeling might be 2nd-order effects and that changes between 4.34 and 4.35 are the problem.

@ArthurZucker @younesbelkada

Reproduction

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name_or_path = "EleutherAI/pythia-70m"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, torch_dtype=torch.float16)


input_ids = [29, 93, 303, 64, 5478, 49651, 10394, 187, 34, 12939, 875]  # this is '<|im_start|>system\nA chat between' 
# alternative: tokenizer('<|im_start|>system\nA chat between')
input_ids = torch.as_tensor(input_ids)[None]

model.cuda()
input_ids = input_ids.cuda()
model(input_ids)["logits"]  # has NaNs?

Expected behavior

Normal forward pass, without NaNs.

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-01-08T06:37:19Z

Thanks for reporting, this should be a lot easier to debug! 🤗 Having a look (can definitely reproduce).

quick fix: using bfloat16 get's rid of the nan's.
definitely a regression as I can confirm this was passing before.
seems to come from 253f9a3
we don't have failing tests so either because we load in a different dtype or we just didn't test it.

ArthurZucker mentioned this issue Jan 19, 2024

[GPTNeoX] Fix BC issue with 4.36 #28602

Merged

amyeroberts closed this as completed in #28602 Jan 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pythia (GPTNeoXForCausalLM) Regression (inference time) in transformers 4.35.0 #28360

Pythia (GPTNeoXForCausalLM) Regression (inference time) in transformers 4.35.0 #28360

JonasGeiping commented Jan 5, 2024 •

edited

Loading

ArthurZucker commented Jan 8, 2024 •

edited

Loading

Pythia (GPTNeoXForCausalLM) Regression (inference time) in transformers 4.35.0 #28360

Pythia (GPTNeoXForCausalLM) Regression (inference time) in transformers 4.35.0 #28360

Comments

JonasGeiping commented Jan 5, 2024 • edited Loading

System Info

The Problem

Reproduction

Expected behavior

ArthurZucker commented Jan 8, 2024 • edited Loading

JonasGeiping commented Jan 5, 2024 •

edited

Loading

ArthurZucker commented Jan 8, 2024 •

edited

Loading