Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pythia (GPTNeoXForCausalLM) Regression (inference time) in transformers 4.35.0 #28360

Closed
JonasGeiping opened this issue Jan 5, 2024 · 1 comment · Fixed by #28602
Closed

Comments

@JonasGeiping
Copy link

JonasGeiping commented Jan 5, 2024

System Info

  • transformers version: 4.35.0
  • Platform: Linux-5.16.19-76051619-generic-x86_64-with-glibc2.35
  • Python version: 3.10.11
  • Huggingface_hub version: 0.17.3
  • Safetensors version: 0.3.1
  • Accelerate version: 0.25.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.0.dev20240104 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

The Problem

The Pythia models, and by extension the GPTNeoXForCausalLM implementation don't appear to be working correctly in 4.35. I've attached a simple reproduction snippet below. This code works on 4.34, but produces NaNs on 4.35 during the forward pass. The token ids are not particularly anomalous.

The problem is likely related to the report at #28316, but this issue shows that any effects on reward modeling might be 2nd-order effects and that changes between 4.34 and 4.35 are the problem.

@ArthurZucker @younesbelkada

Reproduction

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name_or_path = "EleutherAI/pythia-70m"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, torch_dtype=torch.float16)


input_ids = [29, 93, 303, 64, 5478, 49651, 10394, 187, 34, 12939, 875]  # this is '<|im_start|>system\nA chat between' 
# alternative: tokenizer('<|im_start|>system\nA chat between')
input_ids = torch.as_tensor(input_ids)[None]

model.cuda()
input_ids = input_ids.cuda()
model(input_ids)["logits"]  # has NaNs?

Expected behavior

Normal forward pass, without NaNs.

@ArthurZucker
Copy link
Collaborator

ArthurZucker commented Jan 8, 2024

Thanks for reporting, this should be a lot easier to debug! 🤗 Having a look (can definitely reproduce).

  • quick fix: using bfloat16 get's rid of the nan's.
  • definitely a regression as I can confirm this was passing before.
  • seems to come from 253f9a3
  • we don't have failing tests so either because we load in a different dtype or we just didn't test it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants