Changing a single example affects forward pass for other examples in a batch #335

mayank31398 · 2022-08-27T10:31:36Z

@stas00 , I wrote this script to do get the conditional NLL for the labels given the context.
Tried different batches with only the first example changing and rest of the examples fixed in the batch. However, after a certain point, the changing of first examples, affects the NLL for other examples.

This is not supposed to happen.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "bigscience/bloom"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    max_memory={0: '0GIB', 1: '51GIB', 2: '51GIB', 3: '51GIB',
                4: '51GIB', 5: '51GIB', 6: '51GIB', 7: '51GIB'},
    torch_dtype=torch.bfloat16,
)

model.eval()

def compute_gen_loss(lm_logits: torch.Tensor, labels: torch.Tensor) -> torch.Tensor:
    batch_size = labels.shape[0]
    shift_logits = lm_logits[..., :-1, :].contiguous()
    shift_labels = labels[..., 1:].contiguous()

    loss_fct = torch.nn.CrossEntropyLoss(reduction="none")
    loss = loss_fct(
        shift_logits.view(-1, shift_logits.size(-1)),
        shift_labels.view(-1)
    )
    loss = loss.reshape(batch_size, -1)
    loss = loss.sum(dim=-1) / (shift_labels != -100).sum(dim=-1)
    return loss


def pad_ids(arrays, padding, max_length=-1):
    if (max_length < 0):
        max_length = max(list(map(len, arrays)))

    arrays = [[padding] * (max_length - len(array)) +
              array for array in arrays]

    return arrays


def forward(text: list, labels: str, conditional: bool = True):
    input_tokens = tokenizer(text).input_ids
    label_tokens = tokenizer(labels).input_ids

    input_ids = [x + y for (x, y) in zip(input_tokens, label_tokens)]
    attention_mask = [(len(x) + len(y)) * [1]
                      for (x, y) in zip(input_tokens, label_tokens)]
    if (conditional):
        labels = [[-100] * len(x) + y for (x, y)
                  in zip(input_tokens, label_tokens)]
    else:
        labels = input_ids

    pad = 3
    input_ids = pad_ids(input_ids, pad)
    attention_mask = pad_ids(attention_mask, 0)
    # labels need to be on output device
    labels = pad_ids(labels, -100)

    input_ids = torch.tensor(input_ids)
    attention_mask = torch.tensor(attention_mask)
    labels = torch.tensor(labels)
    lm_logits = model(
        input_ids=input_ids,
        attention_mask=attention_mask
    ).logits

    print(compute_gen_loss(lm_logits, labels).cpu().tolist())

text = [
    "DeepSpeed",
    "DeepSpeed is a",
    "DeepSpeed is a machine",
    "DeepSpeed is a machine learning framework",
]
labels = [
    " is awesome.",
    " good person.",
    " that can wipe out the planet.",
    " for generating memes.",
]
forward(text, labels)

labels[0] = " is awesome. really awesome"
forward(text, labels)

labels[0] = " is awesome. really awesome. Try it."
forward(text, labels)

labels[0] = " is awesome. really awesome. Try it. You'll be surprised"
forward(text, labels)

labels[0] = " is awesome. really awesome. Try it. You'll be surprised. BLOOM was trained using DeepSpeed."
forward(text, labels)

labels[0] = " is awesome. really awesome. Try it. You'll be surprised. BLOOM was trained using DeepSpeed. Oh no the values are bugging out now."
forward(text, labels)

Output:

[4.8125, 5.1875, 3.296875, 5.09375]
[5.625, 5.1875, 3.296875, 5.09375]
[4.375, 5.1875, 3.296875, 5.09375]
[4.0625, 5.1875, 3.28125, 5.09375]
[3.953125, 5.1875, 3.28125, 5.0625]
[4.25, 5.1875, 3.296875, 5.09375]

Value drops from 3.29 to 3.28 in column 2 when only example for column 0 is changed. Even column 3 changes in last case.
Only column 0 is supposed to change here.

stas00 · 2022-08-29T15:56:25Z

@mayank31398, I think it'd be much better to file this issue with transformers, since this code isn't Mega-DS-related.

I know several folks have been tweaking the bloom modeling code a lot recently, so you may want to tag them on that (peek into the history of https://github.com/huggingface/transformers/blame/main/src/transformers/models/bloom/modeling_bloom.py)

mayank31398 · 2022-08-29T21:37:11Z

Thanks, I will do that.
Closing this issue here.

mayank31398 · 2022-08-29T21:53:23Z

Filed this issue huggingface/transformers#18809

stas00 · 2022-08-29T21:56:22Z

FYI: I have re-tagged that new issue to those who have been actively tweaking the model, so they are the best to talk to.

mayank31398 added the bug Something isn't working label Aug 27, 2022

mayank31398 mentioned this issue Aug 27, 2022

Add generation server scripts using HF accelerate and DS-inference #328

Merged

mayank31398 closed this as completed Aug 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing a single example affects forward pass for other examples in a batch #335

Changing a single example affects forward pass for other examples in a batch #335

mayank31398 commented Aug 27, 2022 •

edited

Loading

stas00 commented Aug 29, 2022

mayank31398 commented Aug 29, 2022

mayank31398 commented Aug 29, 2022

stas00 commented Aug 29, 2022

Changing a single example affects forward pass for other examples in a batch #335

Changing a single example affects forward pass for other examples in a batch #335

Comments

mayank31398 commented Aug 27, 2022 • edited Loading

stas00 commented Aug 29, 2022

mayank31398 commented Aug 29, 2022

mayank31398 commented Aug 29, 2022

stas00 commented Aug 29, 2022

mayank31398 commented Aug 27, 2022 •

edited

Loading