Is there a reason why GPT2 does not trigger <eos> token at all? #196

timothylimyl · 2023-03-06T08:52:30Z

I been trying to use GPT2-1.5b to do some Q/A but it seems that the model continues to generate (repeating itself over and over again) until max tokens are reached.

Under the function call model.generate(), I have already added a check to return as soon as is found:

            # sample from the distribution
            idx_next = torch.multinomial(probs, num_samples=1)
            # append sampled index to the running sequence and continue
            idx = torch.cat((idx, idx_next), dim=1)

            if int(idx_next.cpu()) == 50256:
                return idx

However, it seems that it never gets triggered. An example of generated output:

Q: What is the second law of thermodynamics?
A: It is the law of entropy.
Q: What is the third law of thermodynamics?
A: It is the law of thermodynamics.
Q: What is the fourth law of thermodynamics?
A: It is the law of thermodynamics.
Q: What is the fifth law of thermodynamics?
A: It is the law of thermodynamics.
Q: What is the sixth law of thermodynamics?
A: It is the law of thermodynamics.

It just keeps repeating the flow. Sometimes it repeats exactly.

The text was updated successfully, but these errors were encountered:

chrisociepa · 2023-03-28T16:54:45Z

Have you finetuned the model with such token? Pretrained models do not operate with such tokens, because they are trained on large corpuses, not specific tasks.

timothylimyl · 2023-03-31T09:14:45Z

@chrisociepa yes, I realised my misconception about pre-trained language models.

Add scripts compatible with the Newswire Dataset

timothylimyl closed this as completed Mar 31, 2023

LamOne1 mentioned this issue May 29, 2023

EOS token in the prepare_redpajama script Lightning-AI/lit-llama#329

Closed

gkielian added a commit to gkielian/ReaLLMASIC_nanogpt that referenced this issue Jul 27, 2024

Merge pull request karpathy#196 from klei22/add_newswire

2bdaa38

Add scripts compatible with the Newswire Dataset

gkielian added a commit to gkielian/ReaLLMASIC_nanogpt that referenced this issue Sep 5, 2024

Merge pull request karpathy#196 from klei22/add_newswire

3b140a2

Add scripts compatible with the Newswire Dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a reason why GPT2 does not trigger <eos> token at all? #196

Is there a reason why GPT2 does not trigger <eos> token at all? #196

timothylimyl commented Mar 6, 2023 •

edited

Loading

chrisociepa commented Mar 28, 2023

timothylimyl commented Mar 31, 2023

Is there a reason why GPT2 does not trigger <eos> token at all? #196

Is there a reason why GPT2 does not trigger <eos> token at all? #196

Comments

timothylimyl commented Mar 6, 2023 • edited Loading

chrisociepa commented Mar 28, 2023

timothylimyl commented Mar 31, 2023

timothylimyl commented Mar 6, 2023 •

edited

Loading