Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a reason why GPT2 does not trigger <eos> token at all? #196

Closed
timothylimyl opened this issue Mar 6, 2023 · 2 comments
Closed

Comments

@timothylimyl
Copy link

timothylimyl commented Mar 6, 2023

I been trying to use GPT2-1.5b to do some Q/A but it seems that the model continues to generate (repeating itself over and over again) until max tokens are reached.

Under the function call model.generate(), I have already added a check to return as soon as is found:

            # sample from the distribution
            idx_next = torch.multinomial(probs, num_samples=1)
            # append sampled index to the running sequence and continue
            idx = torch.cat((idx, idx_next), dim=1)

            if int(idx_next.cpu()) == 50256:
                return idx

However, it seems that it never gets triggered. An example of generated output:

Q: What is the second law of thermodynamics?
A: It is the law of entropy.
Q: What is the third law of thermodynamics?
A: It is the law of thermodynamics.
Q: What is the fourth law of thermodynamics?
A: It is the law of thermodynamics.
Q: What is the fifth law of thermodynamics?
A: It is the law of thermodynamics.
Q: What is the sixth law of thermodynamics?
A: It is the law of thermodynamics.

It just keeps repeating the flow. Sometimes it repeats exactly.

@chrisociepa
Copy link

Have you finetuned the model with such token? Pretrained models do not operate with such tokens, because they are trained on large corpuses, not specific tasks.

@timothylimyl
Copy link
Author

@chrisociepa yes, I realised my misconception about pre-trained language models.

gkielian added a commit to gkielian/ReaLLMASIC_nanogpt that referenced this issue Jul 27, 2024
Add scripts compatible with the Newswire Dataset
gkielian added a commit to gkielian/ReaLLMASIC_nanogpt that referenced this issue Sep 5, 2024
Add scripts compatible with the Newswire Dataset
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants