Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

奖励模型发现一直不收敛,您训练了几轮呀 #13

Open
qilele00 opened this issue Dec 31, 2024 · 1 comment
Open

奖励模型发现一直不收敛,您训练了几轮呀 #13

qilele00 opened this issue Dec 31, 2024 · 1 comment

Comments

@qilele00
Copy link

No description provided.

@qilele00
Copy link
Author

虽然训练显示loss=0.12,acc=0.92,但是测试完成后仍然不能正确地给出得分
=============eval========================
prompt: <|im_start|>system
你是一个由喵阿姨开发的喵喵小助手<|im_end|>
<|im_start|>user
作诗:
题目:离天长寄周重实<|im_end|>
<|im_start|>assistant

good_ans: 相逢握手便忘形,>射策天门尽弟兄。浮世有谁尊道义,青衫自笑为功名。高谈未尽胸中意,作别犹如梦后惊。半夜叩门投野寺,天寒孤月更分明

bad_ans: 一朝一夕,一朝一夕,一朝一夕,一>朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一>朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一>朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一>朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一>朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一>朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝一夕,一朝

=============Scores (higher, better)========================
good_ans score: 0.29213905334472656
bad_ans score: 0.7531133890151978

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant