-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RM数据构造 #55
Comments
十个模型仅仅只有随机种子不同,利用随机性获得一个平均和稳定的reward model打分。 |
您好,想问下论文中说的用于给数据打分的奖励模型是基于什么模型训练的,随机种子指的是哪里的随机种子呢 |
基于llama 2 hf 随机种子就是就是random seed 可以打乱数据集的输入顺序
---- 回复的原邮件 ----
***@***.***>发送日期2024年12月18日 14:00 ***@***.***> 抄送人binghai ***@***.***>,
***@***.***>主题Re: [OpenLMLab/MOSS-RLHF] RM数据构造 (Issue #55)
您好,想问下论文中说的用于给数据打分的奖励模型是基于什么模型训练的,随机种子指的是哪里的随机种子呢
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
感谢回复,请问这10个奖励模型的训练代码就是公开的那部分吗,我可以通过复现代码代码得到这10个奖励模型吗,谢谢 |
您好,想问下,论文中说选择10个不同的RM模型对同一个数据打分,这10个RM模型的选择标准是什么?
The text was updated successfully, but these errors were encountered: