Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问指令微调数据是由哪些数据组成的呀? #11

Closed
Macvh opened this issue Feb 27, 2024 · 1 comment
Closed

请问指令微调数据是由哪些数据组成的呀? #11

Macvh opened this issue Feb 27, 2024 · 1 comment

Comments

@Macvh
Copy link

Macvh commented Feb 27, 2024

请问指令微调数据集大小,组成信息以及预处理方式公开了吗

@jubgjf
Copy link
Collaborator

jubgjf commented Mar 1, 2024

微调数据集包括一部分私有数据集,暂无全部公开的计划。

我们使用的公开数据集包括:

私有数据集包括:

  • 各类传统NLP任务的指令数据
  • 安全数据

数据集总量为30万条。我们对数据集内部的模型身份信息做了过滤

@jubgjf jubgjf mentioned this issue Mar 1, 2024
@jubgjf jubgjf closed this as completed Mar 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants