You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
________________________________
发件人: Willard Sheen ***@***.***>
发送时间: Tuesday, August 29, 2023 10:45:28 AM
收件人: ydli-ai/CSL ***@***.***>
抄送: Subscribed ***@***.***>
主题: [ydli-ai/CSL] 关于预训练数据的来源 (Issue #11)
预训练数据集数据似乎远多于发布的论文元数据集。
在训练模型时为了去重,我简单校验了两个数据,似乎是不重叠的?
方便简要说明下预训练数据的来源和内容吗
* 预训练的数据集
* csl.jsonl
* 2310165 line
* 论文元数据
* csl_camera_readly.tsv
* 396209 line
―
Reply to this email directly, view it on GitHub<#11>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AE3SPVZ374RPQDKNMEJGRY3XXVJURANCNFSM6AAAAAA4CH7X7E>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
预训练数据集数据似乎远多于发布的论文元数据集。
在训练模型时为了去重,我简单校验了两个数据,似乎是不重叠的?
方便简要说明下预训练数据的来源和内容吗
The text was updated successfully, but these errors were encountered: