Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何训练自己搜集的语料 #14

Open
ZNZHL opened this issue May 27, 2018 · 4 comments
Open

如何训练自己搜集的语料 #14

ZNZHL opened this issue May 27, 2018 · 4 comments

Comments

@ZNZHL
Copy link

ZNZHL commented May 27, 2018

本人搜集了一些语料,格式是txt,形式是(问题a回答b问题c回答d......,分行),不知道如何训练?请大神解答

@yaleimeng
Copy link

训练数据的格式(扩展名为.conv):
E
M 你好/,/在/吗
M 请/向/我/提问/吧
E
M 好厉害/~
M 我/师父/教/得/好
E
具体操作步骤在chatbot目录下有说明。依次执行extract、train、test即可。

@RaymondJSu
Copy link

@yaleimeng
请问自己蒐集的没有到百万条也可以进行训练吗?
是哪边的参数要做修改呢?

@yaleimeng
Copy link

@axa000 没有百万条也不要紧,但至少还是要几万条级别。接触时间比较早,只要语料处理好了,应该example是能直接跑起来的。
不过这种seq2seq方案只适合对应答正确性、合理性要求比较低的闲聊场景。目前在语句通顺等方面还有不少局限。

@RaymondJSu
Copy link

@yaleimeng 谢谢你的回复!
确实训练完常常答非所问
但gitgub上好像找不到更好的中文机器人?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants