Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练好测试显示全是标点符号。。。 #6

Open
shizhediao opened this issue Apr 12, 2018 · 10 comments
Open

训练好测试显示全是标点符号。。。 #6

shizhediao opened this issue Apr 12, 2018 · 10 comments

Comments

@shizhediao
Copy link

image

如图所示,输入‘你好’
但是输出全是数字和符号。。。
不知道是什么问题
求解
谢谢

@qhduan
Copy link
Owner

qhduan commented Apr 13, 2018

不知道你用chatbot还是chatbot_cut

cut那个,我调的比较少,问题比较大

当然,不cut的结果没好多少,至少稍微稳定一些

@shizhediao
Copy link
Author

shizhediao commented Apr 13, 2018

我用的chatbot。。。
所有过程按照readme来的,一步不差
但是全是符号让人有点困惑。。。
这个可以说中文吗,还是只能说英文呢

@qhduan
Copy link
Owner

qhduan commented Apr 13, 2018

可能是准备数据的问题,如果是windows下,某些步骤可能有编码问题

@shizhediao
Copy link
Author

嗯,是linux ubuntu 14.04
用gpu训练的
准备数据可能是什么问题呢
我按照这个流程来的😀

@shizhediao
Copy link
Author

另外麻烦看一下这个问题
实在打扰了哈哈
qhduan/Seq2Seq_Chatbot_QA#21
谢谢!

@qhduan
Copy link
Owner

qhduan commented Apr 13, 2018

你看上面那个25%那部分,test.py会输出一部分训练数据(语料句子)

这个就有问题,我怀疑你训练数据那里可能就有问题(是说chatbot.pkl)

你可以随便打开一个python3 shell,然后

import pickle
t = pickle.load(open('chatbot.pkl', 'rb'))

类似这样,看看t里面结果是不是错的

如果是错的建议你也可以重新clone一遍项目,然后重做一遍数据准备试试看

我的运行test.py输出大概会是这样的:

qhduan-station-gpu-01% python3 test.py 
/home/qhduan/.local/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
畹 华 吾 侄
你 接 到 这 封 信 的 时 候
畹 华 吾 侄 , 你 接 到 这 封 信 的 时 候
咱 们 梅 家 从 你 爷 爷 起
咱 们 梅 家 从 你 爷 爷 起
2018-04-13 10:06:39.736784: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-04-13 10:06:39.849721: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-13 10:06:39.850069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties: 
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7845
pciBusID: 0000:01:00.0
totalMemory: 7.92GiB freeMemory: 6.69GiB
2018-04-13 10:06:39.850103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-04-13 10:06:40.038533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/device:GPU:0 with 6459 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-04-13 10:06:40.413912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-04-13 10:06:40.414090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/device:GPU:0 with 114 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-04-13 10:06:40.583457: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
try load model from ./s2ss_chatbot.ckpt
Input Chat Sentence:你好
[[   3 1456  562]] [3]
[[2067 2017  562  530 2483  456 3425    3]]
['</s>', '好', '你']
['我', '想', '你', '会', '有', '事', '的', '</s>']
Input Chat Sentence:

@shizhediao
Copy link
Author

you're right!
看了一下chat.pkl全是逗号,没有有意义字符
我怀疑是不是系统语言的问题?我vim extract_conv.py , 代码都会乱码。。。
image

@qhduan
Copy link
Owner

qhduan commented Apr 13, 2018

linux下也可能会有编码问题吧,中间存取文件会有问题

我基本只测试了系统在utf-8下没问题

个人开发,精力有限

@shizhediao
Copy link
Author

嗯好的,谢谢
我再尝试一下
打扰了!

@sadxiaohu
Copy link

嗯好的,谢谢
我再尝试一下
打扰了!

你好,请问你遇到的问题解决了么,我也遇到这样的问题,而且在linux下预处理训练数据就会出现编码问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants