Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

感知机中可变DAT的entrySet方法有bug #1038

Closed
1 task
jimichan opened this issue Nov 28, 2018 · 2 comments
Closed
1 task

感知机中可变DAT的entrySet方法有bug #1038

jimichan opened this issue Nov 28, 2018 · 2 comments
Labels

Comments

@jimichan
Copy link
Contributor

注意事项

请确认下列注意事项:

  • 我已仔细阅读下列文档,都没有找到答案:
  • 我已经通过Googleissue区检索功能搜索了我的问题,也没有找到答案。
  • 我明白开源社区是出于兴趣爱好聚集起来的自由社区,不承担任何责任或义务。我会礼貌发言,向每一个帮助我的人表示感谢。
  • 我在此括号内输入x打钩,代表上述事项确认完毕。

版本号

当前最新版本号是:
我使用的版本是:最新版

我的问题

感知机中使用可变DAT加载特征集合,加载large/cws.bin,在调试过程中发现

 System.out.println(featureMap.size());
 输出值为8021239

但是通过

 Set<String> set = new HashSet<String>();

Set<Map.Entry<String, Integer>> entries = featureMap.entrySet();

for (Map.Entry<String, Integer> m : entries) {
                    set.add(m.getKey());
 }

排除重复后set.size() == 8021206 少掉了33个key。
比如 "\u0001/\u00014" 这个key,
通过featureMap.idOf("\u0001/\u00014") 可以返回id是8
但是featureMap.entrySet()中并包括这个id=8的项。
也就是id=8并没有被恢复出来。

@jimichan
Copy link
Contributor Author

打印完所有的Map.Entry<String, Integer>>,到最后你会发现7054133会重复很多,也有乱码现象

我猜测 �/�4本应该是 "\u0001/\u00014" ==> 8这个键值对

¥/聊7	5257068
¥1	5249614
¥2	5249610
¥3	5249605
�/特4	7054146
�/特5	7054145
�/特6	7054143
�/特7	7054139
�/�4	7054144
�/�5	7054142
�/�6	7054138
�/�7	7054135
�1	7054140
�2	7054136
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133
�3	7054133

@hankcs hankcs closed this as completed in 43f0ea8 Dec 8, 2018
@hankcs hankcs added the bug label Dec 8, 2018
@hankcs
Copy link
Owner

hankcs commented Dec 8, 2018

感谢反馈,已经修复,请参考上面的commit。
如果还有问题,欢迎重开issue。

hankcs added a commit that referenced this issue Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants