You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
19th century music
20 century British history
21st Century Music
21st century science & technology
2D Materials
3 Biotech
3D Printing and Additive Manufacturing
3D Printing in Medicine
3D Research
"3L: Language, Linguistics, Literature"
Describe the current behavior
Exception in thread "main" java.lang.NumberFormatException: For input string: " Linguistics"
at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.base/java.lang.Integer.parseInt(Integer.java:638)
at java.base/java.lang.Integer.parseInt(Integer.java:770)
at com.hankcs.hanlp.corpus.io.IOUtil.loadDictionary(IOUtil.java:794)
at com.hankcs.hanlp.corpus.io.IOUtil.loadDictionary(IOUtil.java:752)
at com.hankcs.hanlp.seg.Other.DoubleArrayTrieSegment.<init>(DoubleArrayTrieSegment.java:68)
at org.grobid.core.lexicon.DictSegmenterKt.main(DictSegmenter.kt:6)
at org.grobid.core.lexicon.DictSegmenterKt.main(DictSegmenter.kt)
Expected behavior
正常加载
System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 22.04.1 LTS
HanLP version: com.hankcs:hanlp:portable-1.8.3
I've completed this form and searched the web for solutions.
The text was updated successfully, but these errors were encountered:
Describe the bug
使用 CSV 文件作为词典时,由于部分词含有逗号会导致词典失败。
从代码上看,HanLP 只是单纯的使用逗号切分每一行,并没有处理 CSV 转义的情况。
列数据中存在
"
,,
符号时会将该列使用""
进行转义。Code to reproduce the issue
将以下文本直接保存为 csv 文件并加载词典。
Describe the current behavior
Expected behavior
正常加载
System information
The text was updated successfully, but these errors were encountered: