-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add N best results #151
Comments
Vibrato does not currently support N-best results. This is because there are not so many use cases for the N-best results. |
I see. I was considering to use vibrato over mecab-rs for tokenization for a Japanese dictionary lookup program I'm planning to build. Something similar to https://github.com/themoeway/yomitan which is useful for learning Japanese, but my version would be faster in rust, and would support 古文 by using UniDic 中古 dictionary. Yomichan/yomitan does not support 古文 deconjugation and it would be simpler+more accurate to use mecab/vibrato here, rather than write custom 活用 rules Sometimes a word may have multiple readings such as 昨日 (きのう・さくじつ)
and it would be nice to get the N best results to consider all likely possibilities. Additionally, there might be a sentence that is not tokenized properly in the first result, but is correct in the second result. I don't have an example on hand at the moment, but I have seen this before |
Here's an example:
vibrato/mecab's first tokenization result for 暮らしてられる is
where we have
and labels て as 接続助詞, but it should be 助動詞 for てる Using mecab with
In the 2nd best result, it correctly labels て as 助動詞 for てる In other words, 暮らしていられる -> 暮らしてられる |
Thank you for the examples! It's interesting. I agree that the N-best option is useful in such applications. We will consider supporting the N-best option. (I apologize it might be difficult for me to support it soon since I have been busy these days. I'd like to recommend using mecab as an alternative solution for the time being.) |
Thank you! No rush on it and take your time! I'm happy that adding support is being considered |
Is your feature request related to a problem? Please describe.
Mecab has the flag
-N
which provides the N best resultsHowever, I couldn't find docs or in the source code how to do this with vibrato
Describe the solution you'd like
Allow support for providing the N best results
Describe alternatives you've considered
N/A
Additional context
Vibrato 0.5.1
The text was updated successfully, but these errors were encountered: