-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best fit matching for commodity names #15
Comments
It does do Levenshtein similarity tests between commodity names. Maybe that list needs expanding or the algorithm needs to be more tolerant of differences on longer names. Do you have any specific examples? |
Weird. These are all in the hard-coded list of commodity names, and I've just run them through MRmP's tweaked levenshtein algorithm - they all got autocorrected properly. You can either: It seems to be working fine for me - but I regularly delete my autosave.csv so that could be relevant :/ |
uploaded them to onedrive: |
Awesome. This has pointed me to a HUGE HOLE in the Levenshtein algorithm, namely: it doesn't work. I have swapped in one that does and the problem is fixed. V1.73 includes the fix. Note that for two of the commodity names, "FRUIT AND VEGETABLES" and one other, it will still ask you to correct it. This is because you have (something like) "FRUTT AND VEGETABLES" and it can't decide if the correct answer is that or "FRUIT AND VEGETABLES" because they both look like good candidates. The fix is to remove the bad commodity names from your AutoSave.csv - then this problem will go away. A commodity rename feature, like the station rename feature, is issue #17... |
Awesome. Seems to be working much better now. |
A best fit match to commodity names when OCRing the markets would make it much more accurate - for instance, I've seen numbers show up in commodity names. Restricting it to letters for the names, and numbers for prices etc would reduce the number of errors.
The text was updated successfully, but these errors were encountered: