Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best fit matching for commodity names #15

Closed
Lknechtli opened this issue Jan 11, 2015 · 6 comments
Closed

Best fit matching for commodity names #15

Lknechtli opened this issue Jan 11, 2015 · 6 comments

Comments

@Lknechtli
Copy link
Contributor

A best fit match to commodity names when OCRing the markets would make it much more accurate - for instance, I've seen numbers show up in commodity names. Restricting it to letters for the names, and numbers for prices etc would reduce the number of errors.

@stringandstickytape
Copy link
Owner

It does do Levenshtein similarity tests between commodity names. Maybe that list needs expanding or the algorithm needs to be more tolerant of differences on longer names. Do you have any specific examples?

@Lknechtli
Copy link
Contributor Author

typo2
typo3
typo4
typo5
typo6
typo1

@stringandstickytape
Copy link
Owner

Weird. These are all in the hard-coded list of commodity names, and I've just run them through MRmP's tweaked levenshtein algorithm - they all got autocorrected properly.

You can either:
a) try deleting your autosave.csv and retrying the screenshots - i wonder if it is related to existing commodity names in your autosave.csv, or
b) post a link to a zip file containing your autosave.csv, calibration.txt and (say) the HYDROGEN FUEL Curie Gateway screenshot, so I can try on this side.

It seems to be working fine for me - but I regularly delete my autosave.csv so that could be relevant :/

@Lknechtli
Copy link
Contributor Author

uploaded them to onedrive:
http://1drv.ms/1FILRi9

@stringandstickytape
Copy link
Owner

Awesome. This has pointed me to a HUGE HOLE in the Levenshtein algorithm, namely: it doesn't work. I have swapped in one that does and the problem is fixed. V1.73 includes the fix.

Note that for two of the commodity names, "FRUIT AND VEGETABLES" and one other, it will still ask you to correct it. This is because you have (something like) "FRUTT AND VEGETABLES" and it can't decide if the correct answer is that or "FRUIT AND VEGETABLES" because they both look like good candidates.

The fix is to remove the bad commodity names from your AutoSave.csv - then this problem will go away. A commodity rename feature, like the station rename feature, is issue #17...

@Lknechtli
Copy link
Contributor Author

Awesome. Seems to be working much better now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants