-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"to and fro" is correct #410
Comments
Sure, feel free to add it |
Thanks @larsoner. I couldn't find the an existing mechanism to specify multi word exceptions like this. Does it exist and I've missed it, or would it need to be added to handle this case? |
Usually people just add the word to the list like:
Right @luzpaz ? |
(I don't think there is a multi-word way in particular, I think it's okay to just use this single-word one in the meantime) |
|
So I've just sort of hit another of these, preform->perform. Well actually in this case it was a typo, but preform is a valid word too. But there are others like Cristal.
Essentially we currently have some entries where the misspelling is also listed as a valid correction. There are two trains of thought here I guess, firstly it's nonsense and we should remove the entry, or more considered, that cristal is most likely a misspelling, but in some (rare) circumstances you may really mean it (see also fro, until we support multi-word corrections #255). What's the general feeling? I've not used interactive mode, but there a "did you really mean?", and in manual/automatic the option to skip/ignore all potential ones seems like it might be sensible. Essentially treat likely typos differently from definite typos. |
That is more or less what these entries seem to do. I agree we could add a parameter to be more suggestive (default) or less suggestive (if the "error" is in the list of corrections, don't prompt or report) |
I guess one of the things I've always liked about Codespell is the fact the dictionary is curated, rather than a list of valid words, and hence doesn't normally trip up on valid but obscure/technical words. It sort of feels to me that it goes against that ethos when words are added to the dictionary which are valid (although admittedly mostly rarely used). Even if the other variant (i.e. the "typo") is listed, but even more so when it's not. |
We probably need a new argument for this, that (I agree) should disable these by default.
|
If so, any volunteer to implement this? |
Would a bitmask make more sense, in case someone doesn't want one of the future strict levels? I'll have to pass on implementing it for now. It probably also needs a dictionary test that we don't just have |
Yes the idea is to use binary values so it would be a bitmask (it's just a trivial, future compatible one for now) |
Not that this covers multi-word examples, but the other rare stuff can go in https://github.com/codespell-project/codespell/blob/master/codespell_lib/data/dictionary_rare.txt |
I am to overcome this limitation as observed in https://framagit.org/medoc92/recoll/-/merge_requests/23#note_1999939 via
I think it would be valuable to collect/support presence of such phrases (I can't recall ATM any other but remember hitting them) which should be whitelisted although individual words ( |
A good idea indeed, however:
Item 1 looks like the most complex to address – but then I haven't done my homework. Nowadays, I am not certain it is useful to start supporting n-grams without using deep learning to process them. |
couldn't it be just pretty much "pre-feed ignore-regex with all the phrases surrounded with |
Do you mean you would add an invisible backspace to your text, just to please codespell? |
no, I mean that
|
So codespell would apply a limited set of regexes for very common such expressions, prior to splitting the text into words, removing the matched words from further checks. I suspect this would have a perceptible impact on performance, but I can't tell how maintainers would react it. |
codespell suggests replacing "fro" with "for". Can we have an exception for the phrase "to and fro"?
https://en.wiktionary.org/wiki/to_and_fro
The text was updated successfully, but these errors were encountered: