-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plural inflections #2079
Comments
See rationale here: |
Here is a proof of concept: #!/usr/bin/python3
import os
import inflect
DICTIONARY_DIR = "/my/path/codespell/codespell_lib/data"
backwards = {}
for dictionary in os.listdir(DICTIONARY_DIR):
root, ext = os.path.splitext(dictionary)
if root.startswith("dictionary") and ext == ".txt":
dictionary = os.path.join(DICTIONARY_DIR, dictionary)
with open(dictionary) as f:
for line in f:
key, data = line.split("->")
key = key.lower()
key = key.strip()
data = data.lower()
for word in data.split(','):
word = word.strip()
if word:
backwards.setdefault(word, []).append(key)
p = inflect.engine()
for word in backwards:
plural = p.plural(word)
if plural not in backwards:
print(word, backwards[word]) Most words are not nouns. The difficult part is to single out nouns. Any way, approx. 1 out of 10 reported words are nouns. For example, the following are missing from the dictionaries: |
To find nouns, see Library to grammatically classify English words (nouns, verbs, adverbs, etc). Use the Python module nltk. |
Module inflect generates the plural inflection of English words.
This could be used to detect whether the plural inflection of a suggestion is available in the dictionary. If it is not, chances are the plural inflection of the associated misspelling is missing too.
This is convoluted, but using inflect directly on misspellings might not work well - to be tested - and will mean more words to look into.
The text was updated successfully, but these errors were encountered: