Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plural inflections #2079

Open
DimitriPapadopoulos opened this issue Sep 23, 2021 · 3 comments
Open

plural inflections #2079

DimitriPapadopoulos opened this issue Sep 23, 2021 · 3 comments
Labels
dictionary Changes to the dictionary

Comments

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Sep 23, 2021

Module inflect generates the plural inflection of English words.

This could be used to detect whether the plural inflection of a suggestion is available in the dictionary. If it is not, chances are the plural inflection of the associated misspelling is missing too.

This is convoluted, but using inflect directly on misspellings might not work well - to be tested - and will mean more words to look into.

@DimitriPapadopoulos
Copy link
Collaborator Author

See rationale here:
#2077 (comment)

@DimitriPapadopoulos
Copy link
Collaborator Author

DimitriPapadopoulos commented Sep 23, 2021

Here is a proof of concept:

#!/usr/bin/python3

import os
import inflect

DICTIONARY_DIR = "/my/path/codespell/codespell_lib/data"

backwards = {}

for dictionary in os.listdir(DICTIONARY_DIR):
    root, ext = os.path.splitext(dictionary)
    if root.startswith("dictionary") and ext == ".txt":
        dictionary = os.path.join(DICTIONARY_DIR, dictionary)
        with open(dictionary) as f:
            for line in f:
                key, data = line.split("->")

                key = key.lower()
                key = key.strip()

                data = data.lower()
                for word in data.split(','):
                    word = word.strip()
                    if word:
                        backwards.setdefault(word, []).append(key)


p = inflect.engine()

for word in backwards:
    plural = p.plural(word)
    if plural not in backwards:
        print(word, backwards[word])

Most words are not nouns. The difficult part is to single out nouns. Any way, approx. 1 out of 10 reported words are nouns. For example, the following are missing from the dictionaries:
infarctions->infractions
aminators->animators, laminators,
reorganisations->reorganizations
webaservers->webservers, web servers, (I cannot find any occurrence in Google, except in a "valid" webaservers.net domain)
webages->webpages
weaponaries->weaponries

@DimitriPapadopoulos
Copy link
Collaborator Author

DimitriPapadopoulos commented Sep 23, 2021

To find nouns, see Library to grammatically classify English words (nouns, verbs, adverbs, etc). Use the Python module nltk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dictionary Changes to the dictionary
Projects
None yet
Development

No branches or pull requests

2 participants