-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rare typo hep->heap, help,
#3461
Conversation
85cfdef
into
codespell-project:master
Ahh! HEP stands for High Energy Physics, and is extremely common, such as in https://scikit-hep.org, https://iris-hep.org, etc. The Scientific-Python developer guidelines were originally the Scikit-HEP developer pages, etc. Just noticed this in yesterday's release. |
Actually, hep is also an entry in the OED. I'm not sure how to handle this. Perhaps the rare dictionary should not be selected by default. |
How about removing this, and just leaving all the other ones (like heping, which is not valid for either HEP or "hep" the word)? |
It could lead to false negatives in other contexts. |
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [codespell](https://github.com/codespell-project/codespell) | dev | minor | `2.3.0` -> `2.4.0` | --- ### Release Notes <details> <summary>codespell-project/codespell (codespell)</summary> ### [`v2.4.0`](https://github.com/codespell-project/codespell/releases/tag/v2.4.0) [Compare Source](codespell-project/codespell@v2.3.0...v2.4.0) <!-- Release notes generated using configuration in .github/release.yml at main --> #### What's Changed - Exclude bots from generated release notes by [@​hugovk](https://github.com/hugovk) in codespell-project/codespell#3432 - Refactor: Move some code to new files for reuse by [@​nthykier](https://github.com/nthykier) in codespell-project/codespell#3434 - Add `equipmnet->equipment` by [@​korverdev](https://github.com/korverdev) in codespell-project/codespell#3438 - Set better project description by [@​mtelka](https://github.com/mtelka) in codespell-project/codespell#3435 - Additional en-GB → en-US entries by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3058 - Consistent error messages by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3440 - Add 'driven' as 'drivin' variant by [@​korverdev](https://github.com/korverdev) in codespell-project/codespell#3441 - More typos by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3439 - Add reusing misspelling and variants by [@​korverdev](https://github.com/korverdev) in codespell-project/codespell#3445 - Add typos found in Emacs and elsewhere by [@​skangas](https://github.com/skangas) in codespell-project/codespell#3447 - MAINT: Fix codecov by [@​larsoner](https://github.com/larsoner) in codespell-project/codespell#3451 - Add typos found in GNU Guile by [@​skangas](https://github.com/skangas) in codespell-project/codespell#3448 - Add corrections from Aspell (fix [#​3356](codespell-project/codespell#3356)) by [@​skangas](https://github.com/skangas) in codespell-project/codespell#3453 - Add entries to dictionary_informal.txt by [@​skangas](https://github.com/skangas) in codespell-project/codespell#3456 - Add rare typo `lien->line` by [@​skangas](https://github.com/skangas) in codespell-project/codespell#3460 - Add rare typo `firs->first` by [@​skangas](https://github.com/skangas) in codespell-project/codespell#3459 - Add rare typo `hep->heap, help,` by [@​skangas](https://github.com/skangas) in codespell-project/codespell#3461 - Add rare typo `brunch->branch` by [@​skangas](https://github.com/skangas) in codespell-project/codespell#3458 - Add corrections from `typos` dictionary (A1) by [@​skangas](https://github.com/skangas) in codespell-project/codespell#3450 - Add timestmp->timestamp and its variations by [@​fkmy](https://github.com/fkmy) in codespell-project/codespell#3464 - Add .venv to .gitignore by [@​skangas](https://github.com/skangas) in codespell-project/codespell#3466 - Only accept documented choices after `-i` and `-q` by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3344 - Move assertIn to the code dictionary as it's a Python test function by [@​peternewman](https://github.com/peternewman) in codespell-project/codespell#3469 - Add some more typos by [@​skangas](https://github.com/skangas) in codespell-project/codespell#3468 - Add some typos from Emacs by [@​skangas](https://github.com/skangas) in codespell-project/codespell#3471 - Add corrections from `typos` dictionary (A2) by [@​skangas](https://github.com/skangas) in codespell-project/codespell#3454 - Add variations for words starting with `non-` by [@​skangas](https://github.com/skangas) in codespell-project/codespell#3467 - Update "Using a config file" README entry by [@​oddhack](https://github.com/oddhack) in codespell-project/codespell#3478 - Add two choices for verision typo fix by [@​yarikoptic](https://github.com/yarikoptic) in codespell-project/codespell#3252 - fix typo by [@​spaette](https://github.com/spaette) in codespell-project/codespell#3479 - \[pre-commit.ci] pre-commit manual update (ruff 0.5.0) by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3481 - Add trusthworth(y|iness)->trustworth(y|iness) correction. by [@​cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3482 - Add thrustworth(y|iness)->trustworth(y|iness). by [@​cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3483 - New typos by [@​gforcada](https://github.com/gforcada) in codespell-project/codespell#3484 - add enrol->enroll to en-GB to en-US dictionary by [@​slitvackwinkler](https://github.com/slitvackwinkler) in codespell-project/codespell#3485 - Add --ignore-multiline-regex option. by [@​julian-smith-artifex-com](https://github.com/julian-smith-artifex-com) in codespell-project/codespell#3476 - Add spelling correction for separately. by [@​cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3487 - Start testing with Python 3.13 by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3488 - Missing typo in dictionary by [@​matlupi](https://github.com/matlupi) in codespell-project/codespell#3497 - Add enterpris->enterprise spelling correction. by [@​cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3498 - Add spelling correction for proir and variant. by [@​cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3499 - fix issue [#​3220](codespell-project/codespell#3220): interactive model & write-changes by [@​MercuryDemo](https://github.com/MercuryDemo) in codespell-project/codespell#3340 - infastructure typo (15.6k hits on github) by [@​yarikoptic](https://github.com/yarikoptic) in codespell-project/codespell#3501 - Add several spelling corrections by [@​luzpaz](https://github.com/luzpaz) in codespell-project/codespell#3500 - Add "releaseds->released, releases," spelling correction by [@​cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3503 - Several spelling suggestions by [@​mdeweerd](https://github.com/mdeweerd) in codespell-project/codespell#3504 - Add favilitate->facilitate and its variations by [@​luzpaz](https://github.com/luzpaz) in codespell-project/codespell#3505 - Add seemd -> seemed by [@​fishilico](https://github.com/fishilico) in codespell-project/codespell#3508 - feat: add typo spelling for capabilities by [@​IndexSeek](https://github.com/IndexSeek) in codespell-project/codespell#3507 - entirerly -> entirely by [@​matlupi](https://github.com/matlupi) in codespell-project/codespell#3512 - Add stuty -> study and variations by [@​fishilico](https://github.com/fishilico) in codespell-project/codespell#3514 - readibly->readably by [@​claydugo](https://github.com/claydugo) in codespell-project/codespell#3518 - Add clapse->collapse to dictionary.txt by [@​Runtemund](https://github.com/Runtemund) in codespell-project/codespell#3513 - fix(rare): remove loath->loathe, as loath is as common as loathe by [@​corneliusroemer](https://github.com/corneliusroemer) in codespell-project/codespell#3523 - Add variations of 'symetriy' typo by [@​luzpaz](https://github.com/luzpaz) in codespell-project/codespell#3528 - Add distriute->distribute (and variations) to dictionary.txt by [@​corneliusroemer](https://github.com/corneliusroemer) in codespell-project/codespell#3517 - Some extra spelling suggestions for the dictionnary (aumatically, ...) by [@​mdeweerd](https://github.com/mdeweerd) in codespell-project/codespell#3516 - More typos by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3465 - Add a spelling correction by [@​fxlb](https://github.com/fxlb) in codespell-project/codespell#3533 - Move `hom` to code dictionary by [@​skangas](https://github.com/skangas) in codespell-project/codespell#3490 - Add `realtd->related`, `prediced->predicted` by [@​janosh](https://github.com/janosh) in codespell-project/codespell#3536 - Handle CTRL+C by showing a better message by [@​mwtoews](https://github.com/mwtoews) in codespell-project/codespell#3511 - Move crate->create to code dictionary by [@​luzpaz](https://github.com/luzpaz) in codespell-project/codespell#3537 - More typos by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3535 - Add cirumvent -> circumvent suggestion by [@​algonell](https://github.com/algonell) in codespell-project/codespell#3540 - More typos found in numpy by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3541 - Add spelling correction for appliance and variants. by [@​cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3544 - Workaround for Python issue by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3546 - Partially undo [`293bec1`](codespell-project/codespell@293bec1) / [#​3465](codespell-project/codespell#3465) by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3548 - master → main by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3555 - Add a spelling correction by [@​fxlb](https://github.com/fxlb) in codespell-project/codespell#3552 - Add spelling corrections for remote and variants. by [@​cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3551 - Add spelling correction for revert and variants. by [@​cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3553 - workdlow->workflow by [@​peterjc](https://github.com/peterjc) in codespell-project/codespell#3556 - More typos found in Scipy by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3547 - Update ruff settings by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3558 - Improve config file documentation in README by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3495 - Support Python 3.13 by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3560 - feat: add typo for override and overridden by [@​IndexSeek](https://github.com/IndexSeek) in codespell-project/codespell#3564 - feat: add strring entry for string and stirring by [@​IndexSeek](https://github.com/IndexSeek) in codespell-project/codespell#3565 - Add spelling correction for credential and variant. by [@​cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3567 - Typo from filesystem_spec by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3559 - Add zarr as a fix for zar. by [@​yarikoptic](https://github.com/yarikoptic) in codespell-project/codespell#3568 - Add multiple spellings by [@​mdeweerd](https://github.com/mdeweerd) in codespell-project/codespell#3569 - acceleratored->accelerated by [@​SpookyYomo](https://github.com/SpookyYomo) in codespell-project/codespell#3571 - Add correction for seens->seems, seen, scenes, by [@​cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3572 - Add generaml->general spelling correction. by [@​cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3573 - Add forach->foreach, orach, spelling correction by [@​cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3576 - Add spelling correction for leadin. by [@​cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3578 - Minor typo fix in README by [@​DanielYang59](https://github.com/DanielYang59) in codespell-project/codespell#3580 - Add pauload->payload and friend by [@​peternewman](https://github.com/peternewman) in codespell-project/codespell#3581 - These typos do not belong to code, do they? by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3470 - Add spelling correction for "agos". by [@​cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3586 - If `writeable` is OK, so is `overwriteable` by [@​DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3593 - Add `atfer`->`after` and variations by [@​fishilico](https://github.com/fishilico) in codespell-project/codespell#3598 - Add poduce->produce and friends by [@​peternewman](https://github.com/peternewman) in codespell-project/codespell#3599 - Add variations for correction: reurn->return by [@​jdufresne](https://github.com/jdufresne) in codespell-project/codespell#3600 - Add spelling correction for various variants of everything. by [@​cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3601 - Add "sems->seems, stems, semis, sens, seams," correction by [@​cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3603 - Add replacements for complasance and complisance by [@​TheGiraffe3](https://github.com/TheGiraffe3) in codespell-project/codespell#3597 - Add typos found in software projects by [@​mwtoews](https://github.com/mwtoews) in codespell-project/codespell#3595 - Add distinghish->distinguish and variations by [@​fishilico](https://github.com/fishilico) in codespell-project/codespell#3604 - docs: typo in an example by [@​12rambau](https://github.com/12rambau) in codespell-project/codespell#3610 - Add typos found in various software projects by [@​luzpaz](https://github.com/luzpaz) in codespell-project/codespell#3612 - Add spelling correction for denila and variant. by [@​cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3616 - Remove socioeconomic entries by [@​isaak654](https://github.com/isaak654) in codespell-project/codespell#3353 - Run pytest GitHub Action on an ARM processor by [@​cclauss](https://github.com/cclauss) in codespell-project/codespell#3619 #### New Contributors - [@​nthykier](https://github.com/nthykier) made their first contribution in codespell-project/codespell#3434 - [@​mtelka](https://github.com/mtelka) made their first contribution in codespell-project/codespell#3435 - [@​fkmy](https://github.com/fkmy) made their first contribution in codespell-project/codespell#3464 - [@​oddhack](https://github.com/oddhack) made their first contribution in codespell-project/codespell#3478 - [@​spaette](https://github.com/spaette) made their first contribution in codespell-project/codespell#3479 - [@​slitvackwinkler](https://github.com/slitvackwinkler) made their first contribution in codespell-project/codespell#3485 - [@​julian-smith-artifex-com](https://github.com/julian-smith-artifex-com) made their first contribution in codespell-project/codespell#3476 - [@​Runtemund](https://github.com/Runtemund) made their first contribution in codespell-project/codespell#3513 - [@​corneliusroemer](https://github.com/corneliusroemer) made their first contribution in codespell-project/codespell#3523 - [@​mwtoews](https://github.com/mwtoews) made their first contribution in codespell-project/codespell#3511 - [@​algonell](https://github.com/algonell) made their first contribution in codespell-project/codespell#3540 - [@​peterjc](https://github.com/peterjc) made their first contribution in codespell-project/codespell#3556 - [@​SpookyYomo](https://github.com/SpookyYomo) made their first contribution in codespell-project/codespell#3571 - [@​DanielYang59](https://github.com/DanielYang59) made their first contribution in codespell-project/codespell#3580 - [@​TheGiraffe3](https://github.com/TheGiraffe3) made their first contribution in codespell-project/codespell#3597 **Full Changelog**: codespell-project/codespell@v2.3.0...v2.4.0 </details> --- ### Configuration 📅 **Schedule**: Branch creation - "* 0-3 * * *" (UTC), Automerge - "* 0-3 * * *" (UTC). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xMDYuMCIsInVwZGF0ZWRJblZlciI6IjM5LjEwNi4wIiwidGFyZ2V0QnJhbmNoIjoiZm9yZ2VqbyIsImxhYmVscyI6WyJkZXBlbmRlbmN5LXVwZ3JhZGUiLCJ0ZXN0L25vdC1uZWVkZWQiXX0=--> Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/6652 Reviewed-by: Gusted <[email protected]> Co-authored-by: Renovate Bot <[email protected]> Co-committed-by: Renovate Bot <[email protected]>
Many correctly spelled words could be a misspelling for another word, but those aren't listed. Forgetting the "n" in "many" gives "may", but "may" isn't listed as a misspelled word, for example. I thought the idea was to just list ones that are not valid words. "hep" is a major branch of physics that has had a lot of impact in programming (at least in C++ and Python), and an English word as well. HEP is used in 40+ repos across the Scikit-HEP project and in other HEP related projects, as well as in lots of Python packaging repos, etc; all told I'm likely to have to ignore this now in 60+ repos. That's why I was wondering if it could have just been left out, as older versions didn't have this. (I'd also be fine if "hep" was invalid but "HEP" was valid, by the way, which doesn't help the English usage but does fix the HEP case. |
I'd like to emphasize how common "HEP" is as a term. From my physics background, I can say that it's one of the major subdivisions of the field: 4 of the 13 physics categories in the preprint archive (https://arxiv.org) are "High Energy Physics," but the project started at Los Alamos, so that's a bias. Are there some quantitative ways to say how common this is? Well, in Google Books, "HEP" closely tracks its meaning, "High Energy Physics", and although the peak use of that term was in the late 20th century, it's on the same scale as phrases like "quantum mechanics." (The term is giving way to "particle physics," since that's more inclusive of experiments that don't use colliders, not literally "high energy.") In Google searches, "HEP" is much more common than "quantum mechanics" (a lot to type!), but within a factor of 3 of just the word "quantum" by itself. Maybe more important for typos, specifically "hep" when "help" was intended, is that "hep" is often included in filenames in GitHub. A typo in a filename is much less likely (to survive scrutiny) than a typo in the text itself. In GitHub code searches,
which is 4%. There are also a lot of repositories with "hep" in their name (1% compared to "help"). For a comparison between a subdivision of a scientific discipline versus a common English word, that's a lot! I bet there are ways of determining how often people type "hep" when they mean "help" (spellchecker logs?). How rare is this "rare typo"? What's the ROC curve threshold for our tolerance of type 1 and type 2 errors on this? Because it would be quite annoying to have to patch all HEP-related software with a rule that counters this rule. |
Note that I am not the maintainer. @larsoner Do not hesitate to jump in. I am fully aware that HEP is a major field of physics. Yet, for someone outside of physics it's yet another acronym and I wouldn't be surprised if any (short) typo was an acronym. The rule of thumb I've been using so far is to:
That's perhaps an excessive simplification, but words that are not part of these dictionaries are not taken into account. In this specific case:
Any way, I am not hell-bent to refuse this change - and again I am not the maintainer. I just don't feel like taking the decision. Why not submit a PR? Someone else, perhaps @larsoner, will have a look. |
I'm okay with reverting the addition to At some point maybe we don't want @jpivarski want to open a PR? I could look merge and cut a quick release |
Random slightly drive-by thoughts:
I appreciate it's a balance, but personally my worry would be if you disable that dictionary by default, most people will just run with the defaults and don't get the benefit, whereas it's pretty trivial to remove the dictionary if you find you're getting too many false positives (does it need better signposting in the log messages though)? FWIW on my domain specific typos, I've just added them to an ignore list (I think thead and wronly have subsequently been moved to code given they impact anyone there): |
It's also in Collins and Merriam-Webster. (I don't have access to Macquarie.) |
Note that SCOWL (and friends) include a scale of rarity for words, extracted from Google Ngram viewer: By the way, this screen capture shows that |
Found in Emacs.