Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rare typo hep->heap, help, #3461

Merged
merged 1 commit into from
Jun 17, 2024

Conversation

skangas
Copy link
Collaborator

@skangas skangas commented Jun 16, 2024

Found in Emacs.

@skangas skangas requested a review from peternewman as a code owner June 16, 2024 15:31
@skangas skangas added the dictionary Changes to the dictionary label Jun 16, 2024
@DimitriPapadopoulos DimitriPapadopoulos merged commit 85cfdef into codespell-project:master Jun 17, 2024
14 checks passed
@skangas skangas deleted the hep branch June 17, 2024 18:46
@henryiii
Copy link

henryiii commented Jan 22, 2025

Ahh! HEP stands for High Energy Physics, and is extremely common, such as in https://scikit-hep.org, https://iris-hep.org, etc. The Scientific-Python developer guidelines were originally the Scikit-HEP developer pages, etc. Just noticed this in yesterday's release.

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Jan 23, 2025

Actually, hep is also an entry in the OED.

I'm not sure how to handle this. Perhaps the rare dictionary should not be selected by default.

@henryiii
Copy link

How about removing this, and just leaving all the other ones (like heping, which is not valid for either HEP or "hep" the word)?

@DimitriPapadopoulos
Copy link
Collaborator

It could lead to false negatives in other contexts.

humaton pushed a commit to fedora-infra/forgejo-deployment-images that referenced this pull request Jan 24, 2025
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [codespell](https://github.com/codespell-project/codespell) | dev | minor | `2.3.0` -> `2.4.0` |

---

### Release Notes

<details>
<summary>codespell-project/codespell (codespell)</summary>

### [`v2.4.0`](https://github.com/codespell-project/codespell/releases/tag/v2.4.0)

[Compare Source](codespell-project/codespell@v2.3.0...v2.4.0)

<!-- Release notes generated using configuration in .github/release.yml at main -->

#### What's Changed

-   Exclude bots from generated release notes by [@&#8203;hugovk](https://github.com/hugovk) in codespell-project/codespell#3432
-   Refactor: Move some code to new files for reuse by [@&#8203;nthykier](https://github.com/nthykier) in codespell-project/codespell#3434
-   Add `equipmnet->equipment` by [@&#8203;korverdev](https://github.com/korverdev) in codespell-project/codespell#3438
-   Set better project description by [@&#8203;mtelka](https://github.com/mtelka) in codespell-project/codespell#3435
-   Additional en-GB → en-US entries by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3058
-   Consistent error messages by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3440
-   Add 'driven' as 'drivin' variant by [@&#8203;korverdev](https://github.com/korverdev) in codespell-project/codespell#3441
-   More typos by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3439
-   Add reusing misspelling and variants by [@&#8203;korverdev](https://github.com/korverdev) in codespell-project/codespell#3445
-   Add typos found in Emacs and elsewhere by [@&#8203;skangas](https://github.com/skangas) in codespell-project/codespell#3447
-   MAINT: Fix codecov by [@&#8203;larsoner](https://github.com/larsoner) in codespell-project/codespell#3451
-   Add typos found in GNU Guile by [@&#8203;skangas](https://github.com/skangas) in codespell-project/codespell#3448
-   Add corrections from Aspell (fix [#&#8203;3356](codespell-project/codespell#3356)) by [@&#8203;skangas](https://github.com/skangas) in codespell-project/codespell#3453
-   Add entries to dictionary_informal.txt by [@&#8203;skangas](https://github.com/skangas) in codespell-project/codespell#3456
-   Add rare typo `lien->line` by [@&#8203;skangas](https://github.com/skangas) in codespell-project/codespell#3460
-   Add rare typo `firs->first` by [@&#8203;skangas](https://github.com/skangas) in codespell-project/codespell#3459
-   Add rare typo `hep->heap, help,` by [@&#8203;skangas](https://github.com/skangas) in codespell-project/codespell#3461
-   Add rare typo `brunch->branch` by [@&#8203;skangas](https://github.com/skangas) in codespell-project/codespell#3458
-   Add corrections from `typos` dictionary (A1) by [@&#8203;skangas](https://github.com/skangas) in codespell-project/codespell#3450
-   Add timestmp->timestamp and its variations by [@&#8203;fkmy](https://github.com/fkmy) in codespell-project/codespell#3464
-   Add .venv to .gitignore by [@&#8203;skangas](https://github.com/skangas) in codespell-project/codespell#3466
-   Only accept documented choices after `-i` and `-q` by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3344
-   Move assertIn to the code dictionary as it's a Python test function by [@&#8203;peternewman](https://github.com/peternewman) in codespell-project/codespell#3469
-   Add some more typos by [@&#8203;skangas](https://github.com/skangas) in codespell-project/codespell#3468
-   Add some typos from Emacs by [@&#8203;skangas](https://github.com/skangas) in codespell-project/codespell#3471
-   Add corrections from `typos` dictionary (A2) by [@&#8203;skangas](https://github.com/skangas) in codespell-project/codespell#3454
-   Add variations for words starting with `non-` by [@&#8203;skangas](https://github.com/skangas) in codespell-project/codespell#3467
-   Update "Using a config file" README entry  by [@&#8203;oddhack](https://github.com/oddhack) in codespell-project/codespell#3478
-   Add two choices for verision typo fix by [@&#8203;yarikoptic](https://github.com/yarikoptic) in codespell-project/codespell#3252
-   fix typo by [@&#8203;spaette](https://github.com/spaette) in codespell-project/codespell#3479
-   \[pre-commit.ci] pre-commit manual update (ruff 0.5.0) by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3481
-   Add trusthworth(y|iness)->trustworth(y|iness) correction. by [@&#8203;cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3482
-   Add thrustworth(y|iness)->trustworth(y|iness). by [@&#8203;cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3483
-   New typos by [@&#8203;gforcada](https://github.com/gforcada) in codespell-project/codespell#3484
-   add enrol->enroll to en-GB to en-US dictionary by [@&#8203;slitvackwinkler](https://github.com/slitvackwinkler) in codespell-project/codespell#3485
-   Add --ignore-multiline-regex option. by [@&#8203;julian-smith-artifex-com](https://github.com/julian-smith-artifex-com) in codespell-project/codespell#3476
-   Add spelling correction for separately. by [@&#8203;cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3487
-   Start testing with Python 3.13 by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3488
-   Missing typo in dictionary by [@&#8203;matlupi](https://github.com/matlupi) in codespell-project/codespell#3497
-   Add enterpris->enterprise spelling correction. by [@&#8203;cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3498
-   Add spelling correction for proir and variant. by [@&#8203;cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3499
-   fix issue [#&#8203;3220](codespell-project/codespell#3220): interactive model & write-changes by [@&#8203;MercuryDemo](https://github.com/MercuryDemo) in codespell-project/codespell#3340
-   infastructure typo (15.6k hits on github) by [@&#8203;yarikoptic](https://github.com/yarikoptic) in codespell-project/codespell#3501
-   Add several spelling corrections by [@&#8203;luzpaz](https://github.com/luzpaz) in codespell-project/codespell#3500
-   Add "releaseds->released, releases," spelling correction by [@&#8203;cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3503
-   Several spelling suggestions by [@&#8203;mdeweerd](https://github.com/mdeweerd) in codespell-project/codespell#3504
-   Add favilitate->facilitate and its variations by [@&#8203;luzpaz](https://github.com/luzpaz) in codespell-project/codespell#3505
-   Add seemd -> seemed by [@&#8203;fishilico](https://github.com/fishilico) in codespell-project/codespell#3508
-   feat: add typo spelling for capabilities by [@&#8203;IndexSeek](https://github.com/IndexSeek) in codespell-project/codespell#3507
-   entirerly -> entirely by [@&#8203;matlupi](https://github.com/matlupi) in codespell-project/codespell#3512
-   Add stuty -> study and variations by [@&#8203;fishilico](https://github.com/fishilico) in codespell-project/codespell#3514
-   readibly->readably by [@&#8203;claydugo](https://github.com/claydugo) in codespell-project/codespell#3518
-   Add clapse->collapse to dictionary.txt by [@&#8203;Runtemund](https://github.com/Runtemund) in codespell-project/codespell#3513
-   fix(rare): remove loath->loathe, as loath is as common as loathe by [@&#8203;corneliusroemer](https://github.com/corneliusroemer) in codespell-project/codespell#3523
-   Add variations of 'symetriy' typo by [@&#8203;luzpaz](https://github.com/luzpaz) in codespell-project/codespell#3528
-   Add distriute->distribute (and variations) to dictionary.txt by [@&#8203;corneliusroemer](https://github.com/corneliusroemer) in codespell-project/codespell#3517
-   Some extra spelling suggestions for the dictionnary (aumatically, ...) by [@&#8203;mdeweerd](https://github.com/mdeweerd) in codespell-project/codespell#3516
-   More typos by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3465
-   Add a spelling correction by [@&#8203;fxlb](https://github.com/fxlb) in codespell-project/codespell#3533
-   Move `hom` to code dictionary by [@&#8203;skangas](https://github.com/skangas) in codespell-project/codespell#3490
-   Add `realtd->related`, `prediced->predicted` by [@&#8203;janosh](https://github.com/janosh) in codespell-project/codespell#3536
-   Handle CTRL+C by showing a better message by [@&#8203;mwtoews](https://github.com/mwtoews) in codespell-project/codespell#3511
-   Move crate->create to code dictionary by [@&#8203;luzpaz](https://github.com/luzpaz) in codespell-project/codespell#3537
-   More typos by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3535
-   Add cirumvent -> circumvent suggestion by [@&#8203;algonell](https://github.com/algonell) in codespell-project/codespell#3540
-   More typos found in numpy by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3541
-   Add spelling correction for appliance and variants. by [@&#8203;cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3544
-   Workaround for Python issue by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3546
-   Partially undo [`293bec1`](codespell-project/codespell@293bec1) / [#&#8203;3465](codespell-project/codespell#3465) by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3548
-   master → main by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3555
-   Add a spelling correction by [@&#8203;fxlb](https://github.com/fxlb) in codespell-project/codespell#3552
-   Add spelling corrections for remote and variants. by [@&#8203;cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3551
-   Add spelling correction for revert and variants. by [@&#8203;cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3553
-   workdlow->workflow by [@&#8203;peterjc](https://github.com/peterjc) in codespell-project/codespell#3556
-   More typos found in Scipy by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3547
-   Update ruff settings by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3558
-   Improve config file documentation in README by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3495
-   Support Python 3.13 by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3560
-   feat: add typo for override and overridden by [@&#8203;IndexSeek](https://github.com/IndexSeek) in codespell-project/codespell#3564
-   feat: add strring entry for string and stirring by [@&#8203;IndexSeek](https://github.com/IndexSeek) in codespell-project/codespell#3565
-   Add spelling correction for credential and variant. by [@&#8203;cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3567
-   Typo from filesystem_spec by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3559
-   Add zarr as a fix for zar. by [@&#8203;yarikoptic](https://github.com/yarikoptic) in codespell-project/codespell#3568
-   Add multiple spellings by [@&#8203;mdeweerd](https://github.com/mdeweerd) in codespell-project/codespell#3569
-   acceleratored->accelerated by [@&#8203;SpookyYomo](https://github.com/SpookyYomo) in codespell-project/codespell#3571
-   Add correction for seens->seems, seen, scenes, by [@&#8203;cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3572
-   Add generaml->general spelling correction. by [@&#8203;cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3573
-   Add forach->foreach, orach, spelling correction by [@&#8203;cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3576
-   Add spelling correction for leadin. by [@&#8203;cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3578
-   Minor typo fix in README  by [@&#8203;DanielYang59](https://github.com/DanielYang59) in codespell-project/codespell#3580
-   Add pauload->payload and friend by [@&#8203;peternewman](https://github.com/peternewman) in codespell-project/codespell#3581
-   These typos do not belong to code, do they? by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3470
-   Add spelling correction for "agos". by [@&#8203;cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3586
-   If `writeable` is OK, so is `overwriteable` by [@&#8203;DimitriPapadopoulos](https://github.com/DimitriPapadopoulos) in codespell-project/codespell#3593
-   Add `atfer`->`after` and variations by [@&#8203;fishilico](https://github.com/fishilico) in codespell-project/codespell#3598
-   Add poduce->produce and friends by [@&#8203;peternewman](https://github.com/peternewman) in codespell-project/codespell#3599
-   Add variations for correction: reurn->return by [@&#8203;jdufresne](https://github.com/jdufresne) in codespell-project/codespell#3600
-   Add spelling correction for various variants of everything. by [@&#8203;cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3601
-   Add "sems->seems, stems, semis, sens, seams," correction by [@&#8203;cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3603
-   Add replacements for complasance and complisance by [@&#8203;TheGiraffe3](https://github.com/TheGiraffe3) in codespell-project/codespell#3597
-   Add typos found in software projects by [@&#8203;mwtoews](https://github.com/mwtoews) in codespell-project/codespell#3595
-   Add distinghish->distinguish and variations by [@&#8203;fishilico](https://github.com/fishilico) in codespell-project/codespell#3604
-   docs: typo in an example by [@&#8203;12rambau](https://github.com/12rambau) in codespell-project/codespell#3610
-   Add typos found in various software projects by [@&#8203;luzpaz](https://github.com/luzpaz) in codespell-project/codespell#3612
-   Add spelling correction for denila and variant. by [@&#8203;cfi-gb](https://github.com/cfi-gb) in codespell-project/codespell#3616
-   Remove socioeconomic entries by [@&#8203;isaak654](https://github.com/isaak654) in codespell-project/codespell#3353
-   Run pytest GitHub Action on an ARM processor by [@&#8203;cclauss](https://github.com/cclauss) in codespell-project/codespell#3619

#### New Contributors

-   [@&#8203;nthykier](https://github.com/nthykier) made their first contribution in codespell-project/codespell#3434
-   [@&#8203;mtelka](https://github.com/mtelka) made their first contribution in codespell-project/codespell#3435
-   [@&#8203;fkmy](https://github.com/fkmy) made their first contribution in codespell-project/codespell#3464
-   [@&#8203;oddhack](https://github.com/oddhack) made their first contribution in codespell-project/codespell#3478
-   [@&#8203;spaette](https://github.com/spaette) made their first contribution in codespell-project/codespell#3479
-   [@&#8203;slitvackwinkler](https://github.com/slitvackwinkler) made their first contribution in codespell-project/codespell#3485
-   [@&#8203;julian-smith-artifex-com](https://github.com/julian-smith-artifex-com) made their first contribution in codespell-project/codespell#3476
-   [@&#8203;Runtemund](https://github.com/Runtemund) made their first contribution in codespell-project/codespell#3513
-   [@&#8203;corneliusroemer](https://github.com/corneliusroemer) made their first contribution in codespell-project/codespell#3523
-   [@&#8203;mwtoews](https://github.com/mwtoews) made their first contribution in codespell-project/codespell#3511
-   [@&#8203;algonell](https://github.com/algonell) made their first contribution in codespell-project/codespell#3540
-   [@&#8203;peterjc](https://github.com/peterjc) made their first contribution in codespell-project/codespell#3556
-   [@&#8203;SpookyYomo](https://github.com/SpookyYomo) made their first contribution in codespell-project/codespell#3571
-   [@&#8203;DanielYang59](https://github.com/DanielYang59) made their first contribution in codespell-project/codespell#3580
-   [@&#8203;TheGiraffe3](https://github.com/TheGiraffe3) made their first contribution in codespell-project/codespell#3597

**Full Changelog**: codespell-project/codespell@v2.3.0...v2.4.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "* 0-3 * * *" (UTC), Automerge - "* 0-3 * * *" (UTC).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xMDYuMCIsInVwZGF0ZWRJblZlciI6IjM5LjEwNi4wIiwidGFyZ2V0QnJhbmNoIjoiZm9yZ2VqbyIsImxhYmVscyI6WyJkZXBlbmRlbmN5LXVwZ3JhZGUiLCJ0ZXN0L25vdC1uZWVkZWQiXX0=-->

Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/6652
Reviewed-by: Gusted <[email protected]>
Co-authored-by: Renovate Bot <[email protected]>
Co-committed-by: Renovate Bot <[email protected]>
@henryiii
Copy link

henryiii commented Jan 27, 2025

Many correctly spelled words could be a misspelling for another word, but those aren't listed. Forgetting the "n" in "many" gives "may", but "may" isn't listed as a misspelled word, for example. I thought the idea was to just list ones that are not valid words. "hep" is a major branch of physics that has had a lot of impact in programming (at least in C++ and Python), and an English word as well. HEP is used in 40+ repos across the Scikit-HEP project and in other HEP related projects, as well as in lots of Python packaging repos, etc; all told I'm likely to have to ignore this now in 60+ repos. That's why I was wondering if it could have just been left out, as older versions didn't have this.

(I'd also be fine if "hep" was invalid but "HEP" was valid, by the way, which doesn't help the English usage but does fix the HEP case. At least some of the time - not scikit-hep.org links or links to scikit-hep repos, but at least the normal usages. Nevermind, it ignores scikit-hep, only triggering on standalone hep, that helps, especially in the lower-case case.)

@jpivarski
Copy link
Contributor

jpivarski commented Jan 28, 2025

I'd like to emphasize how common "HEP" is as a term. From my physics background, I can say that it's one of the major subdivisions of the field: 4 of the 13 physics categories in the preprint archive (https://arxiv.org) are "High Energy Physics," but the project started at Los Alamos, so that's a bias.

Are there some quantitative ways to say how common this is? Well, in Google Books, "HEP" closely tracks its meaning, "High Energy Physics", and although the peak use of that term was in the late 20th century, it's on the same scale as phrases like "quantum mechanics." (The term is giving way to "particle physics," since that's more inclusive of experiments that don't use colliders, not literally "high energy.")

image

In Google searches, "HEP" is much more common than "quantum mechanics" (a lot to type!), but within a factor of 3 of just the word "quantum" by itself.

image

Maybe more important for typos, specifically "hep" when "help" was intended, is that "hep" is often included in filenames in GitHub. A typo in a filename is much less likely (to survive scrutiny) than a typo in the text itself. In GitHub code searches,

which is 4%. There are also a lot of repositories with "hep" in their name (1% compared to "help"). For a comparison between a subdivision of a scientific discipline versus a common English word, that's a lot!

I bet there are ways of determining how often people type "hep" when they mean "help" (spellchecker logs?). How rare is this "rare typo"? What's the ROC curve threshold for our tolerance of type 1 and type 2 errors on this? Because it would be quite annoying to have to patch all HEP-related software with a rule that counters this rule.

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Jan 28, 2025

Note that I am not the maintainer. @larsoner Do not hesitate to jump in.

I am fully aware that HEP is a major field of physics. Yet, for someone outside of physics it's yet another acronym and I wouldn't be surprised if any (short) typo was an acronym. The rule of thumb I've been using so far is to:

That's perhaps an excessive simplification, but words that are not part of these dictionaries are not taken into account. In this specific case:

  • Unless I am mistaken, except for the OED, hep is not a word.
  • I would argue that you need to add HEP to SCOWL (And Friends), the open source reference dictionary.
  • Nevertheless, since hep is in the OED, it has been added to the rare dictionary.
  • Again, I think the best solution would be to remove the rare dictionary from the default dictionaries.

Any way, I am not hell-bent to refuse this change - and again I am not the maintainer. I just don't feel like taking the decision. Why not submit a PR? Someone else, perhaps @larsoner, will have a look.

@larsoner
Copy link
Member

I'm okay with reverting the addition to rare. Seems like it'll be easier for people to add it to their own custom dict in the rare cases it's helpful.

At some point maybe we don't want rare in the defaults but that's a bigger change

@jpivarski want to open a PR? I could look merge and cut a quick release

@peternewman
Copy link
Collaborator

Random slightly drive-by thoughts:

  • Do we need a science dictionary, to mirror the code one ( https://github.com/codespell-project/codespell/blob/main/codespell_lib/data/dictionary_code.txt )? For science-specific conflicting typos like hep, which you could then skip the whole dictionary in a science context (I feel like there were a few others back in the day)
  • Should non-core dictionaries (rare etc) never auto-correct, even if they've only got one entry?
  • Back in the day I suggested splitting rare in half based on some criteria (e.g. which SCOWL list they were in) so you've got rare and really rare, where the latter may be enabled by default, but not the former.

I appreciate it's a balance, but personally my worry would be if you disable that dictionary by default, most people will just run with the defaults and don't get the benefit, whereas it's pretty trivial to remove the dictionary if you find you're getting too many false positives (does it need better signposting in the log messages though)?

FWIW on my domain specific typos, I've just added them to an ignore list (I think thead and wronly have subsequently been moved to code given they impact anyone there):
https://github.com/OpenLightingProject/ola/blob/master/.codespellignorewords

@hugovk
Copy link
Contributor

hugovk commented Jan 28, 2025

  • Unless I am mistaken, except for the OED, hep is not a word.

It's also in Collins and Merriam-Webster. (I don't have access to Macquarie.)

@DimitriPapadopoulos
Copy link
Collaborator

Note that SCOWL (and friends) include a scale of rarity for words, extracted from Google Ngram viewer:

hep

By the way, this screen capture shows that hep is in the en_GB-large dictionary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dictionary Changes to the dictionary
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants