-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add configuration for running codespell
locally
#2151
Add configuration for running codespell
locally
#2151
Conversation
@hugovk This is still a draft, it adds configuration files but no Makefile target - yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding it as a hard requirement which gets installed on all platforms, CIs and locally, (regardless if the user actually wants to perform spell checking), and in the same environment as the build (potential for dep conflicts), I strongly suggest instead just adding it as Pre-Commit check with hook-stage: manual
, so it only runs on-demand, and run it via pre-commit run codespell
(perhaps with an alias in the Makefile
, for completeness)?
Pre-commit will then take care of installing it if/when it is needed in its own isolated environment with a consistent pinned version for all devs (with easy updates via pre-commit autoupdate
), and it works cross-platform unlike relying solely on the Makefile, while still being entirely optional (and not installed for all users and CIs by default).
Then the command wouldn't be lint:
pre-commit --version > /dev/null || python3 -m pip install pre-commit
pre-commit run --all-files
+spellcheck:
+ pre-commit --version > /dev/null || python3 -m pip install pre-commit
+ pre-commit run --hook-stage manual codespell |
Add codespell configuration files: * `.codespellrc` is the root configuration file * `.codespell/ignore-words.txt` list of words that are false positives * `.codespell/exclude-file.txt` list of lines that contain false positives Add a pre-commit hook, confined to the manual stage. Also add a Makefile target that will call the pre-commit hook.
Oops, forgot that Pre-Commit will only check the default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor comments; otherwise LGTM if the maintainers don't mind adding this.
Co-authored-by: CAM Gerlach <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, for what its worth. Thanks @DimitriPapadopoulos !
Seems reasonable to add this, to make it much easier and less false-positive-prone for PEP authors to check their PEPs locally if they choose to do so, using our existing infra (and allow potentially adding it to the CI in the future for new PEPs if we decide to do so, which I'm willing to take responsibility for maintaining if so). @python/pep-editors Any objections? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with adding this as a non-blocking, optional step.
I see the value to an extent, my concern is having to additionally maintain a list of false positives -- and (at least to me) clarity of expression is much more important than correct spelling. 1 Also there are a number of different valid dialects for English - American and British, but also Australian, Candadian, South African, various other creoles, etc. I suppose I'm -0. A Footnotes
|
(footnote to previous comment) An example: In 676 "focussed" was changed by someone doing a mass spellcheck PR to "focused" -- the double "s" variant is a valid spelling in British English, and was the author's choice of wording. I worry a little that having this in the repo "blesses" that type of mass spellcheck PR, which create quite a bit of noise but, in my opinion, don't add much. I won't block anything, but just want to be the grumpy old man here, I guess! A |
I'm okay with adding as an optional make command, but against adding to the CI: #2075 (comment) |
Yes, though I don't see why we can't have both, at least if an author chooses to run it or a PEP editor uses it as a tool for editorial suggestions. I'm not condoning mass typo correction PRs; they've already caused enough trouble to my own PEPs.
To note,
To note, Oxford, the canonical BrE dictionary, lists the "focussed" spelling as "irregular" and as I gather from various sources, "focused" is still more common and seen as more "correct" in BrE usage than "focussed", which is flatly incorrect in AmE, so that change isn't necessarily wrong per say. I would certainly agree that it wasn't worth changing, though, and do not condone mass spelling correction PRs in general—though I hope use of this tool by writers and/or editors can help avoid them.
I don't think so; for better or for worse those PRs were pretty much already done and accepted, so aside from new PEPs (which we or the authors can manually check) there isn't going to be much to correct (thus less motivation to do so), and just having the infra and config file to make that easy and silence false positives, I don't think, encourages that more—and we can always simply accept or reject such PRs on their merits if and when they are submitted, as we always have. I don't think we should not offer authors and editors the opportunity to use a tool if they choose to, just because of the possibility that it may increase the chance that a few individuals may submit PRs making trivial changes that we can simply decline to accept.
I agree; in addition to the things you mention (particularly usability for authors), there's also the fact that at least currently, some authors still commit directly to the repo without letting our builds and checks run first to ensure they don't break everything, so typos could easily slip through anyway without any review, human or machine, and cause later check runs to fail. I think it would be a good idea to change this, especially if and when we move to continuous deployment via the new build system, but that's a separate discussion. As much as the perfectionist in me would love to have it, it is incompatible with the practical reality of this repo. |
Note that codespell does not (or is not supposed to) enforce American English over British English by default – unless you explicitly instruct it to use Some codespell suggestions might be debatable, for example enforcing
I understand there is “usually” no doubling when the preceding vowel is unstressed, which means that That said, I fully understand that enforcing more standard and readable English from the CI might be a bad idea, at least in some contexts, or as a first step, or as long as spelling tools are not perfect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"focussed" was perhaps a poor example, and just something I noticed.
Bigger picture, I think I've ameliorated to Hugo & Jelle's view.
Could you add some text to https://github.com/python/peps/blob/main/CONTRIBUTING.rst, briefly explaining how to run codespell
(with and without make
), but more importantly saying something along the lines of Guido's post here that we generally don't want mass spellcheck PRs:
I am against fixing typos unless they reduce readability. Most typos don't hamper readability much less than many other "defects" in our documentation, like poor wording, unclear examples, and outright ambiguities. We receive way fewer fixes for those quality issues. To me this is proof that a lot of contributors prefer the quick win of a typo fix over something that requires thinking, and I don't want to encourage that sort of thing.
If you ping me when you've pushed that text I'll review & happy to merge after that.
A
Thank you, i'll look into Otherwise, yes, typos are only the tip of the iceberg. Yes, they are a quick win because fixing typos can be partially automated (current spell checking tools do require a human checks all suggestions), but they're still a win and a first step. Fixing typos is by no way a complete solution, but you have to start somewhere. That said, when fixing typos in code, I usually also fix missing words or poor wording that I notice when reviewing spellchecker suggestions, and I encourage anyone to do the same. Also, the kind of “negative” feedback on readability issues I see here does not encourage to spend time on other qualities issues like poor wording, unclear examples, and outright ambiguities. Don't misunderstand me, I value and appreciate all feedback. By “negative” I just mean “opposed to fixing typos” and I understand and value all the arguments. However, spending time on improving documentation suddenly looks like a risky business, not worth investing time in it because the risk of rejected changes is too high. Finally, typos do hamper readability as far as I am concerned – we are all different, including in the way we read. |
I do think if the typo significantly impairs understanding of the word, including for a non-native speaker, then it motivates a change, but that's on a case by case basis. Typically, the types of typos detected by In any case, as I understand you've already corrected most of these typos already, so the work is more or less done there (and the cost paid), and for my part, with your improvements here, I can try to run codespell on newly submitted PRs to catch typos as part of my submission workflow and add suggestions to fix any valid ones, to avoid new ones creeping in; and authors and other PEP editors can be encouraged to do the same—and of course, you can help us with that too on any PRs you see. One potential problem is our current workflow that allows direct commits to I don't want to discourage you from contributing; we welcome the more substantive fixes that Guido mentioned, as well as your help reviewing newly submitted and updated PEPs for all manner of issues, large and small. There are other more mechanical things you could help with that would (at least in my view) have much more value—for instance, one of my biggest frustrations as a PEP reader is the PEP's header fields often missing important information, most especially a link to the current/latest mailing list or Discourse discussion of the PEP, which means that I need to manually go off searching for it, if I find the right discussion at all. We'd want to open an issue first and discuss it, but adding the right links to PEPs missing such would be (IMO) a valuable endeavor. Likewise, if you find examples that are unclear, missing, buggy, etc., open an issue, we can all discuss it and assuming it sounds good, you can go ahead with a PR—for such substantive changes, opening an issue helps ensure your time is not wasted if we do happen to take a different view of it. In any case, I think we all agree that merging this PR with the basic infra for authors and editors to check specific PEPs themselves is a reasonable step. Right now we don't actually formally document any of the linting workflow at all, as far as I'm aware, which is something I'm responsible for. As such, I think it would be best if I write up a concise but sufficiently detailed section of the contributing guide mentioning how to do so (I already have a basic template for such in mind, based on the many related such sections I've added to other contributing guides in the past) as well as a short section clarifying when and what changes generally are and aren't appropriate to propose to Final PEPs along those lines. I can either do this as a commit on your PR here, or (my preference) a separate followup PR. |
OK, given documentation for this will be added in a followup PR (taking advantage of @CAM-Gerlach's good nature), I'll merge now. A |
codespell
locally
Bitten by only adding Sphinx runs to CI recently. I'll push a fix. A |
Merely add codespell configuration files, following #2075 (comment):
.codespellrc
is the root configuration file.codespell/ignore-words.txt
list of words that are false positives.codespell/exclude-file.txt
list of lines that contain false positivesUnlike the initial PR #2075, this PR focuses on configuration and does not attempt to add a spelling check to the CI.
A few notes:
Codespell checks the current directory for a file named
setup.cfg
or.codespellrc
. I have opened Addtox.ini
to the list of default config files codespell-project/codespell#2087, but the request hasn't been accepted yet. In the meantime, in the absence of asetup.cfg
file, I have added a new hidden configuration file.codespellrc
One of the caveats is the necessity to maintain a list of false positives. There are not too many of them, but contributors might have to add a new false positive to the list from time to time.
The location of the list of words that are false positives is:
.codespell/ignore_words.txt
The location of the list of entire lines that are false positives is:
.codespell/exclude-file.txt
Some false positives that appear only in URIs are listed after option
uri-ignore-words-list
in the main configuration file:.codespellrc
The default dictionaries used by codespell are
clear
andrare
. Whilerare
might catch a few additional typos, it raises a few additional false positives, such ascomplies
,therefor
andtheses
.