Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate 'Created:' dates on CI #1809

Closed
wants to merge 2 commits into from

Conversation

hugovk
Copy link
Member

@hugovk hugovk commented Feb 9, 2021

Fixes #1803.

Also added to the make lint.

As we can see, 87 currently don't validate:

$ make lint
pre-commit --version > /dev/null || python3 -m pip install pre-commit
pre-commit run --all-files
rst ``code`` is two backticks............................................Passed
rst ``inline code`` next to normal text..................................Passed
Lint Created: dates......................................................Failed
- hook id: validate-created
- exit code: 25

pep-0542.txt:9:8: time data '10-February-2017' does not match format '%d-%b-%Y'
pep-0013.rst:7:8: time data '2018-12-16' does not match format '%d-%b-%Y'
pep-3155.txt:9:8: time data '2011-10-29' does not match format '%d-%b-%Y'
pep-0511.txt:9:8: time data '4-January-2016' does not match format '%d-%b-%Y'
pep-8011.rst:7:8: time data '2018-08-24' does not match format '%d-%b-%Y'
pep-0628.txt:9:8: time data '2011-06-28' does not match format '%d-%b-%Y'
pep-0254.txt:9:8: time data '18-June-2001' does not match format '%d-%b-%Y'
pep-8013.rst:7:8: time data '2018-09-14' does not match format '%d-%b-%Y'
pep-0510.txt:9:8: time data '4-January-2016' does not match format '%d-%b-%Y'
pep-0641.rst:11:8: time data '2020-10-20' does not match format '%d-%b-%Y'
pep-0554.rst:8:8: time data '2017-09-05' does not match format '%d-%b-%Y'
pep-0469.txt:9:8: time data '2014-04-18' does not match format '%d-%b-%Y'
pep-0426.txt:14:8: time data '30 Aug 2012' does not match format '%d-%b-%Y'
pep-8012.rst:7:8: time data '2018-10-03' does not match format '%d-%b-%Y'
pep-0564.rst:9:8: time data '16-October-2017' does not match format '%d-%b-%Y'
pep-0248.txt:10:8: time data '' does not match format '%d-%b-%Y'
pep-0615.rst:8:8: time data '2020-02-22' does not match format '%d-%b-%Y'
pep-0381.txt:9:8: time data '21-March-2009' does not match format '%d-%b-%Y'
pep-0421.txt:10:8: time data '26-April-2012' does not match format '%d-%b-%Y'
pep-0433.txt:9:8: time data '10-January-2013' does not match format '%d-%b-%Y'
pep-0424.txt:9:8: time data '14-July-2012' does not match format '%d-%b-%Y'
pep-0435.txt:11:8: time data '2013-02-23' does not match format '%d-%b-%Y'
pep-0440.txt:12:8: time data '18 Mar 2013' does not match format '%d-%b-%Y'
pep-0390.txt:11:8: time data '10-October-2009' does not match format '%d-%b-%Y'
pep-0207.txt:9:8: time data '' does not match format '%d-%b-%Y'
pep-0633.rst:10:8: time data '2020-09-02' does not match format '%d-%b-%Y'
pep-8014.rst:7:8: time data '2018-09-16' does not match format '%d-%b-%Y'
pep-8001.rst:19:8: time data '2018-08-24' does not match format '%d-%b-%Y'
pep-0411.txt:10:8: time data '2012-02-10' does not match format '%d-%b-%Y'
pep-0206.txt:9:8: time data '' does not match format '%d-%b-%Y'
pep-3150.txt:9:8: time data '2010-07-09' does not match format '%d-%b-%Y'
pep-0620.rst:7:8: time data '19-June-2020' does not match format '%d-%b-%Y'
pep-0418.txt:9:8: time data '26-March-2012' does not match format '%d-%b-%Y'
pep-0491.txt:10:8: time data '16 April 2015' does not match format '%d-%b-%Y'
pep-3147.txt:9:8: time data '2009-12-16' does not match format '%d-%b-%Y'
pep-0454.txt:10:8: time data '3-September-2013' does not match format '%d-%b-%Y'
pep-0416.txt:9:8: time data '29-February-2012' does not match format '%d-%b-%Y'
pep-0403.txt:9:8: time data '2011-10-13' does not match format '%d-%b-%Y'
pep-8000.rst:7:8: time data '2018-08-24' does not match format '%d-%b-%Y'
pep-0361.txt:9:8: time data '29-June-2006' does not match format '%d-%b-%Y'
pep-0404.txt:9:8: time data '2011-11-09' does not match format '%d-%b-%Y'
pep-8010.rst:7:8: time data '2018-08-24' does not match format '%d-%b-%Y'
pep-0617.rst:12:8: time data '24-March-2020' does not match format '%d-%b-%Y'
pep-3139.txt:9:8: time data '4-April-2008' does not match format '%d-%b-%Y'
pep-0490.txt:9:8: time data '25-March-2015' does not match format '%d-%b-%Y'
pep-0509.txt:9:8: time data '4-January-2016' does not match format '%d-%b-%Y'
pep-0540.txt:10:8: time data '5-January-2016' does not match format '%d-%b-%Y'
pep-0541.txt:11:8: time data '12-January-2017' does not match format '%d-%b-%Y'
pep-0524.txt:9:8: time data '20-June-2016' does not match format '%d-%b-%Y'
pep-8002.rst:9:8: time data '2018-08-24' does not match format '%d-%b-%Y'
pep-0249.txt:10:8: time data '' does not match format '%d-%b-%Y'
pep-0446.txt:9:8: time data '5-August-2013' does not match format '%d-%b-%Y'
pep-0205.txt:9:8: time data '' does not match format '%d-%b-%Y'
pep-0408.txt:10:8: time data '2012-01-07' does not match format '%d-%b-%Y'
pep-0571.rst:13:8: time data '' does not match format '%d-%b-%Y'
pep-0599.rst:12:8: time data '29-April-2019' does not match format '%d-%b-%Y'
pep-0102.txt:11:8: unconverted data remains:  (edited down on 9-Jan-2002 to become PEP 102)
pep-0235.txt:9:8: time data '' does not match format '%d-%b-%Y'
pep-0556.rst:7:8: time data '2017-09-08' does not match format '%d-%b-%Y'
pep-8016.rst:7:8: time data '2018-11-01' does not match format '%d-%b-%Y'
pep-0410.txt:9:8: time data '01-February-2012' does not match format '%d-%b-%Y'
pep-0552.rst:9:8: time data '2017-09-04' does not match format '%d-%b-%Y'
pep-0461.txt:9:8: time data '2014-01-13' does not match format '%d-%b-%Y'
pep-0396.txt:9:8: time data '2011-03-16' does not match format '%d-%b-%Y'
pep-0507.txt:9:8: time data '2015-09-30' does not match format '%d-%b-%Y'
pep-0437.txt:9:8: time data '2013-03-11' does not match format '%d-%b-%Y'
pep-0230.txt:9:8: time data '' does not match format '%d-%b-%Y'
pep-3151.txt:10:8: time data '2010-07-21' does not match format '%d-%b-%Y'
pep-0428.txt:9:8: time data '30-July-2012' does not match format '%d-%b-%Y'
pep-3154.txt:9:8: time data '2011-08-11' does not match format '%d-%b-%Y'
pep-0442.txt:10:8: time data '2013-05-18' does not match format '%d-%b-%Y'
pep-0476.txt:9:8: time data '28-August-2014' does not match format '%d-%b-%Y'
pep-0441.txt:11:8: time data '30 March 2013' does not match format '%d-%b-%Y'
pep-3149.txt:9:8: time data '2010-07-09' does not match format '%d-%b-%Y'
pep-0559.rst:7:8: time data '2017-09-08' does not match format '%d-%b-%Y'
pep-0553.rst:7:8: time data '2017-09-05' does not match format '%d-%b-%Y'
pep-0386.txt:9:8: time data '4-June-2009' does not match format '%d-%b-%Y'
pep-3143.txt:9:8: time data '2009-01-26' does not match format '%d-%b-%Y'
pep-0407.txt:11:8: time data '2012-01-12' does not match format '%d-%b-%Y'
pep-0522.txt:10:8: time data '16 June 2016' does not match format '%d-%b-%Y'
pep-0475.txt:10:8: time data '29-July-2014' does not match format '%d-%b-%Y'
pep-0459.txt:12:8: time data '11 Nov 2013' does not match format '%d-%b-%Y'
pep-0467.txt:9:8: time data '2014-03-30' does not match format '%d-%b-%Y'
pep-8015.rst:7:8: time data '2018-10-04' does not match format '%d-%b-%Y'
pep-0413.txt:9:8: time data '2012-02-24' does not match format '%d-%b-%Y'
pep-0200.txt:9:8: time data '' does not match format '%d-%b-%Y'
pep-0558.rst:9:8: time data '2017-09-08' does not match format '%d-%b-%Y'
pep-0593.rst:9:8: time data '26-April-2019' does not match format '%d-%b-%Y'

make: *** [lint] Error 1

I'll fix them and update the PR.

@encukou
Copy link
Member

encukou commented Feb 9, 2021

See also #1805

date = line.removeprefix("Created:").strip()
try:
# Validate
dt.strptime(date, "%d-%b-%Y")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "%d-%b-%Y" format depends on the current locale; it might be different across machines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, your suggestion at #1805 (comment) for something like %Y-%m-%d (2021-02-08) would help with this.

@hugovk
Copy link
Member Author

hugovk commented Feb 9, 2021

See also #1805

Great! That saves me a lot of work :)

Rerunning on the top of that branch, everything validates except for:

$ make lint
pre-commit --version > /dev/null || python3 -m pip install pre-commit
pre-commit run --all-files
rst ``code`` is two backticks............................................Passed
rst ``inline code`` next to normal text..................................Passed
Lint Created: dates......................................................Failed
- hook id: validate-created
- exit code: 1

pep-0102.txt:11:8: unconverted data remains:  (edited down on 9-Jan-2002 to become PEP 102)

make: *** [lint] Error 1

From:

Created: 22-Aug-2001 (edited down on 9-Jan-2002 to become PEP 102)

Should this line be changed to pass the current validation, or the validation changed to accept this?

@hugovk hugovk mentioned this pull request Feb 9, 2021
@hugovk hugovk force-pushed the validate-created-date branch from 75bec84 to 1c70eed Compare February 9, 2021 21:12
Copy link
Member

@CAM-Gerlach CAM-Gerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, happened to see this PR and wanted to make sure it wasn't affected by pre-commit/pygrep-hooks#51 . Its not, but I did notice something else:

  • Because stages isn't set (i.e. to ["manual"]), the validation won't just run on CIs as indicated by the title and description, but rather whenever the user runs pre-commit, either as a pre-commit hook (if they install them), pre-commit run --all-files, or the non-standard make lint.
  • If it isn't desired (and there are several problems if it is), the easiest way to fix this is just add stages: ["manual"] to the hook config and add the following here in your Github workflow, which will run both the default hooks and the manual one as well:
        - uses: pre-commit/[email protected]
        with:
          extra_args: --all-files --hook-stage manual
  • If this is the intended behavior, the PR title should presumably be updated. However, be aware that running this hook by default will likely cause the pre-commit run to fail the run if users are on Windows (no python3, python is just called python there) or don't have Python 3.9+ as the default on their system/current env (quite likely), so I would recommend the former.

Also, as you might already be aware, language: script, entry: ./scripts/validate-created.py and a shebang and executable mode on the script is the nominal way to call scripts like this with pre-commit, but AFAIK there's isn't much of a practical difference doing it like this instead.

@hugovk hugovk force-pushed the validate-created-date branch from 1c70eed to c66d81c Compare March 12, 2021 10:14
@hugovk
Copy link
Member Author

hugovk commented Mar 12, 2021

Thanks for checking!

Python 3.9: I've pushed an update which allows #1809 (comment) to validate and also drops removeprefix so we have Python 3.6+ compatibility.

Windows: good point, although we have a possibly related issue to decide on about locale: #1809 (comment).

CI + local: my intention was to run both on CI and locally, and it currently does so. But if we decide only to run this check on Linux then your suggestions about running on only CI may help. (Possibly hoisting the check out of pre-commit.)

@CAM-Gerlach
Copy link
Member

Sure! I did notice that the older PEPs with the txt extension won't be linted by the rst checkers, but I assume that they are unlikely enough to be modified that adding a regex to files to match them wasn't seen as worthwhile (and neither this PR nor mine affects that).

Python 3.9: Sounds good!

Locale: IMO, I would strongly support a standard, locale-independent format that's unambiguous and easy to parse by both programs and programmers, i.e. ISO 8601/RFC 3339 (`%Y-%m-%d/YYYY-MM-DD). Particularly in this case, it seems a bit unnecessary to require anyone running the script to set their locale to English or get incorrect results. But that's just my personal opinion, its ultimately up to you all and I'm happy to help with whatever you decide.

CI: Yeah, ideal to run them on both; I have some ideas to resolve the cross-platform compat issue without having to make it CI-only. If you do that, I'd suggest just confining it to the manual hook-stage; less work and less bespoke code, plus nicer UX.

Windows: Yup, that comes back to the perennial issue that there's no reliable cross-platform way to invoke Python that works across Windows, *nix, venvs and conda; I recall reading yet another thread on that on Python-dev lately. In this case, we know a Python environment must be accessible (because Pre-Commit itself is running in it), but we don't know whether it is exposed at python, python3 or both, and a shell wrapper script with a fallback won't work since bash might not be available either (on Windows without Git bash, cygwin, etc). language: python takes care of everything but requires the target hook to be a Python package, which doesn't really make sense for a bespoke script (which is the purpose of local hooks).

Maybe we can ask Anthony if he would accept a PR adding a python-script language that behaved like script, but executed the script using the same interpreter path as Pre-Commit itself, i,e, its own sys.executable? Or maybe there is some better way that projects shipping such scripts as local hooks handle this (I'm not immediately aware of any)?

However, in this case you should be able to get away with a pygrep regex to do the validation almost as well as a bespoke script; that would run a lot faster and be a lot simpler, and would use pre-commit's own interpreter, avoiding the above issue. The following regex will match any Created: DD-mmm-YYYY that begins a line and is within the first 20 lines of the file, in the appropriate format, that is a valid date between 1990 and 2100; it will match either dates along or dates with an arbitrary parenthetical after them:

^(.*\n){1,20}Created: ([0-2][0-9]|(3[01]))-(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-(199[0-9]|20[0-9][0-9])( \(|\n)

Here's the equivalent in YYYY-MM-DD:

^(.*\n){1,20}Created: ((199[0-9]|20[0-9][0-9])-(0[0-9]|1[0-2])-[0-2][0-9]|(3[01]))( \(|\n)

@gvanrossum
Copy link
Member

gvanrossum commented Mar 12, 2021 via email

@CAM-Gerlach
Copy link
Member

Yeah, it is unambiguous and easy enough read since it includes %b, so no reason to change it on that grounds.

The practical issue is that %b is locale-dependent, so any tools reading (or writing) it will break if they don't happen to be run in an English locale. Specifically, if this PR is commited as-is, make lint, pre-commit run --all-files and the pre-commit hook itself will all start failing locally for users unless they happen to have their locale set to English. However, either of my suggestions above (making it a pygrep hook or only running it on CI) will fix that issue for this PR without having to change the current format, as was previously proposed.

@hugovk
Copy link
Member Author

hugovk commented Mar 12, 2021

Right, so we keep the existing format.

@CAM-Gerlach I agree, makes sense to unhitch this at least the local pre-commit, maybe even move it outside pre-commit to run on CI. Sounds like you have some solid ideas and I'm a little swamped at the moment, would you like to come up with something? Feel free to take anything from this PR or start fresh, and either update my fork branch or just submit a new PR. Thank you!

@gvanrossum
Copy link
Member

gvanrossum commented Mar 13, 2021 via email

@CAM-Gerlach
Copy link
Member

Sorry for the delay; it was a busy week with other things.

@hugovk Sure, I ended up just starting fresh with creating a simple pygrep hook as proposed above, as well as addressing a few related issues. I'll open a PR momentarily, thanks.

@gvanrossum Right; as mentioned the solutions above and in the followup PR avoids locale dependence completely and doesn't require changing the existing format.

@hugovk
Copy link
Member Author

hugovk commented Mar 22, 2021

Please see #1886 instead.

@hugovk hugovk closed this Mar 22, 2021
@hugovk hugovk deleted the validate-created-date branch March 22, 2021 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validate Created dates
5 participants