Improve non ascii checker #5643

CarliJoy · 2022-01-06T17:59:49Z

Add yourself to CONTRIBUTORS if you are a new contributor.
Add a ChangeLog entry describing what your PR does.
If it's a new feature, or an important bug fix, add a What's New entry in
doc/whatsnew/<current release.rst>.
Write a good description on what the PR does.

Type of Changes

	Type
✓	✨ New feature

Description

Add non-ascii-identifier as replacement non-ascii-name to ensure really all Python names are ASCII. Checker now checks properly the names of imports (non-ascii-module-import) as well for as of file names (non-ascii-file-name). Non ASCII characters could be homoglyphs (look alike characters) and hard to enter on a non specialized keyboard.

-> See Confusable Characters in PEP 672

- non-ascii-identifier (replaces non-ascii-name) - non-ascii-file-name (a warning) - non-ascii-module-import (only considering the namespace the import is imported in)

coveralls · 2022-01-06T18:06:20Z

Pull Request Test Coverage Report for Build 1678974913

90 of 90 (100.0%) changed or added relevant lines in 3 files are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage increased (+0.03%) to 93.73%

Files with Coverage Reduction	New Missed Lines	%
pylint/testutils/lint_module_test.py	1	85.29%

Totals
Change from base Build 1663696938:	0.03%
Covered Lines:	14441
Relevant Lines:	15407

💛 - Coveralls

CarliJoy · 2022-01-06T18:06:39Z

@DanielNoord and @Pierre-Sassoulas here the splitted version of the ascii name checker.
I did some requested (and some non requested) improvements.
These improvements are within the second commit to allow easier reviewing.
The first commit is simple copy/paste from #5311

About the _is_ascii_only attribute for the nodes. I know it is a bit of a hack but I don't know a better way.

Only good solution would be to spend a lot more time into figuring out why some nodes are hit multiple times. But sorry I spent to much time already sorry, I won't do that.

If something should be added within Astroid, I would recommend something like at was_checked_by: Set[str] to NodesNG, just to make it simple.

I would kindly ask you to this for me.

Alternative would be that I keep a list of all checked nodes within the parser and just check "if node in self._checked".
But that feels much more wronge and inefficient as the current hack.

CarliJoy · 2022-01-06T18:10:14Z

BTW: Some test always fail locally for me. So I have no way to check the coverage.
So I don't know which line lost coverage in [https://coveralls.io/builds/45430446/source?filename=pylint%2Ftestutils%2Flint_module_test.py#L9]

pylint/checkers/non_ascii_names.py

Pierre-Sassoulas · 2022-01-06T21:08:04Z

So I have no way to check the coverage. So I don't know which line lost coverage

You need to connect with your github account on coveralls.io, they need your token to display the file or they would get rate limited.

DanielNoord

Is there a reason for adding the tests/functional/n/non_ascii_name_class/__init__.py and tests/functional/n/non_ascii_name/__init__.py files? I think they can be removed tight?

Could you add a docstring to tests/functional/n/non_ascii_import/foobar.py to say what it is testing. I can't make a suggestion because it is empty 😢

I haven't looked at pylint/checkers/non_ascii_names.py and the unittest this time (hopefully I'll get time this evening), but this should help this along again. I'm at 47/69 files viewed and approved right now so we're certainly getting there 😄

tests/functional/n/non_ascii_name/non_ascii_name_pos_and_kwonly_function.rc

tests/functional/n/non_ascii_name/non_ascii_name_kwargs_py39plus.py

tests/functional/n/non_ascii_name/non_ascii_name_kwargs_py38.py

tests/functional/n/non_ascii_name/non_ascii_name_function_argument_py38.py

tests/functional/n/non_ascii_name/non_ascii_name_function_argument_py39plus.py

pylint/checkers/base.py

DanielNoord

Got to the other files as well!

@CarliJoy Tests look very good and robust. 🎉 I just wonder if we can move some of the remaining unittests to the functional tests as well. I'm asking this not only because I find them more readable, but also since I know they are actually better "tests". They way those are invoked better resembles the way pylint is invoked by normal users than the unittests are, which has recently caught a bug that the unittests simply could not in their current setup. This is something we as maintainers/contributors should fix, but in the meantime functional tests are often the way to go.

pylint/checkers/non_ascii_names.py

tests/checkers/unittest_non_ascii_name.py

CarliJoy · 2022-01-07T11:23:41Z

Got to the other files as well!

@CarliJoy Tests look very good and robust. tada I just wonder if we can move some of the remaining unittests to the functional tests as well. I'm asking this not only because I find them more readable, but also since I know they are actually better "tests". They way those are invoked better resembles the way pylint is invoked by normal users than the unittests are, which has recently caught a bug that the unittests simply could not in their current setup. This is something we as maintainers/contributors should fix, but in the meantime functional tests are often the way to go.

@DanielNoord which unit test are you referring to?
Because I added functional test for the most use cases. I escpecially use the unit tests for two things:
a) To check that actually got the correct node and make sure that I understood the logic correctly (internal logic checks)
b) To use parameterize to check a function for all possible outcomes. This would be very cumbersome to this manually in funcrtional tests, if there is not equivalent of parameterize. Some cases I use there are still tested within the functional tests. So testing all possiblites with unit test and a bunch of real world examples should actually cover everything IMO.

DanielNoord · 2022-01-07T11:35:49Z

@DanielNoord which unit test are you referring to? Because I added functional test for the most use cases. I escpecially use the unit tests for two things: a) To check that actually got the correct node and make sure that I understood the logic correctly (internal logic checks) b) To use parameterize to check a function for all possible outcomes. This would be very cumbersome to this manually in funcrtional tests, if there is not equivalent of parameterize. Some cases I use there are still tested within the functional tests. So testing all possiblites with unit test and a bunch of real world examples should actually cover everything IMO.

I was thinking of test_check_import specifically, but the others might fit as well. I agree that testing some examples for nodes makes sense, but can't we just put all those import statements in one big functional test file and see if they test correctly?

CarliJoy · 2022-01-07T14:23:10Z

@DanielNoord which unit test are you referring to? Because I added functional test for the most use cases. I escpecially use the unit tests for two things: a) To check that actually got the correct node and make sure that I understood the logic correctly (internal logic checks) b) To use parameterize to check a function for all possible outcomes. This would be very cumbersome to this manually in funcrtional tests, if there is not equivalent of parameterize. Some cases I use there are still tested within the functional tests. So testing all possiblites with unit test and a bunch of real world examples should actually cover everything IMO.

I was thinking of test_check_import specifically, but the others might fit as well. I agree that testing some examples for nodes makes sense, but can't we just put all those import statements in one big functional test file and see if they test correctly?

Then they are integration tests and I have to consider too many other side effects: i.e. nothing marked duplicated, modules should really exist, you name it.
Just gets more complicated and additional work for no real benefit IMO.

I again was just glad about my unit tests, because I am in the process of figuring out how to remove ._is_ascii attribute. For this unit test are just a charm, because they run fast and are easy to debug and don't have side effects.

(And I mean like it is not, that I didn't add function tests -> I hit the limit of maximum files in a folder already.... btw...)

Co-authored-by: Daniël van Noord <[email protected]>

CarliJoy · 2022-01-07T15:44:02Z

Using Unit Tests I was able to checkout and debug a number of things and so could simplify the Checker a good amount.

Also recognized, that I used Plural for the checker name and changed that.

The Checker is now almost dead simple 🎉

Hope you like it :-) @DanielNoord @Pierre-Sassoulas

DanielNoord

@CarliJoy Thanks again! See my comments about unittests and functional tests about why I was so insistent on them. You're right I shouldn't have complained about too many tests 😄

I think we're really getting there with this PR. The final checker is becoming really quite elegant and concise.

One questions that couldn't really be commented anywhere:
Do we need tests/functional/n/non_ascii_name_class/__init__.py? And the other __init__ files in the functional test directories?

tests/functional/n/non_ascii_name/non_ascii_name_pos_and_kwonly_function.py

pylint/testutils/checker_test_case.py

tests/checkers/unittest_non_ascii_name.py

pylint/checkers/non_ascii_names.py

Pierre-Sassoulas

Thank you for splitting the unicode MR, this is easier to review. The tests are also getting exhaustive and this is very nice. I think we're converging and will be able to merge this first part soon.

pylint/checkers/base_checker.py

pylint/checkers/non_ascii_names.py

tests/checkers/unittest_non_ascii_name.py

pylint/checkers/base_checker.py

CarliJoy · 2022-01-08T12:49:02Z

@DanielNoord wrote:

@CarliJoy Thanks again! See my comments about unittests and functional tests about why I was so insistent on them. You're right I shouldn't have complained about too many tests smile

I think we're really getting there with this PR. The final checker is becoming really quite elegant and concise.

One questions that couldn't really be commented anywhere: Do we need tests/functional/n/non_ascii_name_class/__init__.py? And the other __init__ files in the functional test directories?

I oriented myself on the existing functional tests and all main folders (a-z) and some of the subfolders contain a __init__.py.
I am not sure if that is required. I guess for testing the local import functions this is actually required.
Anyway, as this seems to be an issues of the existing codebase I kindly ask to address this issues outside of this PR.

Co-authored-by: Daniël van Noord <[email protected]> Co-authored-by: Pierre Sassoulas <[email protected]>

for more information, see https://pre-commit.ci

…oy/pylint into improve_non_ascii_checker

CarliJoy · 2022-01-10T10:54:33Z

@DanielNoord and @Pierre-Sassoulas

I reverted the changes to the Base Checkers and now use a simpler args checks.
I don't know if this really catches all cases but I don't want to invest more time in this and contradicting statements on how to have the new checker use functions of the old one, especially as this are only 4 lines of code.

I also removed two unittests but kept the other. See #5643 (comment) for my view on them.
If you still want the rest removed, submit a change.
For all of them (besides the imports) a functional test exists as well.

DanielNoord

Approving with my last recommendations.

I think we could keep going back and forth about pylint's testing framework, but it doesn't feel too productive. I just want to say that I do see and agree with some of the points you're making or made previously, but that within the context of the current code we can't easily fix those. That's the result of working with such an old codebase I guess.

With regard to conflicting suggestions, I think that is to be expected in open source projects without strict hierarchy and review procedures on large PRs such as these. Of course we try to minimise them, but I don't think it is strange that reviewers can have conflicting opinions about certain decisions. With PRs with 100+ comments it's easy to miss comments from previous reviewers (or even your own) about the same topic. I know that that is frustrating and it was something I ran into when I first started contributing to pylint but I think that's one of the downsides of these open source projects. That doesn't really make this less annoying but I hope you can see how such things can occur.

tests/checkers/unittest_non_ascii_name.py

pylint/checkers/non_ascii_names.py

Co-authored-by: Daniël van Noord <[email protected]>

Pierre-Sassoulas

Thank you for the new checker @CarliJoy !

To add to what @DanielNoord said about conflicting reviews: the smaller and atomic the MR is, the less likely it's going to happen. Making our lives easier by making small easy reviewable MR will help tremendously. If we need to review 1000 lines repeatedly with hundred of comments then we might not have the time to read all comments from everyone. Especially if something that should be discussed in another MR in detail is lost in the hundred of comments and makes us feel that the code change a lot and that we need to reread everything every-time as a result.

Also remember there's other interactions at the same time on the repository (repositories!) we're maintaining, we don't have a single MR / issue to take care of and we have jobs and lifes on top of it.

CarliJoy added 2 commits January 6, 2022 18:57

improve non-ascii check

df77a71

split non-ascii-name into 3 different msgs

7c0411a

- non-ascii-identifier (replaces non-ascii-name) - non-ascii-file-name (a warning) - non-ascii-module-import (only considering the namespace the import is imported in)

CarliJoy mentioned this pull request Jan 6, 2022

add a checker for misleading unicode #5311

Merged

DanielNoord requested review from DanielNoord and Pierre-Sassoulas January 6, 2022 18:14

Pierre-Sassoulas requested changes Jan 6, 2022

View reviewed changes

pylint/checkers/non_ascii_names.py Outdated Show resolved Hide resolved

DanielNoord requested changes Jan 6, 2022

View reviewed changes

DanielNoord requested changes Jan 7, 2022

View reviewed changes

CarliJoy force-pushed the improve_non_ascii_checker branch from be1749c to 415a13b Compare January 7, 2022 13:52

Apply suggestions from code review

54709b8

Co-authored-by: Daniël van Noord <[email protected]>

CarliJoy force-pushed the improve_non_ascii_checker branch from 9184357 to 54709b8 Compare January 7, 2022 14:48

Keep the original message name

c36c1eb

CarliJoy force-pushed the improve_non_ascii_checker branch from 57ef516 to 58827c3 Compare January 7, 2022 15:45

DanielNoord self-requested a review January 7, 2022 15:45

simplify checker

1a0fed3

CarliJoy force-pushed the improve_non_ascii_checker branch from 58827c3 to 1a0fed3 Compare January 7, 2022 15:46

DanielNoord requested changes Jan 7, 2022

View reviewed changes

Pierre-Sassoulas requested changes Jan 7, 2022

View reviewed changes

CarliJoy and others added 3 commits January 10, 2022 10:32

Apply suggestions from code review

737d51c

Co-authored-by: Daniël van Noord <[email protected]> Co-authored-by: Pierre Sassoulas <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

58524ce

for more information, see https://pre-commit.ci

Do not change original files, use simple args check

bcc8a39

CarliJoy added 6 commits January 10, 2022 10:53

Merge branch 'improve_non_ascii_checker' of https://github.com/CarliJ…

8ae2ab9

…oy/pylint into improve_non_ascii_checker

fix functional test

85f7529

Remove not needed unit tests

06761c8

update help

c89a5c8

remove confidence as parameter

4575edc

be more explicit in error message

b22c2da

fix name

e2c9a88

DanielNoord approved these changes Jan 10, 2022

View reviewed changes

CarliJoy and others added 2 commits January 10, 2022 19:54

Apply suggestions from code review

f6b540c

Co-authored-by: Daniël van Noord <[email protected]>

add missing human-readable type

5ed79c7

Pierre-Sassoulas approved these changes Jan 10, 2022

View reviewed changes

Pierre-Sassoulas merged commit d2475b4 into pylint-dev:main Jan 10, 2022

Pierre-Sassoulas added this to the 2.13.0 milestone Jan 10, 2022

Pierre-Sassoulas added the Enhancement ✨ Improvement to a component label Jan 10, 2022

CarliJoy deleted the improve_non_ascii_checker branch January 10, 2022 23:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve non ascii checker #5643

Improve non ascii checker #5643

CarliJoy commented Jan 6, 2022

coveralls commented Jan 6, 2022 •

edited

Loading

CarliJoy commented Jan 6, 2022

CarliJoy commented Jan 6, 2022

Pierre-Sassoulas commented Jan 6, 2022

DanielNoord left a comment

DanielNoord left a comment

CarliJoy commented Jan 7, 2022 •

edited

Loading

DanielNoord commented Jan 7, 2022

CarliJoy commented Jan 7, 2022 •

edited

Loading

CarliJoy commented Jan 7, 2022 •

edited

Loading

DanielNoord left a comment

Pierre-Sassoulas left a comment

CarliJoy commented Jan 8, 2022

CarliJoy commented Jan 10, 2022

DanielNoord left a comment

Pierre-Sassoulas left a comment

Improve non ascii checker #5643

Improve non ascii checker #5643

Conversation

CarliJoy commented Jan 6, 2022

Type of Changes

Description

coveralls commented Jan 6, 2022 • edited Loading

Pull Request Test Coverage Report for Build 1678974913

💛 - Coveralls

CarliJoy commented Jan 6, 2022

CarliJoy commented Jan 6, 2022

Pierre-Sassoulas commented Jan 6, 2022

DanielNoord left a comment

Choose a reason for hiding this comment

DanielNoord left a comment

Choose a reason for hiding this comment

CarliJoy commented Jan 7, 2022 • edited Loading

DanielNoord commented Jan 7, 2022

CarliJoy commented Jan 7, 2022 • edited Loading

CarliJoy commented Jan 7, 2022 • edited Loading

DanielNoord left a comment

Choose a reason for hiding this comment

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

CarliJoy commented Jan 8, 2022

CarliJoy commented Jan 10, 2022

DanielNoord left a comment

Choose a reason for hiding this comment

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

coveralls commented Jan 6, 2022 •

edited

Loading

CarliJoy commented Jan 7, 2022 •

edited

Loading

CarliJoy commented Jan 7, 2022 •

edited

Loading

CarliJoy commented Jan 7, 2022 •

edited

Loading