Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve non ascii checker #5643

Merged
merged 17 commits into from
Jan 10, 2022

Conversation

CarliJoy
Copy link
Contributor

@CarliJoy CarliJoy commented Jan 6, 2022

  • Add yourself to CONTRIBUTORS if you are a new contributor.
  • Add a ChangeLog entry describing what your PR does.
  • If it's a new feature, or an important bug fix, add a What's New entry in
    doc/whatsnew/<current release.rst>.
  • Write a good description on what the PR does.

Type of Changes

Type
✨ New feature

Description

Add non-ascii-identifier as replacement non-ascii-name to ensure really all Python names are ASCII. Checker now checks properly the names of imports (non-ascii-module-import) as well for as of file names (non-ascii-file-name). Non ASCII characters could be homoglyphs (look alike characters) and hard to enter on a non specialized keyboard.

-> See Confusable Characters in PEP 672

- non-ascii-identifier (replaces non-ascii-name)
- non-ascii-file-name (a warning)
- non-ascii-module-import (only considering the namespace the import is imported in)
@coveralls
Copy link

coveralls commented Jan 6, 2022

Pull Request Test Coverage Report for Build 1678974913

  • 90 of 90 (100.0%) changed or added relevant lines in 3 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.03%) to 93.73%

Files with Coverage Reduction New Missed Lines %
pylint/testutils/lint_module_test.py 1 85.29%
Totals Coverage Status
Change from base Build 1663696938: 0.03%
Covered Lines: 14441
Relevant Lines: 15407

💛 - Coveralls

@CarliJoy
Copy link
Contributor Author

CarliJoy commented Jan 6, 2022

@DanielNoord and @Pierre-Sassoulas here the splitted version of the ascii name checker.
I did some requested (and some non requested) improvements.
These improvements are within the second commit to allow easier reviewing.
The first commit is simple copy/paste from #5311

About the _is_ascii_only attribute for the nodes. I know it is a bit of a hack but I don't know a better way.

Only good solution would be to spend a lot more time into figuring out why some nodes are hit multiple times. But sorry I spent to much time already sorry, I won't do that.

If something should be added within Astroid, I would recommend something like at was_checked_by: Set[str] to NodesNG, just to make it simple.

I would kindly ask you to this for me.

Alternative would be that I keep a list of all checked nodes within the parser and just check "if node in self._checked".
But that feels much more wronge and inefficient as the current hack.

@CarliJoy
Copy link
Contributor Author

CarliJoy commented Jan 6, 2022

BTW: Some test always fail locally for me. So I have no way to check the coverage.
So I don't know which line lost coverage in [https://coveralls.io/builds/45430446/source?filename=pylint%2Ftestutils%2Flint_module_test.py#L9]

@Pierre-Sassoulas
Copy link
Member

So I have no way to check the coverage. So I don't know which line lost coverage

You need to connect with your github account on coveralls.io, they need your token to display the file or they would get rate limited.

Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for adding the tests/functional/n/non_ascii_name_class/__init__.py and tests/functional/n/non_ascii_name/__init__.py files? I think they can be removed tight?

Could you add a docstring to tests/functional/n/non_ascii_import/foobar.py to say what it is testing. I can't make a suggestion because it is empty 😢

I haven't looked at pylint/checkers/non_ascii_names.py and the unittest this time (hopefully I'll get time this evening), but this should help this along again. I'm at 47/69 files viewed and approved right now so we're certainly getting there 😄

Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got to the other files as well!

@CarliJoy Tests look very good and robust. 🎉 I just wonder if we can move some of the remaining unittests to the functional tests as well. I'm asking this not only because I find them more readable, but also since I know they are actually better "tests". They way those are invoked better resembles the way pylint is invoked by normal users than the unittests are, which has recently caught a bug that the unittests simply could not in their current setup. This is something we as maintainers/contributors should fix, but in the meantime functional tests are often the way to go.

@CarliJoy
Copy link
Contributor Author

CarliJoy commented Jan 7, 2022

Got to the other files as well!

@CarliJoy Tests look very good and robust. tada I just wonder if we can move some of the remaining unittests to the functional tests as well. I'm asking this not only because I find them more readable, but also since I know they are actually better "tests". They way those are invoked better resembles the way pylint is invoked by normal users than the unittests are, which has recently caught a bug that the unittests simply could not in their current setup. This is something we as maintainers/contributors should fix, but in the meantime functional tests are often the way to go.

@DanielNoord which unit test are you referring to?
Because I added functional test for the most use cases. I escpecially use the unit tests for two things:
a) To check that actually got the correct node and make sure that I understood the logic correctly (internal logic checks)
b) To use parameterize to check a function for all possible outcomes. This would be very cumbersome to this manually in funcrtional tests, if there is not equivalent of parameterize. Some cases I use there are still tested within the functional tests. So testing all possiblites with unit test and a bunch of real world examples should actually cover everything IMO.

@DanielNoord
Copy link
Collaborator

@DanielNoord which unit test are you referring to? Because I added functional test for the most use cases. I escpecially use the unit tests for two things: a) To check that actually got the correct node and make sure that I understood the logic correctly (internal logic checks) b) To use parameterize to check a function for all possible outcomes. This would be very cumbersome to this manually in funcrtional tests, if there is not equivalent of parameterize. Some cases I use there are still tested within the functional tests. So testing all possiblites with unit test and a bunch of real world examples should actually cover everything IMO.

I was thinking of test_check_import specifically, but the others might fit as well. I agree that testing some examples for nodes makes sense, but can't we just put all those import statements in one big functional test file and see if they test correctly?

@CarliJoy CarliJoy force-pushed the improve_non_ascii_checker branch from be1749c to 415a13b Compare January 7, 2022 13:52
@CarliJoy
Copy link
Contributor Author

CarliJoy commented Jan 7, 2022

@DanielNoord which unit test are you referring to? Because I added functional test for the most use cases. I escpecially use the unit tests for two things: a) To check that actually got the correct node and make sure that I understood the logic correctly (internal logic checks) b) To use parameterize to check a function for all possible outcomes. This would be very cumbersome to this manually in funcrtional tests, if there is not equivalent of parameterize. Some cases I use there are still tested within the functional tests. So testing all possiblites with unit test and a bunch of real world examples should actually cover everything IMO.

I was thinking of test_check_import specifically, but the others might fit as well. I agree that testing some examples for nodes makes sense, but can't we just put all those import statements in one big functional test file and see if they test correctly?

Then they are integration tests and I have to consider too many other side effects: i.e. nothing marked duplicated, modules should really exist, you name it.
Just gets more complicated and additional work for no real benefit IMO.

I again was just glad about my unit tests, because I am in the process of figuring out how to remove ._is_ascii attribute. For this unit test are just a charm, because they run fast and are easy to debug and don't have side effects.

(And I mean like it is not, that I didn't add function tests -> I hit the limit of maximum files in a folder already.... btw...)

Co-authored-by: Daniël van Noord <[email protected]>
@CarliJoy CarliJoy force-pushed the improve_non_ascii_checker branch from 9184357 to 54709b8 Compare January 7, 2022 14:48
@CarliJoy
Copy link
Contributor Author

CarliJoy commented Jan 7, 2022

Using Unit Tests I was able to checkout and debug a number of things and so could simplify the Checker a good amount.

Also recognized, that I used Plural for the checker name and changed that.

The Checker is now almost dead simple 🎉

Hope you like it :-) @DanielNoord @Pierre-Sassoulas

@CarliJoy CarliJoy force-pushed the improve_non_ascii_checker branch from 57ef516 to 58827c3 Compare January 7, 2022 15:45
@DanielNoord DanielNoord self-requested a review January 7, 2022 15:45
@CarliJoy CarliJoy force-pushed the improve_non_ascii_checker branch from 58827c3 to 1a0fed3 Compare January 7, 2022 15:46
Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarliJoy Thanks again! See my comments about unittests and functional tests about why I was so insistent on them. You're right I shouldn't have complained about too many tests 😄

I think we're really getting there with this PR. The final checker is becoming really quite elegant and concise.

One questions that couldn't really be commented anywhere:
Do we need tests/functional/n/non_ascii_name_class/__init__.py? And the other __init__ files in the functional test directories?

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for splitting the unicode MR, this is easier to review. The tests are also getting exhaustive and this is very nice. I think we're converging and will be able to merge this first part soon.

@CarliJoy
Copy link
Contributor Author

CarliJoy commented Jan 8, 2022

@DanielNoord wrote:

@CarliJoy Thanks again! See my comments about unittests and functional tests about why I was so insistent on them. You're right I shouldn't have complained about too many tests smile

I think we're really getting there with this PR. The final checker is becoming really quite elegant and concise.

One questions that couldn't really be commented anywhere: Do we need tests/functional/n/non_ascii_name_class/__init__.py? And the other __init__ files in the functional test directories?

I oriented myself on the existing functional tests and all main folders (a-z) and some of the subfolders contain a __init__.py.
I am not sure if that is required. I guess for testing the local import functions this is actually required.
Anyway, as this seems to be an issues of the existing codebase I kindly ask to address this issues outside of this PR.

@CarliJoy
Copy link
Contributor Author

@DanielNoord and @Pierre-Sassoulas

I reverted the changes to the Base Checkers and now use a simpler args checks.
I don't know if this really catches all cases but I don't want to invest more time in this and contradicting statements on how to have the new checker use functions of the old one, especially as this are only 4 lines of code.

I also removed two unittests but kept the other. See #5643 (comment) for my view on them.
If you still want the rest removed, submit a change.
For all of them (besides the imports) a functional test exists as well.

Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with my last recommendations.

I think we could keep going back and forth about pylint's testing framework, but it doesn't feel too productive. I just want to say that I do see and agree with some of the points you're making or made previously, but that within the context of the current code we can't easily fix those. That's the result of working with such an old codebase I guess.

With regard to conflicting suggestions, I think that is to be expected in open source projects without strict hierarchy and review procedures on large PRs such as these. Of course we try to minimise them, but I don't think it is strange that reviewers can have conflicting opinions about certain decisions. With PRs with 100+ comments it's easy to miss comments from previous reviewers (or even your own) about the same topic. I know that that is frustrating and it was something I ran into when I first started contributing to pylint but I think that's one of the downsides of these open source projects. That doesn't really make this less annoying but I hope you can see how such things can occur.

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the new checker @CarliJoy !

To add to what @DanielNoord said about conflicting reviews: the smaller and atomic the MR is, the less likely it's going to happen. Making our lives easier by making small easy reviewable MR will help tremendously. If we need to review 1000 lines repeatedly with hundred of comments then we might not have the time to read all comments from everyone. Especially if something that should be discussed in another MR in detail is lost in the hundred of comments and makes us feel that the code change a lot and that we need to reread everything every-time as a result.

Also remember there's other interactions at the same time on the repository (repositories!) we're maintaining, we don't have a single MR / issue to take care of and we have jobs and lifes on top of it.

@Pierre-Sassoulas Pierre-Sassoulas merged commit d2475b4 into pylint-dev:main Jan 10, 2022
@Pierre-Sassoulas Pierre-Sassoulas added this to the 2.13.0 milestone Jan 10, 2022
@Pierre-Sassoulas Pierre-Sassoulas added the Enhancement ✨ Improvement to a component label Jan 10, 2022
@CarliJoy CarliJoy deleted the improve_non_ascii_checker branch January 10, 2022 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants