-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a checker for misleading unicode #5311
Conversation
@Pierre-Sassoulas here my suggestion. Sure some work / decision have to put into them. |
Pull Request Test Coverage Report for Build 1687416478
💛 - Coveralls |
235a273
to
af44181
Compare
@Pierre-Sassoulas I am a bit undecided on what to do with the "old" non ascii checker. I wrote a new checker for two reasons:
Anyhow having similar checks makes not a lot of sense, and also leads to this last failing tests. A suggestion would be to to remove the old non-ascii checker from Or we could try to build in the new checks into Did you have time to look in the MR yet? I will now try to move out the test files to the correct places and write more proper unit tests (documentation about writing an Plugin is quite different from the requested code contribution style btw...). |
@Pierre-Sassoulas could it be that detecting column and line ending works only with Python 3.8? Or did I do something wrong regarding returning the correct node? confused Also I am not sure what to make with the failing Windows tests: I don't have Windows to test stuff atm. but I am a bit confused why it should fail due to unicode errors (as written in the other ASCII only test) -> I remember that unicode worked fine in Windows, when I was using it in my old company: Is this a CI issues? |
Yes, sorry about that we added an indication in #5418 but it's not merged yet. |
Sorry I did not have the time to review yet, finally releasing 2.12 made up for a busy week 😄 |
LoL the Windows functional test is really strange. |
The windows tests are rather hard to debug because I don't think we have a Windows developers 😄
I think the problem is due to the column being calculated differently for Windows, we check the line, column, end_line, end_column of the messages in functional tests which is not the case in your unit test. The peculiar unicode character might prevent python's ast (and thus astroid) to return the proper value on Windows ? |
I figured it out, was my fault, I didn't check for byte encoding, when handing the "\r\n" line endings. One open questions is still how to proceed with the two different "non-ascii-" checks.
Easiest solution would be the replacement. So I would remove the What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the old checker for non ascii character, I think they should both stay. if your new check does the same thing than the old one (or more), you can rename it using old_names
(see the checker for missing-docstring
for example).
It actually does the same very same thing but covers much more use cases (i.e. imports, except as variable names)... If I rename it, the functionality should still stay within Names Checker? Because then we would check for the same thing twice?? |
Seeing it's doing the same thing + more, you could register the new check in your new checker by registering the old name in the new checker so we handle retro-compatibility with the old name, and remove the old check and old code from NameChecker so we do not waste time checking the same thing here. It would be nice to have a functional test using the old name so we can check that it still works like before. |
@Pierre-Sassoulas I replaced the old name checker as agreed and merge the current master version. Still I am getting problems with the end-col stuff in Python 3.8 but I really have no clue how I could fix this. Could you support me fixing it? |
It look like the proper column is not available with python 3.8 AST, There's not a lot we can do, parsing better than the AST in astroid, is not something that takes a reasonable time and we almost never do it (We're not testing the line/column before python 3.8 because there is a lot better AST since python 3.8). Maybe it would be enough to test for python 3.9 and superior (we could add a .rc configuration file with min_version). Or we could have two functional files with the same content, one for < 3.9 and one for > 3.9 (one with min_version, one with max_version). |
To be clearer : python 3.6 and 3.7 do not have line/column information in this case but it's expected, that's why tests for 3.6 and 3.7 are green. |
This problem (sadly) goes beyond the end line and end column. I'm not sure what the best way to solve this is: I don't think we have any test options that would allow creating one expected output that satisfies 3.6, 3.7 and 3.8 and one for 3.9 and 3.10. We could create tests for 1) 3.6, 3.7, 2) 3.8 and 3) 3.9, 3.10 but that seems like a lot of duplication.. Not sure what to do here.. |
Why does it not occur in the other functional tests? Is it due to an special AST object I am using? My suggestion for a solution would be the following: This way we don't have a lot of duplication in the function test and are also able to special AST behavior of special versions. What do you think? Should I include this behavior in the PR? |
I think it might be due to the special unicode character that the ast parsers doesn't recognise.. Perhaps using another glyph fixes it?
This would still requires us to make three different files right? For
If we go ahead with this this should be done in a different PR. |
I could try this. But even if it is like this, it would be better if we are aware of this issue and have a test for it or not? Maybe I also look into the Astroid to see if there is a fix for this. Or do you think the problem originates already from the Python 3.8 AST parser?
Nope this would require only one extra .py38 file at the moment. For all other tests it is running fine.
Should I do this? |
I agree with @DanielNoord that it's something that need another MR. I like the idea but it's not a small change. I think this is necessary in order to handle small inconsistencies in the AST for python interpreter > 3.8 and still being able to check the that the column is correct most of the time. This is the first time the line number is different between 3.8 and 3.9 or 3.10 by the way :) ! I don't think we should start testing the value from python 3.6 and 3.7. Especially not 3.6 as it will not be required anymore in 9 days. The issue happens all the time for those interpreters, and we already test the column number on other versions. Also the wrong information come directly from the AST so it's not linked to pylint and we don't control it. The way we would create the functional tests output would be with the current output, it shows that it's not really useful. Also some "testname" contain a Regarding inferior or superior to a python version what about:
|
What about making a test for Although I think we could implement it, the proposed change for test filenames is quite large (also because we already use a similar pattern) and I don't want to give you a lot of extra work @CarliJoy. Writing all of these additional |
Sounds good, I just thought about the update script that will need to raise an error when your python interpreter cannot upgrade the generic output. Say you have python 3.8 and |
Ref #5311 Co-authored-by: Carli* Freudenberg <[email protected]>
Ref pylint-dev#5311 Co-authored-by: Carli* Freudenberg <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another round. I'm having a very busy week so I don't have enough time to do a full review probably.
I've looked at NonAsciiNamesChecker
and requested some changes. @CarliJoy Would you perhaps reconsider putting some of these changes in a separate PR? Especially the NonAsciiNamesChecker
should be fairly trivial to extract out of this PR and it would make the reviewing a lot easier. (We'll also get a more tangible result quicker, as we'll likely be able to merge a part of this sooner).
@DanielNoord I extract the ASCII checker from this PR as on your request. |
dcf244a
to
734ca46
Compare
I combined the changes. So there are not this crazy amount of commits any more. Content is the same. |
Ref pylint-dev#5311 Co-authored-by: Carli* Freudenberg <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a first look at the actual checker, which already seems almost good to go.
I'll take another look at the tests as well tomorrow.
github says, that I was mentioned here 2h ago but I don't see where... 😿 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes, but rest looks good! Thanks @CarliJoy 🎉
Co-authored-by: Daniël van Noord <[email protected]>
@Pierre-Sassoulas anything left to do from my side? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for all the work @CarliJoy, without doubt one of the major addition to pylint 2.13.0 !
suffix[0], | ||
id=f"{codec_and_msg[0]}_{line_ending[1]}_{suffix[1]}", | ||
) | ||
for codec_and_msg, line_ending, suffix in itertools.product( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice parametrize !
Type of Changes
Description
Unicode and some other ASCII characters can be used to create programs that run
much different as a human reader would expect it.
PEP 672 lists some examples.
The added checkers are intended to make users aware of these issues.
Some of the test might be too strict for users programming in other languages as
english. As we assume this is minority than can disable the given tests, this should
be not an issue.
Note that the codes try to be prepared for UTF-16 and UTF-32 even so Python files
can't be encoded in them yet.
There is a checker that would detect UTF-16/UTF-32 files and fail, so even the usage
of an old pylint version would not cause a "security hole" when used with new Python
version that allow these encodings.
Closes #5281