Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-39337: Add a test case for normalizing of codec names #19069

Merged
merged 17 commits into from
Oct 8, 2020

Conversation

shihai1991
Copy link
Member

@shihai1991 shihai1991 commented Mar 19, 2020

@shihai1991 shihai1991 changed the title bpo-1635741: Add a test case to against nomralizeing() of codecs bpo-39337: Add a test case to against nomralizeing() of codecs Mar 19, 2020
Lib/test/test_codecs.py Outdated Show resolved Hide resolved
@shihai1991 shihai1991 changed the title bpo-39337: Add a test case to against nomralizeing() of codecs [WIP] bpo-39337: Add a test case to against nomralizeing() of codecs Mar 20, 2020
@shihai1991 shihai1991 changed the title [WIP] bpo-39337: Add a test case to against nomralizeing() of codecs bpo-39337: Add a test case to against nomralizeing() of codecs Mar 21, 2020
@shihai1991
Copy link
Member Author

cc @vstinner Hi, victor. I added a test case of normalizing().
Without the help of codecs_unregister(), this test case have been passed .

@shihai1991 shihai1991 closed this Mar 29, 2020
@shihai1991 shihai1991 reopened this Mar 29, 2020
@vstinner
Copy link
Member

vstinner commented Sep 8, 2020

Is this PR still relevant? Without unregistering the codec, it is likely to fail on Refleak buildbots. Try to run "./python -m test test_codecs -R 3:3 -m test_normalized_encoding".

@shihai1991
Copy link
Member Author

shihai1991 commented Sep 8, 2020

Is this PR still relevant? Without unregistering the codec, it is likely to fail on Refleak buildbots. Try to run "./python -m test test_codecs -R 3:3 -m test_normalized_encoding".

ok, I will try it again in this weekend. I almost forget this PR's detail, Lol~

* origin/master: (1373 commits)
  bpo-1635741: Port mashal module to multi-phase init (python#22149)
  bpo-1635741: Port _string module to multi-phase init (pythonGH-22148)
  bpo-1635741: Convert _sha256 types to heap types (pythonGH-22134)
  bpo-1635741: Port the termios to multi-phase init (PEP 489) (pythonGH-22139)
  bpo-41732: add iterator to memoryview (pythonGH-22119)
  bpo-40744: Drop support for SQLite pre 3.7.3 (pythonGH-20909)
  bpo-41316: Make tarfile follow specs for FNAME (pythonGH-21511)
  bpo-41720: Add "return NotImplemented" in turtle.Vec2D.__rmul__(). (pythonGH-22092)
  bpo-1635741 port _curses_panel to multi-phase init (PEP 489) (pythonGH-21986)
  bpo-1635741: Port _overlapped module to multi-phase init (pythonGH-22051)
  bpo-1635741: Port _opcode module to multi-phase init (PEP 489) (pythonGH-22050)
  bpo-1635741 port zlib module to multi-phase init (pythonGH-21995)
  [doc] Add link to Generic in typing (pythonGH-22125)
  bpo-41513: Expand comments and add references for a better understanding (pythonGH-22123)
  bpo-1635741: Port _sha1, _sha512, _md5 to multiphase init (pythonGH-21818)
  closes bpo-41723: Fix an error in the py_compile documentation. (pythonGH-22110)
  [doc] Fix padding in some typing definitions (pythonGH-22114)
  Fix documented Python version for venv --upgrade-deps (pythonGH-22113)
  bpo-40318: Migrate to SQLite3 trace v2 API (pythonGH-19581)
  bpo-41687: Fix sendfile implementation to work with Solaris (python#22040)
  ...
@shihai1991
Copy link
Member Author

shihai1991 commented Sep 11, 2020

@vstinner Hi, victor. I have updated this PR. Pls take a look again when you have free time.

@shihai1991 shihai1991 closed this Sep 11, 2020
@shihai1991 shihai1991 reopened this Sep 11, 2020
@taleinat taleinat added skip news tests Tests in the Lib/test dir labels Sep 20, 2020
@taleinat
Copy link
Contributor

@vstinner, this PR now unregisters the codec search function, going to some significant effort to do so. Would you like to review it again?

Modules/_testinternalcapi.c Outdated Show resolved Hide resolved
Modules/_testinternalcapi.c Outdated Show resolved Hide resolved
Modules/_testinternalcapi.c Outdated Show resolved Hide resolved
Modules/_testinternalcapi.c Outdated Show resolved Hide resolved
@vstinner
Copy link
Member

Can you try to write a first PR just to add _codecs.unregister()?

Do you think that it's worth it to expose it in the public codecs module as well? Near:
https://docs.python.org/dev/library/codecs.html#codecs.register

If yes, it should be documented.

cc @serhiy-storchaka @malemburg

@shihai1991
Copy link
Member Author

shihai1991 commented Sep 21, 2020

Can you try to write a first PR just to add _codecs.unregister()?

copy that.

Do you think that it's worth it to expose it in the public codecs module as well? Near:
https://docs.python.org/dev/library/codecs.html#codecs.register

If yes, it should be documented.

Should be added definitely, I will update it soon.

* origin/master: (113 commits)
  bpo-41773: Raise exception for non-finite weights in random.choices().  (pythonGH-22441)
  bpo-41873: Add vectorcall for float() (pythonGH-22432)
  bpo-41861: Convert _sqlite3 PrepareProtocolType to heap type (pythonGH-22428)
  bpo-41842: Add codecs.unregister() function (pythonGH-22360)
  bpo-41875: Use __builtin_unreachable when possible (pythonGH-22433)
  bpo-40105: ZipFile truncate in append mode with shorter comment (pythonGH-19337)
  bpo-41870: Use PEP 590 vectorcall to speed up bool()  (pythonGH-22427)
  [doc] Leverage the fact that the actual types can now be indexed for typing (pythonGH-22340)
  bpo-41861: Convert _sqlite3 cache and node static types to heap types (pythonGH-22417)
  bpo-41858: Clarify line in optparse doc (pythonGH-22407)
  Revert "Fix all Python Cookbook links (python#22205)" (pythonGH-22424)
  bpo-1635741: Port _bisect module to multi-phase init (pythonGH-22415)
  bpo-41428: Fix compiler warning in unionobject.c (pythonGH-22416)
  Fix logging error message (pythonGH-22410)
  bpo-39934: Account for control blocks in 'except' in compiler. (pythonGH-22395)
  bpo-41775: Make 'IDLE Shell' the shell title  (python#22399)
  bpo-41428: Fix compiler warnings in unionobject.c (pythonGH-22388)
  bpo-41654: Fix compiler warning in MemoryError_dealloc() (pythonGH-22387)
  bpo-41833: threading.Thread now uses the target name (pythonGH-22357)
  bpo-30155: Add macros to get tzinfo from datetime instances (pythonGH-21633)
  ...
@taleinat taleinat changed the title bpo-39337: Add a test case to against nomralizeing() of codecs bpo-39337: Add a test case for normalizing of codec names Sep 29, 2020
Lib/test/test_codecs.py Outdated Show resolved Hide resolved
@taleinat
Copy link
Contributor

On the b.p.o. issue, the main issue was the handling of non-ASCII characters. It seems to me that this test should also test such cases.

@shihai1991
Copy link
Member Author

shihai1991 commented Sep 29, 2020

On the b.p.o. issue, the main issue was the handling of non-ASCII characters. It seems to me that this test should also test such cases.

Tal, you are right. positive and negative testcases are both needed. I create another PR #22219.
But there have no clear conclusion about the right behavior of normalize_encoding().

Lib/test/test_codecs.py Show resolved Hide resolved
Lib/test/test_codecs.py Outdated Show resolved Hide resolved
@@ -3415,5 +3415,28 @@ def test_rot13_func(self):
'To be, or not to be, that is the question')


class CodecNameNormalizationTest(unittest.TestCase):
"""Test the normalizestring function via codecs module"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that the codec names are normalized via the normalizestring function is an implementation detail, so the test should be written ignoring that detail, to the extent possible. This includes its documentation, so...

Suggested change
"""Test the normalizestring function via codecs module"""
"""Test codec name normalization"""

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I considered this details before. Why I am still leave this detail?
IMHO, leave enough details can help developers to maintain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're welcome to add a comment after the doc-string. I wouldn't want the doc-string to mention normalizestring, though, unless we change the test to call normalizestring() directly instead of codecs.lookup(), which I don't recommend.

From my experience, however, such comments often become outdated when the implementation changes. And outdated comments causing confusion are usually worse than no comments.

Also, if someone needs to see the details, they'll likely need to go through the code starting at codecs.lookup() anyways. It's relatively straightforward to follow: codecs imports lookup from _codecs, implemented in Modules/_codecsmodule.c. lookup() is a simple wrapper for _PyCodec_Lookup from Python/codecs.c. With the imports and wrappers out of the way, _PyCodec_Lookup calls normalizestring() near the top of the function, immediately after a bit of initialization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my experience, however, such comments often become outdated when the implementation changes. And outdated comments causing confusion are usually worse than no comments.

Thanks for your share, Tal. It's make sense :)

Lib/test/test_codecs.py Show resolved Hide resolved
@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

self.assertEqual(NOT_FOUND, codecs.lookup('AAA...8'))
self.assertEqual(NOT_FOUND, codecs.lookup('BBB-8'))
self.assertEqual(NOT_FOUND, codecs.lookup('BBB.8'))
self.assertEqual(NOT_FOUND, codecs.lookup('a\xe9\u20ac-8'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole point of https://bugs.python.org/issue39337 is to test that non-ASCII characters are ignored. Your test doesn't check that.

You should also test that codecs.lookup('aaa\xe9\u20ac-8') returns FOUND, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, tal have mentioned this detail too.
My doubt: positive and negative testcases are both needed. I create another PR #22219. But there have no clear conclusion about the right behavior of normalize_encoding().

Lib/test/test_codecs.py Outdated Show resolved Hide resolved
Lib/test/test_codecs.py Outdated Show resolved Hide resolved
@vstinner vstinner merged commit 3f34237 into python:master Oct 8, 2020
@vstinner
Copy link
Member

vstinner commented Oct 8, 2020

Thanks @shihai1991 for your tenacity ;-) With codecs.unregister() merged as a separated PR, this PR became straightforward and so simpler to review.

@shihai1991
Copy link
Member Author

shihai1991 commented Oct 9, 2020

Thanks for everybody's review :)

shihai1991 added a commit to shihai1991/cpython that referenced this pull request Oct 9, 2020
* origin/master: (147 commits)
  Fix the attribute names in the docstring of GenericAlias (pythonGH-22594)
  bpo-39337: Add a test case for normalizing of codec names (pythonGH-19069)
  bpo-41557: Update Windows installer to use SQLite 3.33.0 (pythonGH-21960)
  bpo-41976: Fix the fallback to gcc of ctypes.util.find_library when using gcc>9 (pythonGH-22598)
  bpo-41306: Allow scale value to not be rounded (pythonGH-21715)
  bpo-41970: Avoid test failure in test_lib2to3 if the module is already imported (pythonGH-22595)
  bpo-41376: Fix the documentation of `site.getusersitepackages()` (pythonGH-21602)
  Revert "bpo-26680: Incorporate is_integer in all built-in and standard library numeric types (pythonGH-6121)" (pythonGH-22584)
  bpo-41923: PEP 613: Add TypeAlias to typing module (python#22532)
  Fix comment about PyObject_IsTrue. (pythonGH-22343)
  bpo-38605: Make 'from __future__ import annotations' the default (pythonGH-20434)
  bpo-41905: Add abc.update_abstractmethods() (pythonGH-22485)
  bpo-41944: No longer call eval() on content received via HTTP in the UnicodeNames tests (pythonGH-22575)
  bpo-41944: No longer call eval() on content received via HTTP in the CJK codec tests (pythonGH-22566)
  Post 3.10.0a1
  Python 3.10.0a1
  bpo-41584: clarify when the reflected method of a binary arithemtic operator is called (python#22505)
  bpo-41939: Fix test_site.test_license_exists_at_url() (python#22559)
  bpo-41774: Tweak new programming FAQ entry (pythonGH-22562)
  bpo-41936. Remove macros Py_ALLOW_RECURSION/Py_END_ALLOW_RECURSION (pythonGH-22552)
  ...
xzy3 pushed a commit to xzy3/cpython that referenced this pull request Oct 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip news tests Tests in the Lib/test dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants