-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-39337: Add a test case for normalizing of codec names #19069
Conversation
cc @vstinner Hi, victor. I added a test case of |
Is this PR still relevant? Without unregistering the codec, it is likely to fail on Refleak buildbots. Try to run "./python -m test test_codecs -R 3:3 -m test_normalized_encoding". |
ok, I will try it again in this weekend. I almost forget this PR's detail, Lol~ |
* origin/master: (1373 commits) bpo-1635741: Port mashal module to multi-phase init (python#22149) bpo-1635741: Port _string module to multi-phase init (pythonGH-22148) bpo-1635741: Convert _sha256 types to heap types (pythonGH-22134) bpo-1635741: Port the termios to multi-phase init (PEP 489) (pythonGH-22139) bpo-41732: add iterator to memoryview (pythonGH-22119) bpo-40744: Drop support for SQLite pre 3.7.3 (pythonGH-20909) bpo-41316: Make tarfile follow specs for FNAME (pythonGH-21511) bpo-41720: Add "return NotImplemented" in turtle.Vec2D.__rmul__(). (pythonGH-22092) bpo-1635741 port _curses_panel to multi-phase init (PEP 489) (pythonGH-21986) bpo-1635741: Port _overlapped module to multi-phase init (pythonGH-22051) bpo-1635741: Port _opcode module to multi-phase init (PEP 489) (pythonGH-22050) bpo-1635741 port zlib module to multi-phase init (pythonGH-21995) [doc] Add link to Generic in typing (pythonGH-22125) bpo-41513: Expand comments and add references for a better understanding (pythonGH-22123) bpo-1635741: Port _sha1, _sha512, _md5 to multiphase init (pythonGH-21818) closes bpo-41723: Fix an error in the py_compile documentation. (pythonGH-22110) [doc] Fix padding in some typing definitions (pythonGH-22114) Fix documented Python version for venv --upgrade-deps (pythonGH-22113) bpo-40318: Migrate to SQLite3 trace v2 API (pythonGH-19581) bpo-41687: Fix sendfile implementation to work with Solaris (python#22040) ...
@vstinner Hi, victor. I have updated this PR. Pls take a look again when you have free time. |
@vstinner, this PR now unregisters the codec search function, going to some significant effort to do so. Would you like to review it again? |
Can you try to write a first PR just to add _codecs.unregister()? Do you think that it's worth it to expose it in the public codecs module as well? Near: If yes, it should be documented. |
copy that.
Should be added definitely, I will update it soon. |
* origin/master: (113 commits) bpo-41773: Raise exception for non-finite weights in random.choices(). (pythonGH-22441) bpo-41873: Add vectorcall for float() (pythonGH-22432) bpo-41861: Convert _sqlite3 PrepareProtocolType to heap type (pythonGH-22428) bpo-41842: Add codecs.unregister() function (pythonGH-22360) bpo-41875: Use __builtin_unreachable when possible (pythonGH-22433) bpo-40105: ZipFile truncate in append mode with shorter comment (pythonGH-19337) bpo-41870: Use PEP 590 vectorcall to speed up bool() (pythonGH-22427) [doc] Leverage the fact that the actual types can now be indexed for typing (pythonGH-22340) bpo-41861: Convert _sqlite3 cache and node static types to heap types (pythonGH-22417) bpo-41858: Clarify line in optparse doc (pythonGH-22407) Revert "Fix all Python Cookbook links (python#22205)" (pythonGH-22424) bpo-1635741: Port _bisect module to multi-phase init (pythonGH-22415) bpo-41428: Fix compiler warning in unionobject.c (pythonGH-22416) Fix logging error message (pythonGH-22410) bpo-39934: Account for control blocks in 'except' in compiler. (pythonGH-22395) bpo-41775: Make 'IDLE Shell' the shell title (python#22399) bpo-41428: Fix compiler warnings in unionobject.c (pythonGH-22388) bpo-41654: Fix compiler warning in MemoryError_dealloc() (pythonGH-22387) bpo-41833: threading.Thread now uses the target name (pythonGH-22357) bpo-30155: Add macros to get tzinfo from datetime instances (pythonGH-21633) ...
On the b.p.o. issue, the main issue was the handling of non-ASCII characters. It seems to me that this test should also test such cases. |
Tal, you are right. positive and negative testcases are both needed. I create another PR #22219. |
Lib/test/test_codecs.py
Outdated
@@ -3415,5 +3415,28 @@ def test_rot13_func(self): | |||
'To be, or not to be, that is the question') | |||
|
|||
|
|||
class CodecNameNormalizationTest(unittest.TestCase): | |||
"""Test the normalizestring function via codecs module""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that the codec names are normalized via the normalizestring
function is an implementation detail, so the test should be written ignoring that detail, to the extent possible. This includes its documentation, so...
"""Test the normalizestring function via codecs module""" | |
"""Test codec name normalization""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I considered this details before. Why I am still leave this detail?
IMHO, leave enough details can help developers to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're welcome to add a comment after the doc-string. I wouldn't want the doc-string to mention normalizestring, though, unless we change the test to call normalizestring()
directly instead of codecs.lookup()
, which I don't recommend.
From my experience, however, such comments often become outdated when the implementation changes. And outdated comments causing confusion are usually worse than no comments.
Also, if someone needs to see the details, they'll likely need to go through the code starting at codecs.lookup()
anyways. It's relatively straightforward to follow: codecs
imports lookup
from _codecs
, implemented in Modules/_codecsmodule.c
. lookup()
is a simple wrapper for _PyCodec_Lookup
from Python/codecs.c
. With the imports and wrappers out of the way, _PyCodec_Lookup
calls normalizestring()
near the top of the function, immediately after a bit of initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my experience, however, such comments often become outdated when the implementation changes. And outdated comments causing confusion are usually worse than no comments.
Thanks for your share, Tal. It's make sense :)
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
self.assertEqual(NOT_FOUND, codecs.lookup('AAA...8')) | ||
self.assertEqual(NOT_FOUND, codecs.lookup('BBB-8')) | ||
self.assertEqual(NOT_FOUND, codecs.lookup('BBB.8')) | ||
self.assertEqual(NOT_FOUND, codecs.lookup('a\xe9\u20ac-8')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole point of https://bugs.python.org/issue39337 is to test that non-ASCII characters are ignored. Your test doesn't check that.
You should also test that codecs.lookup('aaa\xe9\u20ac-8') returns FOUND, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, tal have mentioned this detail too.
My doubt: positive and negative testcases are both needed. I create another PR #22219. But there have no clear conclusion about the right behavior of normalize_encoding().
Thanks @shihai1991 for your tenacity ;-) With codecs.unregister() merged as a separated PR, this PR became straightforward and so simpler to review. |
Thanks for everybody's review :) |
* origin/master: (147 commits) Fix the attribute names in the docstring of GenericAlias (pythonGH-22594) bpo-39337: Add a test case for normalizing of codec names (pythonGH-19069) bpo-41557: Update Windows installer to use SQLite 3.33.0 (pythonGH-21960) bpo-41976: Fix the fallback to gcc of ctypes.util.find_library when using gcc>9 (pythonGH-22598) bpo-41306: Allow scale value to not be rounded (pythonGH-21715) bpo-41970: Avoid test failure in test_lib2to3 if the module is already imported (pythonGH-22595) bpo-41376: Fix the documentation of `site.getusersitepackages()` (pythonGH-21602) Revert "bpo-26680: Incorporate is_integer in all built-in and standard library numeric types (pythonGH-6121)" (pythonGH-22584) bpo-41923: PEP 613: Add TypeAlias to typing module (python#22532) Fix comment about PyObject_IsTrue. (pythonGH-22343) bpo-38605: Make 'from __future__ import annotations' the default (pythonGH-20434) bpo-41905: Add abc.update_abstractmethods() (pythonGH-22485) bpo-41944: No longer call eval() on content received via HTTP in the UnicodeNames tests (pythonGH-22575) bpo-41944: No longer call eval() on content received via HTTP in the CJK codec tests (pythonGH-22566) Post 3.10.0a1 Python 3.10.0a1 bpo-41584: clarify when the reflected method of a binary arithemtic operator is called (python#22505) bpo-41939: Fix test_site.test_license_exists_at_url() (python#22559) bpo-41774: Tweak new programming FAQ entry (pythonGH-22562) bpo-41936. Remove macros Py_ALLOW_RECURSION/Py_END_ALLOW_RECURSION (pythonGH-22552) ...
https://bugs.python.org/issue39337