-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data #99613
gh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data #99613
Conversation
Previously *consumed was not set in this case.
Lib/test/test_capi/test_unicode.py
Outdated
|
||
self.assertRaises(LookupError, decodeutf8, b'a\x80', 'foo') | ||
# TODO: Test PyUnicode_DecodeUTF8() with NULL as data and | ||
# negative size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike TODO which stays forever. Please either address it or remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #99594 you requested to split that PR on smaller parts. If address these TODOes, it can add more complexity to this PR, which is not directly related to the bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If remove them, some cases will never be covered by tests.
Lib/test/test_capi/test_unicode.py
Outdated
@@ -11,6 +11,53 @@ | |||
|
|||
class CAPITest(unittest.TestCase): | |||
|
|||
@support.cpython_only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of moving these tests to test_codecs? For me, these APIs are more related to codecs than "Unicode". test_codecs already has multiple tests about the UTF-8 encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. But would not be more convenient to have tests for all PyUnicode_*
functions in a single file?
It would be nice to move the C code from the giant unicodeobject.c into a separed file, since codecs are not directly related to Unicode. They only "use" the Unicode API. I dislike the fact these functions share the PyUnicode_ prefix, but that cannot be changed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lib/test/test_capi/test_codecs.py
Oh. I was thinking at moving the tests in the existing Lib/test/test_codecs.py
file.
See #93649 (comment) discussion in issue #93649.
Also, it is convenient to run all C API test if they are grouped together in a package. |
For the organization of C API tests, I commented there: #93649 (comment) |
Lib/test/test_capi/test_codecs.py
Outdated
|
||
class CAPITest(unittest.TestCase): | ||
|
||
@support.cpython_only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it's redundant with testing _testcapi. For example, PyPy doesn't have _testcapi. But you can keep it if you prefer to be explicit.
Lib/test/test_capi/test_codecs.py
Outdated
class CAPITest(unittest.TestCase): | ||
|
||
@support.cpython_only | ||
@unittest.skipIf(_testcapi is None, 'need _testcapi module') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO you can move it on the whole class.
Lib/test/test_capi/test_codecs.py
Outdated
try: | ||
import _testcapi | ||
except ImportError: | ||
_testcapi = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not skipping this test if the module is missing?
Lib/test/test_capi/test_codecs.py
Outdated
@unittest.skipIf(_testcapi is None, 'need _testcapi module') | ||
def test_decodeutf8(self): | ||
"""Test PyUnicode_DecodeUTF8()""" | ||
from _testcapi import unicode_decodeutf8 as decodeutf8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may use decodeutf8 = _testcapi.unicode_decodeutf8
.
Modules/_testcapi/unicode.c
Outdated
const char *data; | ||
Py_ssize_t size; | ||
const char *errors = NULL; | ||
Py_ssize_t consumed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would feel safer if you initialize the value. Maybe to a marker value like 42?
Otherwise, the test may miss the bug by luck, if local variables allocated on the stack are initialize to 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for updating the PR. I'm more confident with the consumed variable initialized to a value in the test ;-)
I completely agree with you. |
Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11. |
Sorry, @serhiy-storchaka, I could not cleanly backport this to |
Sorry @serhiy-storchaka, I had trouble checking out the |
* main: (112 commits) pythongh-99894: Ensure the local names don't collide with the test file in traceback suggestion error checking (python#99895) pythongh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data (pythonGH-99613) Doc: Add summary line to isolation_level & autocommit sqlite3.connect params (python#99917) pythonGH-98906 ```re``` module: ```search() vs. match()``` section should mention ```fullmatch()``` (pythonGH-98916) pythongh-89189: More compact range iterator (pythonGH-27986) bpo-47220: Document the optional callback parameter of weakref.WeakMethod (pythonGH-25491) pythonGH-99905: Fix output of misses in summarize_stats.py execution counts (pythonGH-99906) pythongh-99845: PEP 670: Convert PyObject macros to functions (python#99850) pythongh-99845: Use size_t type in __sizeof__() methods (python#99846) pythonGH-99877) Fix typo in exception message in `multiprocessing.pool` (python#99900) pythongh-87092: move all localsplus preparation into separate function called from assembler stage (pythonGH-99869) pythongh-99891: Fix infinite recursion in the tokenizer when showing warnings (pythonGH-99893) pythongh-99824: Document that sqlite3.connect implicitly open a transaction if autocommit=False (python#99825) pythonGH-81057: remove static state from suggestions.c (python#99411) Improve zip64 limit error message (python#95892) pythongh-98253: Break potential reference cycles in external code worsened by typing.py lru_cache (python#98591) pythongh-99127: Allow some features of syslog to the main interpreter only (pythongh-99128) pythongh-82836: fix private network check (python#97733) Docs: improve accuracy of socketserver reference (python#24767) ...
…nly data (pythonGH-99613) Previously *consumed was not set in this case.. (cherry picked from commit f08e52c) Co-authored-by: Serhiy Storchaka <[email protected]>
GH-107224 is a backport of this pull request to the 3.11 branch. |
…nly data (pythonGH-99613) (pythonGH-107224) (cherry picked from commit b8b3e6a) Co-authored-by: Serhiy Storchaka <[email protected]> Previously *consumed was not set in this case. (cherry picked from commit f08e52c)
…SCII-only data (pythonGH-99613) (pythonGH-107224) Previously *consumed was not set in this case. (cherry picked from commit f08e52c). (cherry picked from commit b8b3e6a) Co-authored-by: Serhiy Storchaka <[email protected]>
…ly data (pythonGH-99613) (pythonGH-107224) (python#107231) Previously *consumed was not set in this case. (cherry picked from commit f08e52c). (cherry picked from commit b8b3e6a)
Previously *consumed was not set in this case.
This is an extraction from #99594.