Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data #99613

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Nov 20, 2022

Previously *consumed was not set in this case.

This is an extraction from #99594.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Previously *consumed was not set in this case.

self.assertRaises(LookupError, decodeutf8, b'a\x80', 'foo')
# TODO: Test PyUnicode_DecodeUTF8() with NULL as data and
# negative size.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike TODO which stays forever. Please either address it or remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #99594 you requested to split that PR on smaller parts. If address these TODOes, it can add more complexity to this PR, which is not directly related to the bug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If remove them, some cases will never be covered by tests.

@@ -11,6 +11,53 @@

class CAPITest(unittest.TestCase):

@support.cpython_only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of moving these tests to test_codecs? For me, these APIs are more related to codecs than "Unicode". test_codecs already has multiple tests about the UTF-8 encoding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. But would not be more convenient to have tests for all PyUnicode_* functions in a single file?

@vstinner
Copy link
Member

But would not be more convenient to have tests for all PyUnicode_* functions in a single file?

It would be nice to move the C code from the giant unicodeobject.c into a separed file, since codecs are not directly related to Unicode. They only "use" the Unicode API. I dislike the fact these functions share the PyUnicode_ prefix, but that cannot be changed now.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lib/test/test_capi/test_codecs.py

Oh. I was thinking at moving the tests in the existing Lib/test/test_codecs.py file.

See #93649 (comment) discussion in issue #93649.

@serhiy-storchaka
Copy link
Member Author

Lib/test/test_codecs.py is already too large, even without C API tests. It is better to split C API tests in a separate file.

Also, it is convenient to run all C API test if they are grouped together in a package.

@vstinner
Copy link
Member

For the organization of C API tests, I commented there: #93649 (comment)


class CAPITest(unittest.TestCase):

@support.cpython_only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it's redundant with testing _testcapi. For example, PyPy doesn't have _testcapi. But you can keep it if you prefer to be explicit.

class CAPITest(unittest.TestCase):

@support.cpython_only
@unittest.skipIf(_testcapi is None, 'need _testcapi module')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO you can move it on the whole class.

try:
import _testcapi
except ImportError:
_testcapi = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not skipping this test if the module is missing?

@unittest.skipIf(_testcapi is None, 'need _testcapi module')
def test_decodeutf8(self):
"""Test PyUnicode_DecodeUTF8()"""
from _testcapi import unicode_decodeutf8 as decodeutf8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may use decodeutf8 = _testcapi.unicode_decodeutf8.

const char *data;
Py_ssize_t size;
const char *errors = NULL;
Py_ssize_t consumed;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would feel safer if you initialize the value. Maybe to a marker value like 42?

Otherwise, the test may miss the bug by luck, if local variables allocated on the stack are initialize to 0.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for updating the PR. I'm more confident with the consumed variable initialized to a value in the test ;-)

@serhiy-storchaka
Copy link
Member Author

serhiy-storchaka commented Dec 1, 2022

I completely agree with you.

@serhiy-storchaka serhiy-storchaka merged commit f08e52c into python:main Dec 1, 2022
@miss-islington
Copy link
Contributor

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry, @serhiy-storchaka, I could not cleanly backport this to 3.11 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker f08e52ccb027f6f703302b8c1a82db9fd3934270 3.11

@miss-islington
Copy link
Contributor

Sorry @serhiy-storchaka, I had trouble checking out the 3.10 backport branch.
Please retry by removing and re-adding the "needs backport to 3.10" label.
Alternatively, you can backport using cherry_picker on the command line.
cherry_picker f08e52ccb027f6f703302b8c1a82db9fd3934270 3.10

carljm added a commit to carljm/cpython that referenced this pull request Dec 1, 2022
* main: (112 commits)
  pythongh-99894: Ensure the local names don't collide with the test file in traceback suggestion error checking (python#99895)
  pythongh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data (pythonGH-99613)
  Doc: Add summary line to isolation_level & autocommit sqlite3.connect params (python#99917)
  pythonGH-98906 ```re``` module: ```search() vs. match()``` section should mention ```fullmatch()``` (pythonGH-98916)
  pythongh-89189: More compact range iterator (pythonGH-27986)
  bpo-47220: Document the optional callback parameter of weakref.WeakMethod (pythonGH-25491)
  pythonGH-99905: Fix output of misses in summarize_stats.py execution counts (pythonGH-99906)
  pythongh-99845: PEP 670: Convert PyObject macros to functions (python#99850)
  pythongh-99845: Use size_t type in __sizeof__() methods (python#99846)
  pythonGH-99877)
  Fix typo in exception message in `multiprocessing.pool` (python#99900)
  pythongh-87092: move all localsplus preparation into separate function called from assembler stage (pythonGH-99869)
  pythongh-99891: Fix infinite recursion in the tokenizer when showing warnings (pythonGH-99893)
  pythongh-99824: Document that sqlite3.connect implicitly open a transaction if autocommit=False (python#99825)
  pythonGH-81057: remove static state from suggestions.c (python#99411)
  Improve zip64 limit error message (python#95892)
  pythongh-98253: Break potential reference cycles in external code worsened by typing.py lru_cache (python#98591)
  pythongh-99127: Allow some features of syslog to the main interpreter only (pythongh-99128)
  pythongh-82836: fix private network check (python#97733)
  Docs: improve accuracy of socketserver reference (python#24767)
  ...
@serhiy-storchaka serhiy-storchaka deleted the capi-PyUnicode_DecodeUTF8Stateful branch July 25, 2023 09:50
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request Jul 25, 2023
…nly data (pythonGH-99613)

Previously *consumed was not set in this case..
(cherry picked from commit f08e52c)

Co-authored-by: Serhiy Storchaka <[email protected]>
@bedevere-bot
Copy link

GH-107224 is a backport of this pull request to the 3.11 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label Jul 25, 2023
@serhiy-storchaka serhiy-storchaka removed their assignment Jul 25, 2023
@serhiy-storchaka serhiy-storchaka removed the needs backport to 3.10 only security fixes label Jul 25, 2023
serhiy-storchaka added a commit that referenced this pull request Jul 25, 2023
…ta (GH-99613) (GH-107224)

Previously *consumed was not set in this case.
(cherry picked from commit f08e52c)
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 25, 2023
…nly data (pythonGH-99613) (pythonGH-107224)

(cherry picked from commit b8b3e6a)

Co-authored-by: Serhiy Storchaka <[email protected]>
Previously *consumed was not set in this case.
(cherry picked from commit f08e52c)
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request Jul 25, 2023
…SCII-only data (pythonGH-99613) (pythonGH-107224)

Previously *consumed was not set in this case.
(cherry picked from commit f08e52c).
(cherry picked from commit b8b3e6a)

Co-authored-by: Serhiy Storchaka <[email protected]>
ambv pushed a commit that referenced this pull request Aug 22, 2023
…ta (GH-99613) (GH-107224) (#107230)

Previously *consumed was not set in this case.

(cherry picked from commit b8b3e6a)
(cherry picked from commit f08e52c)

Co-authored-by: Serhiy Storchaka <[email protected]>
ambv pushed a commit that referenced this pull request Aug 22, 2023
GH-99613) (GH-107224) (#107231)

Previously *consumed was not set in this case.
(cherry picked from commit f08e52c).
(cherry picked from commit b8b3e6a)
carlosroman pushed a commit to DataDog/cpython that referenced this pull request Oct 11, 2023
…ly data (pythonGH-99613) (pythonGH-107224) (python#107231)

Previously *consumed was not set in this case.
(cherry picked from commit f08e52c).
(cherry picked from commit b8b3e6a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants