gh-99593: Add tests for Unicode C API (part 1) #99651

serhiy-storchaka · 2022-11-21T14:59:59Z

Add tests for functions corresponding to the str class methods.

Issue: Add tests for Unicode C API #99593

Add tests for functions corresponding to the str class methods.

vstinner

Very nice! Here is my first review :-)

Lib/test/test_capi/test_unicode.py

vstinner · 2022-11-24T14:12:06Z

Lib/test/test_capi/test_unicode.py

+        self.assertRaises(ValueError, split, 'a|b|c|d', '')
+        self.assertRaises(TypeError, split, 'a|b|c|d', ord('|'))
+        self.assertRaises(TypeError, split, [], '|')
+        # split(NULL, '|')


what does this comment stand for? Does the function crash with NULL? Same question for similar rsplit() comment below.

It crashes. It was the first test written by me 4 years ago, before I lost my sign, so I missed to add word CRASHES here.

vstinner · 2022-11-24T14:17:29Z

Lib/test/test_capi/test_unicode.py

+        self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d')
+        self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г')
+        self.assertEqual(translate('abc', []), 'abc')
+        self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None})


I don't understand. None is supposed to delete the "b" character: https://docs.python.org/dev/library/stdtypes.html#text-sequence-type-str

The mapping table must map Unicode ordinal integers to Unicode ordinal integers or None (causing deletion of the character).

Is the doc wrong?

The doc is wrong.

Ah. The surprising part is that str.translate() treats None as "delete:

>>> "abc".translate(str.maketrans({'b': None})) 'ac'

Well, it would be nice to update the doc (maybe in a separated PR).

Because str.translate calls PyUnicode_Translate() with the error handler "ignore".

Lib/test/test_capi/test_unicode.py

vstinner · 2022-11-24T14:19:15Z

Lib/test/test_capi/test_unicode.py

+        #for str in "\xa1", "\u8000\u8080", "\ud800\udc02", "\U0001f100\U0001f1f1":
+            #for i, ch in enumerate(str):
+                #self.assertEqual(tailmatch(str, ch, 0, len(str), 1), i)
+                #self.assertEqual(tailmatch(str, ch, 0, len(str), -1), i)


why is this code commented? if it is meaningless for tailmatch, just remove it?

I copied it from other tests (for find/index/count), but did not adapted it to tailmatch yet. I think it is easier to remove it now.

Lib/test/test_capi/test_unicode.py

serhiy-storchaka

Thank you for your review Victor. I have a problem with reviewing such large volume of code, especially if many lines looks similar, so I can easily miss some types of errors. Without your help I would not find them.

Lib/test/test_capi/test_unicode.py

serhiy-storchaka · 2022-11-27T07:34:36Z

Lib/test/test_capi/test_unicode.py

+        self.assertRaises(ValueError, split, 'a|b|c|d', '')
+        self.assertRaises(TypeError, split, 'a|b|c|d', ord('|'))
+        self.assertRaises(TypeError, split, [], '|')
+        # split(NULL, '|')


It crashes. It was the first test written by me 4 years ago, before I lost my sign, so I missed to add word CRASHES here.

serhiy-storchaka · 2022-11-27T07:38:07Z

Lib/test/test_capi/test_unicode.py

+        self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d')
+        self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г')
+        self.assertEqual(translate('abc', []), 'abc')
+        self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None})


The doc is wrong.

serhiy-storchaka · 2022-11-27T07:43:48Z

Lib/test/test_capi/test_unicode.py

+        #for str in "\xa1", "\u8000\u8080", "\ud800\udc02", "\U0001f100\U0001f1f1":
+            #for i, ch in enumerate(str):
+                #self.assertEqual(tailmatch(str, ch, 0, len(str), 1), i)
+                #self.assertEqual(tailmatch(str, ch, 0, len(str), -1), i)


I copied it from other tests (for find/index/count), but did not adapted it to tailmatch yet. I think it is easier to remove it now.

Lib/test/test_capi/test_unicode.py

vstinner

LGTM.

vstinner · 2022-11-28T09:24:03Z

Lib/test/test_capi/test_unicode.py

+        self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d')
+        self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г')
+        self.assertEqual(translate('abc', []), 'abc')
+        self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None})


Ah. The surprising part is that str.translate() treats None as "delete:

>>> "abc".translate(str.maketrans({'b': None})) 'ac'

Well, it would be nice to update the doc (maybe in a separated PR).

miss-islington · 2022-11-29T07:59:59Z

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

miss-islington · 2022-11-29T08:00:03Z

Sorry, @serhiy-storchaka, I could not cleanly backport this to 3.11 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker deaa8dee48beeae9928a418736da0608f2f18361 3.11

miss-islington · 2022-11-29T08:00:06Z

Sorry @serhiy-storchaka, I had trouble checking out the 3.10 backport branch.
Please retry by removing and re-adding the "needs backport to 3.10" label.
Alternatively, you can backport using cherry_picker on the command line.
cherry_picker deaa8dee48beeae9928a418736da0608f2f18361 3.10

vstinner · 2022-11-29T10:51:01Z

Oh, I didn't notice that you want to backport these tests to Python 3.10 and 3.11. You're motivated :-) If it's too complicated, maybe just add them to Python 3.12, no? _testcapi changed a lot since Python 3.11 (splited into multiple files).

serhiy-storchaka · 2022-11-29T14:29:04Z

I think that we should backport as many tests as possible, otherwise we risk to miss a regression introduced before the particular test was added. Especially if we do so many changes in C API.

miss-islington · 2023-07-10T13:08:30Z

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

miss-islington · 2023-07-10T13:08:32Z

Sorry @serhiy-storchaka, I had trouble checking out the 3.11 backport branch.
Please retry by removing and re-adding the "needs backport to 3.11" label.
Alternatively, you can backport using cherry_picker on the command line.
cherry_picker deaa8dee48beeae9928a418736da0608f2f18361 3.11

pythongh-99593: Add tests for Unicode C API (part 1)

7f5362f

Add tests for functions corresponding to the str class methods.

serhiy-storchaka added needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes labels Nov 21, 2022

serhiy-storchaka requested a review from vstinner November 21, 2022 14:59

bedevere-bot mentioned this pull request Nov 21, 2022

Add tests for Unicode C API #99593

Closed

bedevere-bot added the awaiting core review label Nov 21, 2022

serhiy-storchaka mentioned this pull request Nov 23, 2022

gh-99593: Add tests for Unicode C API #99594

Closed

vstinner reviewed Nov 24, 2022

View reviewed changes

serhiy-storchaka commented Nov 27, 2022

View reviewed changes

Address review comments.

545400a

vstinner approved these changes Nov 28, 2022

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Nov 28, 2022

serhiy-storchaka merged commit deaa8de into python:main Nov 29, 2022

bedevere-bot removed the awaiting merge label Nov 29, 2022

miss-islington assigned serhiy-storchaka Nov 29, 2022

serhiy-storchaka mentioned this pull request Nov 29, 2022

gh-93649: Split unicode tests from _testcapimodule.c & add some more #95819

Merged

serhiy-storchaka added needs backport to 3.11 only security fixes and removed needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes labels Jul 10, 2023

serhiy-storchaka deleted the test-unicode-capi5 branch July 10, 2023 13:08

serhiy-storchaka removed their assignment Jul 10, 2023

serhiy-storchaka removed the needs backport to 3.11 only security fixes label Jul 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-99593: Add tests for Unicode C API (part 1) #99651

gh-99593: Add tests for Unicode C API (part 1) #99651

serhiy-storchaka commented Nov 21, 2022 •

edited by bedevere-bot

Loading

vstinner left a comment

vstinner Nov 24, 2022

serhiy-storchaka Nov 27, 2022

vstinner Nov 24, 2022

serhiy-storchaka Nov 27, 2022

vstinner Nov 28, 2022

serhiy-storchaka Nov 29, 2022

vstinner Nov 24, 2022

serhiy-storchaka Nov 27, 2022

serhiy-storchaka left a comment

serhiy-storchaka Nov 27, 2022

serhiy-storchaka Nov 27, 2022

serhiy-storchaka Nov 27, 2022

vstinner left a comment

vstinner Nov 28, 2022

miss-islington commented Nov 29, 2022

miss-islington commented Nov 29, 2022

miss-islington commented Nov 29, 2022

vstinner commented Nov 29, 2022

serhiy-storchaka commented Nov 29, 2022

miss-islington commented Jul 10, 2023

miss-islington commented Jul 10, 2023

gh-99593: Add tests for Unicode C API (part 1) #99651

gh-99593: Add tests for Unicode C API (part 1) #99651

Conversation

serhiy-storchaka commented Nov 21, 2022 • edited by bedevere-bot Loading

vstinner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vstinner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miss-islington commented Nov 29, 2022

miss-islington commented Nov 29, 2022

miss-islington commented Nov 29, 2022

vstinner commented Nov 29, 2022

serhiy-storchaka commented Nov 29, 2022

miss-islington commented Jul 10, 2023

miss-islington commented Jul 10, 2023

serhiy-storchaka commented Nov 21, 2022 •

edited by bedevere-bot

Loading