Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-99593: Add tests for Unicode C API (part 1) #99651

Merged
merged 2 commits into from
Nov 29, 2022

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Nov 21, 2022

Add tests for functions corresponding to the str class methods.

Add tests for functions corresponding to the str class methods.
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Here is my first review :-)

Lib/test/test_capi/test_unicode.py Show resolved Hide resolved
self.assertRaises(ValueError, split, 'a|b|c|d', '')
self.assertRaises(TypeError, split, 'a|b|c|d', ord('|'))
self.assertRaises(TypeError, split, [], '|')
# split(NULL, '|')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this comment stand for? Does the function crash with NULL? Same question for similar rsplit() comment below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It crashes. It was the first test written by me 4 years ago, before I lost my sign, so I missed to add word CRASHES here.

self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d')
self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г')
self.assertEqual(translate('abc', []), 'abc')
self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. None is supposed to delete the "b" character: https://docs.python.org/dev/library/stdtypes.html#text-sequence-type-str

The mapping table must map Unicode ordinal integers to Unicode ordinal integers or None (causing deletion of the character).

Is the doc wrong?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc is wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. The surprising part is that str.translate() treats None as "delete:

>>> "abc".translate(str.maketrans({'b': None}))
'ac'

Well, it would be nice to update the doc (maybe in a separated PR).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because str.translate calls PyUnicode_Translate() with the error handler "ignore".

Lib/test/test_capi/test_unicode.py Show resolved Hide resolved
#for str in "\xa1", "\u8000\u8080", "\ud800\udc02", "\U0001f100\U0001f1f1":
#for i, ch in enumerate(str):
#self.assertEqual(tailmatch(str, ch, 0, len(str), 1), i)
#self.assertEqual(tailmatch(str, ch, 0, len(str), -1), i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this code commented? if it is meaningless for tailmatch, just remove it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied it from other tests (for find/index/count), but did not adapted it to tailmatch yet. I think it is easier to remove it now.

Lib/test/test_capi/test_unicode.py Outdated Show resolved Hide resolved
Lib/test/test_capi/test_unicode.py Show resolved Hide resolved
Copy link
Member Author

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review Victor. I have a problem with reviewing such large volume of code, especially if many lines looks similar, so I can easily miss some types of errors. Without your help I would not find them.

Lib/test/test_capi/test_unicode.py Show resolved Hide resolved
self.assertRaises(ValueError, split, 'a|b|c|d', '')
self.assertRaises(TypeError, split, 'a|b|c|d', ord('|'))
self.assertRaises(TypeError, split, [], '|')
# split(NULL, '|')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It crashes. It was the first test written by me 4 years ago, before I lost my sign, so I missed to add word CRASHES here.

self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d')
self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г')
self.assertEqual(translate('abc', []), 'abc')
self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc is wrong.

#for str in "\xa1", "\u8000\u8080", "\ud800\udc02", "\U0001f100\U0001f1f1":
#for i, ch in enumerate(str):
#self.assertEqual(tailmatch(str, ch, 0, len(str), 1), i)
#self.assertEqual(tailmatch(str, ch, 0, len(str), -1), i)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied it from other tests (for find/index/count), but did not adapted it to tailmatch yet. I think it is easier to remove it now.

Lib/test/test_capi/test_unicode.py Outdated Show resolved Hide resolved
Lib/test/test_capi/test_unicode.py Show resolved Hide resolved
Lib/test/test_capi/test_unicode.py Show resolved Hide resolved
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d')
self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г')
self.assertEqual(translate('abc', []), 'abc')
self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. The surprising part is that str.translate() treats None as "delete:

>>> "abc".translate(str.maketrans({'b': None}))
'ac'

Well, it would be nice to update the doc (maybe in a separated PR).

@miss-islington
Copy link
Contributor

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

@miss-islington
Copy link
Contributor

Sorry, @serhiy-storchaka, I could not cleanly backport this to 3.11 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker deaa8dee48beeae9928a418736da0608f2f18361 3.11

@miss-islington
Copy link
Contributor

Sorry @serhiy-storchaka, I had trouble checking out the 3.10 backport branch.
Please retry by removing and re-adding the "needs backport to 3.10" label.
Alternatively, you can backport using cherry_picker on the command line.
cherry_picker deaa8dee48beeae9928a418736da0608f2f18361 3.10

@vstinner
Copy link
Member

Oh, I didn't notice that you want to backport these tests to Python 3.10 and 3.11. You're motivated :-) If it's too complicated, maybe just add them to Python 3.12, no? _testcapi changed a lot since Python 3.11 (splited into multiple files).

@serhiy-storchaka
Copy link
Member Author

I think that we should backport as many tests as possible, otherwise we risk to miss a regression introduced before the particular test was added. Especially if we do so many changes in C API.

@serhiy-storchaka serhiy-storchaka added needs backport to 3.11 only security fixes and removed needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes labels Jul 10, 2023
@miss-islington
Copy link
Contributor

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry @serhiy-storchaka, I had trouble checking out the 3.11 backport branch.
Please retry by removing and re-adding the "needs backport to 3.11" label.
Alternatively, you can backport using cherry_picker on the command line.
cherry_picker deaa8dee48beeae9928a418736da0608f2f18361 3.11

@serhiy-storchaka serhiy-storchaka deleted the test-unicode-capi5 branch July 10, 2023 13:08
@serhiy-storchaka serhiy-storchaka removed their assignment Jul 10, 2023
@serhiy-storchaka serhiy-storchaka removed the needs backport to 3.11 only security fixes label Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants