-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-99593: Add tests for Unicode C API (part 1) #99651
gh-99593: Add tests for Unicode C API (part 1) #99651
Conversation
Add tests for functions corresponding to the str class methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Here is my first review :-)
Lib/test/test_capi/test_unicode.py
Outdated
self.assertRaises(ValueError, split, 'a|b|c|d', '') | ||
self.assertRaises(TypeError, split, 'a|b|c|d', ord('|')) | ||
self.assertRaises(TypeError, split, [], '|') | ||
# split(NULL, '|') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this comment stand for? Does the function crash with NULL? Same question for similar rsplit() comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It crashes. It was the first test written by me 4 years ago, before I lost my sign, so I missed to add word CRASHES here.
self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d') | ||
self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г') | ||
self.assertEqual(translate('abc', []), 'abc') | ||
self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. None is supposed to delete the "b" character: https://docs.python.org/dev/library/stdtypes.html#text-sequence-type-str
The mapping table must map Unicode ordinal integers to Unicode ordinal integers or None (causing deletion of the character).
Is the doc wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. The surprising part is that str.translate() treats None as "delete:
>>> "abc".translate(str.maketrans({'b': None}))
'ac'
Well, it would be nice to update the doc (maybe in a separated PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because str.translate
calls PyUnicode_Translate()
with the error handler "ignore"
.
Lib/test/test_capi/test_unicode.py
Outdated
#for str in "\xa1", "\u8000\u8080", "\ud800\udc02", "\U0001f100\U0001f1f1": | ||
#for i, ch in enumerate(str): | ||
#self.assertEqual(tailmatch(str, ch, 0, len(str), 1), i) | ||
#self.assertEqual(tailmatch(str, ch, 0, len(str), -1), i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this code commented? if it is meaningless for tailmatch, just remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied it from other tests (for find/index/count), but did not adapted it to tailmatch yet. I think it is easier to remove it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review Victor. I have a problem with reviewing such large volume of code, especially if many lines looks similar, so I can easily miss some types of errors. Without your help I would not find them.
Lib/test/test_capi/test_unicode.py
Outdated
self.assertRaises(ValueError, split, 'a|b|c|d', '') | ||
self.assertRaises(TypeError, split, 'a|b|c|d', ord('|')) | ||
self.assertRaises(TypeError, split, [], '|') | ||
# split(NULL, '|') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It crashes. It was the first test written by me 4 years ago, before I lost my sign, so I missed to add word CRASHES here.
self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d') | ||
self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г') | ||
self.assertEqual(translate('abc', []), 'abc') | ||
self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc is wrong.
Lib/test/test_capi/test_unicode.py
Outdated
#for str in "\xa1", "\u8000\u8080", "\ud800\udc02", "\U0001f100\U0001f1f1": | ||
#for i, ch in enumerate(str): | ||
#self.assertEqual(tailmatch(str, ch, 0, len(str), 1), i) | ||
#self.assertEqual(tailmatch(str, ch, 0, len(str), -1), i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied it from other tests (for find/index/count), but did not adapted it to tailmatch yet. I think it is easier to remove it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d') | ||
self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г') | ||
self.assertEqual(translate('abc', []), 'abc') | ||
self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. The surprising part is that str.translate() treats None as "delete:
>>> "abc".translate(str.maketrans({'b': None}))
'ac'
Well, it would be nice to update the doc (maybe in a separated PR).
Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11. |
Sorry, @serhiy-storchaka, I could not cleanly backport this to |
Sorry @serhiy-storchaka, I had trouble checking out the |
Oh, I didn't notice that you want to backport these tests to Python 3.10 and 3.11. You're motivated :-) If it's too complicated, maybe just add them to Python 3.12, no? _testcapi changed a lot since Python 3.11 (splited into multiple files). |
I think that we should backport as many tests as possible, otherwise we risk to miss a regression introduced before the particular test was added. Especially if we do so many changes in C API. |
Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11. |
Sorry @serhiy-storchaka, I had trouble checking out the |
Add tests for functions corresponding to the str class methods.