-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix tests with Python 3.11.4+ #213
Conversation
Codecov Report
@@ Coverage Diff @@
## master #213 +/- ##
=======================================
Coverage 96.18% 96.18%
=======================================
Files 9 9
Lines 498 498
Branches 94 94
=======================================
Hits 479 479
Misses 9 9
Partials 10 10 |
Can you please run |
Done |
tests/test_url.py
Outdated
else: | ||
KNOWN_SAFE_URL_STRING_URL_ISSUES.add( | ||
f"https://{USERNAME_TO_ENCODE}:{PASSWORD_TO_ENCODE}@example.com" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, to make sure I understand this correctly:
- Both
safe_url_string("http://[2a01:5cc0:1:2:3:4]")
andsafe_url_string('https://"%;<=>@[]^`{|}\x7f:"%;<=>@[]^`{|}\x7f:@example.com')
work on older Python versions but raise a ValueError on 3.11.4. test_safe_url_string_url
is a test that takes input and output ofsafe_url_string()
fromSAFE_URL_URL_CASES
and expects that output unless the input is inKNOWN_SAFE_URL_STRING_URL_ISSUES
.- Hence,
KNOWN_SAFE_URL_STRING_URL_ISSUES
is basically a list of cases that should xfail. SAFE_URL_URL_CASES
says that"http://[2a01:5cc0:1:2:3:4]"
should give ValueError and this is true only on the new Python, so this change removes xfail for it on the new Python.SAFE_URL_URL_CASES
says thatsafe_url_string('https://"%;<=>@[]^`{|}\x7f:"%;<=>@[]^`{|}\x7f:@example.com')
should be encoded correctly and that works only on older Python, so this change adds xfail for it on the new Python.
With this in mind I feel like we should instead change the USERNAME_TO_ENCODE and/or PASSWORD_TO_ENCODE so that it works on all Python versions instead. It's probably enough to remove []
from there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if remove []
from USERNAME_TO_ENCODE
, then test pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, can you please make this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I have been delaying reviewing this change until I could put some time aside to properly look into it, as it felt to me like something was not right. After a proper review, I think this change is actually perfect as it is, so thank you very much @shadchin. But I feel the need to clarify, for future reference, what’s happening here since Python 3.11.4, which is 2 things:
To elaborate on 2: On Python 3.11.3 and earlier, >>> safe_url_string("https://user[]:password[]@example.com")
'https://user%5B%5D:password%5B%[email protected]' Since 3.11.4, that’s not possible, the URL is unparsable altogether by the Python standard library: >>> safe_url_string("https://user[]:password[]@example.com")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/adrian/temporal/w3lib-213/venv/lib/python3.11/site-packages/w3lib/url.py", line 142, in safe_url_string
parts = urlsplit(_strip(decoded))
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adrian/.pyenv/versions/3.11.4/lib/python3.11/urllib/parse.py", line 500, in urlsplit
_check_bracketed_host(bracketed_host)
File "/home/adrian/.pyenv/versions/3.11.4/lib/python3.11/urllib/parse.py", line 446, in _check_bracketed_host
ip = ipaddress.ip_address(hostname) # Throws Value Error if not IPv6 or IPv4
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adrian/.pyenv/versions/3.11.4/lib/python3.11/ipaddress.py", line 54, in ip_address
raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address')
ValueError: '' does not appear to be an IPv4 or IPv6 address This is OK for Python, but not for us, that strive to fix bad URLs where possible, and here it should be (and in fact used to be) possible. I dream to some day have our own URL parsing implementation, with a good performance but a more browser-like behavior, so that we have control over things like this 😓 . |
The CVE fix got backported to Python 3.10.16 as well, so the test suite is also failing for 3.10. I'll send a PR extending the check to everything <= this version too. |
PR at #233. |
Fix #212