We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
While decoding faulty websites like this one https://www.societe.com/societe/ankaboot-832320170.html
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 2: invalid continuation byte Exception ignored in: 'selectolax.lexbor.text_callback' Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 2: invalid continuation byt
This may be fixed, if the default policy changes from "strict" (default) to "replace"
selectolax/selectolax/lexbor/node.pxi
Line 863 in 19ee5e0
py_str = text.decode(_ENCODING, "replace")
The text was updated successfully, but these errors were encountered:
Hi, I just had a look and turns out that the Modest HTMLParser already handles this by allowing to pass decode_errors kwarg to it:
decode_errors
HTMLParser(html, decode_errors="ignore")
Also for Modest the default is ignore.
ignore
I updated the code to have the same behavior for Lexbor, but still need to add tests / document, so I'll finish that over the weekend :)
Sorry, something went wrong.
Hi @JuroOravec, thanks for the awesome package! Is there any ETA on a fix for this issue?
No branches or pull requests
While decoding faulty websites like this one https://www.societe.com/societe/ankaboot-832320170.html
This may be fixed, if the default policy changes from "strict" (default) to "replace"
selectolax/selectolax/lexbor/node.pxi
Line 863 in 19ee5e0
py_str = text.decode(_ENCODING, "replace")
The text was updated successfully, but these errors were encountered: