-
-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Wrong way to get list of supported encodings #323
Comments
Hello,
Yes.
Already is the case.
IT IS TRUE WE MISSED 9 ENCODINGS! 💯 (Please avoid caps lock in the future) Otherwise, it is a good catch. They are still very kind rare encoding (depending on various factors), but still, valuable. Will add a patch to include them. Regards, |
Sorry, it was copied from Stackoverflow :) I didn't realize that there is an information about set(aliases.values()) and Caps lock is from the original post. Actually I'm working on porting this library to other language and go through all sources. |
Good! I am interested in any progress or proof of concept. Is it rust? golang? c? |
It's Rust. At the time I have ported cd/md/utils/models. So api and cli are waiting for implementation (ETA 1-3 weeks). |
Sorry for offtopic, I can't find any other possibility to contact you. https://github.com/nickspring/charset-normalizer-rs there is a Rust version of library. I tried to port with maximum compatibility. I believe I did everything correctly with copyrights and licences :) I would be grateful if you add link to th Rust version. |
Good work. I briefly tested it and it seems fine. I can mention it, of course. |
Whole command:
|
Describe the bug
It looks like you get list of encodings from encoding.aliases module.
aliases(as one would/should expect) contains several cases where different keys are mapped to the same value e.g. 1252 and windows_1252 are both mapped to cp1252. You could save time if instead of aliases.keys() you use set(aliases.values()).
BUT THERE'S A WORSE PROBLEM: aliases don't contain codecs that don't have aliases (like cp856, cp874, cp875, cp737, and koi8_u).
To Reproduce
List of encodings, supported by this library:
https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings
It's declared that the library supports all encodings which are supported by Python.
But there is no, for example, KOI8-U.
Expected behavior
But KOI8-U is supported (but just it doesn't have an alias):
https://docs.python.org/3.11/library/codecs.html#standard-encodings
The text was updated successfully, but these errors were encountered: