-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding value casing (eg. ASCII) #391
Comments
I vote for matching against lower case strings, but using |
Totally agree, runtimes should treat these strings in case-insensitive way (actually, I thought all of them behave that way already — and they almost do so), and we should declare some canonical way to spell them in the style guide. I'd go with lowercasing everything, as it's simple and Also, I'd declare a canonical way to spell widespread encoding names, i.e.:
And, while we're there, some encoding names should be definitely banned, for example, |
I'm against it. |
On Python _ and - are interchangable, like in:
If the issue is that this encoding has no endianness, then I should also point out the same python docs. UTF32 and UTF16 I think as well, have BOM markers within the stream that define the endianness. So the encodings are valid and endianness is known at runtime (well, at parsing time).
It would be better not to couple those, not without some guarantees that is the case. |
Okay, I will fix the JS runtime then and check other runtimes if they have the same issue (I don't expect them). Quick search gave me the following "canonical" names for encodings (if I am not mistaken): ((About modifying the existing formats in the repo: I think it won't be necessary now to do it, but it will be enough later when we will(?) enforce the style guide for formats in the repo.)) |
There is actually #116 where I've started doing massive encodings table for it a long time ago: https://docs.google.com/spreadsheets/d/1l87kGi9_U4Xrgaw2CGaTc9-_f5UEf1nf-68Dk_e1_iA/edit?usp=sharing So, yeah, I believe we should stick to "canonical" spellings as per IANA "preferred MIME name", or just IANA's "name", if it's there. |
|
There's no universal agreement in the world for all these names, so the very point of that table is to find some common ground and the foundation. While it's probably a very long term goal, what I propose now is to:
|
Fixed the issue in the JS runtime (kaitai-io/kaitai_struct_javascript_runtime@dd658ee) and released new npm+Github version ( Checked the other runtimes and I did not find similar issue in them (usually they just pass the encoding value to the platforms native text decoder; in Lua it already lowercases the encoding before it compares to "ascii"). From my side this issue can be closed, but we can continue to use it for what @GreyCat just described in the previous comment. |
Let's close this one ;) I'll add relevant issues separately. |
Currently the JS runtime only allows the
encoding
value to beASCII
, but notascii
:https://github.com/kaitai-io/kaitai_struct_javascript_runtime/blob/8fedb4d2f278f55d0d8fcf24a9bbf1c1ea2b150f/KaitaiStream.js#L511-L513
I checked the formats repo and 9 file formats used lowercase
ascii
at 23 occurrences.Should we fix the formats or should the runtimes accept lowercase
ascii
too?(I presume we should fix the runtimes, but maybe add a KSY styling rule which casing should be used generally?)
The text was updated successfully, but these errors were encountered: