-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconsider hex and/or octal integer formats #409
Comments
Also, I'll repeat the comment I made on issue #53 last month: if octal values are included, PLEASE don't repeat C's mistake, as so many programming languages have. A leading 0 in an integer should not change its meaning. The format for octal should parallel the format for hex: Additionally, I would recommend that the only integer format markers allowed be lowercase: As for hex digits, either they should be lowercase-only (to match how Finally, the question of negative numbers comes up. What should |
Also #54 got accepted |
I think this is exactly what @BurntSushi was worried about when he hesitated to agree #54: allowing any of the features is unfair to all the others, but allowing all of them would make this language no longer "minimal". I would suggest that this feature be included in the standard after most of the available parsers have implemented it, not the other way around; at that time it would be easier for @BurntSushi to decide to merge this feature into standard. |
Letting the parser implementations drive the spec might be a good idea, but OTOH, that's how we got the mess that is Javascript. So while I agree that it would be good for parsers to implement this proposal (and it should be easy to implement since most languages have a built-in ability to parse ints in bases other than 10), I think we should also have a discussion about how the spec should specify it. In particular, I think it's VERY important to hash out how octal should be represented -- should a leading 0 signify octal, as it does in C? Or should the Also, as of this writing, a total of 9 unique people have reacted with thumbs-up emoji on either this proposal, or on my March 30th comment on #53. So far I have not seen a thumbs-down emoji or a "We shouldn't do this" response to me. Both @BurntSushi and @mojombo said "Not sure we'll need this, and it would complicate parsers" to the original proposal, but haven't yet responded to this one. And since their original "Not sure we'll need this" response was a good one, and there does need to be a use-case to justify the extra work for parser implementors, here's a summary of the use cases: Hex - colors ( Octal - Unix file permissions ( Binary - No obvious use case. MAYBE some utility for bitmasks: |
Hex could be a poor man's surrogate for MAC addresses, IPv6 addresses, public key / certificate fingerprints, RFC 4122 UUIDs . Underscores could be put in place of colons and dashes. Note that ipv6 addr and uuid are 128 bits long. |
How about having a "lowest standard" without any advanced feature, while keeping a "suggested standard" for all the advanced features? Something like "we do not impose this requirement on your implementation, but if you do want this feature, then implement it as below..."? Most of the advanced features, including Hex/Oct literals, Date/Datetime literals, all serves as extension to the standard: anything satisfying the "lowest standard" will still be parsed as expected even by a parser supporting such advanced features. It might be a bad example, but it reminds me of C vs C++ (bad example because actually not all C codes can compile as C++ code). I am wondering how @BurntSushi and @mojombo like this idea. updateA better example is the Scheme specification branching to two specification: R7RS (small) to keep minimalism and R7RS (large) for more functionality. |
So, to summarise, add these 3 ways to represent numbers:
Have these rules:
Am I missing anything? |
That's all I wrote. I just noticed that I didn't mention underscores between digits, the way the spec allows for decimal integers. For consistency's sake, I think underscores between digits should also be allowed in hex, octal and binary as well, especially since that is what is allowed in languages like Java and F#. So if underscores are allowed in decimal numbers and not in hex/octal/binary, then that will violate the principle of least surprise. |
Having just come across TOML I was delighted by everything until I noticed the very odd omission of hex literals (and octal and binary by extension). In cases where such values are natural, trying to use anything else goes directly against TOML's "easy to read" objective. Well, to be sure, |
One further refinement of my design suggestion: underscores should be allowed between digits, but NOT inside the 0x / 0o / 0b prefix of a hex, octal or binary number. I.e., I have not yet decided whether underscores should be allowed between the prefix and the first digit of the number; technically, the |
I've looked further at two existing languages that allow underscores in number literals (Java and F#). In both of these languages, as in TOML, the underscore may appear ONLY between digits, and they do not count a base prefix ( To follow the principle of least surprise, I have therefore decided that my TOML spec proposal will use the same rule as Java and F#. So underscores MUST NOT appear immediately after the base prefix. The |
Octals are useless. I see no usage of them except of one single case - unix file rights. But even unix have more userfriendly option with |
Octals are almost useless for everything except Unix file permissions, yes — but that's a major use case, and sufficient justification all by itself for including them. The letter-based permissions can be easier to read in some cases (especially for people who don't use Unix very much), but experienced Unix admins find (There's actually another decent reason to use octal, and that's to more easily spot UTF-8 multi-byte sequences in Unicode data, but that's not a use case for TOML. I'm just mentioning it for curiosity's sake.) |
@rmunn, how about express this single case with just strings like "0644"? |
Any progress on this? @rmunn I'd like to comment on the use cases from a hardware developer perspective. I'd like to use toml as a configuration language for a test rig. Hex and binary are very useful when you deal with hardware, for example, hex is used to refer to memory or register addresses. Binary is very useful when you deal with register values. |
On octals: here are some relatively new langs that decided not to have them.
|
@guai Octals should be not ambiguous, if you prefix them with |
As requested by #330 (comment), I'll weigh in. First and foremost, this is a backwards compatible addition, since all conforming parsers today will return an error if a user types a hex/octal literal as proposed here. Therefore, there is no particular reason to render a verdict now. Secondly, I'd personally be in favor of adding at least hex. Octal seems useful for file permissions. @mojombo what do you think? |
Many other new languages have decided to allow octal, but to settle on the |
I agree. If we do octal, we should use a |
I definitely agree that |
@guai what is mnemonics? |
I'd like to see hex and octal literals. The former is common in for representing multibit values and the latter is used for single bit values (like Unix permissions). 0xDead_Beef and 0o644_000 look good to me too. :) |
@tshepang, it would be something like |
I find mnemonics more clumsy, and they feel not justified to have support for them (use strings). OTOH octals are more general, and there probably is some other use for them beyond unix file permissions. |
@guai - For the Unix access permissions use case, octal numbers are more widely used than the mnemonics, and especially in config files. I can't point you to any evidence for this assertion, since AFAIK nobody has done a statistical analysis. But in my experience, you'll see a lot more And I'd be against a special-case |
@rmunn, its just that sort of crazyness everyone got used to.
And who would be an average toml user? In neighbor thread I was told, that concept of empty path is not obvious enough for toml, but that is the thing well known to every user familiar with any filesystem too |
@guai - The fact that you said on March 30 that "octals are useless" when they're used in just one use case (Unix file permissions) makes me think that you do most of your development on Windows. Is that correct? If so, you have relatively little experience with Unix, so you wouldn't know just how much more often the octal-number format is used in Unix permissions than the text format. But here's one data point to help convince you: I've been using Linux for about 20 years now, and I can look at permissions like Anyway, I've made my point so it's time to move on to a different topic: should binary numbers ( Pro: Consistency, a.k.a. the "why not?" argument. If you've already written code to handle hex and octal numbers in your parser, handling binary numbers is trivial to add. I was thinking that binary should be dropped from the proposal, but then @wbober mentioned an actual use case: config files for driving a hardware test rig. When you're writing a file to send a specific set of binary digits to a connector, and the connector's pins are numbered from (say) 0 to 15, it's easier to use So since there's a real user with a real use case (and because the cost of implementing binary format is trivial once you've added hex and octal formats), I'm now in favor of saying "Yes, let's add binary as well". |
Folks, I think everything that is going to be said has been said. Let's
just sit tight until @mojombo makes a decision.
…On Nov 27, 2017 11:09 AM, "Robin Munn" ***@***.***> wrote:
@guai <https://github.com/guai> - The fact that you said on March 30 that
"octals are useless" when they're used in just one use case (Unix file
permissions) makes me think that you do most of your development on
Windows. Is that correct? If so, you have relatively little experience with
Unix, so you wouldn't know just *how* much more often the octal-number
format is used in Unix permissions than the text format. But here's one
data point to help convince you: I've been using Linux for about 20 years
now, and I can look at permissions like 775 or 644 and tell you exactly
what they mean. But every time I try to *write* the mnemonic permissions,
I have to stop and say "Does the o in o=rx mean 'owner', or 'other'?" And
then I have to look it up.
Anyway, I've made my point so it's time to move on to a different topic:
should binary numbers (0b1101) be included as well?
*Pro:* Consistency, a.k.a. the "why not?" argument. If you've already
written code to handle hex and octal numbers in your parser, handling
binary numbers is *trivial* to add.
*Con:* Not often needed, a.k.a. the "why?" argument.
I was thinking that binary should be dropped from the proposal, but then
@wbober <https://github.com/wbober> mentioned an actual use case: config
files for driving a hardware test rig. When you're writing a file to send a
specific set of binary digits to a connector, and the connector's pins are
numbered from (say) 0 to 15, it's easier to use pinout =
0b1101_0011_0111_0010 than pinout = 0xd374. The binary version of that
number will let you see at a glance whether pin 12 has a high signal (1) or
a low signal (0), whereas the hex version requires you to do a conversion
in your head.
So since there's a real user with a real use case (and because the cost of
implementing binary format is trivial once you've added hex and octal
formats), I'm now in favor of saying "Yes, let's add binary as well".
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#409 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAb34r7MTEhOVjkwQTZ-inMDNl8dt_Sjks5s6t55gaJpZM4IRpXK>
.
|
@rmunn, I have quite a lot of unix experience, but still hate to convert those meaningless digits in my head all the time. Its just bad design, and is still there for legacy reasons. I think binary is more useful than octal But there still a question left, will out user experienced enough. If he is an experienced unix user at least, than the point of this topic is ok, but many other decisions were made with less experienced users in mind, I think. |
I agree with @BurntSushi. Let's just wait. :) |
Thank you all for your patience and the excellent arguments presented here! It's been a year and a half since this was opened, and as I hoped, time would bear out which features would turn out to be important to real TOML users. I think I've seen enough evidence now that hex, octal, and binary all have reasonable use cases and should be included in TOML as first class citizens. I'll draw up a PR for their inclusion with |
This issue can be closed. :) |
@pradyunsg why? |
Ah. My bad. I thought this was some other issue. :/ |
See #507 for the proposal. |
One comment about underscores in numeric literals: my proposal so far has been that an underscore is not allowed between a hex/octal/binary prefix and the first digit of the number. That is, I have just learned that C# 7.2 will allow underscores between a prefix and the first digit, so that But it's better to start out strict and then loosen restrictions later, because that keeps backward compatibility. I.e., if the original rule is that So I recommend keeping the proposal as-is with regard to the underscore rules, but if C# 7.2's slightly looser underscore rules make their way into Java and F# (and other languages that I haven't looked at yet), then we can loosen TOML's underscore restrictions as well, in whatever future version of the TOML spec would be appropriate. |
Issue #53 was closed in June 2014, because the decision at the time was to prefer simplicity of implementation. So because
0xff00ff
or0o755
were slightly harder to write parsers for than16711935
or493
, the choice at the time was not to allow hex or octal numbers in TOML.However, since that time, issue #263 has been decided the other way. Datetime values are non-trivial to parse, but are highly useful in some scenarios. So the decision was made to keep them in, because they are useful to some real users.
These two decisions are inconsistent. If datetimes are going to be in TOML, the same arguments can be (and have been) made for hex and octal representations of numbers, which are a lot easier to write a parser for than datetimes. Most languages already have a hex parser implementation that TOML parsers could take advantage of. And in any language that doesn't, parsing hex values is not complex. It's a problem with "Coding 101 homework" levels of difficulty, not "doctoral thesis" levels of difficulty.
And hex and octal values are useful in many scenarios that TOML is intended for, such as config files. Unix permissions use octal values:
0o755
is much easier to mentally translate tou+rwx, g+rx, o+rx
than491
. Or was it493
or495
? Quick, can you tell which of those three decimal values is the correct conversion of0o755
? I can't without a calculator, and I'd much rather see0o755
in config files. Hex, of course, is highly useful when dealing with colors or bit flags. Neither are as common in config files as octal, but if we allow octal there's no good reason not to allow hex.Therefore, I would ask that #53 be revisited, either be reopening that issue and having the discussion there, or by starting a new discussion here. The reason for closing #53, to keep things simple for TOML implementations, has been abandoned by now, and there's no longer any reason not to allow hex and octal values.
The text was updated successfully, but these errors were encountered: