-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define appropriate implementation behaviors for extreme values #538
Comments
Well, TOML does set a few expectations. Here's what we have for integers. So we cannot say that integers can have any length. Unless you want this requirement rescinded.
Same thing for floats. Any valid value, including a very small epsilon value, is expected to work within this limit. Even
For any numerical value in a TOML file that doesn't translate to these 64-bit standards, I take it that you want parsers to raise errors, rather than leave the parsers' actual behaviors undefined. I'm inclined to agree, but not sure how verbose the standard needs to be about such cases. |
@eksortso I think it would be useful to say that an integer outside the supported ranges should give an error, rather than return a different integer, which may happen for languages that have error-free wrapping, or return a truncated integer calculated up until a parse error occurs. Instead of producing an error, I'd support the parser returning the correct integer where the language can support it (e.g. Python). If that latter behavior is allowed, and it is clear that an error must be produced otherwise, then the 64-bit signed value could be defined as the guaranteed minimum that a parser should support. Defining these things specifically encourages parser developers to test for these cases and also ensures parser implementations are more interchangeable |
Both the Python TOML parsers listed on the Wiki ("specs-conforming and strict" This also emphasises that the word "expected" has almost no meaning in a standard. For floats, both Python libraries convert |
Are there existing real world problems/issues caused by the lack of strict language on the behavior at these extreme values? |
Some notes:
|
Consider a parser for a typed language that allows the caller to provide a fixed-length buffer for a string. Similarly, for integers a typed parser may support multiple integral types (8, 16, 32, 64, and possibly 128-bit). Ensuring that the default behavior is to produce an error if the value cannot be represented exactly is simple and conservative. Parsers can provide configuration options for other behaviors: truncation, wrapping, producing the max value. If we end up with TOML parsers in different languages undetectably producing different values for the same TOML, then TOML is not going to be an appropriate choice for systems that use multiple languages.
Please do not encourage parser libraries to print warnings. The failure to parse correctly needs to go to the calling code, not to the user. |
Toml parsers do not have sufficient domain knowledge to take a responsibility for changing values. I believe that the only sane option for parsers is to produce a parsing error. So that the user (or some kind of domain-aware AI) will have an opportunity to fix unrepresentable values in a conscious way. See also https://en.wikipedia.org/wiki/Fail-fast
That would be a disaster. To prevent this, parser should either produce an exact representation of toml file, or produce a parsing error.
+1 |
Should the spec recommend distinct error codes for specific representation problems? Could it be useful for parser-neutral testsuits?
|
Given the lack of response to this question, I don't think I want to look into this prior to 1.0 though since we won't miss out on much. |
Yeah, i.e. I am disagree that it is |
Hello, everyone. TOML does not come out of my mind, so today morning I realised possible solution for this border-cases issue. Statement 1. The Integer, Float and String can be any precision while it keeps as TOML text document. Really it doesn't matter that an Integer keeps inside 64bit border or not. We can easily describe general syntax with Regular Expression or ABNF notation. Statement 2. The issue comes when you parse TOML for particular machine or language (Python also an be considered as virtual machine). And TOML specification (and its authors) can not solve this problem by design. It is responsibility of programmer for the particular system/program. This problem can be splitted into two issues: 2.1 Downcast precision of Float/Integer/Datetime, because of unsupporting this precision in the machine So, can we help somewhat to programmer? Solution 1. We can define a hard or recommended limits (as it is written right now). But if the limits is not appropriate for particular case, a programmer consciously has to break the rules. Solution 2. We can specify optional Prefix-Types where it is really matter: S2.0 Without prefix-types we can set recommended limits as Solution 1. S2.1 Integer. As eexample we can take cstdint or go basic types, aslo add something like S2.2 Float. We can easily take it form IEEE 754. Also I would add something like S2.3 String. We can encoded length/encoding in the prefix. Other encodings except UTF-8 make thing complicated. But it is really helpful in non Unicode environment Example: a_int = uint16_0xFFEA
b_float = binary16_0.2
s_string = utf8@14"Hello, world!" Options: S2.4 We can add optional exclaimed sign to warn the end recipiend that the precision is important and it is should be parsed as written or dismissed with error as well. Without exclamation can be as recommended/non critical. But it should be argumented. it seems a litle bit as over-engineered S2.5 This does not solve the problem of large TOML file and small recipient buffer. Of course, it is really big problem of recipient. In the HTTP headers or Filesystem metadata it can reach to real TOML size, but it is not true for arbitary bites stream. So we can add optional S2.6 These suggestions seem to back compatible with TOML v0.5.0. |
I'm an outsider (just comparing serialization formats out there, and TOML is perhaps the format I'm going to choose, mainly because of how simple its C library is), but one of the things I was checking now is precisely numeric ranges, and then I found this issue, so I thought I could write my 2cents: If I'm reading the spec correctly, if an implementation chooses to support, for example, 128-bit integers, it would be fully-compliant, because the spec requires to support 64-bit signed ints, but doesn't say anything about integers with more bits, so I guess they would be legal. Even parsing an integer as unsigned 64 bit when it's positive and doesn't fit as signed 64bit would be legal too (I mean, the spec doesn't say that's forbidden). I find all of this very fortunate, because I use to work with unsigned 64bit a lot, and sometimes with more than 64 bits, so the fact that the TOML spec doesn't impose limits to integers is in my case a very good point for choosing it. Regarding floating point, I think it would be nice to relax the "should" word in the spec when it says that implementations should implement fp numbers as doubles. In some of my apps, I use 80bit Intel fp, for example. Yes, not portable to different CPUs, but I use it when I really need it, and it would be nonsense to be able to use TOML in my apps that work with 64bit doubles, and not being able to do so when some variable needs to be 80bit fp. As I said, just my 2cents, as I'm an outsider... |
You did read it correctly.
Given the lack of concrete answers here, I'm leaning toward maintaining status quo; which defers to implementation authors to make the design choices of what the best apporach to tkae would be -- Each implementation can do as its authors deem appropriate for their language/ecosystem/domain and they're welcome to support higher limits, having a strict failure mode or other behaviours on extreme values. |
Hi, everybody, I just want to add one thing regarding the exact representation of floats. If the TOML would require exact representation, it could not work with usual floating point types because of their rounding errors. It would fail even for simple values such as 0.1 or 0.3 which do not really have an exact representation in double for float, but at the same time they are pretty likely to appear in human-provided configuration files. The only solution to that would be to use exact floating point representation types, such as "decimal" in C#. That may complicate the design of parsing libraries in languages where there's no native type for that. In practice, I believe that supporting special treatment of types and entries is rather a (optional, not required) responsibility of parsing library. They can either provide "callbacks" or access to the raw textual value of the entry, if the user of the library really cares about the special treatment of the values. |
@Timie You said:
Well, the specification does say the following:
Which means that decimal values cannot be expected to be represented with exact precision, barring additional input to the parser. Instances where TOML floats need to be converted to exact Decimal types are exceptional cases, and indeed there are parsers that can make that conversion. Python's This is not a problem in most cases. The only thing that parsers must do with TOML floats is turn them into numbers. There's no immediate check for exact equivalence to perceived precision. It's usually not necessary. Especially since TOML does not perform arithmetic. The consumer program that uses the parser's output is supposed to handle any rounding issues that may arise. And the spec doesn't state how floats ought to be represented if they're not converted to IEEE 754 binary64 value. The parsers already carry a lot of that weight. So really, nothing more needs to be said. However, if a distinct TOML decimal type would ever need to be created, then we'd have to provide a syntactic marker to indicate a decimal, and we'd have to put acceptable lower limits on the precision of such decimals. (No idea if we could agree upon a common prefix or suffix to indicate decimals.) And then, nearly all modern programming languages have common implementations of exact decimal values; there's no shortage of those. Parser writers will choose what they deem fits their needs most accurately. If you think that's something that TOML should consider, for v1.1.0 or for the future, then feel free to make a case for such a data type. Suggest a syntax. (I'm partial to a prefix like a But we would still need to gauge whether such a type really belongs in TOML. And for that, we'll need feedback for such a proposed new feature. |
I'm gonna close this out, based on #538 (comment) |
But this made unparsable error for 10000+ years May relate to toml-lang/toml#538 (comment)
But this made unparsable error for 10000+ years May relate to toml-lang/toml#538 (comment)
One thing that I would like to see defined before 1.0 is the behavior when implementation limits are reached. This would consist of an explicit statement that TOML itself does not have a limit but that parsers may.
For example:
And perhaps most debatable, what happens to very large or very small floats (e.g. 1e500 and 1e-500). Programming languages often convert these to
inf
or 0. For configuration, I'd consider it more appropriate to produce an error. For example, a very small float may represent an epsilon value or error allowance, and should produce an error rather than be rounded down to zero.The text was updated successfully, but these errors were encountered: