-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with reading large integer? #369
Comments
It's been reported many times, actually :) The problem is that jq uses C doubles to represent numbers, and on pretty |
Oh I see. Thanks for the clarification, and thanks for the great work! Cheers, On May 21, 2014, at 7:31 PM, Nico Williams <[email protected]mailto:[email protected]> wrote: It's been reported many times, actually :) The problem is that jq uses C doubles to represent numbers, and on pretty — |
jq's handling of number's is simply wrong, at least with respect to its own documentation. The description of "." is:
Since the documentation is in accordance with the philosophy of JSON, it seems to me that jq should do what the documentation says. Of course it would be more than acceptable if there were an option that would determine how numbers are to be handled. |
jq programs consume parsed JSON values. It is the jq processor that The jq parser (i.e., the jv_parse*() C functions) only supports IEEE 754 As for options... jq will almost certainly never have any run-time |
In response to nicowilliams -- If the problem is one of design, so be it, but it is a problem. Infinite precision arithmetic would obviate the problem for integers; perhaps decimals should be parsed as strings, and only converted to floats when js is given an instruction to perform an arithmetic operation. To highlight the fact that some JSON tools get it right, consider:
|
Indeed, some JSON tools do what you like. However, many don't. The only The problem here isn't just jq. It's JSON. JSON didn't (RFC4627) and JavaScript implementations, for example, only handle IEEE 754 doubles. |
This is a response to nicowilliams's post. There is a distinction to be drawn between JSON as defined at json.org and specified by ECMA (http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf), and JSON as envisioned by http://tools.ietf.org/html/rfc7159. JSON as defined at json.org and by ECMA just has a number type. There is no concept of "precision" or any limit specifically for numbers. As far as the definition at json.org is concerned, JSON numbers are essentially strings for representing decimals with a finite representation. As the ECMA specification says:
On the other hand, the draft specification at http://tools.ietf.org/html/rfc7159 does allow "implementations to set limits on the range and precision of numbers accepted", so perhaps the intent of jq is to be such an implementation. If that is the case, then I believe the documentation should be more upfront about the issue. I certainly was misled by the prominence given to this statement very early in the jq Manual:
A case could also be made that "implementations" that "set limits" should emit error or warning messages if the limits are breached. It seems to me, however, that jq would be much more useful if it retained precision AT LEAST in the absence of arithmetic operations. That is, it would be very nice if "jq -M ." really could be used to format JSON without ever altering any values. Thanks. |
On 05/24/14 08:19, pkoppstein wrote:
+1 There was quite a bit discussion about this in the past, you can find I think there was a prototypical implementation already from someone BTW, awk does it that way too: %echo '111111111111111111' | awk '{print $1, 1*$1}' Regards,
|
Thanks, Tilo, especially for mentioning "awk" -- for two reasons. First, the following text from jq's "home page" advertises jq as being like awk and sed:
Contrast:
Second, and more importantly, regular expressions are intrinsic to both sed and awk. I've mentioned that elsewhere (#164), so by way of summary I'll just say that although I really do like PEGs, jq, being modern and unconstrained by backwards-compatibility issues, really ought to support regular expressions with named captures, hopefully in the manner of ruby, and hopefully sooner rather than later :-) |
RFC7159 also doesn't say anything about IEEE 754 being the standard for No one says that jq shouldn't have bignums. The point is that you can't |
A possible candidate library would be https://github.com/libtom/libtomfloat, but it seems to be abandoned, and the WARNING seems scary (but we can always write tests for it and fix it). Another possible candidate is any number of bignum integer libraries, like libtommath and bsdnt, and make our own bignum real library. |
I don't have a lot to add to this discussion other than to note that I was shocked that |
Submitted #1246, which solves this issue in a conservative way - no bigint libraries, only for ints up to 64 bits, and only if no operations are done over them, much like awk's behavior in previous comments:
|
`jq` has a precision bug while loading probabilities with big vids: ``` $ printf '1 200 300\n' | jq -R -r --argjson SHARD_BASE $((37 << 48)) ' split(" ") | [ (.[0] | tonumber + $SHARD_BASE | tostring) , .[1:][] ] | join("\t")' 10414574138294272 200 300 $ printf '2 200 300\n' | jq -R -r --argjson SHARD_BASE $((37 << 48)) ' split(" ") | [ (.[0] | tonumber + $SHARD_BASE | tostring) , .[1:][] ] | join("\t")' 10414574138294274 200 300 $ printf '3 200 300\n' | jq -R -r --argjson SHARD_BASE $((37 << 48)) ' split(" ") | [ (.[0] | tonumber + $SHARD_BASE | tostring) , .[1:][] ] | join("\t")' 10414574138294276 200 300 $ printf '4 200 300\n' | jq -R -r --argjson SHARD_BASE $((37 << 48)) ' split(" ") | [ (.[0] | tonumber + $SHARD_BASE | tostring) , .[1:][] ] | join("\t")' 10414574138294276 200 300 $ printf '5 200 300\n' | jq -R -r --argjson SHARD_BASE $((37 << 48)) ' split(" ") | [ (.[0] | tonumber + $SHARD_BASE | tostring) , .[1:][] ] | join("\t")' 10414574138294276 200 300 $ printf '6 200 300\n' | jq -R -r --argjson SHARD_BASE $((37 << 48)) ' split(" ") | [ (.[0] | tonumber + $SHARD_BASE | tostring) , .[1:][] ] | join("\t")' 10414574138294278 200 300 $ printf '7 200 300\n' | jq -R -r --argjson SHARD_BASE $((37 << 48)) ' split(" ") | [ (.[0] | tonumber + $SHARD_BASE | tostring) , .[1:][] ] | join("\t")' 10414574138294280 200 300 ``` this is a known issue and shockingly there seems no good way in jq to support better precision: jqlang/jq#369 We've tried using awk but versioning seems a problem. Here we just try to use python3 to do the job. Just used a poor
I had this issue today with jq-1.5. Where for large integers, it is changing the last values to 0s, even converting to strings doesn't help (as it reads it as integer first):
Should be:
|
@sp-james-mcmurray Internally jq uses IEEE754 doubles for number representation. Any integers whose absolute values are larger than 2^52 - 1 will not be faithfully represented. |
I just use Python and its |
@mitar please don't leave this comment on every issue that deals with jq and IEEE754. Thanks. |
One workaround for this issue is to double quote integers as string before sending to jq. |
Hello,
I am running jq (version 1.3), and find that it seems to be reading large integers (>= 17 digits) incorrectly. For example, if I run
jq '.id'
on the following input:
the output is:
Results on the first and second line (18 and 17 digits) are incorrect, but the third line (16-digits integer) and the fourth (floating number) are correct.
Seems like this hasn't been reported before. So I just want to bring it up and wonder if anyone has had this issue as well? Thanks!
The text was updated successfully, but these errors were encountered: