Error with reading large integer? #369

bsbkeven · 2014-05-22T01:53:42Z

Hello,

I am running jq (version 1.3), and find that it seems to be reading large integers (>= 17 digits) incorrectly. For example, if I run
jq '.id'
on the following input:

    {"id" : 125276004817190914}
    {"id" : 12527600481719091}
    {"id" : 1252760048171909}
    {"id" : 12527.6004817190914}

the output is:

    125276004817190910
    12527600481719092
    1252760048171909
    12527.600481719091

Results on the first and second line (18 and 17 digits) are incorrect, but the third line (16-digits integer) and the fourth (floating number) are correct.

Seems like this hasn't been reported before. So I just want to bring it up and wonder if anyone has had this issue as well? Thanks!

The text was updated successfully, but these errors were encountered:

nicowilliams · 2014-05-22T02:31:11Z

It's been reported many times, actually :)

The problem is that jq uses C doubles to represent numbers, and on pretty
much all modern systems that's an IEEE 754 double, which can only represent
integers without loss between -2^53..2^53. 125276004817190914 is about 14
times larger than the largest integer that jq can represent losslessly,
therefore jq can only approximate it.

bsbkeven · 2014-05-22T03:33:41Z

Oh I see. Thanks for the clarification, and thanks for the great work!

Cheers,
Yiye

On May 21, 2014, at 7:31 PM, Nico Williams <[email protected]mailto:[email protected]> wrote:

It's been reported many times, actually :)

The problem is that jq uses C doubles to represent numbers, and on pretty
much all modern systems that's an IEEE 754 double, which can only represent
integers without loss between -2^53..2^53. 125276004817190914 is about 14
times larger than the largest integer that jq can represent losslessly,
therefore jq can only approximate it.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/369#issuecomment-43842711.

pkoppstein · 2014-05-23T21:32:37Z

jq's handling of number's is simply wrong, at least with respect to its own documentation. The description of "." is:

This is a filter that takes its input and produces it unchanged as output.

Since the documentation is in accordance with the philosophy of JSON, it seems to me that jq should do what the documentation says. Of course it would be more than acceptable if there were an option that would determine how numbers are to be handled.

nicowilliams · 2014-05-23T22:59:55Z

. is an expression in a jq program, but it does no parsing -- it deals in
already-parsed values only.

jq programs consume parsed JSON values. It is the jq processor that
arranges to do parsing and encoding, as well as the to_json, from_json,
and @json functions/expressions. In terms of internal code organization,
a jq program is run by the jq_start() and jq_next() C functions, which are
called from the main program (main()); the latter parses values with
jv_parse*(), and passes parsed values to jq_start(). I can see that the
difference between "a jq program" (e.g., .[][]) and "the jq program"
(i.e., the executable named jq) is confusing, and the manual doesn't
document it.

The jq parser (i.e., the jv_parse*() C functions) only supports IEEE 754
double representations of numbers. Many JSON implementations do this, but
not all; some use 32 and 64 bit signed integers, others use bignum
representations (exact for any rational that fits in memory). There are
interoperability considerations as a result, and these aren't really jq's
fault. Of course, jq could switch to a bignum representation, but that'd
be a fair bit of work, and would require a suitable bignum library with
acceptable license.

As for options... jq will almost certainly never have any run-time
options for number representation and handling. A build-time option for
IEEE 754 double or bignum is about the most that can be expected.

pkoppstein · 2014-05-24T03:23:06Z

In response to nicowilliams -- If the problem is one of design, so be it, but it is a problem. Infinite precision arithmetic would obviate the problem for integers; perhaps decimals should be parsed as strings, and only converted to floats when js is given an instruction to perform an arithmetic operation.

To highlight the fact that some JSON tools get it right, consider:

 $ echo 12311111111111111111111111321 | jq -M .
 12311111111111112000000000000

 $ echo 12311111111111111111111111321 | jsonpp
 12311111111111111111111111321

nicowilliams · 2014-05-24T04:30:19Z

Indeed, some JSON tools do what you like. However, many don't. The only
numbers that reliably interoperate with all JSON implementations are ones
with exact 32-bit signed integer representations. Beyond that all signed
integers in the -2^53..2^53 range (the range of integers that IEEE 754
doubles can represent exactly). Beyond that real numbers with exact IEEE
754 double representations. Beyond that you're not likely to interop with
many implementations at all.

The problem here isn't just jq. It's JSON. JSON didn't (RFC4627) and
doesn't (RFC7159) specify that arbitrary bignums must be supported.

JavaScript implementations, for example, only handle IEEE 754 doubles.

pkoppstein · 2014-05-24T06:18:56Z

This is a response to nicowilliams's post.

There is a distinction to be drawn between JSON as defined at json.org and specified by ECMA (http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf), and JSON as envisioned by http://tools.ietf.org/html/rfc7159.

JSON as defined at json.org and by ECMA just has a number type. There is no concept of "precision" or any limit specifically for numbers. As far as the definition at json.org is concerned, JSON numbers are essentially strings for representing decimals with a finite representation. As the ECMA specification says:

 JSON is agnostic about numbers. In any programming language, there can be a variety of number types of various capacities and complements, fixed or floating, binary or decimal. That can make interchange between different programming languages difficult. JSON instead offers only the representation of numbers that humans use: a sequence of digits. All programming languages know how to make sense of digit sequences even if they disagree on internal representations. That is enough to allow interchange.

On the other hand, the draft specification at http://tools.ietf.org/html/rfc7159 does allow "implementations to set limits on the range and precision of numbers accepted", so perhaps the intent of jq is to be such an implementation. If that is the case, then I believe the documentation should be more upfront about the issue. I certainly was misled by the prominence given to this statement very early in the jq Manual:

  Since jq by default pretty-prints all output, this trivial program [jq -M .] can be a useful way of formatting JSON output

A case could also be made that "implementations" that "set limits" should emit error or warning messages if the limits are breached.

It seems to me, however, that jq would be much more useful if it retained precision AT LEAST in the absence of arithmetic operations.

That is, it would be very nice if "jq -M ." really could be used to format JSON without ever altering any values.

Thanks.

tischwa · 2014-05-24T08:37:12Z

On 05/24/14 08:19, pkoppstein wrote:

This is a response to nicowilliams's post.

There is a distinction to be drawn between JSON as defined at json.org, and JSON as envisioned by http://tools.ietf.org/html/rfc7159.

JSON as defined at json.org just has a number type. There is no concept of "precision" or any limit specifically for numbers. As far as the definition at json.org is concerned, JSON numbers are essentially strings for representing decimals with a finite representation.

On the other hand, the draft specification at http://tools.ietf.org/html/rfc7159 does allow "implementations to set limits on the range and precision of numbers accepted", so perhaps the intent of jq is to be such an implementation. If that is the case, then I believe the documentation should be more upfront about the issue. I certainly was misled by the prominence given to this statement very early in the jq Manual:
   Since jq by default pretty-prints all output, this trivial program [jq -M .] can be a useful way of formatting JSON output
A case could also be made that "implementations" that "set limits" should emit error or warning messages if the limits are breached.

It seems to me, however, that jq would be much more useful if it retained precision AT LEAST in the absence of arithmetic operations.

+1

There was quite a bit discussion about this in the past, you can find
those in the issue list on github.

I think there was a prototypical implementation already from someone
exactly doing what you propose: Read number as strings and convert to
double only if arithmetic is involved. I'm not sure why that
implementation didn't make it in.

BTW, awk does it that way too:

%echo '111111111111111111' | awk '{print $1, 1*$1}'
111111111111111111 111111111111111104

Regards,

 Tilo

pkoppstein · 2014-05-24T15:37:40Z

Thanks, Tilo, especially for mentioning "awk" -- for two reasons.

First, the following text from jq's "home page" advertises jq as being like awk and sed:

jq is like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text.

Contrast:

$ sed <<< 11111111111111111
11111111111111111

$ jq -M . <<< 11111111111111111
11111111111111112

Second, and more importantly, regular expressions are intrinsic to both sed and awk. I've mentioned that elsewhere (#164), so by way of summary I'll just say that although I really do like PEGs, jq, being modern and unconstrained by backwards-compatibility issues, really ought to support regular expressions with named captures, hopefully in the manner of ruby, and hopefully sooner rather than later :-)

nicowilliams · 2014-05-24T22:40:08Z

RFC7159 also doesn't say anything about IEEE 754 being the standard for
JSON. All it does is note the variance in implementations.

No one says that jq shouldn't have bignums. The point is that you can't
expect bignum support universally.

nicowilliams · 2014-06-09T04:03:18Z

A possible candidate library would be https://github.com/libtom/libtomfloat, but it seems to be abandoned, and the WARNING seems scary (but we can always write tests for it and fix it).

Another possible candidate is any number of bignum integer libraries, like libtommath and bsdnt, and make our own bignum real library.

jrdriscoll · 2015-10-25T10:25:14Z

I don't have a lot to add to this discussion other than to note that I was shocked that echo 11111111111111111 | jq '.' produced something other than its input. That being said, I never expected infinite precision arithmetic.

dequis · 2016-10-03T03:34:35Z

Submitted #1246, which solves this issue in a conservative way - no bigint libraries, only for ints up to 64 bits, and only if no operations are done over them, much like awk's behavior in previous comments:

$ echo 111111111111111111 | jq -c '[., 1*.]'
[111111111111111111,111111111111111100]

sp-james-mcmurray · 2017-03-10T12:57:42Z

I had this issue today with jq-1.5.

Where for large integers, it is changing the last values to 0s, even converting to strings doesn't help (as it reads it as integer first):

"3265374331746778600"
"3349146353896582000"
"3658445187091539500"
"381327942920288"
"4540495826739245000"
"4609284046671461000"

Should be:

3265374331746778747
3349146353896582298
3658445187091539618
4540495826739245237
4609284046671461009

nicowilliams · 2017-03-10T16:09:27Z

@sp-james-mcmurray Internally jq uses IEEE754 doubles for number representation. Any integers whose absolute values are larger than 2^52 - 1 will not be faithfully represented.

mitar · 2019-02-19T21:16:31Z

I just use Python and its json module. Now that Python maintains the order in its dicts, it is easy to modify JSON and get output match the input, expect for the changes you want.

nicowilliams · 2019-02-19T21:43:18Z

@mitar please don't leave this comment on every issue that deals with jq and IEEE754. Thanks.

umairrafiq · 2019-08-21T22:42:06Z

One workaround for this issue is to double quote integers as string before sending to jq.
An easy way to do that in bash using perl :
perl -pe 's/("(?:\.|[^"])*")|-?\d+(?:.\d+)?(?:[eE][-+]?\d+)?/$1||qq("$&")/ge'
sourced from : https://unix.stackexchange.com/a/504446/368108

utkarshkukreti mentioned this issue Jun 6, 2014

emacs json mode to autoformat before save #385

Closed

nicowilliams mentioned this issue Jun 9, 2014

Why does jq automatically convert double values to int when possible? #356

Closed

nicowilliams added feature request and removed feature request labels Jul 7, 2014

pkoppstein mentioned this issue Aug 4, 2014

enhancement request: avoid displaying floating point approximations to integers as integers #529

Closed

nicowilliams mentioned this issue Aug 8, 2014

jq change 64 bit integers #545

Closed

dequis mentioned this issue Oct 3, 2016

jv: Add some support for 64 bit ints in a very conservative way #1246

Open

pkoppstein mentioned this issue Dec 21, 2016

jq is converting float values ending in .0 to integers. #1301

Closed

zifeishan mentioned this issue Jan 12, 2017

Fix jq load_probabilities precision error using python HazyResearch/deepdive#621

Merged

pkoppstein mentioned this issue Sep 25, 2017

Bug: process uint64 occurs error #1487

Closed

This was referenced Dec 3, 2017

numbers are rounded #1545

Closed

Floating-point strings have trailing zeroes removed. #1550

Closed

omkensey mentioned this issue Apr 5, 2018

etcdctl member list should return member IDs in a consistent format etcd-io/etcd#9535

Closed

pkoppstein mentioned this issue Jun 7, 2018

Fails to parse big integer #1652

Closed

wtlangford closed this as completed May 1, 2021

fabioglopes mentioned this issue Jul 16, 2021

Error with large integers bcicen/jstream#13

Open

anna328p mentioned this issue Dec 6, 2021

pkgs.formats.json alters integers in options due to use of jq NixOS/nixpkgs#148824

Open

bradfitz mentioned this issue Aug 2, 2023

tailscale status --json mismatch between user ids and object keys tailscale/tailscale#8769

Closed

AVMatthews mentioned this issue Nov 30, 2024

test(clp-s): Add end-to-end test case for compression and extraction. y-scope/clp#595

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with reading large integer? #369

Error with reading large integer? #369

bsbkeven commented May 22, 2014

nicowilliams commented May 22, 2014

bsbkeven commented May 22, 2014

pkoppstein commented May 23, 2014

nicowilliams commented May 23, 2014

pkoppstein commented May 24, 2014

nicowilliams commented May 24, 2014

pkoppstein commented May 24, 2014

tischwa commented May 24, 2014

pkoppstein commented May 24, 2014

nicowilliams commented May 24, 2014

nicowilliams commented Jun 9, 2014

jrdriscoll commented Oct 25, 2015

dequis commented Oct 3, 2016

sp-james-mcmurray commented Mar 10, 2017

nicowilliams commented Mar 10, 2017

mitar commented Feb 19, 2019

nicowilliams commented Feb 19, 2019

umairrafiq commented Aug 21, 2019

Error with reading large integer? #369

Error with reading large integer? #369

Comments

bsbkeven commented May 22, 2014

nicowilliams commented May 22, 2014

bsbkeven commented May 22, 2014

pkoppstein commented May 23, 2014

nicowilliams commented May 23, 2014

pkoppstein commented May 24, 2014

nicowilliams commented May 24, 2014

pkoppstein commented May 24, 2014

tischwa commented May 24, 2014

pkoppstein commented May 24, 2014

nicowilliams commented May 24, 2014

nicowilliams commented Jun 9, 2014

jrdriscoll commented Oct 25, 2015

dequis commented Oct 3, 2016

sp-james-mcmurray commented Mar 10, 2017

nicowilliams commented Mar 10, 2017

mitar commented Feb 19, 2019

nicowilliams commented Feb 19, 2019

umairrafiq commented Aug 21, 2019