Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Behaviour of NaN & Null Values #3799

Closed
2 tasks done
phil-zxx opened this issue Oct 19, 2022 · 15 comments
Closed
2 tasks done

Inconsistent Behaviour of NaN & Null Values #3799

phil-zxx opened this issue Oct 19, 2022 · 15 comments
Labels
kind: bug solution: wontfix the issue will not be fixed (either it is impossible or deemed out of scope) state: please discuss please discuss the issue or vote for your favorite option

Comments

@phil-zxx
Copy link

phil-zxx commented Oct 19, 2022

Description

Let us assume we have nan = std::numeric_limits<double>::quiet_NaN() and null = nlohmann::json() in our json object. Then the first is a number and NOT null. Whereas the second is null and NOT a number. However, in string form, both appear as null.

So when dumping everything to a json string and then parsing it back into a json object, then both values have now become null. And neither value is a number. So when we try to convert any of them back to a double we get type must be number, but is null.

Reproduction steps

See Minimal code example or https://godbolt.org/z/b9n7Eq9qv

Expected vs. actual results

Both std::numeric_limits<double>::quiet_NaN() and nlohmann::json() are displayed as null in a json file. So a nlohmann::json object should treat both the same at any given point in time. Two solutions:

  1. Either immediately convert std::numeric_limits<double>::quiet_NaN() to a true null object (which is not a number). So that json in memory is consistent with its json string representation.
  2. Or allow null to get cast to NaN when converting to double.

Since NaN serializes to null, I expect that when I convert a json null object to a double to get NaN. That is, solution 2.

Minimal code example

#include <nlohmann/json.hpp>
#include <iostream>

int main()
{
    using json = nlohmann::json;

    const double NaN     = std::numeric_limits<double>::quiet_NaN();
    json jsonData1       = {1.72, NaN, json()};
    const json jsonData2 = json::parse(jsonData1.dump());
    std::cout << "DATA1: " << jsonData1.dump() << "\n";
    std::cout << "DATA2: " << jsonData2.dump() << "\n\n";

    for (const auto& v : jsonData1)
        std::cout << ">> is_null=" << v.is_null() << ", is_number=" << v.is_number() << ", value='" << v << "'\n";
    std::cout << "\n";
    for (const auto& v : jsonData2)
        std::cout << ">> is_null=" << v.is_null() << ", is_number=" << v.is_number() << ", value='" << v << "'\n";

    const double v1 = jsonData1.at(1);
    // const double v2 = jsonData2.at(1);  // ERROR: type must be number, but is null

    return 0;
}

Error messages

ERROR: type must be number, but is null

Compiler and operating system

Linux (gcc 12.2), Windows (msvc v19.33)

Library version

3.11.1 or 3.11.2

Validation

@nlohmann
Copy link
Owner

@phil-zxx
Copy link
Author

phil-zxx commented Oct 19, 2022

Understood.

However, if we have nlohmann::json obj = {1.72, NaN, null}, then the round-trip

obj == nlohmann::json::parse(obj.dump())

fails because nlohmann::json::parse(obj.dump()) gives {1.72, null, null} instead.

The json strings are equal, but the json objects are different, which seems like an inconsistency.

@gregmarr
Copy link
Contributor

It is behaving exactly as in the linked document:

That is, there is no way to parse a NaN value. However, NaN values can be stored in a JSON value by assignment.

This library serializes NaN values as null. This corresponds to the behavior of JavaScript's JSON.stringify function.

@phil-zxx
Copy link
Author

Hi. I am not trying to debate whether this is documented or not. I am raising the point that the json string & json object are not consistent.

On our side this leads to problems, as null & nan are treated differently in memory, while being the same after we output the data to a json string. So when we load the json data back from a string (or a file), the data in memory is now distinctly different, which breaks some of our workflows.

@nlohmann
Copy link
Owner

What do you propose?

@phil-zxx
Copy link
Author

phil-zxx commented Oct 20, 2022

Two solutions:

  1. Whenever a C++ double value is passed into the json object, which is NaN, convert it to a true null object. That way, the json string & in-memory json are the same.
    and/or
  2. Allow null to get cast to NaN when it is explicitly cast to double, e.g. via obj.get<double>().

@gregmarr
Copy link
Contributor

1 is not technically possible in all cases due to the get_ref function which can return a double & and allow the value to be changed without involving the library at all. Also, it would be very surprising (and a breaking change) if assigning NaN silently changed the type of the value.
2 is incorrect as it's not necessarily an NaN that triggers the value to be null. It could just have been a null from the beginning.

Are you not able to keep your workflows from generating NaN values and storing them into the json objects? Could you create your own wrapper function around the double assignment that does the NaN check and converts it to a null assignment?

@trzeciak
Copy link

Hey, I came across the same case, in this example:

json dict = json::object();

{
    double nan = NAN;
    double val = 123;
    cout << "A nan: " << nan << ", val:" << val << endl;
    dict["nan"] = nan;
    dict["val"] = val;
}

cout << "B dict: " << dict.dump() << endl;
cout << "\tisnan: " << isnan(dict["nan"].get<double>()) << " " << isnan(dict["val"].get<double>()) << endl;
cout << "\t'nan'" << ", is_null: " << dict["nan"].is_null() << ", is_number: " << dict["nan"].is_number() << endl;
cout << "\t'val'" << ", is_null: " << dict["val"].is_null() << ", is_number: " << dict["val"].is_number() << endl;

{
    double nan = dict["nan"].get<double>();
    double val = dict["val"].get<double>();
    cout << "C nan: " << nan << ", val:" << val << endl;
}

dict = json::parse(dict.dump());

cout << "D dict: " << dict.dump() << endl;
cout << "\t'nan'" << ", is_null: " << dict["nan"].is_null() << ", is_number: " << dict["nan"].is_number() << endl;
cout << "\t'val'" << ", is_null: " << dict["val"].is_null() << ", is_number: " << dict["val"].is_number() << endl;

{
    double nan = 0;
    nan = dict["nan"].get<double>(); // <------ json throw exception
    cout << "E nan: " << nan << endl;
}
A nan: nan, val:123
B dict: {"nan":null,"val":123.0}
	isnan: 1 0
	'nan', is_null: 0, is_number: 1
	'val', is_null: 0, is_number: 1
C nan: nan, val:123
D dict: {"nan":null,"val":123.0}
	'nan', is_null: 1, is_number: 0
	'val', is_null: 0, is_number: 1
(lldb) 

And I send this json via internet, parse on server side, next dumps to file.
At the end server can't open json again.

Third solution (maybe optional by set define?):
3. Throw an exception when setting NaN to json: dict["nan"] = NAN;

@nlohmann
Copy link
Owner

nlohmann commented Dec 3, 2022

Adding an exception when assigning a NaN to a JSON value has two problems:

  • It is a breaking change, and we don't want to add those lightheartedly.
  • It would be wrong to forbid storing NaN, because it is perfectly fine to serialize them in formats like CBOR which do support NaN.

@nlohmann
Copy link
Owner

Any idea to proceed here without adding a breaking change?

@nlohmann nlohmann added the state: please discuss please discuss the issue or vote for your favorite option label Dec 11, 2022
@phil-zxx
Copy link
Author

The crux of the problem is that the round-trip const json data2 = json::parse(data1.dump()); creates inconsistent behaviour between data1 and data2. Similarly to trzeciak, I send numerical json data over the wire. So if before & after serialisation, the objects behave differently, it's a big problem. And it's impossible for me to catch or handle every nan.

To recap, so far 3 solutions have been proposed:

  1. Auto-convert NaN to null inside a json object
  2. Allow conversion of null to NaN via get<double>()
  3. Throw an error when NaN is set to a json object

I personally still think solution 2 has merit. Because irrespective of whether a null was originally a true null or a NaN (or something else), converting null to NaN just feels more natural than throwing an error.

Nonetheless, another solution:

  1. On https://json.nlohmann.me/api/basic_json/dump it says "The function tries to mimic Python's json.dumps() function". And on https://docs.python.org/3.10/library/json.html#infinite-and-nan-number-values it says that

    [...] By default, this module accepts and outputs Infinity, -Infinity, and NaN as if they were valid JSON number literal values.

So Python supports NaN and it's even enabled by default. We can control this behaviour via the allow_nan parameter. This parameter could be added to nlohmann json's dump & parse function:

string_t dump(int indent, char indent_char, bool ensure_ascii, error_handler_t, bool allow_nan);
basic_json parse(InputType&& i, parser_callback_t cb, bool allow_exceptions, bool ignore_comments, bool allow_nan);

Then we can write const json data2 = json::parse(data1.dump(-1, ' ', false, strict, true), nullptr, true, false, true); and we truly have data1 == data2.

Of course, listing all the parameters is a bit cumbersome. So maybe new overloads or new functions could help:

// New dump overload like string_t dump(nan_policy_t) const
// where nan_policy_t = enum class { allow, forbid };
const json data2 = json::parse(data1.dump(nan_policy_t::allow), nan_policy_t::allow);

// New "..._with_nan" functions
const json data2 = json::parse_with_nan(data1.dump_with_nan());

However, if nlohmann::json's default for allow_nan is true, then this may be a breaking change (again, for some users). But it would follow Python's example (as promised in the docs), and it may remove the need for the additional overloads/functions (as I suppose most people would be happy with nan's being supported by default).

Hope this helps.

@pavel-kirienko
Copy link

pavel-kirienko commented Feb 24, 2023

I also ran into the same issue as OP, as my data contains NaNs that are not easily reducible to nulls. The optional syntax extension proposed by @phil-zxx in the previous post seems like a sensible solution that solves at least my specific problem and is in line with the prior art, even if it is not enabled by default.

EDIT: Enabling NaN support in the parser by default is probably preferred as this change is unlikely to be breaking.

@bredelings
Copy link

I'm not sure if I care if its enabled by default, but could we at least add an optional flag to read/write NaN, Infinity, and -Infinity?

@nlohmann nlohmann added the solution: wontfix the issue will not be fixed (either it is impossible or deemed out of scope) label Sep 23, 2023
@nlohmann
Copy link
Owner

As for now, the library will not support NaN values.

@bredelings
Copy link

bredelings commented Sep 23, 2023

Note that Boost.JSON has decided to write Inf and -Inf as 1e99999 and -1e99999, which seems to handle Inf without violating the JSON spec.

boostorg/json#397

They still write NaN as null, but have a flag for selecting the desired behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: bug solution: wontfix the issue will not be fixed (either it is impossible or deemed out of scope) state: please discuss please discuss the issue or vote for your favorite option
Projects
None yet
Development

No branches or pull requests

6 participants