Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simdjson as a Replacement for the Semi-Abandoned RapidJSON Library #1116

Open
3 tasks
dkierner-dh opened this issue Feb 7, 2023 · 5 comments
Open
3 tasks

Comments

@dkierner-dh
Copy link

dkierner-dh commented Feb 7, 2023

Unreliable Release Schedule of RapidJSON and the Debianization Problem

With RapidJSON's unreliable release schedule of not releasing a new version in six years, continuing to use RapidJSON is becoming a future productivity risk and possible hindrance to this project's development.

This becomes especially obvious when considering that Pistache is aiming for debianization and the Debian version of RapidJSON is six years behind its master branch, therefore basically abandoned in the Debian repository.

RapidJSON's Replacement With Simdjson

I therefore suggest for Pistache to switch to simdjson, a library that can even beat RapidJSON and yyjson in terms of performance by using SIMD instructions where available. This library is also available as componentized Debian packages, so the switch shouldn't be too hard.

The Looming Feature Freeze of Debian 12 "Bookworm"

As the feature freeze for Debian Bookworm is already underway (Soft Freeze: 2023-02-12, Hard Freeze: 2023-03-12), I consider this a high priority request, if you want to get an updated release of Pistache into Debian 12 "Bookworm", which is also the reason why I'm tagging you here (@kiplingw, @Tachi107).
The next opportunity for an updated Debian release will not be until Debian 13 "Trixie" in around two years, or as a backported release.

Tasklist

  • Update Pistache to use simdjson instead of RapidJSON
  • Release an updated version of Pistache
  • Release and upload an updated Debian package
@dkierner-dh dkierner-dh changed the title Simdjson as a Replacement for the Semi-Abandoned RapidJSON library Simdjson as a Replacement for the Semi-Abandoned RapidJSON Library Feb 8, 2023
@Tachi107
Copy link
Member

Hi @dkierner-dh, thank you very much for your detailed analysis. Unfortunately I've been quite busy in the last week, and I haven't been able to properly reply before.

Pistache barely made it into Debian 12, which I see as a great success! But, as you said, the freeze is now begun, and we cannot swap the JSON dependency any more there. Still, Debian 12 does provide RapidJSON, so is this that big of an issue?

Not only that, but Pistache's reliance on RapidJSON is fairly minimal. It's only used in a Swagger thing I've never personally touched.

That being said, I too dislike having to depend on RapidJSON, and I agree that simdjson would be a nice alternative, having used it before.

@dkierner-dh
Copy link
Author

dkierner-dh commented Feb 16, 2023

Hi @Tachi107.

Pistache barely made it into Debian 12, which I see as a great success! But, as you said, the freeze is now begun, and we cannot swap the JSON dependency any more there. Still, Debian 12 does provide RapidJSON, so is this that big of an issue?

The availability in the Debian repositories does alleviate the some of issues that RapidJSON facing, like broken builds on the master branch.

I was more thinking of potentially obscure and/or minor bugs that could linger in that old version and are fixed in the newer versions. When evaluating a framework to use, you also evaluate its dependencies and a semi-abandoned dependency doesn't leave too good of an impression.
The biggest issue are the fairly long release schedules (~two years) between Debian versions, paired with the fact that the latest Debian version of RapidJSON v1.1.0 is currently seven years old and will be nine years old when Debian 13 Trixie is released.

Migrating away from RapidJSON would be the best in the long run, as there are also issues to request a newer JSON schema version, which could become a hindrance with new Swagger/OpenAPI versions:

I didn't find similar issues in simdjson.

Will there be a backported version in the future, should Pistache migrate to simdjson?

@kiplingw
Copy link
Member

Hey @dkierner-dh. Thank you for the suggestion. I wasn't aware of the simdjson library and find it interesting that they found a way to leverage SIMD to optimize what is effectively a string parsing library. I think that's a great idea.

I also think it would be a good idea for the reasons already discussed to migrate Pistache's RapidJSON dependency to simdjson. Thankfully, as @Tachi107 pointed out, there isn't much code that's dependent on RapidJSON IIRC.

Would you be interested in submitting a PR for this?

@dkierner-dh
Copy link
Author

Hey @kiplingw, @Tachi107,
I've had a look at it. The library is just as it's advertised: "Parsing gigabytes of JSON per second" and no building of JSON.
There are multiple open issues for this:

I'm sorry, I wasn't aware that simdjson only provides parsing for now, when suggesting this enhancement. I would have assumed simdjson would provide both, given that reading and serializing JSON is often done in conjunction.

Should we rename this issue or put it on hold until simdjson gets such a functionality?

@kiplingw
Copy link
Member

I think putting it on hold for the time being is reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants