Skip to content
This repository has been archived by the owner on Feb 4, 2025. It is now read-only.

Performant and convenient way of putting data in a JsonBuffer #48

Closed
davidmorgan opened this issue Aug 16, 2024 · 3 comments
Closed

Performant and convenient way of putting data in a JsonBuffer #48

davidmorgan opened this issue Aug 16, 2024 · 3 comments

Comments

@davidmorgan
Copy link
Contributor

We plan to use a binary wire format that's equivalent to JSON, via the JsonBuffer class in dart_model.

The idea is to write data directly to the buffer instead of copying; but the most obvious ways of using our JSON-overlay extension types leads to creating standard maps with all the data then copying it into the buffer.

Jake kicked off some exploration with this PR which I have built on in this PR, adding a simple benchmark and a third way. I didn't change the codegen in my PR, instead I copied codegen output from Jake's PR and built on it to have three benchmarks.

Here is the raw benchmark output for checking in case I screw up the rest of the write-up ;)

JsonBufferSerialize0(RunTime): 1470143.0 us.
JsonBufferSerialize0(RunTime): 938088.5 us.
JsonBufferSerialize0(RunTime): 946229.5 us.
Size: 4509017
JsonBufferSerialize(RunTime): 358227.28571428574 us.
JsonBufferSerialize(RunTime): 341955.5 us.
JsonBufferSerialize(RunTime): 345791.5 us.
Size: 4509017
JsonBufferSerialize2(RunTime): 270204.75 us.
JsonBufferSerialize2(RunTime): 270856.5 us.
JsonBufferSerialize2(RunTime): 272467.5 us.
Size: 4509017

Here is a reformatted version:

Method 0: build in standard maps, copy out (old)
Time taken: ignoring first run (warmup), about 0.94 seconds, or 4.79MB/s

Method 1: build in lazy maps with closures (directly from Jake's PR)
Time taken: about 0.35 seconds, or 12.86MB/s, a speedup of x2.7

Method 2: build directly into buffer with builders (new)
Time taken: about 0.27 seconds, or 16.66MB/s, a further speedup of x1.3 for a total of x3.5

The new "method 2" idea is to create further extension types over JsonBuffer that are builders: they write the data directly into the buffer.

      return InterfaceBuilder()
          .members(
              LazyMap(const ['a', 'aa', 'aaa', 'aaaa'], _makeMember).cast())
          .properties(PropertiesBuilder()
              .isAbstract((intKey & 1) == 1)
              .isClass((intKey & 2) == 2)
              .isGetter((intKey & 4) == 4)
              .isField((intKey & 8) == 8)
              .isMethod((intKey & 16) == 16)
              .isStatic((intKey & 32) == 32)
              .build())
          .build();

The speedup from method 0 to method 1 is convincing that we should do better than standard maps, which is good, that was the plan :)

From 1 to 2 is just about enough further speedup that it looks worth digging further ... there is a risk of the extension types builders approach falling off a usability cliff, they are relying on quite a bit of magic to work correctly; right now I think there is a reasonable chance it can be made to work without major drawbacks but I can't be sure. I am also interested in possible further usability boosts: in particular I think it should be possible to get rid of explicit use of LazyMap by having the builders provide convenience methods for maps.

This looks quite fun to hack on so the problem will not be doing the investigation but avoiding spending too much time on it ;)

@jakemac53
Copy link
Contributor

This looks quite fun to hack on so the problem will not be doing the investigation but avoiding spending too much time on it ;)

So true 🤣

@jakemac53
Copy link
Contributor

jakemac53 commented Aug 16, 2024

Another thing to consider here is that we want to be able to filter efficiently what data is actually serialized based on queries. So we should think about what that might look like with these approaches.

We might end up needing to step away from the extension type approach to efficiently handle that, by having the analyzer/CFE implement an interface as opposed to calling an extension type constructor. Probably, they would actually extend an abstract class which would have a Query as a field, and it would implement keys in terms of that query (or even read it directly from the Query object?). And then the interface would also have a (possibly static) method to read values by key, by calling getters on the interface, etc etc.

Basically the end result being that the analyzer/CFE don't worry about queries directly, and they also don't compute anything eagerly.

@davidmorgan
Copy link
Contributor Author

This is now addressed in

https://github.com/dart-lang/macros/tree/main/pkgs/dart_model/lib/src/json_buffer

which has some buffer-backed types that can accumulate new values.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

2 participants