-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify semantics around Jackson's ByteBufferFeeder #2425
Conversation
Hey @daschl, thank you for the contribution! I will have a closer look soon. From the build logs:
We have some checkstyle rules for the imports. Those rules will be installed automatically for your project in Intellij IDEA if you use |
Thanks for sharing - I'll do the checkstyle cleanup alongside other feedback once you've had a chance to take a closer look if that's alright. |
Absolutely, sent an earlier message just to prevent you from spending time to figure out what is the correct order 😄 |
* JacksonStreamingSerializerBenchmark.deserializeMidBackedByArray thrpt 5 527170,664 ± 107330,448 ops/s | ||
* JacksonStreamingSerializerBenchmark.deserializeMidBackedByDirect thrpt 5 420306,213 ± 78607,263 ops/s | ||
* JacksonStreamingSerializerBenchmark.deserializeSmallBackedByArray thrpt 5 1307308,103 ± 128131,629 ops/s | ||
* JacksonStreamingSerializerBenchmark.deserializeSmallBackedByDirect thrpt 5 674688,550 ± 108705,165 ops/s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding a benchmark! In the PR description or comments can you please add numbers before/after the change for comparison?
"est.\\r\\n\",\"registered\":\"2022-01-01T03:44:39 -01:00\",\"latitude\":61.061004,\"longitude\":38.544479," + | ||
"\"tags\":[\"qui\",\"nulla\",\"cillum\",\"dolor\",\"deserunt\",\"amet\",\"magna\"],\"friends\":[{\"id\":0," + | ||
"\"name\":\"Gibson Benson\"},{\"id\":1,\"name\":\"Marshall Atkins\"},{\"id\":2,\"name\":\"Gabriel Clay\"}," + | ||
"{\"id\":3,\"name\":\"Aguilar\"},{\"id\":4,\"name\":\"Bauer\"}]}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these data generated? trying to make sure we do not hold anyone's info :)
"\"picture\":\"https://placehold.it/32x32\"}"; | ||
|
||
/** | ||
* Represents a 1000-byte JSON object (encoded). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't hesitate to add really large files. That will help to see the difference better. Consider 256B, 2Kb, 16Kb JSON files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add really large files -> if we go this route consider generating the content so we don't have to check it in to git :)
private static Map<String, Object> deserialize(final Buffer fromBuffer) throws Exception { | ||
return JACKSON | ||
.streamingSerializerDeserializer(MAP_TYPE_REFERENCE) | ||
.deserialize(from(fromBuffer), BufferAllocators.DEFAULT_ALLOCATOR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it uses streaming API, effectively the entire payload is aggregated. To simulate a streaming use-case, consider splitting JSON into multiple chunks (at least 10 but could be up to 100). It will help us to see how Jackson manages its internal state during streaming data flow.
} | ||
|
||
private static final class ByteArrayDeserializeSubscriber<T> extends DeserializeSubscriber<T> { | ||
private final ByteArrayFeeder feeder; | ||
private static class DeserializeSubscriber<T> implements Subscriber<Buffer> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making it final
isByteArrayParser = true; | ||
} else { | ||
parser = reader.getFactory().createNonBlockingByteBufferParser(); | ||
isByteArrayParser = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ByteBufferFeeder
is available now, I wonder if we can simplify the control flow to always use it and forget about ByteArrayParser. The ByteBuffer
can be backed by direct memory or heap memory. I would expect it to work with any type of ByteBuffer
. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For backwards compatibility (e.g. if users link an older version of Jackson) we may have to still fallback to ByteArray
. This is assuming perf is roughly equivalent (I expect some additional object allocation overhead if byte[]
needs to be wrapped in ByteBuffer, and I'm not sure how efficient our toNioBuffer()
is ..).
subscriber.onError(t); | ||
} | ||
|
||
@Override | ||
public final void onComplete() { | ||
if (feeder != null) { | ||
feeder.endOfInput(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
ce13c87
to
7e2bbbe
Compare
@idelpivnitskiy changeset polished up per review - here are the before and after benchmarks: PR
stable (0.42.22)
They are in the same ballpark, only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After you adjust the memory type, would be nice to run 3 options: pre-existing behavior with Jackson 2.14, this PR, a version that always uses new ByteBufferFeeder
and toNioBuffer()
.
private static final Buffer LARGE_JSON_HEAP = | ||
BufferAllocators.PREFER_HEAP_ALLOCATOR.fromUtf8(generateJson(1_000_000)); | ||
private static final Buffer LARGE_JSON_DIRECT = | ||
BufferAllocators.PREFER_DIRECT_ALLOCATOR.wrap(LARGE_JSON_HEAP.toNioBuffer()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numbers are not different much between heap/direct because you always use heap buffers :)
Have a look inside wrap
method.
In order to get a direct memory, you need to use either PREFER_DIRECT_ALLOCATOR.fromUtf8
method or PREFER_DIRECT_ALLOCATOR.newBuffer(...)
and then copy content from LARGE_JSON_HEAP
.
When you adjust, consider using the same content (cache generateJson
result or copy from heap to direct).
Motivation: In 2.14.0, Jackson added support for feeding ByteBuffers directly into the streaming parser engine (where previously only byte arrays would be supported). This changeset adds a benchmark and provides rationale why the decision has been made to stick with the current code. Modifications: The benchmark numbers show that there is no substantial benefit in either completely switching over to the ByteBufferFeeder or decide which one to use when the first buffer arrives. In some scenarios it is even slower. As a drive-by fix, the feeder is now properly closed when the operator terminates successfully or because of an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into that, was a nice attempt. Hope Jackson improves in the future 🤞
Motivation:
In 2.14.0, Jackson added support for feeding ByteBuffers directly
into the streaming parser engine (where previously only byte arrays
would be supported). This changeset adds a benchmark and provides
rationale why the decision has been made to stick with the current
code.
Modifications:
The benchmark numbers show that there is no substantial benefit
in either completely switching over to the ByteBufferFeeder or
decide which one to use when the first buffer arrives. In some
scenarios it is even slower.
As a drive-by fix, the feeder is now properly closed when the
operator terminates successfully or because of an error.