Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify semantics around Jackson's ByteBufferFeeder #2425

Merged
merged 4 commits into from
Dec 3, 2022

Conversation

daschl
Copy link
Contributor

@daschl daschl commented Nov 17, 2022

Motivation:

In 2.14.0, Jackson added support for feeding ByteBuffers directly
into the streaming parser engine (where previously only byte arrays
would be supported). This changeset adds a benchmark and provides
rationale why the decision has been made to stick with the current
code.

Modifications:

The benchmark numbers show that there is no substantial benefit
in either completely switching over to the ByteBufferFeeder or
decide which one to use when the first buffer arrives. In some
scenarios it is even slower.

As a drive-by fix, the feeder is now properly closed when the
operator terminates successfully or because of an error.

@idelpivnitskiy idelpivnitskiy self-requested a review November 18, 2022 16:40
@idelpivnitskiy
Copy link
Member

idelpivnitskiy commented Nov 18, 2022

Hey @daschl, thank you for the contribution! I will have a closer look soon.

From the build logs:

Error: eckstyle] [ERROR] /home/runner/work/servicetalk/servicetalk/servicetalk-benchmarks/src/jmh/java/io/servicetalk/data/jackson/JacksonStreamingSerializerBenchmark.java:19:1: Import statement for 'io.servicetalk.buffer.api.Buffer' is in the wrong order. Should be in the 'SPECIAL_IMPORTS' group, expecting group 'STANDARD_JAVA_PACKAGE' on this line. [CustomImportOrder]

Error: eckstyle] [ERROR] /home/runner/work/servicetalk/servicetalk/servicetalk-benchmarks/src/jmh/java/io/servicetalk/data/jackson/JacksonStreamingSerializerBenchmark.java:20:1: Import statement for 'io.servicetalk.buffer.netty.BufferAllocators' is in the wrong order. Should be in the 'SPECIAL_IMPORTS' group, expecting group 'STANDARD_JAVA_PACKAGE' on this line. [CustomImportOrder]
> Task :servicetalk-benchmarks:checkstyleJmh
Error: eckstyle] [ERROR] /home/runner/work/servicetalk/servicetalk/servicetalk-benchmarks/src/jmh/java/io/servicetalk/data/jackson/JacksonStreamingSerializerBenchmark.java:61:46: '{' is not followed by whitespace. [WhitespaceAround]
Error: eckstyle] [ERROR] /home/runner/work/servicetalk/servicetalk/servicetalk-benchmarks/src/jmh/java/io/servicetalk/data/jackson/JacksonStreamingSerializerBenchmark.java:61:47: '}' is not preceded with whitespace. [WhitespaceAround]
Error: eckstyle] [ERROR] /home/runner/work/servicetalk/servicetalk/servicetalk-benchmarks/src/jmh/java/io/servicetalk/data/jackson/JacksonStreamingSerializerBenchmark.java:127: An empty line at the end of the code block [EmptyLineBeforeBlockClose]

> Task :servicetalk-benchmarks:checkstyleJmh FAILED

We have some checkstyle rules for the imports. Those rules will be installed automatically for your project in Intellij IDEA if you use ./gradlew idea to generate the project structure instead of opening the source code by Intellij IDEA directly. For more information, see Contributor setup.

@daschl
Copy link
Contributor Author

daschl commented Nov 18, 2022

Hey @daschl, thank you for the contribution! I will have a closer look soon.

From the build logs:

Error: eckstyle] [ERROR] /home/runner/work/servicetalk/servicetalk/servicetalk-benchmarks/src/jmh/java/io/servicetalk/data/jackson/JacksonStreamingSerializerBenchmark.java:19:1: Import statement for 'io.servicetalk.buffer.api.Buffer' is in the wrong order. Should be in the 'SPECIAL_IMPORTS' group, expecting group 'STANDARD_JAVA_PACKAGE' on this line. [CustomImportOrder]

Error: eckstyle] [ERROR] /home/runner/work/servicetalk/servicetalk/servicetalk-benchmarks/src/jmh/java/io/servicetalk/data/jackson/JacksonStreamingSerializerBenchmark.java:20:1: Import statement for 'io.servicetalk.buffer.netty.BufferAllocators' is in the wrong order. Should be in the 'SPECIAL_IMPORTS' group, expecting group 'STANDARD_JAVA_PACKAGE' on this line. [CustomImportOrder]
> Task :servicetalk-benchmarks:checkstyleJmh
Error: eckstyle] [ERROR] /home/runner/work/servicetalk/servicetalk/servicetalk-benchmarks/src/jmh/java/io/servicetalk/data/jackson/JacksonStreamingSerializerBenchmark.java:61:46: '{' is not followed by whitespace. [WhitespaceAround]
Error: eckstyle] [ERROR] /home/runner/work/servicetalk/servicetalk/servicetalk-benchmarks/src/jmh/java/io/servicetalk/data/jackson/JacksonStreamingSerializerBenchmark.java:61:47: '}' is not preceded with whitespace. [WhitespaceAround]
Error: eckstyle] [ERROR] /home/runner/work/servicetalk/servicetalk/servicetalk-benchmarks/src/jmh/java/io/servicetalk/data/jackson/JacksonStreamingSerializerBenchmark.java:127: An empty line at the end of the code block [EmptyLineBeforeBlockClose]

> Task :servicetalk-benchmarks:checkstyleJmh FAILED

We have some checkstyle rules for the imports. Those rules will be installed automatically for your project in Intellij IDEA if you use ./gradlew idea to generate the project structure instead of opening a project by Intellij IDEA. For more information, see Contributor setup.

Thanks for sharing - I'll do the checkstyle cleanup alongside other feedback once you've had a chance to take a closer look if that's alright.

@idelpivnitskiy
Copy link
Member

Absolutely, sent an earlier message just to prevent you from spending time to figure out what is the correct order 😄

* JacksonStreamingSerializerBenchmark.deserializeMidBackedByArray thrpt 5 527170,664 ± 107330,448 ops/s
* JacksonStreamingSerializerBenchmark.deserializeMidBackedByDirect thrpt 5 420306,213 ± 78607,263 ops/s
* JacksonStreamingSerializerBenchmark.deserializeSmallBackedByArray thrpt 5 1307308,103 ± 128131,629 ops/s
* JacksonStreamingSerializerBenchmark.deserializeSmallBackedByDirect thrpt 5 674688,550 ± 108705,165 ops/s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding a benchmark! In the PR description or comments can you please add numbers before/after the change for comparison?

"est.\\r\\n\",\"registered\":\"2022-01-01T03:44:39 -01:00\",\"latitude\":61.061004,\"longitude\":38.544479," +
"\"tags\":[\"qui\",\"nulla\",\"cillum\",\"dolor\",\"deserunt\",\"amet\",\"magna\"],\"friends\":[{\"id\":0," +
"\"name\":\"Gibson Benson\"},{\"id\":1,\"name\":\"Marshall Atkins\"},{\"id\":2,\"name\":\"Gabriel Clay\"}," +
"{\"id\":3,\"name\":\"Aguilar\"},{\"id\":4,\"name\":\"Bauer\"}]}";
Copy link
Member

@idelpivnitskiy idelpivnitskiy Nov 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these data generated? trying to make sure we do not hold anyone's info :)

"\"picture\":\"https://placehold.it/32x32\"}";

/**
* Represents a 1000-byte JSON object (encoded).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't hesitate to add really large files. That will help to see the difference better. Consider 256B, 2Kb, 16Kb JSON files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add really large files -> if we go this route consider generating the content so we don't have to check it in to git :)

private static Map<String, Object> deserialize(final Buffer fromBuffer) throws Exception {
return JACKSON
.streamingSerializerDeserializer(MAP_TYPE_REFERENCE)
.deserialize(from(fromBuffer), BufferAllocators.DEFAULT_ALLOCATOR)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it uses streaming API, effectively the entire payload is aggregated. To simulate a streaming use-case, consider splitting JSON into multiple chunks (at least 10 but could be up to 100). It will help us to see how Jackson manages its internal state during streaming data flow.

}

private static final class ByteArrayDeserializeSubscriber<T> extends DeserializeSubscriber<T> {
private final ByteArrayFeeder feeder;
private static class DeserializeSubscriber<T> implements Subscriber<Buffer> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making it final

isByteArrayParser = true;
} else {
parser = reader.getFactory().createNonBlockingByteBufferParser();
isByteArrayParser = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ByteBufferFeeder is available now, I wonder if we can simplify the control flow to always use it and forget about ByteArrayParser. The ByteBuffer can be backed by direct memory or heap memory. I would expect it to work with any type of ByteBuffer. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backwards compatibility (e.g. if users link an older version of Jackson) we may have to still fallback to ByteArray. This is assuming perf is roughly equivalent (I expect some additional object allocation overhead if byte[] needs to be wrapped in ByteBuffer, and I'm not sure how efficient our toNioBuffer() is ..).

subscriber.onError(t);
}

@Override
public final void onComplete() {
if (feeder != null) {
feeder.endOfInput();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@daschl daschl force-pushed the issue-1711 branch 3 times, most recently from ce13c87 to 7e2bbbe Compare December 1, 2022 15:56
@daschl
Copy link
Contributor Author

daschl commented Dec 1, 2022

@idelpivnitskiy changeset polished up per review - here are the before and after benchmarks:

PR

Benchmark                                                            Mode  Cnt        Score      Error  Units
JacksonStreamingSerializerBenchmark.deserializeLargeBackedByArray   thrpt   10   102853,701 ±  636,206  ops/s
JacksonStreamingSerializerBenchmark.deserializeLargeBackedByDirect  thrpt   10   101887,661 ±  318,581  ops/s
JacksonStreamingSerializerBenchmark.deserializeMidBackedByArray     thrpt   10   256065,426 ±  457,192  ops/s
JacksonStreamingSerializerBenchmark.deserializeMidBackedByDirect    thrpt   10   237096,527 ±  716,018  ops/s
JacksonStreamingSerializerBenchmark.deserializeSmallBackedByArray   thrpt   10  1129949,785 ± 1387,879  ops/s
JacksonStreamingSerializerBenchmark.deserializeSmallBackedByDirect  thrpt   10  1026503,131 ± 2218,478  ops/s

stable (0.42.22)

Benchmark                                                            Mode  Cnt        Score      Error  Units
JacksonStreamingSerializerBenchmark.deserializeLargeBackedByArray   thrpt   10   107283,809 ±  435,752  ops/s
JacksonStreamingSerializerBenchmark.deserializeLargeBackedByDirect  thrpt   10   102547,028 ±  278,600  ops/s
JacksonStreamingSerializerBenchmark.deserializeMidBackedByArray     thrpt   10   293569,073 ± 1079,336  ops/s
JacksonStreamingSerializerBenchmark.deserializeMidBackedByDirect    thrpt   10   232366,394 ±  568,238  ops/s
JacksonStreamingSerializerBenchmark.deserializeSmallBackedByArray   thrpt   10  1108666,880 ± 3319,926  ops/s
JacksonStreamingSerializerBenchmark.deserializeSmallBackedByDirect  thrpt   10  1132111,074 ± 1764,877  ops/s

They are in the same ballpark, only deserializeMidBackedByArray on stable seems to perform a bit better but I'm not sure if that's a big enough difference to keep investigating given that smaller and larger seem to be identical.

Copy link
Member

@idelpivnitskiy idelpivnitskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you adjust the memory type, would be nice to run 3 options: pre-existing behavior with Jackson 2.14, this PR, a version that always uses new ByteBufferFeeder and toNioBuffer().

private static final Buffer LARGE_JSON_HEAP =
BufferAllocators.PREFER_HEAP_ALLOCATOR.fromUtf8(generateJson(1_000_000));
private static final Buffer LARGE_JSON_DIRECT =
BufferAllocators.PREFER_DIRECT_ALLOCATOR.wrap(LARGE_JSON_HEAP.toNioBuffer());
Copy link
Member

@idelpivnitskiy idelpivnitskiy Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numbers are not different much between heap/direct because you always use heap buffers :)
Have a look inside wrap method.
In order to get a direct memory, you need to use either PREFER_DIRECT_ALLOCATOR.fromUtf8 method or PREFER_DIRECT_ALLOCATOR.newBuffer(...) and then copy content from LARGE_JSON_HEAP.
When you adjust, consider using the same content (cache generateJson result or copy from heap to direct).

Motivation:

In 2.14.0, Jackson added support for feeding ByteBuffers directly
into the streaming parser engine (where previously only byte arrays
would be supported). This changeset adds a benchmark and provides
rationale why the decision has been made to stick with the current
code.

Modifications:

The benchmark numbers show that there is no substantial benefit
in either completely switching over to the ByteBufferFeeder or
decide which one to use when the first buffer arrives. In some
scenarios it is even slower.

As a drive-by fix, the feeder is now properly closed when the
operator terminates successfully or because of an error.
@daschl daschl changed the title Add support for Jackson's new ByteBufferFeeder (#1711) Clarify semantics around Jackson's ByteBufferFeeder Dec 3, 2022
Copy link
Member

@idelpivnitskiy idelpivnitskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into that, was a nice attempt. Hope Jackson improves in the future 🤞

@daschl daschl merged commit f3636fb into apple:main Dec 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants