Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batching buffer with minChunkSize #32858

Merged
merged 1 commit into from
May 29, 2023
Merged

Conversation

franz1981
Copy link
Contributor

Fixes #32546

This is still in draft due to few still opened questions:

  1. what happen to the batch buffer content in case of partial failures within flushBuffer? it should cleanup its whole content, loosing it? What's supposed to do the caller?
  2. what happen to ResteasyReactiveOutputStream in case of failure while using the buffer returned by flushBuffer? releasing it if not already released? (in the previous impl it wasn't well defined in each state)
  3. why 128? We would like to expose even that configuration parameters to users? It's great for benchmarking tests, but maybe too complex for avg urs @geoand ?
  4. how performance looks like? Is any better? [WIP - after or during Quarkus F2F]

@franz1981
Copy link
Contributor Author

@Sanne @geoand @stuartwdouglas the implementation can already be reviewed actually - apart from the questions at #32858 (comment)

@franz1981
Copy link
Contributor Author

If this would work @Sanne the change at https://github.com/TechEmpower/FrameworkBenchmarks/pull/7861/files#diff-ea8995e85f3d4bb82b2c8aa7feab84fb8d193519e3f919c7d43c3c2c25310344R31-R32 won't be impactfull anymore and real users as well would benefit from not using 8K Netty (pooled) direct buffer sized for each response, as now

@geoand
Copy link
Contributor

geoand commented Apr 24, 2023

Thanks @franz1981!

Let's move this ahead in a couple weeks time :)

@geoand
Copy link
Contributor

geoand commented May 11, 2023

I'd like to see how this performs with small / medium and large payloads compared to what we have currently

@franz1981
Copy link
Contributor Author

Large ones should look similars (if they approach the configured buffer size ie 8K), you'll get better and better as smaller single payload are placed, but you will get worse If users perform many small ones writes to the output stream.
This last one case iiuc should not be the common use case unless users manually manipulating the stream themselves.
The append buffer could be configured to behave like before (or in others ways too) hence we can decide if expose such configuration somehow, in case users requires that level of control.
@geoand let me know If you are aware of other cases where we keep on appending small bytes arrays without flushing to the Netty connection each time.

@geoand
Copy link
Contributor

geoand commented May 11, 2023

Many small ones would only happen in SSE or other streaming scenarios, definitely not a major concern

@franz1981
Copy link
Contributor Author

franz1981 commented May 26, 2023

I've spent few weeks testing with our CI using Techempower for evident regressions with both stacks (reactive/blocking) with/without hibernate and I didn't had any performance regression nor improvement, throughput-wise.

What this patch can deliver instead, is a very tangible improvement in the off-heap/direct memory usage in case of blocking endpoints, instead:

  • JSON encoding happen from within the EQE threads (aka the blocking thread pool), causing the allocation of direct ByteBuf to happen there
  • EQE threads while writing to the wire, delegates such to an async task on the Vertx/Netty event loop (which surprise me a bit, given that it could use a write/writeAndFlush instead of a generic Runnable on ConnectionBase::queueForWrite), see below:

image

  • that means that the buffer allocated in EQE can be alive for some time till flushed to the I/O and making the EQE ready to pick other requests in-flight

This last behavior make very important to NOT over-size such allocation buffer, because it determines how much overall in-flight bytes can be required to be allocated to the Netty allocator, that will grow to accomodate the incoming load.
This same behavior won't happen for buffers allocated within the I/O threads, instead which interact directly with I/O and can be reused right after for the same or another connection.

In a JSON hello world load test using a blocking endpoint the saving of RSS has been of ~2MB vs ~70MB.

@geoand
Copy link
Contributor

geoand commented May 26, 2023

In a JSON hello world load test using a blocking endpoint the saving of RSS has been of ~2MB vs ~70MB.

This is a very significant improvement! Definitely makes the PR worthy of inclusion :).

Thanks a lot of this!

@franz1981
Copy link
Contributor Author

franz1981 commented May 26, 2023

I've just to decide what's the minChunkSize by default: Netty allocations capacity are

16,32,48,64,80,96,112,128,...

The logic is explained https://github.com/netty/netty/blob/eb3feb479826949090a9a55c782722ece9b42e50/buffer/src/main/java/io/netty/buffer/SizeClasses.java and https://github.com/netty/netty/blob/eb3feb479826949090a9a55c782722ece9b42e50/buffer/src/main/java/io/netty/buffer/SizeClasses.java#L83 is the initial log2 step ie 4 (2^4 = 16 - that's the step between the different size classes)
Meaning that in theory it should be 16, to be friendly with Netty

@Sanne
Copy link
Member

Sanne commented May 26, 2023

What is holding this up? Any concerns?

@franz1981
Copy link
Contributor Author

What is holding this up? Any concerns?

I've run all possible tests to be sure no evident regressions were there and now I'm addressing the leaks in case of exceptions thrown

@geoand
Copy link
Contributor

geoand commented May 26, 2023

Yup, I've already talked with @franz1981 and once he is ready, we can rebase onto main, take it out of draft and merge when CI goees green

@franz1981 franz1981 marked this pull request as ready for review May 28, 2023 13:34
@franz1981
Copy link
Contributor Author

franz1981 commented May 28, 2023

The original code, given that was using a single buffer, was correctly handling releases of resources, so it has taken me a while to ensure a similar behavior now that we can have few chunks instead....
I didn't add any unit test on AppendBuffer unless we plan to use it elsewhere.

@geoand
Some thoughts that I would like to solve before merging this:

  • as said, any append happening from outside the I/O threads, can lead to unbounded grow of direct memory (impacting RSS): right now I'm capping the minimum allocation to be 128 bytes (by default), but other strategies could be to always be 16 bytes or, more adaptively, 128 bytes if the append happen in an I/O thread (because from I/O threads, I/O is unlikely to accumulate, unless network congestion), or 16 if is not on an I/O thread (to avoid uncontrolled memory growth).
  • regardless the strategy chosen, we do want such values to be configurable or an option to return back to the original strategy? (the AppendBuffer can behave like the original code too, there's a eager method to make it do it)

@geoand
Copy link
Contributor

geoand commented May 29, 2023

I don't think we need this to be configurable - at least not for the time being.
It's a very low level mechanism and I don't really see us needing to tune it.

If we do in the future come up with specific scenarios (I mean code that users actually run) were configuration would be beneficial, we can add it (along with some good documentation).

@geoand geoand added the triage/waiting-for-ci Ready to merge when CI successfully finishes label May 29, 2023
@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

@geoand
Copy link
Contributor

geoand commented May 29, 2023

Force pushed an update that makes the minChunkSize configurable via: quarkus.resteasy-reactive.min-chunk-size

@quarkus-bot
Copy link

quarkus-bot bot commented May 29, 2023

✔️ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

@geoand
Copy link
Contributor

geoand commented May 29, 2023

Merging, thanks a ton for this @franz1981!

@geoand geoand merged commit 29d71ec into quarkusio:main May 29, 2023
@quarkus-bot quarkus-bot bot added kind/enhancement New feature or request and removed triage/waiting-for-ci Ready to merge when CI successfully finishes labels May 29, 2023
@quarkus-bot quarkus-bot bot added this to the 3.2 - main milestone May 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/rest kind/enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ResteasyReactiveOutputStream could make better use of the Netty (direct) pooled allocator
4 participants