Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pool internal objects allocated per message #1385

Merged
merged 1 commit into from
Jun 4, 2019
Merged

Conversation

shanson7
Copy link
Contributor

This is a followup to #1373 to further reduce objects allocated per message. It takes some inspiration from #1161 but starts with internal allocations, to not require clients to release any objects.

Tested allocations by consuming 2 million (small, ~32 bytes) messages and profiling which functions allocated the most. Here is the base case:

Type: alloc_objects
Showing nodes accounting for 10886534, 99.52% of 10939241 total
      flat  flat%   sum%        cum   cum%
   4256506 38.91% 38.91%    7504220 68.60%  github.com/Shopify/sarama.(*MessageBlock).decode
   2419737 22.12% 61.03%    8980700 82.10%  github.com/Shopify/sarama.(*MessageSet).decode
   2031646 18.57% 79.60%    2031646 18.57%  github.com/Shopify/sarama.newCRC32Field
   1952968 17.85% 97.46%    1952968 17.85%  github.com/Shopify/sarama.(*partitionConsumer).parseMessages
    116416  1.06% 98.52%     117159  1.07%  github.com/eapache/go-xerial-snappy.DecodeInto

newCRC32Field is entirely internal, so pooling should be pretty easy here. Let's give that a try.

Type: alloc_objects
Showing nodes accounting for 8279558, 100% of 8282212 total
      flat  flat%   sum%        cum   cum%
   3728561 45.02% 45.02%    4801419 57.97%  github.com/Shopify/sarama.(*MessageBlock).decode
   2345670 28.32% 73.34%    6251983 75.49%  github.com/Shopify/sarama.(*MessageSet).decode
   2029259 24.50% 97.84%    2029259 24.50%  github.com/Shopify/sarama.(*partitionConsumer).parseMessages
     81921  0.99% 98.83%      81921  0.99%  github.com/Shopify/sarama.(*realDecoder).push
     53008  0.64% 99.47%      54870  0.66%  github.com/eapache/go-xerial-snappy.DecodeInto

In this case, we see ~24% reduction in allocations (from ~5 allocations per message to ~4). This is about what was expected. Digging further into the implementation, it seems that lengthField is equally easy to pool.

Type: alloc_objects
Showing nodes accounting for 6161988, 99.89% of 6168630 total
      flat  flat%   sum%        cum   cum%
   2046271 33.17% 33.17%    2046271 33.17%  github.com/Shopify/sarama.(*partitionConsumer).parseMessages
   1988197 32.23% 65.40%    2708081 43.90%  github.com/Shopify/sarama.(*MessageBlock).decode
   1982298 32.14% 97.54%    4119420 66.78%  github.com/Shopify/sarama.(*MessageSet).decode
     79470  1.29% 98.83%      81849  1.33%  github.com/eapache/go-xerial-snappy.DecodeInto
     32770  0.53% 99.36%    1252404 20.30%  github.com/Shopify/sarama.(*Message).decodeSet

This shows another ~25.5% reduction in allocated objects, down to ~3 allocations per consumed message. For, the newer format (RecordBatch et. al.) the numbers are similar:

base (~7 allocations per message):

Type: alloc_objects
Showing nodes accounting for 14493767, 99.74% of 14531173 total
      flat  flat%   sum%        cum   cum%
   3406593 23.44% 23.44%    3406593 23.44%  github.com/Shopify/sarama.(*partitionConsumer).parseRecords
   3376093 23.23% 46.68%    8302355 57.13%  github.com/Shopify/sarama.(*RecordBatch).decode
   2036514 14.01% 60.69%    3609402 24.84%  github.com/Shopify/sarama.recordsArray.decode
   1572888 10.82% 71.52%    1572888 10.82%  github.com/Shopify/sarama.(*realDecoder).push
   1493060 10.27% 81.79%   11109612 76.45%  github.com/Shopify/sarama.(*FetchResponseBlock).decode

pool Crc32 (~6 allocations per message):

Type: alloc_objects
Showing nodes accounting for 13447937, 99.69% of 13489340 total
      flat  flat%   sum%        cum   cum%
   3418870 25.34% 25.34%    3418870 25.34%  github.com/Shopify/sarama.(*partitionConsumer).parseRecords
   2863152 21.23% 46.57%    7480473 55.45%  github.com/Shopify/sarama.(*RecordBatch).decode
   2069286 15.34% 61.91%    3412794 25.30%  github.com/Shopify/sarama.recordsArray.decode
   1432180 10.62% 72.53%    8912653 66.07%  github.com/Shopify/sarama.(*Records).decode
   1343508  9.96% 82.49%    1343508  9.96%  github.com/Shopify/sarama.(*realDecoder).push

@bai bai requested a review from varun06 June 3, 2019 03:06
Copy link
Contributor

@varun06 varun06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bai
Copy link
Contributor

bai commented Jun 4, 2019

Thanks!

@kjelle
Copy link
Contributor

kjelle commented Aug 24, 2019

Did you profile the performance effect of using a defer for each message decode?

@shanson7 shanson7 deleted the pool_internals branch June 11, 2021 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants