Pool internal objects allocated per message #1385

shanson7 · 2019-05-30T19:33:41Z

This is a followup to #1373 to further reduce objects allocated per message. It takes some inspiration from #1161 but starts with internal allocations, to not require clients to release any objects.

Tested allocations by consuming 2 million (small, ~32 bytes) messages and profiling which functions allocated the most. Here is the base case:

Type: alloc_objects
Showing nodes accounting for 10886534, 99.52% of 10939241 total
      flat  flat%   sum%        cum   cum%
   4256506 38.91% 38.91%    7504220 68.60%  github.com/Shopify/sarama.(*MessageBlock).decode
   2419737 22.12% 61.03%    8980700 82.10%  github.com/Shopify/sarama.(*MessageSet).decode
   2031646 18.57% 79.60%    2031646 18.57%  github.com/Shopify/sarama.newCRC32Field
   1952968 17.85% 97.46%    1952968 17.85%  github.com/Shopify/sarama.(*partitionConsumer).parseMessages
    116416  1.06% 98.52%     117159  1.07%  github.com/eapache/go-xerial-snappy.DecodeInto

newCRC32Field is entirely internal, so pooling should be pretty easy here. Let's give that a try.

Type: alloc_objects
Showing nodes accounting for 8279558, 100% of 8282212 total
      flat  flat%   sum%        cum   cum%
   3728561 45.02% 45.02%    4801419 57.97%  github.com/Shopify/sarama.(*MessageBlock).decode
   2345670 28.32% 73.34%    6251983 75.49%  github.com/Shopify/sarama.(*MessageSet).decode
   2029259 24.50% 97.84%    2029259 24.50%  github.com/Shopify/sarama.(*partitionConsumer).parseMessages
     81921  0.99% 98.83%      81921  0.99%  github.com/Shopify/sarama.(*realDecoder).push
     53008  0.64% 99.47%      54870  0.66%  github.com/eapache/go-xerial-snappy.DecodeInto

In this case, we see ~24% reduction in allocations (from ~5 allocations per message to ~4). This is about what was expected. Digging further into the implementation, it seems that lengthField is equally easy to pool.

Type: alloc_objects
Showing nodes accounting for 6161988, 99.89% of 6168630 total
      flat  flat%   sum%        cum   cum%
   2046271 33.17% 33.17%    2046271 33.17%  github.com/Shopify/sarama.(*partitionConsumer).parseMessages
   1988197 32.23% 65.40%    2708081 43.90%  github.com/Shopify/sarama.(*MessageBlock).decode
   1982298 32.14% 97.54%    4119420 66.78%  github.com/Shopify/sarama.(*MessageSet).decode
     79470  1.29% 98.83%      81849  1.33%  github.com/eapache/go-xerial-snappy.DecodeInto
     32770  0.53% 99.36%    1252404 20.30%  github.com/Shopify/sarama.(*Message).decodeSet

This shows another ~25.5% reduction in allocated objects, down to ~3 allocations per consumed message. For, the newer format (RecordBatch et. al.) the numbers are similar:

base (~7 allocations per message):

Type: alloc_objects
Showing nodes accounting for 14493767, 99.74% of 14531173 total
      flat  flat%   sum%        cum   cum%
   3406593 23.44% 23.44%    3406593 23.44%  github.com/Shopify/sarama.(*partitionConsumer).parseRecords
   3376093 23.23% 46.68%    8302355 57.13%  github.com/Shopify/sarama.(*RecordBatch).decode
   2036514 14.01% 60.69%    3609402 24.84%  github.com/Shopify/sarama.recordsArray.decode
   1572888 10.82% 71.52%    1572888 10.82%  github.com/Shopify/sarama.(*realDecoder).push
   1493060 10.27% 81.79%   11109612 76.45%  github.com/Shopify/sarama.(*FetchResponseBlock).decode

pool Crc32 (~6 allocations per message):

Type: alloc_objects
Showing nodes accounting for 13447937, 99.69% of 13489340 total
      flat  flat%   sum%        cum   cum%
   3418870 25.34% 25.34%    3418870 25.34%  github.com/Shopify/sarama.(*partitionConsumer).parseRecords
   2863152 21.23% 46.57%    7480473 55.45%  github.com/Shopify/sarama.(*RecordBatch).decode
   2069286 15.34% 61.91%    3412794 25.30%  github.com/Shopify/sarama.recordsArray.decode
   1432180 10.62% 72.53%    8912653 66.07%  github.com/Shopify/sarama.(*Records).decode
   1343508  9.96% 82.49%    1343508  9.96%  github.com/Shopify/sarama.(*realDecoder).push

length_field.go

varun06

LGTM

bai · 2019-06-04T11:44:19Z

Thanks!

kjelle · 2019-08-24T11:04:26Z

Did you profile the performance effect of using a defer for each message decode?

Reuse lengthFields

47ae433

bai requested a review from varun06 June 3, 2019 03:06

varun06 reviewed Jun 3, 2019

View reviewed changes

length_field.go Show resolved Hide resolved

varun06 approved these changes Jun 4, 2019

View reviewed changes

bai approved these changes Jun 4, 2019

View reviewed changes

bai merged commit cd910a6 into IBM:master Jun 4, 2019

shanson7 mentioned this pull request Jul 5, 2019

Update Shopify/sarama from v1.19.0 to v1.23.0 grafana/metrictank#1383

Merged

shanson7 deleted the pool_internals branch June 11, 2021 07:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pool internal objects allocated per message #1385

Pool internal objects allocated per message #1385

shanson7 commented May 30, 2019

varun06 left a comment

bai commented Jun 4, 2019

kjelle commented Aug 24, 2019

Pool internal objects allocated per message #1385

Pool internal objects allocated per message #1385

Conversation

shanson7 commented May 30, 2019

varun06 left a comment

Choose a reason for hiding this comment

bai commented Jun 4, 2019

kjelle commented Aug 24, 2019