dynamic chunk sizing for v4 raw forward index #12945

itschrispeck · 2024-04-16T23:51:37Z

Background

V4 format was introduced to better handle variable length data chunk size by reducing the potential for large allocations implement size balanced V4 raw chunk format #7661
V3 format allocated direct memory based on numDocPerChunk * lengthOfLongestEntry, which was very efficient for near-constant length/short data.
For example, in each format a null column’s chunk size would be:
- V3: 1000 docs * 4 bytes (‘null’) = 4KB
- V4: 1MB hardcoded target

Problem

Making V4 default (Create V4 raw index by default #11120) will result in a large direct memory increase for the values we typically see.
For the static 1000 docs/chunk used in V2/3 (assuming deriveNumDocsPerChunk is not set) the breakeven point assumes the lengthOfLongestEntry of a column in a segment is ~1KB

We have seen this behavior first hand after making V4 default internally. We have many columns for which we do not know if they will contain variable length data or ‘short data’, and it's desirable to handle both cases with a single format.

Change
This PR introduces dynamic chunk sizing for V4 format. Target chunk size is calculated based on the heuristic:

max(min(maxLength * targetDocsPerChunk, targetMaxChunkSize), TARGET_MIN_CHUNK_SIZE)

where new configs are introduced:

"forward": {
  "targetMaxChunkSize": "1M",
  "targetDocsPerChunk": 1000
}

and TARGET_MIN_CHUNK_SIZE = 4K

In testing I’ve found doing this results in reduced direct memory spikes, especially against wide tables/high QPS. The below graph shows the improvement in direct memory spikes for a env with majority of tables using 3-7 day TTL and adhoc QPS. Some spikes are still present as not all segments with the old static chunk size have been expired (some 30 day TTL tables exist).

I think dynamic chunk sizing should be the default implementation for V4 and have not put this behind a config. It bridges the gap between the variable length data behavior of V4 with the 'short data' behavior of V2/V3.

There are no backward compatibility concerns with this PR.

tags: performance

codecov-commenter · 2024-04-17T00:26:28Z

Codecov Report

Attention: Patch coverage is 74.00000% with 13 lines in your changes are missing coverage. Please review.

Project coverage is 62.16%. Comparing base (59551e4) to head (b8a6173).
Report is 397 commits behind head on master.

Files	Patch %	Lines
...he/pinot/segment/spi/index/ForwardIndexConfig.java	53.84%	10 Missing and 2 partials ⚠️
...or/impl/fwd/SingleValueVarByteRawIndexCreator.java	80.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #12945      +/-   ##
============================================
+ Coverage     61.75%   62.16%   +0.41%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2504      +68     
  Lines        133233   136710    +3477     
  Branches      20636    21187     +551     
============================================
+ Hits          82274    84991    +2717     
- Misses        44911    45416     +505     
- Partials       6048     6303     +255

Flag	Coverage Δ
custom-integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration	`<0.01% <0.00%> (-0.01%)`	⬇️
integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration2	`0.00% <0.00%> (ø)`
java-11	`62.14% <74.00%> (+0.43%)`	⬆️
java-21	`34.99% <28.00%> (-26.63%)`	⬇️
skip-bytebuffers-false	`62.14% <74.00%> (+0.40%)`	⬆️
skip-bytebuffers-true	`34.99% <28.00%> (+7.26%)`	⬆️
temurin	`62.16% <74.00%> (+0.41%)`	⬆️
unittests	`62.16% <74.00%> (+0.41%)`	⬆️
unittests1	`46.69% <28.00%> (-0.20%)`	⬇️
unittests2	`27.96% <46.00%> (+0.22%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

itschrispeck · 2024-04-17T03:18:46Z

...g/apache/pinot/segment/local/segment/creator/impl/fwd/SingleValueVarByteRawIndexCreator.java

@@ -38,6 +38,7 @@
 */
 public class SingleValueVarByteRawIndexCreator implements ForwardIndexCreator {
  private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+  private static final int TARGET_MIN_CHUNK_SIZE = 4 * 1024;


This lower bound is debatable. 4KB is what we tested with and errs on the side of minimal memory usage, but since it's uncompressed target size the compressed chunk could be below disk read ahead value for many systems

Jackie-Jiang · 2024-04-18T23:01:46Z

I don't think we need a lower bound for the chunk size. We can probably simply do min(maxLength * DEFAULT_NUM_DOCS_PER_CHUNK, TARGET_MAX_CHUNK_SIZE).
I feel it can also be useful to allow user to specify the chunk size

itschrispeck · 2024-04-19T07:17:30Z

I don't think we need a lower bound for the chunk size. We can probably simply do min(maxLength * DEFAULT_NUM_DOCS_PER_CHUNK, TARGET_MAX_CHUNK_SIZE).

I had the same thought, but we ran into two issues that blocked segment build: maxLength can be 0, and int overflow for large maxLength. Setting a minimum size seemed like a good way to catch both cases.

I feel it can also be useful to allow user to specify the chunk size

Makes a lot of sense. I added a config targetMaxChunkSize which sets the upper bound. The chunk size can still be dynamically reduced if maxLength is small, since I couldn't think of a strong case for a user increasing chunk size when values are always short. I will document the behavior in the docs. Reducing the max chunk size is very useful for both avoiding on the fly allocations/huge chunks w/ V4, and in reducing direct buffer usage.

This config can also apply to V2/V3 format with deriveNumDocsPerChunk.

Jackie-Jiang · 2024-04-25T23:28:07Z

DEFAULT_NUM_DOCS_PER_CHUNK is the thing we want to avoid in V4, so using it to calculate the target chunk size seems weird to me. How about we add 2 configs here:

maxTargetChunkSize: upper bound of the target chunk size
targetDocsPerChunk: reduce the target chunk size when max length is small. We can make it 1000 by default to have the desired behavior. I can imagine people want to disable this and always go with the maxTargetChunkSize for scan intensive case to reduce decompression

richardstartin · 2024-04-26T14:30:17Z

Nice improvement!

itschrispeck · 2024-04-27T01:20:45Z

How about we add 2 configs here

Done. Updated the description with the new configs. Lmk if there's any concern with naming, I changed it slightly for consistency:

"forward": {
  "targetMaxChunkSize": "1M",
  "targetDocsPerChunk": 1000
}

Posting the below just for reference:

I can imagine people want to disable this and always go with the maxTargetChunkSize for scan intensive case to reduce decompression

I understand why this should be the case, but could not reproduce it in our prod env. I ended up being able to reproduce it in a microbench, and it turned out the scans I initially tested were against very low cardinality data. For higher cardinality data the microbench showed up to 40% faster performance w/ 1M vs 4K chunk size.

Jackie-Jiang

LGTM. Can you help also update the pinot documentation about these 2 configs? Some examples will help user understand how to use them.

Jackie-Jiang · 2024-04-29T20:17:27Z

pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/ForwardIndexConfig.java

+    }
+    _targetMaxChunkSize =
+        targetMaxChunkSize == null ? DataSizeUtils.fromBytes(DEFAULT_TARGET_MAX_CHUNK_SIZE) : targetMaxChunkSize;
+    _targetDocsPerChunk =


Suggest allowing negative value for this config to turn it off and only honor the max chunk size.

...g/apache/pinot/segment/local/segment/creator/impl/fwd/SingleValueVarByteRawIndexCreator.java

Jackie-Jiang · 2024-05-02T19:04:20Z

...src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/ForwardIndexUtils.java

+   */
+  public static int getDynamicTargetChunkSize(int maxLength, int targetDocsPerChunk, int targetMaxChunkSizeBytes) {
+    if (targetDocsPerChunk < 0 || (long) maxLength * targetDocsPerChunk > Integer.MAX_VALUE) {
+      return targetMaxChunkSizeBytes;


Maybe also put a lower bound to this?

Jackie-Jiang · 2024-05-02T19:15:28Z

pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/ForwardIndexConfig.java

+          "targetMaxChunkSize should only be used when deriveNumDocsPerChunk is true or rawIndexWriterVersion is 4");
+    }
+    _targetMaxChunkSize =
+        targetMaxChunkSize == null ? DataSizeUtils.fromBytes(DEFAULT_TARGET_MAX_CHUNK_SIZE) : targetMaxChunkSize;


(minor) Make a constant for DataSizeUtils.fromBytes(DEFAULT_TARGET_MAX_CHUNK_SIZE) (maybe having both DEFAULT_TARGET_MAX_CHUNK_SIZE and DEFAULT_TARGET_MAX_CHUNK_SIZE_BYTES)

Jackie-Jiang · 2024-05-02T19:16:05Z

pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/ForwardIndexConfig.java

  public static final ForwardIndexConfig DEFAULT = new Builder().build();

  @Nullable
  private final CompressionCodec _compressionCodec;
  private final boolean _deriveNumDocsPerChunk;
  private final int _rawIndexWriterVersion;
+  private final String _targetMaxChunkSize;


Consider parsing _targetMaxChunkSizeBytes upfront to avoid illegal size

klsince · 2024-05-06T22:38:20Z

...g/apache/pinot/segment/local/segment/creator/impl/fwd/SingleValueVarByteRawIndexCreator.java

+
+    // For columns with very small max value, target chunk size should also be capped to reduce memory during read
+    int dynamicTargetChunkSize =
+        ForwardIndexUtils.getDynamicTargetChunkSize(maxLength, targetDocsPerChunk, targetMaxChunkSizeBytes);


should this method take numDocsPerChunk instead of targetDocsPerChunk here?

or we can check deriveNumDocsPerChunk, if it's true we also derive dynamicTargetChunkSize otherwise use targetMaxChunkSizeBytes instead?

I think it is correct. If not configured, targetDocsPerChunk should be 1000 by default.
Made a small cleanup PR #13093 to clarify the logic a little bit

That is clearer 🙂

…4016)

…4016) (cherry picked from commit 2dd7cca)

dynamic chunk sizing for v4 raw forward index

3e8c3ba

itschrispeck added 2 commits April 16, 2024 18:16

handle int overflows

d16cfd6

handle int overflows/zero maxLength

a4cae70

itschrispeck commented Apr 17, 2024

View reviewed changes

Jackie-Jiang added ingestion performance labels Apr 18, 2024

add targetMaxChunkSize forward index config

c529619

rm accidentally added file

3d0b1c1

itschrispeck added 2 commits April 26, 2024 17:07

add targetDocsPerChunk config

7ecd0d0

missed default

9a884f8

Jackie-Jiang added documentation release-notes Referenced by PRs that need attention when compiling the next release notes Configuration Config changes (addition/deletion/change in behavior) labels Apr 29, 2024

Jackie-Jiang approved these changes Apr 29, 2024

View reviewed changes

handle negative targetDocsPerChunk

ee7c4ba

Jackie-Jiang reviewed May 1, 2024

View reviewed changes

...g/apache/pinot/segment/local/segment/creator/impl/fwd/SingleValueVarByteRawIndexCreator.java Outdated Show resolved Hide resolved

itschrispeck added 4 commits May 1, 2024 16:22

extract and test getDynamicTargetChunkSize

e8b09c8

rm unused constant

5efa50e

minor refactor

a392023

handle overflow and negative input separately

06f35f9

Jackie-Jiang reviewed May 2, 2024

View reviewed changes

address comments

b8a6173

Jackie-Jiang merged commit 31ae6a3 into apache:master May 3, 2024
20 checks passed

klsince reviewed May 6, 2024

View reviewed changes

itschrispeck mentioned this pull request May 17, 2024

Add docs for dynamic chunk sizing pinot-contrib/pinot-docs#335

Merged

jasperjiaguo added a commit that referenced this pull request Sep 18, 2024

remove checks introduced in #12945 that is backward in compatible

7543dcd

jasperjiaguo mentioned this pull request Sep 18, 2024

remove checks introduced in https://github.com/apache/pinot/issues/945 that is backward incompatible #14016

Merged

jasperjiaguo added a commit that referenced this pull request Sep 18, 2024

remove checks introduced in #12945 that is backward in compatible (#1…

2dd7cca

…4016)

jasperjiaguo added a commit that referenced this pull request Sep 19, 2024

remove checks introduced in #12945 that is backward in compatible (#1…

c56afe5

…4016) (cherry picked from commit 2dd7cca)

jasperjiaguo added a commit that referenced this pull request Sep 24, 2024

remove checks introduced in #12945 that is backward in compatible (#1…

d0de7f5

…4016) (cherry picked from commit 2dd7cca)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dynamic chunk sizing for v4 raw forward index #12945

dynamic chunk sizing for v4 raw forward index #12945

itschrispeck commented Apr 16, 2024 •

edited

Loading

codecov-commenter commented Apr 17, 2024 •

edited

Loading

itschrispeck Apr 17, 2024 •

edited

Loading

Jackie-Jiang commented Apr 18, 2024

itschrispeck commented Apr 19, 2024

Jackie-Jiang commented Apr 25, 2024

richardstartin commented Apr 26, 2024

itschrispeck commented Apr 27, 2024

Jackie-Jiang left a comment

Jackie-Jiang Apr 29, 2024

Jackie-Jiang May 2, 2024

Jackie-Jiang May 2, 2024

Jackie-Jiang May 2, 2024

klsince May 6, 2024 •

edited

Loading

Jackie-Jiang May 7, 2024

itschrispeck May 7, 2024

dynamic chunk sizing for v4 raw forward index #12945

dynamic chunk sizing for v4 raw forward index #12945

Conversation

itschrispeck commented Apr 16, 2024 • edited Loading

codecov-commenter commented Apr 17, 2024 • edited Loading

Codecov Report

itschrispeck Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

Jackie-Jiang commented Apr 18, 2024

itschrispeck commented Apr 19, 2024

Jackie-Jiang commented Apr 25, 2024

richardstartin commented Apr 26, 2024

itschrispeck commented Apr 27, 2024

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Jackie-Jiang Apr 29, 2024

Choose a reason for hiding this comment

Jackie-Jiang May 2, 2024

Choose a reason for hiding this comment

Jackie-Jiang May 2, 2024

Choose a reason for hiding this comment

Jackie-Jiang May 2, 2024

Choose a reason for hiding this comment

klsince May 6, 2024 • edited Loading

Choose a reason for hiding this comment

Jackie-Jiang May 7, 2024

Choose a reason for hiding this comment

itschrispeck May 7, 2024

Choose a reason for hiding this comment

itschrispeck commented Apr 16, 2024 •

edited

Loading

codecov-commenter commented Apr 17, 2024 •

edited

Loading

itschrispeck Apr 17, 2024 •

edited

Loading

klsince May 6, 2024 •

edited

Loading