Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynamic chunk sizing for v4 raw forward index #12945

Merged
merged 13 commits into from
May 3, 2024

Conversation

itschrispeck
Copy link
Collaborator

@itschrispeck itschrispeck commented Apr 16, 2024

Background

  • V4 format was introduced to better handle variable length data chunk size by reducing the potential for large allocations implement size balanced V4 raw chunk format #7661
  • V3 format allocated direct memory based on numDocPerChunk * lengthOfLongestEntry, which was very efficient for near-constant length/short data.
  • For example, in each format a null column’s chunk size would be:
    • V3: 1000 docs * 4 bytes (‘null’) = 4KB
    • V4: 1MB hardcoded target

Problem

  • Making V4 default (Create V4 raw index by default #11120) will result in a large direct memory increase for the values we typically see.
  • For the static 1000 docs/chunk used in V2/3 (assuming deriveNumDocsPerChunk is not set) the breakeven point assumes the lengthOfLongestEntry of a column in a segment is ~1KB

We have seen this behavior first hand after making V4 default internally. We have many columns for which we do not know if they will contain variable length data or ‘short data’, and it's desirable to handle both cases with a single format.

Change
This PR introduces dynamic chunk sizing for V4 format. Target chunk size is calculated based on the heuristic:

max(min(maxLength * targetDocsPerChunk, targetMaxChunkSize), TARGET_MIN_CHUNK_SIZE)

where new configs are introduced:

"forward": {
  "targetMaxChunkSize": "1M",
  "targetDocsPerChunk": 1000
}

and TARGET_MIN_CHUNK_SIZE = 4K

In testing I’ve found doing this results in reduced direct memory spikes, especially against wide tables/high QPS. The below graph shows the improvement in direct memory spikes for a env with majority of tables using 3-7 day TTL and adhoc QPS. Some spikes are still present as not all segments with the old static chunk size have been expired (some 30 day TTL tables exist).

image

I think dynamic chunk sizing should be the default implementation for V4 and have not put this behind a config. It bridges the gap between the variable length data behavior of V4 with the 'short data' behavior of V2/V3.

There are no backward compatibility concerns with this PR.

tags: performance

@codecov-commenter
Copy link

codecov-commenter commented Apr 17, 2024

Codecov Report

Attention: Patch coverage is 74.00000% with 13 lines in your changes are missing coverage. Please review.

Project coverage is 62.16%. Comparing base (59551e4) to head (b8a6173).
Report is 397 commits behind head on master.

Files Patch % Lines
...he/pinot/segment/spi/index/ForwardIndexConfig.java 53.84% 10 Missing and 2 partials ⚠️
...or/impl/fwd/SingleValueVarByteRawIndexCreator.java 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12945      +/-   ##
============================================
+ Coverage     61.75%   62.16%   +0.41%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2504      +68     
  Lines        133233   136710    +3477     
  Branches      20636    21187     +551     
============================================
+ Hits          82274    84991    +2717     
- Misses        44911    45416     +505     
- Partials       6048     6303     +255     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.14% <74.00%> (+0.43%) ⬆️
java-21 34.99% <28.00%> (-26.63%) ⬇️
skip-bytebuffers-false 62.14% <74.00%> (+0.40%) ⬆️
skip-bytebuffers-true 34.99% <28.00%> (+7.26%) ⬆️
temurin 62.16% <74.00%> (+0.41%) ⬆️
unittests 62.16% <74.00%> (+0.41%) ⬆️
unittests1 46.69% <28.00%> (-0.20%) ⬇️
unittests2 27.96% <46.00%> (+0.22%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -38,6 +38,7 @@
*/
public class SingleValueVarByteRawIndexCreator implements ForwardIndexCreator {
private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
private static final int TARGET_MIN_CHUNK_SIZE = 4 * 1024;
Copy link
Collaborator Author

@itschrispeck itschrispeck Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lower bound is debatable. 4KB is what we tested with and errs on the side of minimal memory usage, but since it's uncompressed target size the compressed chunk could be below disk read ahead value for many systems

@Jackie-Jiang
Copy link
Contributor

I don't think we need a lower bound for the chunk size. We can probably simply do min(maxLength * DEFAULT_NUM_DOCS_PER_CHUNK, TARGET_MAX_CHUNK_SIZE).
I feel it can also be useful to allow user to specify the chunk size

@itschrispeck
Copy link
Collaborator Author

I don't think we need a lower bound for the chunk size. We can probably simply do min(maxLength * DEFAULT_NUM_DOCS_PER_CHUNK, TARGET_MAX_CHUNK_SIZE).

I had the same thought, but we ran into two issues that blocked segment build: maxLength can be 0, and int overflow for large maxLength. Setting a minimum size seemed like a good way to catch both cases.

I feel it can also be useful to allow user to specify the chunk size

Makes a lot of sense. I added a config targetMaxChunkSize which sets the upper bound. The chunk size can still be dynamically reduced if maxLength is small, since I couldn't think of a strong case for a user increasing chunk size when values are always short. I will document the behavior in the docs. Reducing the max chunk size is very useful for both avoiding on the fly allocations/huge chunks w/ V4, and in reducing direct buffer usage.

This config can also apply to V2/V3 format with deriveNumDocsPerChunk.

@Jackie-Jiang
Copy link
Contributor

DEFAULT_NUM_DOCS_PER_CHUNK is the thing we want to avoid in V4, so using it to calculate the target chunk size seems weird to me. How about we add 2 configs here:

  • maxTargetChunkSize: upper bound of the target chunk size
  • targetDocsPerChunk: reduce the target chunk size when max length is small. We can make it 1000 by default to have the desired behavior. I can imagine people want to disable this and always go with the maxTargetChunkSize for scan intensive case to reduce decompression

@richardstartin
Copy link
Member

Nice improvement!

@itschrispeck
Copy link
Collaborator Author

How about we add 2 configs here

Done. Updated the description with the new configs. Lmk if there's any concern with naming, I changed it slightly for consistency:

"forward": {
  "targetMaxChunkSize": "1M",
  "targetDocsPerChunk": 1000
}

Posting the below just for reference:

I can imagine people want to disable this and always go with the maxTargetChunkSize for scan intensive case to reduce decompression

I understand why this should be the case, but could not reproduce it in our prod env. I ended up being able to reproduce it in a microbench, and it turned out the scans I initially tested were against very low cardinality data. For higher cardinality data the microbench showed up to 40% faster performance w/ 1M vs 4K chunk size.

@Jackie-Jiang Jackie-Jiang added documentation release-notes Referenced by PRs that need attention when compiling the next release notes Configuration Config changes (addition/deletion/change in behavior) labels Apr 29, 2024
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you help also update the pinot documentation about these 2 configs? Some examples will help user understand how to use them.

}
_targetMaxChunkSize =
targetMaxChunkSize == null ? DataSizeUtils.fromBytes(DEFAULT_TARGET_MAX_CHUNK_SIZE) : targetMaxChunkSize;
_targetDocsPerChunk =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest allowing negative value for this config to turn it off and only honor the max chunk size.

*/
public static int getDynamicTargetChunkSize(int maxLength, int targetDocsPerChunk, int targetMaxChunkSizeBytes) {
if (targetDocsPerChunk < 0 || (long) maxLength * targetDocsPerChunk > Integer.MAX_VALUE) {
return targetMaxChunkSizeBytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also put a lower bound to this?

"targetMaxChunkSize should only be used when deriveNumDocsPerChunk is true or rawIndexWriterVersion is 4");
}
_targetMaxChunkSize =
targetMaxChunkSize == null ? DataSizeUtils.fromBytes(DEFAULT_TARGET_MAX_CHUNK_SIZE) : targetMaxChunkSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Make a constant for DataSizeUtils.fromBytes(DEFAULT_TARGET_MAX_CHUNK_SIZE) (maybe having both DEFAULT_TARGET_MAX_CHUNK_SIZE and DEFAULT_TARGET_MAX_CHUNK_SIZE_BYTES)

public static final ForwardIndexConfig DEFAULT = new Builder().build();

@Nullable
private final CompressionCodec _compressionCodec;
private final boolean _deriveNumDocsPerChunk;
private final int _rawIndexWriterVersion;
private final String _targetMaxChunkSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider parsing _targetMaxChunkSizeBytes upfront to avoid illegal size

@Jackie-Jiang Jackie-Jiang merged commit 31ae6a3 into apache:master May 3, 2024
20 checks passed

// For columns with very small max value, target chunk size should also be capped to reduce memory during read
int dynamicTargetChunkSize =
ForwardIndexUtils.getDynamicTargetChunkSize(maxLength, targetDocsPerChunk, targetMaxChunkSizeBytes);
Copy link
Contributor

@klsince klsince May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this method take numDocsPerChunk instead of targetDocsPerChunk here?

or we can check deriveNumDocsPerChunk, if it's true we also derive dynamicTargetChunkSize otherwise use targetMaxChunkSizeBytes instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is correct. If not configured, targetDocsPerChunk should be 1000 by default.
Made a small cleanup PR #13093 to clarify the logic a little bit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is clearer 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Configuration Config changes (addition/deletion/change in behavior) documentation ingestion performance release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants