-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support maxRowsPerSegment for auto compaction #6780
Conversation
@@ -116,23 +116,31 @@ public void testIndexTaskTuningConfigTargetPartitionSizeOrNumShards() throws Exc | |||
IndexTask.IndexTuningConfig.class | |||
); | |||
|
|||
Assert.assertEquals(10, (int) tuningConfig.getTargetPartitionSize()); | |||
Assert.assertEquals(10, (int) tuningConfig.getMaxRowsPerSegment()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please add a backwards compatibility test if not already present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I added a test when both maxRowsPerSegment and targetPartitionSize don't exist. We already have a test for when only targetPartitionSize is provided here.
skipOffsetFromLatest, | ||
tuningConfig, | ||
taskContext | ||
); | ||
} | ||
|
||
public static class UserCompactTuningConfig extends ClientCompactQueryTuningConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have two separate configs ? the only difference seems to be maxRowsPerSegment in these two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. So, I added maxRowsPerSegment as a constructor parameter because it's similar to targetPartitionSize
and they should be at the same level. But, this can make users confused because ClientCompactionQueryTuningConfig
also has maxRowsPerSegment
. So I added UserCompactTuningConfig
which doesn't have maxRowsPerSegment
.
@nishantmonu51 do you have more comments? |
this.maxRowsInMemory = maxRowsInMemory == null ? DEFAULT_MAX_ROWS_IN_MEMORY : maxRowsInMemory; | ||
this.maxTotalRows = maxTotalRows == null ? DEFAULT_MAX_TOTAL_ROWS : maxTotalRows; | ||
this.indexSpec = indexSpec == null ? DEFAULT_INDEX_SPEC : indexSpec; | ||
this.maxPendingPersists = maxPendingPersists == null ? DEFAULT_MAX_PENDING_PERSISTS : maxPendingPersists; | ||
this.publishTimeout = publishTimeout == null ? DEFAULT_PUBLISH_TIMEOUT : publishTimeout; | ||
this.pushTimeout = pushTimeout == null ? DEFAULT_PUBLISH_TIMEOUT : pushTimeout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change DEFAULT_PUBLISH_TIMEOUT
to DEFAULT_PUSH_TIMEOUT
?
docs/content/ingestion/compaction.md
Outdated
@@ -47,7 +47,8 @@ Compaction tasks merge all segments of the given interval. The syntax is: | |||
|`id`|Task id|No| | |||
|`dataSource`|DataSource name to be compacted|Yes| | |||
|`interval`|Interval of segments to be compacted|Yes| | |||
|`dimensions`|Custom dimensionsSpec. compaction task will use this dimensionsSpec if exist instead of generating one. See below for more details.|No| | |||
|`dimensionsSpec`|Custom dimensionsSpec. Compaction task will use this dimensionsSpec if exist instead of generating one. See below for more details.|No| | |||
|`metricsSpec`|Custom metricsSpec. Compaction task will use this metricsSpec if specified rather than generating one.|No| | |||
|`segmentGranularity`|If this is set, compactionTask will change the segment granularity for the given interval. See [segmentGranularity of Uniform Granularity Spec](./ingestion-spec.html#uniform-granularity-spec) for more details. See the below table for the behavior.|No| | |||
|`keepSegmentGranularity`|Deprecated. Please use `segmentGranularity` instead. See the below table for its behavior.|No| | |||
|`targetCompactionSizeBytes`|Target segment size after comapction. Cannot be used with `targetPartitionSize`, `maxTotalRows`, and `numShards` in tuningConfig.|No| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still is targetPartitionSize
? And after look into the code, it seems there is no maxRowsPerSegment
for compaction tuningConfig.
docs/content/ingestion/compaction.md
Outdated
@@ -77,7 +78,7 @@ An example of compaction task is | |||
|
|||
This compaction task reads _all segments_ of the interval `2017-01-01/2018-01-01` and results in new segments. | |||
Since both `segmentGranularity` and `keepSegmentGranularity` are null, the original segment granularity will be remained and not changed after compaction. | |||
To control the number of result segments per time chunk, you can set `targetPartitionSize` or `numShards`. See [indexTuningConfig](../ingestion/native_tasks.html#tuningconfig) for more details. | |||
To control the number of result segments per time chunk, you can set `maxRowsPerSegment` or `numShards`. See [indexTuningConfig](../ingestion/native_tasks.html#tuningconfig) for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same, there is no maxRowsPerSegment
and numShards
for compaction tuningConfig. The maxRowsPerSegment
is at the same level with targetCompactionSizeBytes
, not in tuningConfig.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@QiuMM thanks for the review! |
This PR mainly contains three changes:
targetPartitionSize
ofindexTuningConfig
tomaxRowsPerSegment
.maxRowsPerSegment
was added in Add support for 'maxTotalRows' to incremental publishing kafka indexing task and appenderator based realtime task #6129 and this is to use the same name for the same configuration.maxRowsPerSegment
.