Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VersionedIntervalTimeline.isOvershadowed is very slow when processing many segments #11700

Closed
viongpanzi opened this issue Sep 13, 2021 · 8 comments

Comments

@viongpanzi
Copy link
Contributor

Affected Version

0.18.1

Description

There are millions segments in our Druid cluster. Coordinator is very slow when processing duties. After generating the flame chart, we found that VersionedIntervalTimeline.isOvershadowed has poor performance.

image

But when we read the code according to the above call stack, we get confused about why DataSegment.includeRootPartitions cost so much time (A large part of the top of the flame chart is blank).

  private boolean includeRootPartitions(DataSegment other)
  {
    return shardSpec.getStartRootPartitionId() <= other.shardSpec.getStartRootPartitionId()
           && shardSpec.getEndRootPartitionId() >= other.shardSpec.getEndRootPartitionId();
  }
@abhishekagarwal87
Copy link
Contributor

do you have a lot of segments in a single interval? The MarkAsUnusedOvershadowedSegments doesn't look very efficient when there are a lot of segments in a single interval. This duty iterates over each segment and then in each iteration may compare the segment in the outer loop against each segment in the interval. So it's not that the includeRootPartitions itself is costly. But it is being called too many times.

@viongpanzi
Copy link
Contributor Author

@abhishekagarwal87 Yes. There are about 20,000 segments in a single interval.

@viongpanzi
Copy link
Contributor Author

@abhishekagarwal87 Does it help to replace for loop with Iterators.all?

@abhishekagarwal87
Copy link
Contributor

How will that help? As long as the number of computations/ops is the same, the time taken won't change much.

@viongpanzi
Copy link
Contributor Author

How will that help? As long as the number of computations/ops is the same, the time taken won't change much.

@abhishekagarwal87 You're right! Thanks for your quick reply!

@jihoonson
Copy link
Contributor

The current design assumes that there won't be too many segments in each time chunk. The number of "too many segments" can be defined differently based on your machine spec, but, a rule of thumb is less than 2k. The workaround can depend on why you have such many segments. If you have a lot of small segments, you can use auto compaction to merge them (https://support.imply.io/hc/en-us/articles/360055221054-How-to-set-the-auto-compaction-config-in-the-Druid-console). If you do have large segments of 20k, you can repartition your datasource with a smaller segment granularity. For example, if your datasource is currently daily-partitioned, you can repartition it to be hourly-partitioned. The auto compaction can help with switching segment granularity as well, but it started supporting segment granularity recently, so please check Druid docs to see if auto compaction supports segment granularity in your Druid version.

@viongpanzi
Copy link
Contributor Author

@jihoonson Thank you for your explanation and advice! Our data volume is too large, we need to keep it to be daily-partitioned in order to save more disks. Now the Coordinator has been downgraded to 0.13 and the problem is gone (We are not using compaction yet ).

@kfaraz
Copy link
Contributor

kfaraz commented Nov 10, 2022

The main issue of high runtime of MarkAsUnusedOvershadowedSegments has been fixed in #13287 . As pointed earlier by @abhishekagarwal87 , the isOvershadowed method itself might not be too costly but since it was invoked too many times, it amounted to a very long runtime.

The fix looks only at the segments that have already been identified to be overshadowed by the timeline, rather than iterating over all the used segments.

@kfaraz kfaraz closed this as completed Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants