-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize index merge #1938
optimize index merge #1938
Conversation
I use druid-0.8.1 to test the performance: 2015-11-06T06:18:04,977 INFO [main] io.druid.segment.IndexMerger - outDir[/home/tianzhao.blj/log/merge/v8-tmp] completed index.drd in 39 millis. |
With the patch binlijin@35bb704 , the result is : 2015-11-06T07:22:20,902 INFO [main] io.druid.segment.IndexMerger - outDir[/home/tianzhao.blj/log/merge/v8-tmp] completed index.drd in 40 millis. |
With the patch binlijin@f2b736c , the result is : |
With the patch binlijin@2e31b45 , the result is : |
@@ -216,18 +217,12 @@ public File mergeQueryableIndex( | |||
ProgressIndicator progress | |||
) throws IOException | |||
{ | |||
List<IndexableAdapter> indexAdapteres = new ArrayList<IndexableAdapter>(); | |||
for (QueryableIndex index : indexes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really faster than Lists.transform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not about the speed of this loop, it's about the speed of access of items in that list. Lists.transform only creates a "view" of the original list, meaning the function gets applied every time you access an element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drcrallen, just like @xvrl said.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, during the merge, there is a iteration like
for (dimValue : allDimValues) {
for (int j = 0; j < indexes.size(); ++j) {
}
}
The function will get applied "cardinality" times, and there is some disk IO in the constructor of QueryableIndexIndexableAdapter.
HI @binlijin thanks for the awesome PR. A few points and questions:
Cheers and thanks again for making Druid better! |
return merge( | ||
Lists.transform( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could avoid the for loop by simply doing a Lists.newArrayList(Iterables.transform(..))
It would also be good to add a comment here that we are materializing the list for performance reasons.
Hi @drcrallen @himanshug , I want to explain what's purpose of the BitmapIndexSeeker which the main performance boost comes from. There are 2 properties we did not use during the merging:
There is still a potential optimization though, since we use the Indexed returned by "getDimValueLookup" to get the value of the id which may causes some binary search or disk io later. But these costs are small right now. When these costs get higher, we can move the "seeker" into the bitmap index, and travel the value sequentially to get the value and id tuple in O(1). By then the complexity will be O(N * M). |
This is pretty cool and should be a great addition. Totally makes sense that not doing a binary search for every value to get the id would speed things up, didn't realize how much though, thanks for the contribution! 👍 @drcrallen You added 0.9.0, is that just an indication that we want it in the next release after 0.8.2 and that'll be 0.9.0? Or are you intending that this isn't in 0.8.3 if we choose to do 0.8.3? This is a fully backwards compatible change (the interface that was adjusted is a purely internal interface) so it should be able to go in whatever release happens after 0.8.2. |
I'm ok with having this make into 0.8.3 |
@cheddar I meant next release, which at the time was going to be 0.9.0 |
Overall this is looking very good. I have to admit that following the change is a bit mind-bending. Would you mind squashing your commits? |
Wait, I'm crazy, we have a CLA for @KurtYoung but not for @binlijin , @binlijin would you mind filling out a CLA? http://druid.io/community/cla.html |
@Override | ||
public int get(int index) | ||
{ | ||
throw new UnsupportedOperationException("This is really slow, so it's just not supported."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically this is only slow for concise bitmaps, but is fast with roaring bitmaps, so we should update this comment.
Also, it looks like this whole anonymous class is copy-paste from IncrementaIndexAdapter. Is it possible to create a simple class to wrap a bitmapIndex and turn it into an IndexInts and use it both places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drcrallen, done, i do fill out a CLA.
@xvrl, i do refactor the code and reuse it.
@binlijin Had a few minor suggestions about code-reuse and adding comments, I'm 👍 otherwise once commits are squasheed |
I merge all commit into one and do not know how to reopen with this pull request , so open with the new PR #1960 |
@binlijin thanks, GitHub is weird like that and won't let you re-open a PR that was force pushed. |
@binlijin the trick is to reopen first, then force push. |
@drcrallen @gianm , Thank your very much. |
optimize index merge performance.