-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Sorting / Merge performance #2427
Comments
In case anyone is interested, here are some flame graphs I gathered for the merge benchmarks: They were all gathered from alamb@MacBook-Pro-6 arrow-datafusion % git merge-base alamb/alamb/merge_bench apache/master
7b7edf9c43383c1d3310286b69d2d037db72c967 Conclusion, unshockingly, is that |
Here is a trace from IOx showing a lot of time being spent sorting batches...
|
I'm currently prototyping some stuff, hope to push a draft in the coming days for feedback. |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I plan to make sorting / merging faster. My reasons;
Describe the solution you'd like
Basically the plan is to follow the advice given by Goetz Graefe in Implementing sorting in database systems
and successfully implemented in systems like DuckDB (see blog post)`
It will likely involve some combination of a specialization of the row format and JIT comparisons
Here is my rough plan and a sketch of the kinds of things I want to work on
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: