-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-4170][VL] Decouple partitions from plan to avoid driver stalled #4177
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
@FelixYBW @rui-mo @PHILO-HE @ulysses-you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @Yohahaha for the idea, looks reasonable to me. One concern is about the name, can we avoid using streamXxx
? I think iteratorXxx
is easy to follow. We are using iterator in both java and native side.
gluten-core/src/main/resources/substrait/proto/substrait/algebra.proto
Outdated
Show resolved
Hide resolved
gluten-core/src/main/java/io/glutenproject/substrait/rel/LocalFilesNode.java
Outdated
Show resolved
Hide resolved
777373d
to
996bccc
Compare
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvement.
/// File schema | ||
NamedStruct schema = 17; | ||
/// File schema | ||
NamedStruct schema = 17; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems these changes are not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's needed, there should be 6 leading space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it about the format? How do you meet these errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just found this format issue and fix it.
backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/IteratorApiImpl.scala
Show resolved
Hide resolved
gluten-data/src/main/java/io/glutenproject/vectorized/NativePlanEvaluator.java
Show resolved
Hide resolved
backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/IteratorApiImpl.scala
Outdated
Show resolved
Hide resolved
backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/IteratorApiImpl.scala
Outdated
Show resolved
Hide resolved
case (splitInfos, index) => | ||
wsCtx.substraitContext.initSplitInfosIndex(0) | ||
wsCtx.substraitContext.setSplitInfos(splitInfos) | ||
val substraitPlan = wsCtx.root.toProtobuf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming the proposed optimization can also be applied for CH backend. If so, it will need some follow-up work from CH engineer. @baibaichen
I assume this solution looks well for all your guys. |
@lgbo-ustc hi, this patch tries to refactor on gen file partitions, please take a look if it will impact CK backend thanks, -yuan |
996bccc
to
422a220
Compare
Run Gluten Clickhouse CI |
3 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
I see QueryBenchmark is similar with GenericBenchmark and QueryBenchmark does not covered by CI, could we remove it? @marin-ma @jinchengchenghh |
cc: @rui-mo |
@Yohahaha GenericBenchmark uses arrow to read files, while QueryBenchmark uses Velox. So QueryBenchmark is useful when we want to test Velox TableScan. I think the better option is to enable QueryBenchmark on CI. @marin-ma Please help to confirm, thanks. |
@rui-mo If input is from middle stage, GenericBenchmark will use arrow reader to load the input iterator. If input is from first stage, the whole pipeline is offloaded including table scan. Here's the doc https://github.com/oap-project/gluten/blob/main/docs/developers/MicroBenchmarks.md#generate-substrait-plan-and-input-for-any-query |
375baed
to
c9fd781
Compare
3d1e7d0
to
95c301b
Compare
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thank you @Yohahaha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this work! Could you please also update the micro benchmark documentation? Noticed that we need to specify split files for first stages.
Run Gluten Clickhouse CI |
LGTM. Thanks! @zzcclp Do you have any further comments? |
We can create followups if there are some new finding issues |
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
What changes were proposed in this pull request?
There are two parts will lead driver stalled when scan contains lots of partitions,
This patch decouple scan splitInfo(LocalFileNodes) from ReadRel to avoid serialize substrait plan for each partition in Driver, when the plan is complex or the number of partitions is particularly large, the cost of this serialization cannot be ignored.
Stream splitInfo(inputIterator) still kept in ReadRel for now.
(Fixes: #4170)
How was this patch tested?