Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-2051] Make ColumnarBatchSerializer supports relocation so that continuous shuffle block fetching can be enabled #2052

Merged
merged 2 commits into from
Jul 21, 2023

Conversation

WangGuangxin
Copy link
Contributor

What changes were proposed in this pull request?

Spark supports fetch the contiguous shuffle blocks in batch, which is enabled by default (by conf spark.sql.adaptive.fetchShuffleBlocksInBatch). This feature has a big performance improvement in our production.

However, currently, since ColumnarBatchSerializer's supportsRelocationOfSerializedObjects return false, so that this feature cann't take effect.
In fact, the arrow serialization does support reloation if we don't write schema (which is default to true) and don't write EOS (which is an optional in arrow rpc serialization format)

https://wesm.github.io/arrow-site-test/format/IPC.html#streaming-format
image

(Fixes: #2051)

How was this patch tested?

Manually test using our internal query

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

#2051

@github-actions
Copy link

Run Gluten Clickhouse CI

@WangGuangxin
Copy link
Contributor Author

@zhztheplayer @zhouyuan Can you please help review this?

@zhouyuan
Copy link
Contributor

@WangGuangxin good finding! Curious how the performance gain look like? The marker is 4 bytes only if I i understand this correctly

-yuan

@WangGuangxin
Copy link
Contributor Author

WangGuangxin commented Jun 25, 2023

spark.sql.adaptive.fetchShuffleBlocksInBatch

@zhouyuan Hi, the benefits are not comes from the 4bytes savings, but make the Spark AQE feature "batch fetch shuffle blocks" take effect.
You can refer the feature by the initial commit apache/spark#26040

In short, when Spark AQE do coalesce partitions, it can fetch continuous blocks in batch, instead of fetch blocks one by one. But it has some preconditions, one is the serialization of shuffle must support the so called relocation https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala#L48C48-L48C85.

Currently, the ColumnarBatchSerializer doesn't support relocation. In fact, We can make some simple modifications to make it support relocation by not writing EOS after each shuffle blocks.

@zhouyuan
Copy link
Contributor

spark.sql.adaptive.fetchShuffleBlocksInBatch

@zhouyuan Hi, the benefits are not comes from the 4bytes savings, but make the Spark AQE feature "batch fetch shuffle blocks" take effect. You can refer the feature by the initial commit apache/spark#26040

In short, when Spark AQE do coalesce partitions, it can fetch continuous blocks in batch, instead of fetch blocks one by one. But it has some preconditions, one is the serialization of shuffle must support the so called relocation https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala#L48C48-L48C85.

Currently, the ColumnarBatchSerializer doesn't support relocation. In fact, We can make some simple modifications to make it support relocation by not writing EOS after each shuffle blocks.

@WangGuangxin thanks for detailed explanation, got the idea now

zhouyuan
zhouyuan previously approved these changes Jun 25, 2023
Copy link
Contributor

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@CLTFOREVER
Copy link
Contributor

@WangGuangxin @zhouyuan we have meet the same problem,if the pr is ready,can we merge it?

@github-actions
Copy link

Run Gluten Clickhouse CI

2 similar comments
@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

@zhouyuan zhouyuan force-pushed the support_relocation branch from fd0736b to 4ebd167 Compare July 21, 2023 01:54
@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

Copy link
Contributor

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 works for me

@PHILO-HE PHILO-HE merged commit 55103a3 into apache:main Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] Make ColumnarBatchSerializer supports relocation
4 participants