-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-38124][SS][FOLLOWUP] Document the current challenge on fixing distribution of stateful operator #35512
Conversation
…distribution of stateful operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 just some minor grammar thing
@@ -101,6 +101,14 @@ case class ClusteredDistribution( | |||
* Since this distribution relies on [[HashPartitioning]] on the physical partitioning of the | |||
* stateful operator, only [[HashPartitioning]] (and HashPartitioning in | |||
* [[PartitioningCollection]]) can satisfy this distribution. | |||
* | |||
* NOTE: This is applied only stream-stream join as of now. For other stateful operators, we have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"applied only to stream-stream join"?
* partitionings can satisfy the requirement.) We need to construct the way to fix this with | ||
* minimizing possibility to break the existing checkpoints. | ||
* | ||
* TODO: SPARK-38204 to address above note. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit nit: I saw we usually use the pattern TODO(SPARK-38204)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to use both, but I see more usages on () so I'll follow it. Thanks!
* | ||
* NOTE: This is applied only stream-stream join as of now. For other stateful operators, we have | ||
* been using ClusteredDistribution, which could construct the physical partitioning of the state | ||
* in different way. (ClusteredDistribution requires relaxed condition and multiple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe "in different way (ClusteredDistribution requires) ...": no dot after "way".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @HeartSaVioR!
Thanks! Merging to master. |
What changes were proposed in this pull request?
This PR proposes to add the context of current challenge on fixing distribution of stateful operator, even the distribution is a sort of "broken" now.
This PR addresses the review comment #35419 (comment)
Why are the changes needed?
In SPARK-38124 we figured out the existing long-standing problem in stateful operator, but it is not easy to fix since the fix may break the existing query if the fix is not carefully designed. Anyone should also be pretty much careful when touching the required distribution. We want to document this explicitly to help others to be careful whenever someone is around the codebase.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Code comment only changes.