[SPARK-25342][CORE][SQL]Support rolling back a result stage and rerunning all result tasks when writing files #37359
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
From this pr:#22112, we learn that currently we can't rollback and rerun a result stage, and just fail.
And this new pr is designed to solve some scenarios of this problem. When the analysis result from the result stage of a job will be output to a storage system, it can be written to a file system or database system.
isResultStageRetryAllowed
in RDD class to indicate whether its corresponding Result stage supports retries.It is a Boolean variable and the default value is false,indicating that result stage rollback is not supported and corresponds to the scenario of writing to the database.
And in the case of writing to the file system, the result stage supports retries, and
isResultStageRetryAllowed
will be changed to true.Does this PR introduce any user-facing change?
No
How was this patch tested?
new tests and manually test
write to hive

write to iceberg

write to hdfs

write to mysql
