Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24552][core][sql] Use unique id instead of attempt number for writes [branch-2.3]. #21615

Closed
wants to merge 2 commits into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Jun 22, 2018

This passes a unique attempt id instead of attempt number to v2
data sources and hadoop APIs, because attempt number is reused
when stages are retried. When attempt numbers are reused, sources
that track data by partition id and attempt number may incorrectly
clean up data because the same attempt number can be both committed
and aborted.

…writes.

This passes a unique attempt id instead of attempt number to v2
data sources and hadoop APIs, because attempt number is reused
when stages are retried. When attempt numbers are reused, sources
that track data by partition id and attempt number may incorrectly
clean up data because the same attempt number can be both committed
and aborted.
@tgravescs
Copy link
Contributor

+1 pending tests. @rdblue

@rdblue
Copy link
Contributor

rdblue commented Jun 22, 2018

+1

@SparkQA
Copy link

SparkQA commented Jun 22, 2018

Test build #92226 has finished for PR 21615 at commit a80b57b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 23, 2018

Test build #92234 has finished for PR 21615 at commit f9b134e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Jun 25, 2018
… number for writes .

This passes a unique attempt id instead of attempt number to v2
data sources and hadoop APIs, because attempt number is reused
when stages are retried. When attempt numbers are reused, sources
that track data by partition id and attempt number may incorrectly
clean up data because the same attempt number can be both committed
and aborted.

Author: Marcelo Vanzin <[email protected]>

Closes #21615 from vanzin/SPARK-24552-2.3.
@vanzin vanzin closed this Jun 25, 2018
@vanzin
Copy link
Contributor Author

vanzin commented Jun 25, 2018

Merged to 2.3.

jzhuge pushed a commit to jzhuge/spark that referenced this pull request Aug 20, 2018
… number for writes .

This passes a unique attempt id instead of attempt number to v2
data sources and hadoop APIs, because attempt number is reused
when stages are retried. When attempt numbers are reused, sources
that track data by partition id and attempt number may incorrectly
clean up data because the same attempt number can be both committed
and aborted.

Author: Marcelo Vanzin <[email protected]>

Closes apache#21615 from vanzin/SPARK-24552-2.3.
@vanzin vanzin deleted the SPARK-24552-2.3 branch August 24, 2018 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants