Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5363] fix bug: remove() inside iterator is not safe #4776

Closed
wants to merge 1 commit into from

Conversation

davies
Copy link
Contributor

@davies davies commented Feb 26, 2015

During iterating, it's not safe to remove item.

@davies
Copy link
Contributor Author

davies commented Feb 26, 2015

cc @JoshRosen

@SparkQA
Copy link

SparkQA commented Feb 26, 2015

Test build #27969 has started for PR 4776 at commit a4384a5.

  • This patch merges cleanly.

@JoshRosen
Copy link
Contributor

LGTM pending Jenkins.

@JoshRosen
Copy link
Contributor

Context for other reviewers: removing elements from a mutable HashSet while iterating over it can cause the iteration to skip over entries that weren't removed. In this case, this would cause PythonRDD to write fewer than cnt broadcasts, which caused the Python worker to hang while expecting to read cnt total broadcasts.

@SparkQA
Copy link

SparkQA commented Feb 26, 2015

Test build #27969 has finished for PR 4776 at commit a4384a5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27969/
Test PASSed.

@JoshRosen
Copy link
Contributor

I'm going to merge this into master (1.4.0), branch-1.3 (1.3.0), and branch-1.2 (1.2.2). Thanks!

asfgit pushed a commit that referenced this pull request Feb 26, 2015
Removing elements from a mutable HashSet while iterating over it can cause the
iteration to incorrectly skip over entries that were not removed. If this
happened, PythonRDD would write fewer broadcast variables than the Python
worker was expecting to read, which would cause the Python worker to hang
indefinitely.

Author: Davies Liu <[email protected]>

Closes #4776 from davies/fix_hang and squashes the following commits:

a4384a5 [Davies Liu] fix bug: remvoe() inside iterator is not safe

(cherry picked from commit 7fa960e)
Signed-off-by: Josh Rosen <[email protected]>
@asfgit asfgit closed this in 7fa960e Feb 26, 2015
asfgit pushed a commit that referenced this pull request Feb 26, 2015
Removing elements from a mutable HashSet while iterating over it can cause the
iteration to incorrectly skip over entries that were not removed. If this
happened, PythonRDD would write fewer broadcast variables than the Python
worker was expecting to read, which would cause the Python worker to hang
indefinitely.

Author: Davies Liu <[email protected]>

Closes #4776 from davies/fix_hang and squashes the following commits:

a4384a5 [Davies Liu] fix bug: remvoe() inside iterator is not safe

(cherry picked from commit 7fa960e)
Signed-off-by: Josh Rosen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants