Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3292][streaming]Fix:Shuffle Tasks run incessantly even though there's no inputs #2192

Closed
wants to merge 2 commits into from

Conversation

guowei2
Copy link
Contributor

@guowei2 guowei2 commented Aug 29, 2014

with this PR: no job is commited when there's no imputs.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@guowei2
Copy link
Contributor Author

guowei2 commented Aug 29, 2014

i recommit this.for the last PR is compicated by mistake

@SparkQA
Copy link

SparkQA commented Sep 5, 2014

Can one of the admins verify this patch?

@rxin
Copy link
Contributor

rxin commented Sep 27, 2014

@tdas can you take a look at this?

@tdas
Copy link
Contributor

tdas commented Oct 1, 2014

This is not a good idea. Not returning an RDD can mess up a lot of the logic and semantics. For example if there is a transform() followed by updateStateByKey(), the result will be unpredictable. updateStateByKey expects the previous batch to have a state RDD. If it does not find any state RDD it will assume that this the start of the streamign computation and effectively initialize again, forgetting the previous states from 2 batches ago. So this change is incorrect.

@tdas
Copy link
Contributor

tdas commented Nov 7, 2014

@guowei2 As i had explained, this is not a good idea because it breaks semantics for a state dstream. Mind closing this PR?

@asfgit asfgit closed this in f73b56f Nov 10, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants