-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[STREAMING] SPARK-2009 Key not found exception when slow receiver starts #961
Conversation
… that getReceivedBlockInfo will be called before compute has been called
@@ -74,7 +74,7 @@ abstract class ReceiverInputDStream[T: ClassTag](@transient ssc_ : StreamingCont | |||
|
|||
/** Get information on received blocks. */ | |||
private[streaming] def getReceivedBlockInfo(time: Time) = { | |||
receivedBlockInfo(time) | |||
receivedBlockInfo.get(time).getOrElse(Array.empty[ReceivedBlockInfo]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind using the two-space indentation like before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
On Tue, Jun 3, 2014 at 9:04 PM, Patrick Wendell [email protected]
wrote:
In
streaming/src/main/scala/org/apache/spark/streaming/dstream/ReceiverInputDStream.scala:@@ -74,7 +74,7 @@ abstract class ReceiverInputDStream[T: ClassTag](@transient ssc_ : StreamingCont
/** Get information on received blocks. */
private[streaming] def getReceivedBlockInfo(time: Time) = {
- receivedBlockInfo(time)
receivedBlockInfo.get(time).getOrElse(Array.empty[ReceivedBlockInfo])
Mind using the two-space indentation like before?
—
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/961/files#r13370405.
From RFC 2631: In ASN.1, EXPLICIT tagging is implicit unless IMPLICIT is
explicitly specified
Perhaps my initial interpretation of what lead to this exception was wrong. Next, after graph.generateJobs, JobGenerator calls every getReceiverInputStreams getReceivedBlockInfo(). It does not check, is the time valid or not. And And btw, here is full stacktrace of the exception:
|
Jenkins, test this please. |
Thanks for the patch. I will take a look shortly. In the meantime, can you merge with the master. The jenkins build is failing because your branch is a little outdated. |
Hi Tathagata, I've tried to merge branch-1.0 with master but the conflict in BTW, I thought that branch-1.0 is a maintenance branch for version-1 so I Thanks, On Wed, Jun 4, 2014 at 6:03 PM, Tathagata Das [email protected]
From RFC 2631: In ASN.1, EXPLICIT tagging is implicit unless IMPLICIT is |
Hey Vadim - thanks for this fix! I'll defer to @tdas on the optimal architecture of the fix, but this one seems good to me. We usually ask that people submit fixes against master and then, when merging the fix, we'll back port it into one or more maintenance branches. The easiest way to do this might be for you to just open a new pull request against master (it's only a one line change). |
Jenkins, retest this please. Okay I think we can merge this purely on the grounds of more defensive coding (it avoids a potential |
I got "java.util.NoSuchElementException: key not found: 1401756085000 ms" exception when using kafka stream and 1 sec batchPeriod. Investigation showed that the reason is that ReceiverLauncher.startReceivers is asynchronous (started in a thread). https://github.com/vchekan/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala#L206 In case of slow starting receiver, such as Kafka, it easily takes more than 2sec to start. In result, no single "compute" will be called on ReceiverInputDStream before first batch job is executed and receivedBlockInfo remains empty (obviously). Batch job will cause ReceiverInputDStream.getReceivedBlockInfo call and "key not found" exception. The patch makes getReceivedBlockInfo more robust by tolerating missing values. Author: Vadim Chekan <[email protected]> Closes #961 from vchekan/branch-1.0 and squashes the following commits: e86f82b [Vadim Chekan] Fixed indentation 4609563 [Vadim Chekan] Key not found exception: if receiver is slow to start, it is possible that getReceivedBlockInfo will be called before compute has been called
I got "java.util.NoSuchElementException: key not found: 1401756085000 ms" exception when using kafka stream and 1 sec batchPeriod. Investigation showed that the reason is that ReceiverLauncher.startReceivers is asynchronous (started in a thread). https://github.com/vchekan/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala#L206 In case of slow starting receiver, such as Kafka, it easily takes more than 2sec to start. In result, no single "compute" will be called on ReceiverInputDStream before first batch job is executed and receivedBlockInfo remains empty (obviously). Batch job will cause ReceiverInputDStream.getReceivedBlockInfo call and "key not found" exception. The patch makes getReceivedBlockInfo more robust by tolerating missing values. Author: Vadim Chekan <[email protected]> Closes apache#961 from vchekan/branch-1.0 and squashes the following commits: e86f82b [Vadim Chekan] Fixed indentation 4609563 [Vadim Chekan] Key not found exception: if receiver is slow to start, it is possible that getReceivedBlockInfo will be called before compute has been called (cherry picked from commit 26f6b98) Signed-off-by: Patrick Wendell <[email protected]>
I got "java.util.NoSuchElementException: key not found: 1401756085000 ms" exception when using kafka stream and 1 sec batchPeriod. Investigation showed that the reason is that ReceiverLauncher.startReceivers is asynchronous (started in a thread). https://github.com/vchekan/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala#L206 In case of slow starting receiver, such as Kafka, it easily takes more than 2sec to start. In result, no single "compute" will be called on ReceiverInputDStream before first batch job is executed and receivedBlockInfo remains empty (obviously). Batch job will cause ReceiverInputDStream.getReceivedBlockInfo call and "key not found" exception. The patch makes getReceivedBlockInfo more robust by tolerating missing values. Author: Vadim Chekan <[email protected]> Closes apache#961 from vchekan/branch-1.0 and squashes the following commits: e86f82b [Vadim Chekan] Fixed indentation 4609563 [Vadim Chekan] Key not found exception: if receiver is slow to start, it is possible that getReceivedBlockInfo will be called before compute has been called (cherry picked from commit 26f6b98) Signed-off-by: Patrick Wendell <[email protected]>
I use spark1.5.2 on yarn,alse meeting the same problem: |
@ouyangshourui this is not at all the same issue. Please file a JIRA with your stack trace and mark the affected version as 1.5.2 |
I use version 1.6.2, it still not be fixed, i found such warns : and my code like this: where stream interval is 60s, and stat_interval is 300s |
### What changes were proposed in this pull request? This PR bumps up the version of pycodestyle from 2.6.0 to 2.7.0 released a month ago. ### Why are the changes needed? 2.7.0 includes three major fixes below (see https://readthedocs.org/projects/pycodestyle/downloads/pdf/latest/): - Fix physical checks (such as W191) at end of file. PR #961. - Add --indent-size option (defaulting to 4). PR #970. - W605: fix escaped crlf false positive on windows. PR #976 The first and third ones could be useful for dev to detect the styles. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually tested locally. Closes #32160 from HyukjinKwon/SPARK-35061. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…bled by planner (#961) * [CARMEL-5986] Make Coalesce/Rebucketing effective when bucketing disabled by planner * fix ut
I got "java.util.NoSuchElementException: key not found: 1401756085000 ms" exception when using kafka stream and 1 sec batchPeriod.
Investigation showed that the reason is that ReceiverLauncher.startReceivers is asynchronous (started in a thread).
https://github.com/vchekan/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala#L206
In case of slow starting receiver, such as Kafka, it easily takes more than 2sec to start. In result, no single "compute" will be called on ReceiverInputDStream before first batch job is executed and receivedBlockInfo remains empty (obviously). Batch job will cause ReceiverInputDStream.getReceivedBlockInfo call and "key not found" exception.
The patch makes getReceivedBlockInfo more robust by tolerating missing values.