-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2677][SPARK-2717]BasicBlockFetchIterator#next can wait forever #1619
Conversation
QA tests have started for PR 1619. This patch merges cleanly. |
QA results for PR 1619: |
Can you create a test for this? I'm not sure what happens here if the timeout is encountered. |
@witgo @pwendell I have already noticed there is not a configuration for timeout for ConnectionManager, but the timeout for ConnectionManager does not resolve this issue because the channel used by receiving ack is implemented as non blocking I.O and SO_TIMEOUT is effects read after establishing connection. So, if remote executor hangs, it cannot establish connections with fetching executors. Additionally, BasicBlockFetcherIterator is wait on LinkedBlockingQueue#take (result.take) so we should set FetchResult object which size is -1 to result queue of BasicBlockFetcherIterator. I think remote errors can be classified following 2 cases.
|
QA tests have started for PR 1619. This patch merges cleanly. |
@sarutak I think add a heartbeat detection mechanism is a good solution |
QA results for PR 1619: |
@sarutak ConnectionManager.scala#L259 to deal with the situation of connection cannot be established. |
QA tests have started for PR 1619. This patch merges cleanly. |
QA results for PR 1619: |
QA tests have started for PR 1619. This patch merges cleanly. |
QA tests have started for PR 1619. This patch merges cleanly. |
QA tests have started for PR 1619. This patch merges cleanly. |
QA results for PR 1619: |
QA results for PR 1619: |
QA results for PR 1619: |
QA tests have started for PR 1619. This patch merges cleanly. |
QA results for PR 1619: |
QA tests have started for PR 1619. This patch merges cleanly. |
QA results for PR 1619: |
QA tests have started for PR 1619. This patch merges cleanly. |
QA results for PR 1619: |
No description provided.