Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23785][LAUNCHER] LauncherBackend doesn't check state of connection before setting state #20893

Closed
wants to merge 1 commit into from

Conversation

sahilTakiar
Copy link

What changes were proposed in this pull request?

Changed LauncherBackend set method so that it checks if the connection is open or not before writing to it (uses isConnected).

How was this patch tested?

None

@@ -114,10 +114,10 @@ private[spark] abstract class LauncherBackend {

override def close(): Unit = {
try {
_isConnected = false
super.close()
} finally {
onDisconnected()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I searched the code and seems this is a no-op?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_isConnected is used in def isConnected(). Moving it from the finally block to before the call to super.close() avoids a race condition where a client tries to write to the connection after it has been closed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know your meaning, but I'm not referring to isConnected, I mean onDisconnected seems not doing anything on Spark side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite directly related, but this should means it is safe to move _isConnected = false to before onDisconnected().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, your right, it doesn't look like onDisconnected is being used anywhere, but its marked as protected so I guess its meant to be used by a sub-class (although no sub-class uses it).

So agree, this change should be safe, it doesn't change the semantics of any use of onDisconnected

@jiangxb1987
Copy link
Contributor

The change looks good, cc @cloud-fan

@felixcheung
Copy link
Member

Jenkins, ok to test

@SparkQA
Copy link

SparkQA commented Mar 26, 2018

Test build #88577 has finished for PR 20893 at commit 4ca8a32.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Mar 26, 2018

retest this please

@vanzin
Copy link
Contributor

vanzin commented Mar 26, 2018

The change looks ok but the test being added is totally unrelated to it.

If you want to write a test, it should be not too hard to add something to SparkLauncherSuite in core/. Start an in-process app that waits on a lock, disconnect the handle, then wake up the lock so the app finishes successfully. That should fail without your fix and work with it.

@sahilTakiar
Copy link
Author

sahilTakiar commented Mar 26, 2018

Ok, I'll work on writing a test for SparkLauncherSuite.

The test added here was meant to cover the race condition mentioned here

@SparkQA
Copy link

SparkQA commented Mar 27, 2018

Test build #88604 has finished for PR 20893 at commit 4ca8a32.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sahilTakiar
Copy link
Author

Wrote a test in SparkLauncherSuite and was able to replicate the error from HIVE-18533, and then realized the exception is only logged and then swallowed. From SparkContext

    if (_dagScheduler != null) {
      Utils.tryLogNonFatalError {
        _dagScheduler.stop()
      }
      _dagScheduler = null
    }

Seems the failures I was seeing in HIVE-18533 are due to something else.

Regardless, this is still probably a good fix since you still don't want to write to the connection unless its open, but given that the exception is only logged and not thrown, don't see an easy way to write a test for this.

@vanzin
Copy link
Contributor

vanzin commented Mar 28, 2018

SGTM. Jenkins, retest this please.

@vanzin
Copy link
Contributor

vanzin commented Mar 28, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Mar 29, 2018

Test build #88679 has finished for PR 20893 at commit 4ca8a32.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Mar 29, 2018

Merging to master / 2.3.

asfgit pushed a commit that referenced this pull request Mar 29, 2018
…tion before setting state

## What changes were proposed in this pull request?

Changed `LauncherBackend` `set` method so that it checks if the connection is open or not before writing to it (uses `isConnected`).

## How was this patch tested?

None

Author: Sahil Takiar <[email protected]>

Closes #20893 from sahilTakiar/master.

(cherry picked from commit 491ec11)
Signed-off-by: Marcelo Vanzin <[email protected]>
@asfgit asfgit closed this in 491ec11 Mar 29, 2018
@vanzin
Copy link
Contributor

vanzin commented Mar 30, 2018

Let me revert the change from 2.3 and open a bug for the flaky test.

@dongjoon-hyun
Copy link
Member

Thank you so much for swift action, @vanzin !

@gatorsmile
Copy link
Member

cc @vanzin @sahilTakiar @jiangxb1987

@gatorsmile
Copy link
Member

Let me revert this PR from the master. Please resubmit the PR after addressing the flaky tests. Thanks!

@vanzin
Copy link
Contributor

vanzin commented Apr 2, 2018

The test is fixed by #20950. Just push that one instead.

@gatorsmile
Copy link
Member

I see. Could you merge that PR?

@vanzin
Copy link
Contributor

vanzin commented Apr 2, 2018

Sure. Although I'm not sure why other committers who review PRs don't just merge them too...

mshtelma pushed a commit to mshtelma/spark that referenced this pull request Apr 5, 2018
…tion before setting state

## What changes were proposed in this pull request?

Changed `LauncherBackend` `set` method so that it checks if the connection is open or not before writing to it (uses `isConnected`).

## How was this patch tested?

None

Author: Sahil Takiar <[email protected]>

Closes apache#20893 from sahilTakiar/master.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants