[SPARK-3193]output errer info when Process exit code is not zero in test suite #2108

scwf · 2014-08-24T13:59:01Z

https://issues.apache.org/jira/browse/SPARK-3193
I noticed that sometimes pr tests failed due to the Process exitcode != 0,refer to
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18688/consoleFull
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19118/consoleFull

[info] SparkSubmitSuite:
[info] - prints usage on empty input
[info] - prints usage with only --help
[info] - prints error with unrecognized options
[info] - handle binary specified but not class
[info] - handles arguments with --key=val
[info] - handles arguments to user program
[info] - handles arguments to user program with name collision
[info] - handles YARN cluster mode
[info] - handles YARN client mode
[info] - handles standalone cluster mode
[info] - handles standalone client mode
[info] - handles mesos client mode
[info] - handles confs with flag equivalents
[info] - launch simple application with spark-submit *** FAILED ***
[info] org.apache.spark.SparkException: Process List(./bin/spark-submit, --class, org.apache.spark.deploy.SimpleApplicationTest, --name, testApp, --master, local, file:/tmp/1408854098404-0/testJar-1408854098404.jar) exited with code 1
[info] at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:872)
[info] at org.apache.spark.deploy.SparkSubmitSuite.runSparkSubmit(SparkSubmitSuite.scala:311)
[info] at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply$mcV$sp(SparkSubmitSuite.scala:291)
[info] at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply(SparkSubmitSuite.scala:284)
[info] at org.apacSpark assembly has been built with Hive, including Datanucleus jars on classpath

this PR output the process error info when failed, it can be helpful for diagnosis.

AmplabJenkins · 2014-08-24T14:03:55Z

Can one of the admins verify this patch?

ScrapCodes · 2014-08-25T07:59:04Z

Hey !, Thanks for raising this concern.

The convention in spark is that we look in the [sub-project]/target/unit-tests.log. And this is applicable to all test suits. So when you saw a particular test fail on the Jenkins, you can rerun that test locally and then check that unit-tests.log file for that sub-project.

I hope this helps. You can close this PR if you are convinced.

P.S: May be we can expand our wiki page with this information.

scwf · 2014-08-25T09:04:39Z

Hi, @ScrapCodes, i think unit-tests.log is very big and it's hard to search the matched log of a test suite. And the key point is that the jenkins failed accidentally due to "Process exitcode != 0", reruning may can not replay the error. This PR output the error only when failed.

refer to our discuss on dev list
http://apache-spark-developers-list.1001551.n3.nabble.com/mvn-test-error-td7858.html

ScrapCodes · 2014-08-25T09:14:03Z

Ahh, I am still not sure of the changes. May be the PR has more changes than just fix what you said ?

scwf · 2014-08-25T09:20:19Z

@ScrapCodes, yeah, here i fixed the log4j config of the forked process, this is because the old version is not valid. By the old version the forked process's InputStream output nothing

andrewor14 · 2014-08-27T21:24:36Z

test this please

SparkQA · 2014-08-27T21:31:13Z

QA tests have started for PR 2108 at commit 563fde1.

This patch merges cleanly.

SparkQA · 2014-08-27T23:31:14Z

Tests timed out after a configured wait of 120m.

andrewor14 · 2014-08-28T00:32:42Z

retest this please

SparkQA · 2014-08-28T00:36:02Z

QA tests have started for PR 2108 at commit 563fde1.

This patch merges cleanly.

SparkQA · 2014-08-28T01:32:06Z

QA tests have finished for PR 2108 at commit 563fde1.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

scwf · 2014-08-29T02:04:15Z

@andrewor14, tests passed~

andrewor14 · 2014-08-29T18:36:11Z

@scwf Looks like the code already redirects all stderr of the subprocess to the console. Also, if the process does fail, what we want is not the stdout but stderr, but your code logs the stdout to error. Is there a reason why you're configuring log4j at all? Not sure if I understand that part of your changes.

scwf · 2014-08-30T02:45:18Z

@andrewor14, stderr may not enough when process exitcode != 0(see https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19505/consoleFull), why i configure log4j is that

Logger.getRootLogger().setLevel(Level.WARN)

in DriverSuite is not valid unless we have configured log4j property

SparkQA · 2014-09-05T23:42:23Z

Can one of the admins verify this patch?

andrewor14 · 2014-09-06T00:52:03Z

I see. I would like to see the SparkSubmitSuite and DriverSuite tests fail with this PR so we know what the messages look like before we merge it. Unfortunately these are flaky themselves so we may have to test this a few times...

ok to test

JoshRosen · 2014-09-07T21:45:59Z

These tests are reliably failing in Jenkins Maven build for Spark Master with YARN.

Maybe they'll fail here, too.

Jenkins, retest this please.

JoshRosen · 2014-09-08T19:45:19Z

I SSH'ed into one of the Jenkins boxes and ran the Maven build using this, which resulted in a very interesting error message when SparkSubmitSuite failed:

4/09/08 11:41:40 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
14/09/08 11:41:40 WARN AbstractLifeCycle: FAILED [email protected]:4044: java.net.BindException: Address already in use
[... x 15 ..]
java.net.BindException: Address already in use: Service 'SparkUI' failed after 16 retries!
    at sun.nio.ch.Net.bind0(Native Method)
    at sun.nio.ch.Net.bind(Net.java:444)
    at sun.nio.ch.Net.bind(Net.java:436)
    at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
    at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
    at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
    at org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
    at org.eclipse.jetty.server.Server.doStart(Server.java:293)
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
    at org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:192)
    at org.apache.spark.ui.JettyUtils$$anonfun$3.apply(JettyUtils.scala:202)
    at org.apache.spark.ui.JettyUtils$$anonfun$3.apply(JettyUtils.scala:202)
    at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1414)
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
    at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1410)
    at org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:202)
    at org.apache.spark.ui.WebUI.bind(WebUI.scala:102)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:225)
    at org.apache.spark.deploy.JarCreationTest$.main(SparkSubmitSuite.scala:322)
    at org.apache.spark.deploy.JarCreationTest.main(SparkSubmitSuite.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
- spark submit includes jars passed in through --jar *** FAILED ***
  org.apache.spark.SparkException: Process List(./bin/spark-submit, --class, org.apache.spark.deploy.JarCreationTest, --name, testApp, --master, local-cluster[2,1,512], --jars, file:/tmp/1410205297744-1/testJar-1410205297797.jar,file:/tmp/1410205297797-0/testJar-1410205297849.jar, file:/tmp/1410205297744-0/testJar-1410205297744.jar) exited with code 1
  at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:840)
  at org.apache.spark.deploy.SparkSubmitSuite.runSparkSubmit(SparkSubmitSuite.scala:311)
  at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$15.apply$mcV$sp(SparkSubmitSuite.scala:305)
  at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$15.apply(SparkSubmitSuite.scala:294)
  at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$15.apply(SparkSubmitSuite.scala:294)
  at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
  at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
  ...

It looks like the SparkSubmitSuite test is failing due to running out of Spark web UI ports. This explains why I couldn't reproduce this failure when running that test in isolation.

JoshRosen · 2014-09-08T19:47:05Z

Oh, and the test failure message helpfully included the actual spark-submit command:

- spark submit includes jars passed in through --jar *** FAILED ***
  org.apache.spark.SparkException: Process List(./bin/spark-submit, --class, org.apache.spark.deploy.JarCreationTest, --name, testApp, --master, local-cluster[2,1,512], --jars, file:/tmp/1410205297744-1/testJar-1410205297797.jar,file:/tmp/1410205297797-0/testJar-1410205297849.jar, file:/tmp/1410205297744-0/testJar-1410205297744.jar) exited with code 1
  at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:840)
  at org.apache.spark.deploy.SparkSubmitSuite.runSparkSubmit(SparkSubmitSuite.scala:311)
  at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$15.apply$mcV$sp(SparkSubmitSuite.scala:305)
  at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$15.apply(SparkSubmitSuite.scala:294)
  at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$15.apply(SparkSubmitSuite.scala:294)
  at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
  at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)

JoshRosen · 2014-09-08T21:10:19Z

I'm going to merge this patch, since this additional logging will be helpful in diagnosing the problem behind the failing DriverSuite and SparkSubmitSuite tests.

JoshRosen · 2014-09-08T21:10:43Z

Actually, let me make sure that it passes Jenkins first...

Jenkins, this is ok to test.

srowen · 2014-09-08T21:54:23Z

I also was unable to reproduce it in Maven when only SparkSubmitSuite was run. This does seem consistent with the failure being caused by port conflict but I still can't guess why it doesn't happen in SBT.

I found that setting <forkMode>never</forkMode> in the scalatest plugin causes this stack trace to show up immediately in the test output too, which is at least handy for reproducing it.

Another interesting data point is that if I hack the code temporarily to choose a random UI port, it's all fine.

Choosing ports randomly might be a good idea in the longer-term anyway to facilitate parallel tests, and I believe is mentioned in another PR.

I suppose it's a potential solution here too, but would be good to understand why it doesn't happen with SBT. Neither is parallel. Maybe some difference in the test lifecycle allows the previous web servers to shut down more reliably before the next ones start? I'm only guessing at this stage.

andrewor14 · 2014-09-08T22:34:52Z

@srowen Actually we already use random ports in SBT tests (by setting spark.ui.port to 0). This was introduced in #1777. This may be why the tests are failing in MVN much more frequently than in SBT. This also explains why we can't reproduce the test failure when we only run one or two tests, because we haven't had a chance to accumulate open ports.

@JoshRosen Even before this patch it printed the command it ran, but it seems that with this patch we also get the actual stack trace with the BindException, which is very nice.

JoshRosen · 2014-09-08T23:59:01Z

Jenkins, retest this please.

SparkQA · 2014-09-09T00:35:34Z

QA tests have started for PR 2108 at commit 563fde1.

This patch merges cleanly.

SparkQA · 2014-09-09T00:49:12Z

QA tests have started for PR 2108 at commit 563fde1.

This patch merges cleanly.

SparkQA · 2014-09-09T01:52:22Z

QA tests have finished for PR 2108 at commit 563fde1.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2014-09-09T03:56:25Z

core/src/test/scala/org/apache/spark/DriverSuite.scala

@@ -18,9 +18,9 @@
 package org.apache.spark

 import java.io.File
+import java.util.Properties


import not used I think

andrewor14 · 2014-09-09T03:56:35Z

core/src/test/scala/org/apache/spark/DriverSuite.scala


-import org.apache.log4j.Logger
-import org.apache.log4j.Level
+import org.apache.log4j.{PropertyConfigurator, Logger, Level}


yes, thanks for this

srowen · 2014-09-09T06:39:32Z

@andrewor14 @JoshRosen Bingo! Adding <spark.ui.port>0</spark.ui.port> to the Maven build makes SparkSubmitSuite pass for me where it failed before. This was already set in the SBT build. I'll submit a PR against https://issues.apache.org/jira/browse/SPARK-3404

scwf · 2014-09-09T16:58:27Z

@srowen, it is cool but can not explain SBT test failure. we can use this pr to test a few times to diagnose SBT test problem.

srowen · 2014-09-09T17:04:08Z

@scwf The SBT build has already set -Dspark.ui.port=0, so I suspect this is in fact the difference:
https://github.com/apache/spark/blob/master/project/SparkBuild.scala#L340

scwf · 2014-09-09T17:16:47Z

@srowen ,yeah, we should set spark.ui.port in maven, but 'exited with code 1' error also happened when test using SBT, there may be some other cause behind this

srowen · 2014-09-09T17:17:54Z

@scwf Yes there is still some underlying issue here, where tests hold open ports somehow for a long time. Randomizing the starting port usually avoids most collision, but not all the time. It's a step forward but it is not the ultimate solution.

JoshRosen · 2014-09-09T17:24:16Z

I wonder if this is a resource contention issue from having many parallel copies of the tests running on the same Jenkins worker. For example, we might be exhausting ephemeral ports.

I think that #2259 will help with this by dramatically reducing the number of ephemeral ports used by PySpark jobs.

scwf · 2014-09-09T17:48:14Z

@JoshRosen, exhausting ephemeral ports has small probability, if so we should reduce ephemeral ports. Or we can just verify the port available before using it and try several times to get a free one?

andrewor14 · 2014-09-09T18:55:40Z

@scwf Actually we already try several times before getting a free one. My interpretation of this is we simply ran out of ports, such that no matter how many times we retry we can't get a free one. Using a random port for the UI certainly reduces probability of collision, but yes it is definitely not the final solution as it simply puts maven tests on par with SBT tests, the latter of which still fail sometimes but not always.

@JoshRosen #2259 may help for PySpark tests, but SparkSubmitSuite and DriverSuite are run before those, so regardless we need to find another way to reduce the number of ports used.

One solution @JoshRosen and I discussed is to simply not start the SparkUI during tests (maybe except for the UISuite). This alone will surely free up a significant number of ports we currently occupy.

andrewor14 · 2014-09-09T18:56:13Z

For now, I will merge this (master + 1.1) because this helps us debug these test failures. Thanks.

…est suite https://issues.apache.org/jira/browse/SPARK-3193 I noticed that sometimes pr tests failed due to the Process exitcode != 0,refer to https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18688/consoleFull https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19118/consoleFull [info] SparkSubmitSuite: [info] - prints usage on empty input [info] - prints usage with only --help [info] - prints error with unrecognized options [info] - handle binary specified but not class [info] - handles arguments with --key=val [info] - handles arguments to user program [info] - handles arguments to user program with name collision [info] - handles YARN cluster mode [info] - handles YARN client mode [info] - handles standalone cluster mode [info] - handles standalone client mode [info] - handles mesos client mode [info] - handles confs with flag equivalents [info] - launch simple application with spark-submit *** FAILED *** [info] org.apache.spark.SparkException: Process List(./bin/spark-submit, --class, org.apache.spark.deploy.SimpleApplicationTest, --name, testApp, --master, local, file:/tmp/1408854098404-0/testJar-1408854098404.jar) exited with code 1 [info] at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:872) [info] at org.apache.spark.deploy.SparkSubmitSuite.runSparkSubmit(SparkSubmitSuite.scala:311) [info] at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply$mcV$sp(SparkSubmitSuite.scala:291) [info] at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply(SparkSubmitSuite.scala:284) [info] at org.apacSpark assembly has been built with Hive, including Datanucleus jars on classpath this PR output the process error info when failed, it can be helpful for diagnosis. Author: scwf <[email protected]> Closes #2108 from scwf/output-test-error-info and squashes the following commits: 0c48082 [scwf] minor fix according to comments 563fde1 [scwf] output errer info when Process exitcode not zero (cherry picked from commit 2686233) Signed-off-by: Andrew Or <[email protected]>

output errer info when Process exitcode not zero

563fde1

scwf mentioned this pull request Aug 31, 2014

[SPARK-3010] fix redundant conditional #1992

Closed

andrewor14 reviewed Sep 9, 2014
View reviewed changes

minor fix according to comments

0c48082

asfgit closed this in 2686233 Sep 9, 2014

andrewor14 mentioned this pull request Sep 23, 2014

[SPARK-2778] [yarn] Add yarn integration tests. #2257

Closed

LuciferYang mentioned this pull request Jan 5, 2022

[SPARK-37805][TESTS] Refactor TestUtils#configTestLog4j method to use log4j2 api #35095

Closed

[SPARK-3193]output errer info when Process exit code is not zero in test suite #2108

[SPARK-3193]output errer info when Process exit code is not zero in test suite #2108

Conversation

scwf commented Aug 24, 2014

AmplabJenkins commented Aug 24, 2014

ScrapCodes commented Aug 25, 2014

scwf commented Aug 25, 2014

ScrapCodes commented Aug 25, 2014

scwf commented Aug 25, 2014

andrewor14 commented Aug 27, 2014

SparkQA commented Aug 27, 2014

SparkQA commented Aug 27, 2014

andrewor14 commented Aug 28, 2014

SparkQA commented Aug 28, 2014

SparkQA commented Aug 28, 2014

scwf commented Aug 29, 2014

andrewor14 commented Aug 29, 2014

scwf commented Aug 30, 2014

SparkQA commented Sep 5, 2014

andrewor14 commented Sep 6, 2014

JoshRosen commented Sep 7, 2014

JoshRosen commented Sep 8, 2014

JoshRosen commented Sep 8, 2014

JoshRosen commented Sep 8, 2014

JoshRosen commented Sep 8, 2014

srowen commented Sep 8, 2014

andrewor14 commented Sep 8, 2014

JoshRosen commented Sep 8, 2014

SparkQA commented Sep 9, 2014

SparkQA commented Sep 9, 2014

SparkQA commented Sep 9, 2014

andrewor14 Sep 9, 2014

Choose a reason for hiding this comment

andrewor14 Sep 9, 2014

Choose a reason for hiding this comment

scwf Sep 9, 2014

Choose a reason for hiding this comment

srowen commented Sep 9, 2014

scwf commented Sep 9, 2014

srowen commented Sep 9, 2014

scwf commented Sep 9, 2014

srowen commented Sep 9, 2014

JoshRosen commented Sep 9, 2014

scwf commented Sep 9, 2014

andrewor14 commented Sep 9, 2014

andrewor14 commented Sep 9, 2014