-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3193]output errer info when Process exit code is not zero in test suite #2108
Conversation
Can one of the admins verify this patch? |
Hey !, Thanks for raising this concern. The convention in spark is that we look in the I hope this helps. You can close this PR if you are convinced. P.S: May be we can expand our wiki page with this information. |
Hi, @ScrapCodes, i think unit-tests.log is very big and it's hard to search the matched log of a test suite. And the key point is that the jenkins failed accidentally due to "Process exitcode != 0", reruning may can not replay the error. This PR output the error only when failed. refer to our discuss on dev list |
Ahh, I am still not sure of the changes. May be the PR has more changes than just fix what you said ? |
@ScrapCodes, yeah, here i fixed the log4j config of the forked process, this is because the old version is not valid. By the old version the forked process's InputStream output nothing |
test this please |
QA tests have started for PR 2108 at commit
|
Tests timed out after a configured wait of |
retest this please |
QA tests have started for PR 2108 at commit
|
QA tests have finished for PR 2108 at commit
|
@andrewor14, tests passed~ |
@scwf Looks like the code already redirects all |
@andrewor14, Logger.getRootLogger().setLevel(Level.WARN) in |
Can one of the admins verify this patch? |
I see. I would like to see the ok to test |
These tests are reliably failing in Jenkins Maven build for Spark Master with YARN. Maybe they'll fail here, too. Jenkins, retest this please. |
I SSH'ed into one of the Jenkins boxes and ran the Maven build using this, which resulted in a very interesting error message when SparkSubmitSuite failed:
It looks like the SparkSubmitSuite test is failing due to running out of Spark web UI ports. This explains why I couldn't reproduce this failure when running that test in isolation. |
Oh, and the test failure message helpfully included the actual spark-submit command: - spark submit includes jars passed in through --jar *** FAILED ***
org.apache.spark.SparkException: Process List(./bin/spark-submit, --class, org.apache.spark.deploy.JarCreationTest, --name, testApp, --master, local-cluster[2,1,512], --jars, file:/tmp/1410205297744-1/testJar-1410205297797.jar,file:/tmp/1410205297797-0/testJar-1410205297849.jar, file:/tmp/1410205297744-0/testJar-1410205297744.jar) exited with code 1
at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:840)
at org.apache.spark.deploy.SparkSubmitSuite.runSparkSubmit(SparkSubmitSuite.scala:311)
at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$15.apply$mcV$sp(SparkSubmitSuite.scala:305)
at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$15.apply(SparkSubmitSuite.scala:294)
at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$15.apply(SparkSubmitSuite.scala:294)
at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22) |
I'm going to merge this patch, since this additional logging will be helpful in diagnosing the problem behind the failing DriverSuite and SparkSubmitSuite tests. |
Actually, let me make sure that it passes Jenkins first... Jenkins, this is ok to test. |
I also was unable to reproduce it in Maven when only I found that setting Another interesting data point is that if I hack the code temporarily to choose a random UI port, it's all fine. Choosing ports randomly might be a good idea in the longer-term anyway to facilitate parallel tests, and I believe is mentioned in another PR. I suppose it's a potential solution here too, but would be good to understand why it doesn't happen with SBT. Neither is parallel. Maybe some difference in the test lifecycle allows the previous web servers to shut down more reliably before the next ones start? I'm only guessing at this stage. |
@srowen Actually we already use random ports in SBT tests (by setting @JoshRosen Even before this patch it printed the command it ran, but it seems that with this patch we also get the actual stack trace with the |
Jenkins, retest this please. |
QA tests have started for PR 2108 at commit
|
QA tests have started for PR 2108 at commit
|
QA tests have finished for PR 2108 at commit
|
@@ -18,9 +18,9 @@ | |||
package org.apache.spark | |||
|
|||
import java.io.File | |||
import java.util.Properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import not used I think
|
||
import org.apache.log4j.Logger | ||
import org.apache.log4j.Level | ||
import org.apache.log4j.{PropertyConfigurator, Logger, Level} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thanks for this
@andrewor14 @JoshRosen Bingo! Adding |
@srowen, it is cool but can not explain SBT test failure. we can use this pr to test a few times to diagnose SBT test problem. |
@scwf The SBT build has already set |
@srowen ,yeah, we should set |
@scwf Yes there is still some underlying issue here, where tests hold open ports somehow for a long time. Randomizing the starting port usually avoids most collision, but not all the time. It's a step forward but it is not the ultimate solution. |
I wonder if this is a resource contention issue from having many parallel copies of the tests running on the same Jenkins worker. For example, we might be exhausting ephemeral ports. I think that #2259 will help with this by dramatically reducing the number of ephemeral ports used by PySpark jobs. |
@JoshRosen, exhausting ephemeral ports has small probability, if so we should reduce ephemeral ports. Or we can just verify the port available before using it and try several times to get a free one? |
@scwf Actually we already try several times before getting a free one. My interpretation of this is we simply ran out of ports, such that no matter how many times we retry we can't get a free one. Using a random port for the UI certainly reduces probability of collision, but yes it is definitely not the final solution as it simply puts maven tests on par with SBT tests, the latter of which still fail sometimes but not always. @JoshRosen #2259 may help for PySpark tests, but One solution @JoshRosen and I discussed is to simply not start the SparkUI during tests (maybe except for the |
For now, I will merge this (master + 1.1) because this helps us debug these test failures. Thanks. |
…est suite https://issues.apache.org/jira/browse/SPARK-3193 I noticed that sometimes pr tests failed due to the Process exitcode != 0,refer to https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18688/consoleFull https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19118/consoleFull [info] SparkSubmitSuite: [info] - prints usage on empty input [info] - prints usage with only --help [info] - prints error with unrecognized options [info] - handle binary specified but not class [info] - handles arguments with --key=val [info] - handles arguments to user program [info] - handles arguments to user program with name collision [info] - handles YARN cluster mode [info] - handles YARN client mode [info] - handles standalone cluster mode [info] - handles standalone client mode [info] - handles mesos client mode [info] - handles confs with flag equivalents [info] - launch simple application with spark-submit *** FAILED *** [info] org.apache.spark.SparkException: Process List(./bin/spark-submit, --class, org.apache.spark.deploy.SimpleApplicationTest, --name, testApp, --master, local, file:/tmp/1408854098404-0/testJar-1408854098404.jar) exited with code 1 [info] at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:872) [info] at org.apache.spark.deploy.SparkSubmitSuite.runSparkSubmit(SparkSubmitSuite.scala:311) [info] at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply$mcV$sp(SparkSubmitSuite.scala:291) [info] at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply(SparkSubmitSuite.scala:284) [info] at org.apacSpark assembly has been built with Hive, including Datanucleus jars on classpath this PR output the process error info when failed, it can be helpful for diagnosis. Author: scwf <[email protected]> Closes #2108 from scwf/output-test-error-info and squashes the following commits: 0c48082 [scwf] minor fix according to comments 563fde1 [scwf] output errer info when Process exitcode not zero (cherry picked from commit 2686233) Signed-off-by: Andrew Or <[email protected]>
https://issues.apache.org/jira/browse/SPARK-3193
I noticed that sometimes pr tests failed due to the Process exitcode != 0,refer to
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18688/consoleFull
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19118/consoleFull
[info] SparkSubmitSuite:
[info] - prints usage on empty input
[info] - prints usage with only --help
[info] - prints error with unrecognized options
[info] - handle binary specified but not class
[info] - handles arguments with --key=val
[info] - handles arguments to user program
[info] - handles arguments to user program with name collision
[info] - handles YARN cluster mode
[info] - handles YARN client mode
[info] - handles standalone cluster mode
[info] - handles standalone client mode
[info] - handles mesos client mode
[info] - handles confs with flag equivalents
[info] - launch simple application with spark-submit *** FAILED ***
[info] org.apache.spark.SparkException: Process List(./bin/spark-submit, --class, org.apache.spark.deploy.SimpleApplicationTest, --name, testApp, --master, local, file:/tmp/1408854098404-0/testJar-1408854098404.jar) exited with code 1
[info] at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:872)
[info] at org.apache.spark.deploy.SparkSubmitSuite.runSparkSubmit(SparkSubmitSuite.scala:311)
[info] at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply$mcV$sp(SparkSubmitSuite.scala:291)
[info] at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply(SparkSubmitSuite.scala:284)
[info] at org.apacSpark assembly has been built with Hive, including Datanucleus jars on classpath
this PR output the process error info when failed, it can be helpful for diagnosis.