-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3809][SQL]fix HiveThriftServer2Suite to make it work correctly #2671
Conversation
Can one of the admins verify this patch? |
cc @liancheng |
ok to test |
QA tests have started for PR 2671 at commit
|
@scwf Thanks, this is a good catch. However, I should mention that HiveThriftServer2Suite is known to be flaky before the Thrift server is made a daemon. I had opened #2214 to try to fix this issue, but unfortunately Jenkins fails because of unknown reason that I couldn't reproduce locally. After numerous unsuccessful tries, I haven't got time to get it done yet. Sorry for the trouble... The essential issue fixed in #2214 is that the exception caught in HiveThriftServer2Suite is not re-thrown in the Back to this PR, I have several comments:
Since the Jenkins failure issue in #2214 is really tricky to fix and without fixing that, we couldn't make any change to |
Thread.sleep(5000) | ||
// logFile may have not finished, try every second | ||
while (!logFile.exists() || (!fileToString(logFile).contains( | ||
"ThriftBinaryCLIService listening on") && tryNum < maxTries)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tryNum
is never increased.
QA tests have finished for PR 2671 at commit
|
Hi, @liancheng, thanks for you comments that are very useful. |
BTW, Jenkins passes because the exception re-thrown issue is not fixed in your PR :) You may check the full console output for sure https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/271/consoleFull And mine still suffers the mysterious timeout. Keep digging... |
The key point to use |
Get it, i will check this |
It is very strange in my local maching it is ok. Hi @liancheng, can you get the log file in "stdout> starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to /home/jenkins/workspace/NewSparkPullRequestBuilder/sbin/../logs/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-test02.amplab.out"? |
When trying #2214 several weeks ago, the Thrift server process simply couldn't start on Jenkins server, but everything's fine on my local machine. However, the pull request builder had been refactored a lot by Josh. It seems that #2675 fails simply because my timeout was too tight for Jenkins. I'm trying to relax the timeout a bit. |
Ok, since now the process output redirect to log file, we can also check with it to see where is the problem |
@liancheng, can you retest this, i add some print to diagnose this |
Actually #2214 fixes several issues, you need all of them to make HiveThriftServer2Suite healthy. Please refer to the PR description for details. |
QA tests have started for PR 2671 at commit
|
QA tests have finished for PR 2671 at commit
|
we will fix it in #2675 |
Currently HiveThriftServer2Suite is a fake one, actually HiveThriftServer not even started there. Issues here:
1 Thriftserver not started. Testing will get this error ---
ERROR HiveThriftServer2Suite: Failed to start Hive Thrift server within 30 seconds
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
2 Thriftserver not stoped. After test finished the process of thriftserver did not exit.
This patch fix this problems as follows:
1 Since thriftserver started as a daemon in #2509, output of the
start-thriftserver.sh
has be redirect to a log file such asspark-kf-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-kf.out
, so to see whether this file contain "ThriftBinaryCLIService listening on" ` to assert server started successfully2 Start server in
beforeAll
3 stop server in
afterAll