Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel server sometimes hangs, causing test timeout flakes #2985

Closed
kchodorow opened this issue May 10, 2017 · 7 comments
Closed

Bazel server sometimes hangs, causing test timeout flakes #2985

kchodorow opened this issue May 10, 2017 · 7 comments
Assignees
Labels
P1 I'll work on this now. (Assignee required) type: bug

Comments

@kchodorow
Copy link
Contributor

kchodorow commented May 10, 2017

This is an umbrella bug for "vanilla" bazel invocations hanging and timing out on the CI. This seems to happen to integration tests that run Bazel a bunch of times.

Timed out a couple of times today:

@kchodorow kchodorow added category: extensibility > skylark P1 I'll work on this now. (Assignee required) type: bug labels May 10, 2017
@brandjon
Copy link
Member

I'm not able to reproduce locally. The test doesn't do much besides start up bazel, so it's hard to see how the test could be at fault as opposed to the infrastructure. I'll kick this back to you for now; let me know if there's a logical next step I should take to debug.

@brandjon brandjon assigned kchodorow and unassigned brandjon May 11, 2017
@kchodorow kchodorow changed the title //src/test/shell/integration:skylark_flag_test is flaky Bazel server sometimes hangs, causing test timeout flakes May 11, 2017
@kchodorow
Copy link
Contributor Author

bazelbuild/continuous-integration#68 was created because of this.

@kchodorow
Copy link
Contributor Author

@kchodorow kchodorow removed their assignment May 15, 2017
@haxorz
Copy link
Contributor

haxorz commented May 20, 2017

this sounds like an issue that i diagnosed internally.

3e5edaf made it so bazel waits for all processed spawned by sandboxed actions (e.g. genrules, tests) to terminate before it considers the action to be complete. e.g. for a test that leaves behind long-running processes, this can result in test timeouts.


according to http://ci.bazel.io/job/bazel-tests/BAZEL_VERSION=latest,PLATFORM_NAME=linux-x86_64/753/console, the timeout is in //src/test/shell/bazel:workspace_test

https://github.com/bazelbuild/bazel/blob/8970b43c42197378e93339584d27063d082a512d/src/test/shell/bazel/workspace_test.sh doesn't shutdown the blaze server at the end of each test case. bazel is hard-wired to have a default --max_idle_secs of 15 (

max_idle_secs = testing ? 15 : (3 * 3600);
) when it is running inside of a test. therefore, every shell integration test has an additional 15 second latency attached to it (because the outer blaze will have to wait 15 seconds extra). perhaps this is causing the ci.bazel timeouts you've been seeing.

the above is just a guess from the information in this issue. follow up me and/or @philwo for assistance with debugging.

@haxorz haxorz assigned haxorz and kchodorow and unassigned haxorz May 20, 2017
@philwo
Copy link
Member

philwo commented May 20, 2017

This should no longer happen since May 15, 2017 (more precisely c4f271d), as the suspected culprit was rolled back. Are we still seeing this on CI?

@laszlocsomor
Copy link
Contributor

@philwo :

Are we still seeing this on CI?

Maybe, #3072 could be a dupe, as @kchodorow noticed.

@kchodorow
Copy link
Contributor Author

Please stop assigning this to me, I'm not working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) type: bug
Projects
None yet
Development

No branches or pull requests

7 participants