Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI (Buildkite, code coverage): increase the value of JULIA_WORKER_TIMEOUT on the code coverage job #42193

Merged
merged 1 commit into from
Sep 10, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .buildkite/pipelines/scheduled/0_webui.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@ steps:
# verifies the treehash of the pipeline itself and the inputs listed in `inputs`
signed_pipelines:
- pipeline: .buildkite/pipelines/scheduled/coverage/coverage_linux64.yml
signature: "U2FsdGVkX1+lpFo/nKzx3c6xCZPKYTAuunXpOsZG4+s4+iU5LfEpMvtNvpKQjDugRoxQxCItMqB6vr4KZN3KtKhjkLbr8ExAyaPil/N/uFhrLlpwNem9dxHbPrU2l7qo"
signature: U2FsdGVkX1+FtqbbxyzoI/j0InDefRQ3OR06BAM2EWRhDG3SiwiPcOREudCTJ+1Z+AEVwVz5KTgw9lBVO1yjcWts3XePIy/W+arN4V+t97Dfuf4wsAr9ubpQ10GaoFnK
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,10 @@ steps:
git config --global init.defaultBranch master

echo "--- Run Julia tests in parallel with code coverage enabled"
./julia --code-coverage=all --sysimage-native-code=no .buildkite/pipelines/scheduled/coverage/run_tests_parallel.jl
export JULIA_NUM_THREADS=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export JULIA_NUM_THREADS=1
unset JULIA_NUM_THREADS

But why is this set at all? It is not a good base configuration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My memory is hazy, but if I recall correctly, we didn't use to have this set, and then some job started 256 Julia threads (because the underlying machine had 128 physical cores, 256 CPU threads) and clobbered the whole machine, so we set this variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also set JULIA_CPU_THREADS to 16. Maybe that is sufficient, and we can remove JULIA_NUM_THREADS from the default configuration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staticfloat will probably remember why we set export JULIA_NUM_THREADS=16 on all the Buildkite agents.

Maybe we should set the following configuration on all of the Buildkite agents, just to be safe:

export JULIA_CPU_THREADS=16
export JULIA_NUM_THREADS=1

export JULIA_WORKER_TIMEOUT=1200 # 1200 seconds = 20 minutes
./julia -e 'import Distributed; @info "" Distributed.worker_timeout()'
./julia .buildkite/pipelines/scheduled/coverage/run_tests_parallel.jl

echo "--- Process and upload coverage information"
./julia .buildkite/pipelines/scheduled/coverage/upload_coverage.jl
Expand Down
34 changes: 19 additions & 15 deletions .buildkite/pipelines/scheduled/coverage/run_tests_parallel.jl
Original file line number Diff line number Diff line change
@@ -1,25 +1,29 @@
# Important note: even if one or more tests fail, we will still exit with status code 0.

#
# The reason for this is that we always want to upload code coverage, even if some of the
# tests fail. Therefore, even if the `coverage_linux64` builder passes, you should not
# assume that all of the tests passed. If you want to know if all of the tests are passing,
# please look at the status of the `tester_*` builders (e.g. `tester_linux64`).

# When running this file, make sure to set all of the following command-line flags:
# 1. `--code-coverage=all`
# 2. `--sysimage-native-code=no`
const ncores = Sys.CPU_THREADS
@info "" Sys.CPU_THREADS
@info "" ncores

empty!(Base.DEPOT_PATH)
push!(Base.DEPOT_PATH, mktempdir(; cleanup = true))
script_native_yes = """
Base.runtests(["cmdlineargs"]; ncores = $(ncores))
"""
script_native_no = """
Base.runtests(["all", "--skip", "cmdlineargs"]; ncores = $(ncores))
"""

const tests = "all"
const ncores = Sys.CPU_THREADS
base_cmd = `$(Base.julia_cmd()) --code-coverage=all`
cmd_native_yes = `$(base_cmd) --sysimage-native-code=yes -e $(script_native_yes)`
cmd_native_no = `$(base_cmd) --sysimage-native-code=no -e $(script_native_no)`

@info "" Sys.CPU_THREADS
@info "" tests ncores
@info "Running command" cmd_native_yes
p1 = run(pipeline(cmd_native_yes; stdin, stdout, stderr); wait = false)
wait(p1)

try
Base.runtests(tests; ncores)
catch ex
@error "" exception=(ex, catch_backtrace())
end
@info "Running command" cmd_native_no
p2 = run(pipeline(cmd_native_no; stdin, stdout, stderr); wait = false)
wait(p2)