Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BuildBot woes #27287

Closed
KristofferC opened this issue May 28, 2018 · 19 comments
Closed

BuildBot woes #27287

KristofferC opened this issue May 28, 2018 · 19 comments
Labels
ci Continuous integration priority This should be addressed urgently

Comments

@KristofferC
Copy link
Member

KristofferC commented May 28, 2018

Build bots are kinda sad. This is an issue to consolidate the problems.

Mac

LibGit2

Error in testset LibGit2/libgit2:
Error During Test at /Users/osx/buildbot-tabularasa/worker/tester_osx64/build/share/julia/stdlib/v0.7/LibGit2/test/libgit2.jl:1949
  Got exception ErrorException("Process timed out possibly waiting for a response. Process output found:\n\"\"\"\n\r\nsignal (15): Terminated: 15\r\nin expression starting at no file:15\r\n\n\"\"\"") outside of a @test
  Process timed out possibly waiting for a response. Process output found:
  """
  
  signal (15): Terminated: 15
  in expression starting at no file:15
  
  """
  Stacktrace:
   [1] error(::String, ::String) at ./error.jl:42
   [2] (::getfield(Main.Test52Main_LibGit2_libgit2.LibGit2Tests, Symbol("##5#8")){Int64,Cmd,Array{Any,1},getfield(Main.Test52Main_LibGit2_libgit2.LibGit2Tests, Symbol("#format_output#7")){Bool},Base.GenericIOBuffer{Array{UInt8,1}}})(::RawFD, ::Base.TTY) at /Users/osx/buildbot-tabularasa/worker/tester_osx64/build/share/julia/stdlib/v0.7/LibGit2/test/libgit2.jl:93
   [3] with_fake_pty(::getfield(Main.Test52Main_LibGit2_libgit2.LibGit2Tests, Symbol("##5#8")){Int64,Cmd,Array{Any,1},getfield(Main.Test52Main_LibGit2_libgit2.LibGit2Tests, Symbol("#format_output#7")){Bool},Base.GenericIOBuffer{Array{UInt8,1}}}) at /Users/osx/buildbot-tabularasa/worker/tester_osx64/build/share/julia/test/TestHelpers.jl:37
   [4] #challenge_prompt#4 at /Users/osx/buildbot-tabularasa/worker/tester_osx64/build/share/julia/stdlib/v0.7/LibGit2/test/libgit2.jl:45 [inlined]
   [5] #challenge_prompt at ./<missing>:0 [inlined]
   [6] #challenge_prompt#1(::Int64, ::Bool, ::Function, ::Expr, ::Array{Any,1}) at /Users/osx/buildbot-tabularasa/worker/tester_osx64/build/share/julia/stdlib/v0.7/LibGit2/test/libgit2.jl:27
   [7] challenge_prompt at /Users/osx/buildbot-tabularasa/worker/tester_osx64/build/share/julia/stdlib/v0.7/LibGit2/test/libgit2.jl:14 [inlined]
   [8] (::getfield(Main.Test52Main_LibGit2_libgit2.LibGit2Tests, Symbol("##96#201")))() at /Users/osx/buildbot-tabularasa/worker/tester_osx64/build/share/julia/stdlib/v0.7/LibGit2/test/libgit2.jl:1981
   [9] withenv(::getfield(Main.Test52Main_LibGit2_libgit2.LibGit2Tests, Symbol("##96#201")), ::Pair{String,String}) at ./env.jl:148
   [10] macro expansion at /Users/osx/buildbot-tabularasa/worker/tester_osx64/build/share/julia/stdlib/v0.7/LibGit2/test/libgit2.jl:1980 [inlined]
   [11] macro expansion at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/Test/src/Test.jl:1079 [inlined]
   [12] (::getfield(Main.Test52Main_LibGit2_libgit2.LibGit2Tests, Symbol("##17#117")))(::String) at /Users/osx/buildbot-tabularasa/worker/tester_osx64/build/share/julia/stdlib/v0.7/LibGit2/test/libgit2.jl:1950
   [13] mktempdir(::getfield(Main.Test52Main_LibGit2_libgit2.LibGit2Tests, Symbol("##17#117")), ::String) at ./file.jl:449
   [14] mktempdir(::Function) at ./file.jl:447
   [15] top-level scope

Doesn't look the same as #27109. Seems to happen consistenly.
Last Mac nightly was 2 weeks ago(!). Need to have Mac builds working to make a release.

For example https://build.julialang.org/#/builders/87/builds/259/steps/2/logs/stdio

Kw args to kwfunc

ERROR: function runtests does not accept keyword arguments
Stacktrace:
 [1] kwfunc(::Any) at ./boot.jl:237
program finished with exit code 1
elapsedTime=3.231927

https://build.julialang.org/#/builders/87/builds/266

Win32

Errors from not finishing certain FileWatching tests in time

Error in testset FileWatching:
Test Failed at C:\cygwin\home\Administrator\buildbot-tabularasa\worker\tester_win32\build\share\julia\stdlib\v0.7\FileWatching\test\runtests.jl:185
  Expression: 0.001 <= #= C:\cygwin\home\Administrator\buildbot-tabularasa\worker\tester_win32\build\share\julia\stdlib\v0.7\FileWatching\test\runtests.jl:185 =# @elapsed(#= C:\cygwin\home\Administrator\buildbot-tabularasa\worker\tester_win32\build\share\julia\stdlib\v0.7\FileWatching\test\runtests.jl:185 =# @test(watch_folder(dir, 0.004) == ("" => FileWatching.FileEvent()))) <= 0.3
   Evaluated: 0.001 <= 0.317806314 <= 0.3
Error in testset FileWatching:
Test Failed at C:\cygwin\home\Administrator\buildbot-tabularasa\worker\tester_win32\build\share\julia\stdlib\v0.7\FileWatching\test\runtests.jl:388
  Expression: #= C:\cygwin\home\Administrator\buildbot-tabularasa\worker\tester_win32\build\share\julia\stdlib\v0.7\FileWatching\test\runtests.jl:388 =# @elapsed(c = watch_folder(dir, timeout)) < 0.3
   Evaluated: 0.325074556 < 0.3

Example log https://build.julialang.org/#/builders/91/builds/246

Hangs when testing.

Distributed                   (7) |   384.71 |   0.14 |  0.0 |      16.35 |  1102.57
LinearAlgebra/triangular      (2) |  1003.11 |  79.63 |  7.9 |   24813.37 |  1207.46
command timed out: 1200 seconds without output running ['bin/julia.exe', '-e', 'Base.runtests(["all"]; ncores=min(Sys.CPU_CORES, 8))'], attempting to kill
process killed by signal 9

Example log https://build.julialang.org/#/builders/91/builds/250

Win 64

Distributed tests

Error in testset Distributed:
Error During Test at C:\cygwin\home\Administrator\buildbot-tabularasa\worker\tester_win64\build\share\julia\test\testdefs.jl:19
  Got exception LoadError("C:\\cygwin\\home\\Administrator\\buildbot-tabularasa\\worker\\tester_win64\\build\\share\\julia\\stdlib\\v0.7\\Distributed\\test\\runtests.jl", 10, ErrorException("Distributed test failed, cmd : `'C:\\cygwin\\home\\Administrator\\buildbot-tabularasa\\worker\\tester_win64\\build\\bin\\julia.exe' --check-bounds=yes --startup-file=no --depwarn=error 'C:\\cygwin\\home\\Administrator\\buildbot-tabularasa\\worker\\tester_win64\\build\\share\\julia\\stdlib\\v0.7\\Distributed\\test\\distributed_exec.jl'`")) outside of a @test
  LoadError: Distributed test failed, cmd : `'C:\cygwin\home\Administrator\buildbot-tabularasa\worker\tester_win64\build\bin\julia.exe' --check-bounds=yes --startup-file=no --depwarn=error 'C:\cygwin\home\Administrator\buildbot-tabularasa\worker\tester_win64\build\share\julia\stdlib\v0.7\Distributed\test\distributed_exec.jl'`
  Stacktrace:
   [1] error at .\error.jl:33 [inlined]
   [2] macro expansion at C:\cygwin\home\Administrator\buildbot-tabularasa\worker\tester_win64\build\share\julia\stdlib\v0.7\Distributed\test\runtests.jl:11 [inlined]
   [3] top-level scope at .\<missing>:0

Example log: https://build.julialang.org/#/builders/88/builds/218/steps/2/logs/stdio

Cleaning out step:

rm: cannot remove 'share/julia/test': Device or resource busy

Example log: https://build.julialang.org/#/builders/88/builds/215

AArch

Kw args to kwfunc

Same as the mac error

@KristofferC KristofferC added priority This should be addressed urgently system:mac Affects only macOS ci Continuous integration labels May 28, 2018
@ViralBShah
Copy link
Member

Perhaps an optimizer/llvm issue? This line seems to suggest:

  Got exception ErrorException("Process timed out possibly waiting for a response. Process output found:\n\"\"\"\n\r\nsignal (15): Terminated: 15\r\nin expression starting at no file:17\r\n_ZN12_GLOBAL__N_115LiveDebugValues8transferERN4llvm12MachineInstrERNS0_13OpenRangesSetERNS1_13SmallDenseMapIPKNS1_17MachineBasicBlockENS1_15SparseBitVectorILj128EEELj4ENS1_12DenseMapInfoIS9_EENS1_6detail12DenseMapPairIS9_SB_EEEERNS1_12UniqueVectorINS0_6VarLocEEERNS1_11SmallVectorINS0_14SpillDebugPairELj4EEEb at /Users/osx/buildbot-tabularasa/worker/tester_osx64/build/lib/julia//libLLVM.dylib (unknown line)\r\n_ZN12_GLOBAL__N_115LiveDebugValues20runOnMachineFunctionERN4llvm15MachineFunctionE at /Users/osx/buildbot-tabularasa/worker/tester_osx64/build/lib/julia//libLLVM.dylib (unknown line)\r\n_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE at /Users/osx/buildbot-tabularasa/worker/tester_osx64/build/lib/julia//libLLVM.dylib (unknown line)\r\n_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /Users/osx/buildbot-tabularasa/worker/tester_osx64/build/lib/julia//libLLVM.dylib (unknown line)\r\n_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /Users/osx/buildbot-tabularasa/worker/tester_osx64/build/lib/julia//libLLVM.dylib (unknown line)\r\n_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /Users/osx/buildbot-tabularasa/worker/tester_osx64/build/lib/julia//libLLVM.dylib (unknown line)\r\noperator() at /Users/osx/buildbot/slave/package_osx64/build/src/jitlayers.cpp:485\r\naddModule at /Users/osx/buildbot/slave/package_osx64/build/usr/include/llvm/ExecutionEngine/Orc/IRCompileLayer.h:57\r\naddModule at /Users/osx/buildbot/slave/package_osx64/build/src/jitlayers.cpp:612\r\njl_add_to_ee at /Users/osx/buildbot/slave/package_osx64/build/src/jitlayers.cpp:850 [inlined]\r\njl_finalize_function at /Users/osx/buildbot/slave/package_osx64/build/src/jitlayers.cpp:858\r\ngetAddressForFunction at /Users/osx/buildbot/slave/package_osx64/build/src/codegen.cpp:1321\r\njl_generate_fptr at /Users/osx/buildbot/slave/package_osx64/build/src/codegen.cpp:1432\r\njl_fptr_trampoline at /Users/osx/buildbot/slave/package_osx64/build/src/gf.c:1822\r\n#3 at ./none:19\r\n#open#304 at ./iostream.jl:369\r\njl_fptr_trampoline at /Users/osx/buildbot/slave/package_osx64/build/src/gf.c:1823\r\nopen at ./iostream.jl:367\r\njl_fptr_trampoline at /Users/osx/buildbot/slave/package_osx64/build/src/gf.c:1823\r\ndo_call at /Users/osx/buildbot/slave/package_osx64/build/src/interpreter.c:324\r\neval_body at /Users/osx/buildbot/slave/package_osx64/build/src/interpreter.c:559\r\njl_interpret_toplevel_thunk_callback at /Users/osx/buildbot/slave/package_osx64/build/src/interpreter.c:798\r\nunknown function (ip: 0xfffffffffffffffe)\r\nunknown function (ip: 0x113b1fe4f)\r\nunknown function (ip: 0x1b)\r\njl_interpret_toplevel_thunk at /Users/osx/buildbot/slave/package_osx64/build/src/interpreter.c:807\r\njl_toplevel_eval_flex at /Users/osx/buildbot/slave/package_osx64/build/src/toplevel.c:856\r\njl_toplevel_eval_flex at /Users/osx/buildbot/slave/package_osx64/build/src/toplevel.c:802\r\njl_toplevel_eval_in at /Users/osx/buildbot/slave/package_osx64/build/src/builtins.c:631\r\neval at ./boot.jl:316\r\nexec_options at ./client.jl:241\r\n_start at ./client.jl:424\r\ntrue_main at /Users/osx/buildbot-tabularasa/worker/tester_osx64/build/bin/julia (unknown line)\r\nmain at /Users/osx/buildbot-tabularasa/worker/tester_osx64/build/bin/julia (unknown line)\r\nunknown function (ip: 0xffffffffffffffff)\r\nAllocations: 5037973 (Pool: 5036986; Big: 987); GC: 10\r\n\n\"\"\"") outside of a @test

@KristofferC
Copy link
Member Author

The earliest occurrence I could find was https://build.julialang.org/#/builders/87/builds/152 which was 18 days ago.

@Keno
Copy link
Member

Keno commented May 28, 2018

Perhaps an optimizer/llvm issue? This line seems to suggest:

While it may easily be an optimizer issue, I don't think that line is necessarily suggestive since it'd just print the backtrace at whatever place it was killed, which for compilation heavy work like the test suite could easily be LLVM.

@ViralBShah
Copy link
Member

Can we at least increase the timeouts on the buildbots for the time being in case it will help? Is it @staticfloat who has to do that?

@staticfloat
Copy link
Member

Can we at least increase the timeouts on the buildbots for the time being in case it will help?

I don't think that's the issue here; the log that Kristoffer posted shows julia worker processes hanging on LibGit2 tests, because we've triggered some kind of interactive prompt for credentials.

@KristofferC
Copy link
Member Author

FWIW, the last 3 mac builds passed so the nightly is updated... Not sure if anything is really fixed but hey.

@KristofferC KristofferC changed the title Mac build bot fails so mac nightlies are old BuildBot woes May 29, 2018
@KristofferC
Copy link
Member Author

Updated this issue to be in general about build bot problems because the other arhces also have issues.

@KristofferC KristofferC removed the system:mac Affects only macOS label May 29, 2018
@omus
Copy link
Member

omus commented May 30, 2018

If the build bots are under heavy load it's possible that 10 second default used in challenge_prompt isn't enough time to spawn a new Julia process and execute the test code. We could bump up that default timeout.

@Keno
Copy link
Member

Keno commented May 30, 2018

10 seconds is a little small. Let's bump it to 60. The purpose of the timeouts is to prevent the test suite from hanging. For that purpose, whether it's 10 or 60 doesn't matter much.

@staticfloat
Copy link
Member

Yes, let’s bump it up to 30 seconds or so. The buildbots are often under heavy load. ;)

@KristofferC
Copy link
Member Author

How about the FileWatching tests that also are too slow, e.g https://build.julialang.org/#/builders/91/builds/246.

@omus
Copy link
Member

omus commented May 30, 2018

I'll make a PR to bump up the timeout.

@Keno
Copy link
Member

Keno commented May 30, 2018

I've seen the FileWatching one locally on Windows actually on a fairly beefy machine, so there may be other problems there.

@mbauman
Copy link
Member

mbauman commented May 30, 2018

Is it possible that the function runtests does not accept keyword arguments is a race condition between defining and executing the function? I'm repeatedly launching the tests in the manner the buildbots do, but haven't succeeded in reproducing the failure yet.

@KristofferC
Copy link
Member Author

KristofferC commented Jun 4, 2018

Update:

  • Windows builds consistently time out after the distributed tests.
  • Mac still fails quite often on the LibGit2 tests
  • FreeBSD fails with
    WARNING: Error during initialization of module PCRE:
    ErrorException("could not load library "libpcre2-8"
    Shared object "libpcre2-8.so" not found, required by "julia"")
    

@KristofferC
Copy link
Member Author

Not sure this needs to be opened anymore.

@jekbradbury
Copy link
Contributor

If I’m not mistaken the macOS nightly is still a little old (10 days)?

@KristofferC
Copy link
Member Author

Yes, it's been compiling openblas for 9 days https://build.julialang.org/#/builders/1/builds/1540...

I've been trying to stop it but it doesn't listen.

@KristofferC
Copy link
Member Author

macOS nightly should be updated now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous integration priority This should be addressed urgently
Projects
None yet
Development

No branches or pull requests

7 participants