-
-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need a solution for flaky pypi fetches under travis-ci #8
Comments
Does it help to use This makes it sounds like it might not work anymore, but points to other possible solutions: |
Travis now uses caching. |
kwlzn
added a commit
that referenced
this issue
Mar 31, 2017
…n test. (#4407) ### Problem Currently, on Linux the first thin client call to the daemon can deadlock just after the pantsd->fork->pantsd-runner workflow. Connecting to the process with `gdb` reveals a deadlock in the following stack in the `post_fork` `drop` of the `CpuPool`: ``` #0 0x00007f63f04c31bd in __lll_lock_wait () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007f63f04c0ded in pthread_cond_signal@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 No symbol table info available. #2 0x00007f63d3cfa438 in notify_one () at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys/unix/condvar.rs:52 No locals. #3 notify_one () at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys_common/condvar.rs:39 No locals. #4 notify_one () at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sync/condvar.rs:208 No locals. #5 std::thread::{{impl}}::unpark () at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/thread/mod.rs:633 No locals. #6 0x00007f63d3c583d1 in crossbeam::sync::ms_queue::{{impl}}::push<futures_cpupool::Message> (self=<optimized out>, t=...) at /home/kwilson/.cache/pants/rust-toolchain/registry/src/jackfan.us.kg-1ecc6299db9ec823/crossbeam-0.2.10/src/sync/ms_queue.rs:178 guard = <optimized out> self = <optimized out> #7 0x00007f63d3c588ed in futures_cpupool::{{impl}}::drop (self=<optimized out>) at /home/kwilson/.cache/pants/rust-toolchain/git/checkouts/futures-rs-a4f11d094efefb0a/f7e6bc8/futures-cpupool/src/lib.rs:236 self = 0x37547a0 #8 0x00007f63d3be871c in engine::fs::{{impl}}::post_fork (self=0x3754778) at /home/kwilson/dev/pants/src/rust/engine/src/fs.rs:355 self = 0x3754778 #9 0x00007f63d3be10e4 in engine::context::{{impl}}::post_fork (self=0x37545b0) at /home/kwilson/dev/pants/src/rust/engine/src/context.rs:93 self = 0x37545b0 #10 0x00007f63d3c0de5a in {{closure}} (scheduler=<optimized out>) at /home/kwilson/dev/pants/src/rust/engine/src/lib.rs:275 scheduler = 0x3740580 #11 with_scheduler<closure,()> (scheduler_ptr=<optimized out>, f=...) at /home/kwilson/dev/pants/src/rust/engine/src/lib.rs:584 scheduler = 0x3740580 scheduler_ptr = 0x3740580 #12 engine::scheduler_post_fork (scheduler_ptr=0x3740580) at /home/kwilson/dev/pants/src/rust/engine/src/lib.rs:274 scheduler_ptr = 0x3740580 #13 0x00007f63d3c1be8c in _cffi_f_scheduler_post_fork (self=<optimized out>, arg0=0x35798f0) at src/cffi/native_engine.c:2234 _save = 0x34a65a0 x0 = 0x3740580 datasize = <optimized out> #14 0x00007f63f07b5a62 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0 ``` This presents as a hang in the thin client, because the pailgun socket is left open in the pantsd-runner. ### Solution Add pre-fork hooks and tear down the `CpuPool` instances prior to forking and rebuilding them. ### Result Can no longer reproduce the hang.
lenucksi
pushed a commit
to lenucksi/pants
that referenced
this issue
Apr 25, 2017
…n test. (pantsbuild#4407) ### Problem Currently, on Linux the first thin client call to the daemon can deadlock just after the pantsd->fork->pantsd-runner workflow. Connecting to the process with `gdb` reveals a deadlock in the following stack in the `post_fork` `drop` of the `CpuPool`: ``` #0 0x00007f63f04c31bd in __lll_lock_wait () from /lib64/libpthread.so.0 No symbol table info available. pantsbuild#1 0x00007f63f04c0ded in pthread_cond_signal@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 No symbol table info available. pantsbuild#2 0x00007f63d3cfa438 in notify_one () at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys/unix/condvar.rs:52 No locals. pantsbuild#3 notify_one () at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys_common/condvar.rs:39 No locals. pantsbuild#4 notify_one () at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sync/condvar.rs:208 No locals. pantsbuild#5 std::thread::{{impl}}::unpark () at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/thread/mod.rs:633 No locals. pantsbuild#6 0x00007f63d3c583d1 in crossbeam::sync::ms_queue::{{impl}}::push<futures_cpupool::Message> (self=<optimized out>, t=...) at /home/kwilson/.cache/pants/rust-toolchain/registry/src/jackfan.us.kg-1ecc6299db9ec823/crossbeam-0.2.10/src/sync/ms_queue.rs:178 guard = <optimized out> self = <optimized out> pantsbuild#7 0x00007f63d3c588ed in futures_cpupool::{{impl}}::drop (self=<optimized out>) at /home/kwilson/.cache/pants/rust-toolchain/git/checkouts/futures-rs-a4f11d094efefb0a/f7e6bc8/futures-cpupool/src/lib.rs:236 self = 0x37547a0 pantsbuild#8 0x00007f63d3be871c in engine::fs::{{impl}}::post_fork (self=0x3754778) at /home/kwilson/dev/pants/src/rust/engine/src/fs.rs:355 self = 0x3754778 pantsbuild#9 0x00007f63d3be10e4 in engine::context::{{impl}}::post_fork (self=0x37545b0) at /home/kwilson/dev/pants/src/rust/engine/src/context.rs:93 self = 0x37545b0 pantsbuild#10 0x00007f63d3c0de5a in {{closure}} (scheduler=<optimized out>) at /home/kwilson/dev/pants/src/rust/engine/src/lib.rs:275 scheduler = 0x3740580 pantsbuild#11 with_scheduler<closure,()> (scheduler_ptr=<optimized out>, f=...) at /home/kwilson/dev/pants/src/rust/engine/src/lib.rs:584 scheduler = 0x3740580 scheduler_ptr = 0x3740580 pantsbuild#12 engine::scheduler_post_fork (scheduler_ptr=0x3740580) at /home/kwilson/dev/pants/src/rust/engine/src/lib.rs:274 scheduler_ptr = 0x3740580 pantsbuild#13 0x00007f63d3c1be8c in _cffi_f_scheduler_post_fork (self=<optimized out>, arg0=0x35798f0) at src/cffi/native_engine.c:2234 _save = 0x34a65a0 x0 = 0x3740580 datasize = <optimized out> pantsbuild#14 0x00007f63f07b5a62 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0 ``` This presents as a hang in the thin client, because the pailgun socket is left open in the pantsd-runner. ### Solution Add pre-fork hooks and tear down the `CpuPool` instances prior to forking and rebuilding them. ### Result Can no longer reproduce the hang.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The incidence of sdist fetch failures on travis-ci is high enough that it disrupts reasonable efforts to keep the build green. Twitter uses an internal pypi mirror exclusively and Foursquare uses one in preference to pypi which it falls back to. Both companies have success using this strategy under ci. Consider doing something similar using the binary hosting options discussed here: https://rbcommons.com/s/twitter/r/22/
The text was updated successfully, but these errors were encountered: