-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix #12853: kill detached compilation process on an exception #12855
Conversation
Seems to have some Travis problems... restarting Travis to see if that helps. |
Weird, I'm getting |
f464b84
to
f6ee37f
Compare
I'm getting |
That's the typical output for OOM error. |
But why would I be having an OOM here? Grrr. |
Because we add too many tests recently? |
@yuyichao, but then how do I fix it? |
@stevengj Take away my commit access? |
@kshyatt, I think we'd have to do that retroactively... |
More seriously, it might be worth running the test suite through the allocation profiler and see where we're using a lot of memory. |
Hint, focusing on llvm as a start. In my original issue about Travis OOM @amitmurthy brought up the possibility to more aggressively restart worker processes. I'm not really familiar with the parallel part but it might worth doing. |
In the short term, how are people getting PRs through? Should I just keep force-pushing and hope that it randomly works? |
I sign in to Travis and restart the build. |
Restarting the build from Travis yields |
f6ee37f
to
168460b
Compare
Trying |
@tkelman Is the fact that travis doesn't assign a new build number if you restart a build conflict with some logic in the travis fail fast script? |
Well, the good news is that the OOM killer is predictable. The bad news is that it is predictably (3 times in a row) killing (only) the x86_64-Linux build of this branch. |
Just pushed a commit to this branch that reduces the number of worker processes in |
Oh, that won't help, the |
48e1449
to
6090ce5
Compare
6090ce5
to
d148990
Compare
Okay, reducing the number of workers fixed it, so it seems to definitely be the old OOM error. Can we merge this, and increase the number of Travis cores later when a better solution to the OOM problem is found? |
Going to merge this later today if there are no objections. |
fix #12853: kill detached compilation process on an exception
Thank you all for fixing this issue and working through the Travis failures! |
@ScottPJones I believe a rebase (probably just |
OK, thanks! |
It's not really fixed, just worked around (temporarily, until our tests get so memory-hungry that even two processes is not enough) at the cost of increased Travis latency. Please take further discussions to #11553. |
This fixes #12853 by the simple expedient of wrapping the compilation in a
try
block and making sure that things are cleaned up on exit (julia subprocess is killed and file is closed).