-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix ExecuteProcess return value and disable sigchld handler #2
Conversation
Codecov Report
@@ Coverage Diff @@
## mh/exec #2 +/- ##
==========================================
Coverage ? 78.04%
==========================================
Files ? 48
Lines ? 3690
Branches ? 0
==========================================
Hits ? 2880
Misses ? 810
Partials ? 0 |
That crash in nightly is new, I re-ran the tests in the oscar-system repo on my branch and now it also crashes like this, but the previous attempt from yesterday was better, as only the GAP tests crashed instead of the julia tests: https://github.com/oscar-system/GAP.jl/actions/runs/2162169485/attempts/1 |
The diffs here were difficult to read as they brought in unrelated changes. I've now rebased the The crashes in nightly may or may not be related to oscar-system#801 but in any case they may be gone (for now) in GAP.jl master, simply because we are right now not using GAP_pkg_juliainterface_jll but rather compile that code from scratch (this will be the case automatically until It might not be enough, though, in the end it could be necessary to rebuild |
mostly for testing for now
I rebased the branch, sorry about the unrelated changes, I needed some updates to make it work at all with nightly but wanted to avoid the latest changes which were incompatible with the Oscar master. At least on ubuntu the julia-nightly crash has disappeared (https://github.com/oscar-system/GAP.jl/runs/6023028755?check_suite_focus=true ) and all tests were successful, macos is queued. |
Tests here are fine, but the latest julia nightly did indeed break existing binaries, e.g. CxxWrap fails to precompile when re-running the Polymake.jl tests: https://github.com/oscar-system/Polymake.jl/runs/6023944340?check_suite_focus=true#step:6:298 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Esp. for my stupid res
vs. res.status
bug sigh.
But unfortunately it is more complicated than that: First off, we indeed perform no tests for ExecuteProcess
at all, some should be added to make sure it works.
But more seriously, ExecuteProcess
is only one of two (resp. three, depending on what you count) GAP mechanisms for loading subprocesses, and it is not even the one that needs the signal handler. The other approach are the so-called "iostreams": where ExecuteProcess
is about starting a subprocess and waiting until it terminates, "iostreams" are more advanced and allow communicating with a child process running in the background. These also need to be replaced, in the same way as ExecuteProcess
. And that then of course also needs to be tested... sigh
(The third way is via a GAP package called "io", which among other things has a ton of low-level wrappers for all kinds of POSIX APIs ... including fork
and execve
So a possible plan would be:
- add some tests for all this functionality
- once those are in place, start to replace it (the parts in GAP itself, at least) gradually
- finally rebuild GAP_jll with the builtin-features
- (and perhaps eventually also patch the GAP package
io
, though it is unclear to me how that should reasonably work; maybe more realistic is to adjust packages using it to use slightly higher level APIs instead... sigh)
@@ -96,6 +96,13 @@ function initialize(argv::Vector{String}) | |||
# TODO: turn this into a proper libgap API | |||
unsafe_store!(cglobal((:SyLoadSystemInitFile, libgap), Int64), 0) | |||
|
|||
# remember sigchld handler | |||
if Sys.isbsd() || Sys.islinux() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GAP.jl only runs on POSIX anyway, so this check shouldn't be necessary:-)
if Sys.isbsd() || Sys.islinux() | ||
sigchld = Sys.isbsd() ? 20 : 17 | ||
# nullptr == SIG_DFL | ||
prev_sigchld_hdl = ccall(:signal, Ptr{Cvoid}, (Cint, Ptr{Cvoid}), sigchld, Ptr{Cvoid}(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there actually a handler set? Or does it return NULL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually it will be NULL
, but there might be other software as well that sets a sigchld handler so to be safe I wanted to reinstate whatever was set before GAP replaces it instead of the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sure, I agree this is the right way to do it. I was just wondering
Oh, there are a bunch of GAP packages relying on each of these mechanisms. So one way to "test" things is to try out stuff in those packages. Though I think a few hand written dedicated tests on the long run will be more helpful |
The package installation tests do use the |
A different approach would be to just set the sigchld handler directly before creating a subprocess and restoring it when it is complete, this is what julia does on linux and what it did on macos before the change to kevent. (And also what we do in polymake now.) But maybe we can first try if just removing the |
Ahhh right, I forgot that of course Note that the Getting rid of the waitpid(-1) loop sounds like a good idea anyway (also when GAP is not built with Julia), though I hope @ChrisJefferson will chime in just because he worked on that code extensively a couple years ago and might remember what the reason for it was, if any Finally replacing the GAP iostream code by a Julia implementation shouldn't be that hard in principle (I expect that the most annoying bit will be handling errors, converting between Julia and GAP errors... we'll see) |
This seems to work with julia master on macos.
Note: this PR targets the branch from oscar-system#726 and I merged some parts of master.