Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q() segfaults on Windows & Linux with multiprocess scheduler with clustermq 0.9.0 #308

Closed
luwidmer opened this issue Sep 27, 2023 · 8 comments

Comments

@luwidmer
Copy link

luwidmer commented Sep 27, 2023

I also get segfaults with the following even simpler Q() example (simpler than #306), both on Linux and Windows:

options(clustermq.scheduler = "multiprocess")
library(clustermq)
fun <- function(x) {x}

fun(1)
Q(fun = fun, x = 1:1000, n_jobs = 2)
Q(fun = fun, x = 1:1000, n_jobs = 2)

On Linux with R 4.1.0, this results in

Starting 2 processes ...
Running 1,000 calculations (5 objs/19.3 Kb common; 1 calls/chunk) ...
[===================================================>] 100% (2/2 wrk) eta:  0sAssertion failed: check () (src/msg.cpp:387)
Aborted

On Windows with R 4.3.0 this results in the same error as for @wlandau's example in #306:

Starting 2 processes ...
Running 1,000 calculations (5 objs/19.3 Kb common; 1 calls/chunk) ...
[===================================================>] 100% (2/2 wrk) eta:  0sAssertion failed: check () (../zeromq-4.3.4/src/msg.cpp:414)
@luwidmer
Copy link
Author

luwidmer commented Sep 27, 2023

Seems to be related to shutdown of workers given that it always happens after all jobs have completed. Also, can provoke this with LSF (but it is harder to reproduce, needs Q() in a loop)

@luwidmer
Copy link
Author

luwidmer commented Oct 4, 2023

@mschubert are you able to reproduce this as well?

@mschubert
Copy link
Owner

Yes: I can (occasionally) reproduce, and I'll try to track it down as soon as possible.

I'm also happy to report that I've got internet again at the place I moved to 😅

@mschubert
Copy link
Owner

mschubert commented Oct 9, 2023

@luwidmer Can you check if it still occurs with the current git version?

remotes::install_github("mschubert/clustermq@master")

@luwidmer
Copy link
Author

luwidmer commented Oct 9, 2023

Starting 2 processes ...
Running 1,000 calculations (5 objs/19.3 Kb common; 1 calls/chunk) ...
[===================================================>] 100% (2/2 wrk) eta:  0sAssertion failed: check () (../zeromq-4.3.4/src/msg.cpp:414)

Unfortunately yes (I modified the version number in DESCRIPTION to be 0.9.0.12345 and that version indeed got loaded)

@mschubert
Copy link
Owner

I fixed another bug in 5612364, which may be the cause of this crash as well. Can you confirm if this now works? (same git install command as above)

@luwidmer
Copy link
Author

luwidmer commented Oct 10, 2023

I just really tried to provoke it with 1000s of Q() calls, that seems to have done it, superb @mschubert ! Might make sense to push this as 0.9.1 if no other big issues pop up?

@mschubert
Copy link
Owner

mschubert commented Oct 10, 2023

Great, thanks!

Yes, plan is to push 0.9.1 within the next few days, there are still some other issues to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants