-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Receiving data after shutting down workers results in a segfault #306
Comments
@mschubert, you had suggested in #303 (comment) that I post a new issue to follow up on specific problems using #303, so I hope this helps. |
I just tried @wlandau's example (I added a message() before the send shutdown for clarity) on Windows, and it deterministically dies with an assertion failure:
This is on R 4.3.0, see sessionInfo()
|
You seem to have a bug in your example code, where you keep trying to receive data from workers after they are all shut down (loop goes to 100, tasks go to 10). Minimal code to reproduce the same behavior: options(clustermq.scheduler = "multiprocess")
library(clustermq)
w <- workers(1L, log_worker = TRUE)
w$recv()
w$send_shutdown()
w$recv() # invalid vector index However, this should throw an error in R, not crash the session. |
Hmm... I tried to fix the original example to avoid calling a shutdown too many times: options(clustermq.scheduler = "multiprocess")
library(clustermq)
w <- workers(2L, log_worker = TRUE)
active <- 2L
queue <- seq_len(10L)
running <- integer(0L)
done <- integer(0L)
while (length(done) < 100L) {
result <- w$recv()
if (!is.null(result)) {
message("done task ", result)
done <- c(done, result)
running <- setdiff(running, result)
}
if (length(running) < 2L && length(queue) > 0L) {
next_task <- queue[1L]
message("send task ", next_task)
queue <- queue[-1L]
running <- c(running, next_task)
w$send(cmd = index, index = next_task)
} else if (length(queue) > 0L) {
w$send_wait()
} else if (active > 0L) {
w$send_shutdown()
active <- active - 1L
}
} It hung for several minutes without printing any messages to the R console. The log files show:
and
I used the CRAN version because I could not compile the development version. remotes::install_github("mschubert/clustermq")
Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo mschubert/clustermq@HEAD
'/usr/bin/git' clone --depth 1 --no-hardlinks --recurse-submodules https://github.com/zeromq/libzmq.git /var/folders/4v/vh7xp8553lsbl49svl48g7p00000gp/T//RtmpGZmLLp/remotes16e914d76c6fb/mschubert-clustermq-ed2bf6e/src/libzmq
Cloning into '/var/folders/4v/vh7xp8553lsbl49svl48g7p00000gp/T//RtmpGZmLLp/remotes16e914d76c6fb/mschubert-clustermq-ed2bf6e/src/libzmq'...
'/usr/bin/git' clone --depth 1 --no-hardlinks --recurse-submodules https://github.com/zeromq/cppzmq.git /var/folders/4v/vh7xp8553lsbl49svl48g7p00000gp/T//RtmpGZmLLp/remotes16e914d76c6fb/mschubert-clustermq-ed2bf6e/src/cppzmq
Cloning into '/var/folders/4v/vh7xp8553lsbl49svl48g7p00000gp/T//RtmpGZmLLp/remotes16e914d76c6fb/mschubert-clustermq-ed2bf6e/src/cppzmq'...
── R CMD build ──────────────────────────────────────────────────────────────────────────
✔ checking for file ‘/private/var/folders/4v/vh7xp8553lsbl49svl48g7p00000gp/T/RtmpGZmLLp/remotes16e914d76c6fb/mschubert-clustermq-ed2bf6e/DESCRIPTION’ ...
─ preparing ‘clustermq’: (1.1s)
✔ checking DESCRIPTION meta-information ...
─ cleaning src
─ running ‘cleanup’
─ checking for LF line-endings in source and make files and shell scripts (1.1s)
─ checking for empty or unneeded directories (2s)
Removed empty directory ‘clustermq/src/libzmq/build_qnx/nto/aarch64/le’
Removed empty directory ‘clustermq/src/libzmq/build_qnx/nto/aarch64’
Removed empty directory ‘clustermq/src/libzmq/build_qnx/nto/x86_64/o’
Removed empty directory ‘clustermq/src/libzmq/build_qnx/nto/x86_64’
Removed empty directory ‘clustermq/src/libzmq/build_qnx/nto’
Removed empty directory ‘clustermq/src/libzmq/builds/openwrt’
─ building ‘clustermq_0.9.0.tar.gz’
* installing *source* package ‘clustermq’ ...
** using staged installation
* no system libzmq found -> using bundled libzmq
autoreconf: export WARNINGS=
autoreconf: Entering directory '.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal -I config --force -I config
m4:configure.ac:9: ERROR: end of file in string
autom4te: error: /opt/homebrew/opt/m4/bin/m4 failed with exit status: 1
aclocal: error: /opt/homebrew/Cellar/autoconf/2.71/bin/autom4te failed with exit status: 1
autoreconf: error: aclocal failed with exit status: 1
autogen.sh: error: autoreconf exited with status 1
./configure: line 61: die: command not found
./configure: line 64: ./configure: No such file or directory
make: *** No targets specified and no makefile found. Stop.
ERROR: configuration failed for package ‘clustermq’
* removing ‘/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/clustermq’
* restoring previous ‘/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/clustermq’
Warning messages:
1: In utils::install.packages(pkgs = pkgs, lib = lib, repos = myrepos, :
installation of package ‘/var/folders/4v/vh7xp8553lsbl49svl48g7p00000gp/T//RtmpGZmLLp/file16e913101b078/clustermq_0.9.0.tar.gz’ had non-zero exit status
2: In utils::install.packages(pkgs = pkgs, lib = lib, repos = myrepos, :
installation of package ‘/var/folders/4v/vh7xp8553lsbl49svl48g7p00000gp/T//RtmpGZmLLp/file16e913101b078/clustermq_0.9.0.tar.gz’ had non-zero exit status |
Thanks for working on this. I got a similar compilation error: > remotes::install_github("mschubert/clustermq", ref = "5612364c52f17ba98b241a3f1f7e067c02bad3fe")
Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo mschubert/clustermq@5612364c52f17ba98b241a3f1f7e067c02bad3fe
'/usr/bin/git' clone --depth 1 --no-hardlinks --recurse-submodules https://github.com/zeromq/libzmq.git /var/folders/4v/vh7xp8553lsbl49svl48g7p00000gp/T//Rtmp0lFXsv/remotesdea51cd77bdf/mschubert-clustermq-5612364/src/libzmq
Cloning into '/var/folders/4v/vh7xp8553lsbl49svl48g7p00000gp/T//Rtmp0lFXsv/remotesdea51cd77bdf/mschubert-clustermq-5612364/src/libzmq'...
'/usr/bin/git' clone --depth 1 --no-hardlinks --recurse-submodules https://github.com/zeromq/cppzmq.git /var/folders/4v/vh7xp8553lsbl49svl48g7p00000gp/T//Rtmp0lFXsv/remotesdea51cd77bdf/mschubert-clustermq-5612364/src/cppzmq
Cloning into '/var/folders/4v/vh7xp8553lsbl49svl48g7p00000gp/T//Rtmp0lFXsv/remotesdea51cd77bdf/mschubert-clustermq-5612364/src/cppzmq'...
── R CMD build ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔ checking for file ‘/private/var/folders/4v/vh7xp8553lsbl49svl48g7p00000gp/T/Rtmp0lFXsv/remotesdea51cd77bdf/mschubert-clustermq-5612364/DESCRIPTION’ ...
─ preparing ‘clustermq’: (1.6s)
✔ checking DESCRIPTION meta-information
─ cleaning src
─ running ‘cleanup’
─ checking for LF line-endings in source and make files and shell scripts (549ms)
─ checking for empty or unneeded directories (2.1s)
Removed empty directory ‘clustermq/src/libzmq/build_qnx/nto/aarch64/le’
Removed empty directory ‘clustermq/src/libzmq/build_qnx/nto/aarch64’
Removed empty directory ‘clustermq/src/libzmq/build_qnx/nto/x86_64/o’
Removed empty directory ‘clustermq/src/libzmq/build_qnx/nto/x86_64’
Removed empty directory ‘clustermq/src/libzmq/build_qnx/nto’
Removed empty directory ‘clustermq/src/libzmq/builds/openwrt’
─ building ‘clustermq_0.9.0.tar.gz’
* installing *source* package ‘clustermq’ ...
** using staged installation
sed: include/zmq_utils.h.orig: No such file or directory
autogen.sh: error: could not find autoreconf. autoconf and automake are required to run autogen.sh.
./configure: line 35: die: command not found
./configure: line 38: ./configure: No such file or directory
make: *** No targets specified and no makefile found. Stop.
ERROR: configuration failed for package ‘clustermq’
* removing ‘/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/clustermq’
Warning messages:
1: In utils::install.packages(pkgs = pkgs, lib = lib, repos = myrepos, :
installation of package ‘/var/folders/4v/vh7xp8553lsbl49svl48g7p00000gp/T//Rtmp0lFXsv/filedea533c9091/clustermq_0.9.0.tar.gz’ had non-zero exit status
2: In utils::install.packages(pkgs = pkgs, lib = lib, repos = myrepos, :
installation of package ‘/var/folders/4v/vh7xp8553lsbl49svl48g7p00000gp/T//Rtmp0lFXsv/filedea533c9091/clustermq_0.9.0.tar.gz’ had non-zero exit status |
I see now: |
I installed |
It's a typo in the |
The following
clustermq
-only reprex is a simplified version of whattargets
is trying to do. (I omitw$cleanup()
to test thew$send_shutdown()
.) Not every run segfaults, but many runs do.On a segfault, the error log of the worker reads:
I am using Ubuntu for this test. (On Mac OS, as I have said,
w$recv()
hangs in a much simpler example.)The text was updated successfully, but these errors were encountered: