Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unevaluated promise inflates data sent to workers #279

Closed
6 tasks done
wlandau opened this issue Jan 21, 2021 · 0 comments
Closed
6 tasks done

Unevaluated promise inflates data sent to workers #279

wlandau opened this issue Jan 21, 2021 · 0 comments
Assignees

Comments

@wlandau
Copy link
Member

wlandau commented Jan 21, 2021

Prework

  • Read and agree to the code of conduct and contributing guidelines.
  • If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
  • Post a minimal reproducible example like this one so the maintainer can troubleshoot the problems you identify. A reproducible example is:
    • Runnable: post enough R code and data so any onlooker can create the error on their own computer.
    • Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • Readable: format your code according to the tidyverse style guide.

Description

In this example on an SGE cluster, the targets deploy really slowly.

# _targets.R
library(targets)
library(tarchetypes)
options(clustermq.scheduler = "sge")
options(clustermq.template = "cmq.tmpl")
options(crayon.enabled = FALSE)
tar_rep(x, 0, batches = 1000, reps = 1)
# cmq.tmpl
#$ -N {{ job_name }}
#$ -t 1-{{ n_jobs }}
#$ -j y
#$ -cwd
#$ -V
module load R/4.0.3
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'

The profiling study took several minutes.

px <- proffer::pprof(tar_make_clustermq(workers = 10, callr_function = NULL), host = "0.0.0.0")

I saw this flamegraph:

Screen Shot 2021-01-21 at 12 39 01 PM

Which tells me exactly where the bottleneck is:

self$crew$send_call(
expr = target_run_worker(target),
env = list(target = target)
)

And sure enough, when I changed just retrieval to "worker", everything went much faster.

# _targets.R
library(targets)
library(tarchetypes)
tar_option_set(retrieval = "worker") # worker retrieval
options(clustermq.scheduler = "sge")
options(clustermq.template = "cmq.tmpl")
options(crayon.enabled = FALSE)
tar_rep(x, 0, batches = 1000, reps = 1)

Screen Shot 2021-01-21 at 12 44 37 PM

Solution

Apparently, all I need to do is force() all the pre-loaded objects in the subpipeline. It looks like there is an unevaluated promise object that is consuming too much memory. Here I am debugging this example at $run_worker():

> tar_make_clustermq(callr_function = NULL)
● run target x_batchrun branch x_be02823c
Called from: self$run_worker(target)
Browse[1]> object_size(target) # promise object
9.32 MB # way too big.
Browse[1]> tmp <- force(target$subpipeline$targets$x_batch_084b9b29$value$object)
Browse[1]> object_size(target) # evaluated object
193 kB # much better
Browse[1]> 
@wlandau wlandau self-assigned this Jan 21, 2021
@wlandau wlandau changed the title Strange memory issue sending dependencies to workers Unevaluated promise inflates data sent to workers Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant