Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout when initializing SSH tunnel #157

Closed
brendanf opened this issue Jul 8, 2019 · 6 comments
Closed

Timeout when initializing SSH tunnel #157

brendanf opened this issue Jul 8, 2019 · 6 comments

Comments

@brendanf
Copy link
Contributor

brendanf commented Jul 8, 2019

I'm trying to submit jobs to my cluster (SLURM) via SSH in drake with the clustermq backend. I need to load a conda environment on the cluster before running R. I modified the template to activate the conda environment (and also set the R profile):

"conda activate <my_environment> && \
        cd {{ project_dir | <my_directory> }} && \
        R_PROFILE={{ remote_profile | config/slurm.Rprofile }} R --no-save --no-restore -e \
        'clustermq:::ssh_proxy(ctl={{ ctl_port }}, job={{ job_port }})' \
        > {{ ssh_log | /dev/null }} 2>&1"

This failed with error message Error in .subset2(public_bind_env, "initialize")(...) : Remote R process did not respond after 5 seconds. Check your SSH server log.

Checking the SSH server log as suggested:

R version 3.5.1 (2018-07-02) -- "Feather Spray"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-conda_cos6-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> clustermq:::ssh_proxy(ctl=54344, job=50552)
master ctl listening at: tcp://localhost:54344
forwarding local network from: tcp://rackham2:8266
sent PROXY_UP to master ctl

I assume this is what the output should look like. I'm guessing that the issue is that loading the conda environment and starting R is taking longer than 5 seconds, so the initialization is timing out locally. Apparently this 5 second timeout is hardcoded. Would it be possible to expose it as an option?

@brendanf
Copy link
Contributor Author

brendanf commented Jul 9, 2019

I've tried this a few more times; it sometimes works, suggesting that it really is a timing issue.

@mschubert
Copy link
Owner

Hi @brendanf 👋 ,

Thank you for spotting this!

Your reasoning makes sense, there may just not be enough time to activate and environment and start R within the timeout given.

Do you want to try and send a patch for this?

@brendanf
Copy link
Contributor Author

OK, coming up. Is there any documentation of package options that I should add this to? I haven't been able to find a single place for them.

@mschubert
Copy link
Owner

A lot of documentation on the options is still lacking, but I'm holding off on that until I structure them better (0.9, 1.0)

@mschubert
Copy link
Owner

Thank you for #158, I just merged it.

Does this solve your problem here as well?

@brendanf
Copy link
Contributor Author

Does this solve your problem here as well?

Yes, although I added some documentation for the change in #159, and ropensci/drake#933 is still a (separate) issue with the SSH backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants