-
Notifications
You must be signed in to change notification settings - Fork 20
SSH Cluster
BatchJobs allows the construction of a cluster of machines which are not managed by a true batch system like TORQUE, but are instead only accessible by SSH log in. As configuring the package works a bit differently in this case and you might wonder how jobs are scheduled to nodes and so on, we will cover all necessary details on this page and also present some nice convenience functions to manage and query such a cluster.
- All machines must run Linux or a Unix-like operating system. Mac OS is fine, too.
- All machines (head and compute nodes) need a shared file system, e.g. a NFS mount.
- BatchJobs (including all dependencies) must be installed on all machines. The package version must be identical.
- You must configure the machines so that passwordless SSH authentication works. Some documentation is provided in the Arch Linux Wiki.
Assume you have three compute nodes named tukey, rao and kendall each having 6 cpu cores. In your configuration file you would now set up your cluster functions as follows:
cluster.functions = makeClusterFunctionsSSH(
makeSSHWorker(nodename="tukey"),
makeSSHWorker(nodename="rao"),
makeSSHWorker(nodename="kendall")
)
staged.queries = TRUE
Basically, that's all you have to do to get started if R is in your default PATH
.
You could now use BatchJobs and BatchExperiments to compute on these machines.
BatchJobs will auto-detect the numbers of cores on each machine and start your jobs via R CMD BATCH
which is issued through an ssh
command.
But let's look at a few more details.
Let's make it a bit more complicated and assume that on tukey and rao R is installed under "/opt/R/R-current" but on kendall R is installed under "/usr/local/R/".
To cope with such an installation you would use the rhome
argument of makeSSHWorker
:
cluster.functions = makeClusterFunctionsSSH(
makeSSHWorker(nodename="tukey", rhome="/opt/R/R-current"),
makeSSHWorker(nodename="rao", rhome="/opt/R/R-current")
makeSSHWorker(nodename="kendall", rhome="/usr/local/R"),
)
staged.queries = TRUE
If you do not configure rhome
, like we did in the first example, the R installation which is on your PATH
is used.
In some cases you might also want to control how R is invoked on the nodes when jobs are run. The default flags are
--no-save --no-restore --no-init-file --no-site-file
If you want to change this, use the r.options
argument of makeSSHWorker
.
These are auto-detected via /proc/cpuinfo. But this takes a little bit of time each time the configuration is sourced (e.g., at package start up), so you might want to set this manually, because you usually know the number of cores in advance:
makeSSHWorker(nodename="tukey", ncpus=6)
This also allows you to restrict the number of cores used for parallel computation, but there are a few more sophisticated options for such a resource management available.
In most environments, you are not the only user is allowed to compute on the configured compute nodes. Competing users might start lengthy R processes or other jobs at any point in time. We should make it clear that BatchJobs with our SSH mode will in such scenarios not give you the full, reasonable features that a true batch system would provide. We will not re-implement them for the SSH mode, if you need fine grained control over resource allocation and job scheduling, consider istalling a job scheduler such as SLURM.
Not all is lost however, there are still lots of nice options to allow you some control over your SSH slaves. These ensure that you will not over-allocate your nodes with jobs and we can also be nice and conservatively leave a few computational resources a for our competing colleagues.
Here are the resource management options in detail that you can set individually for each SSH worker:
- You can set a
max.load
. If the load of the node during the last 5 minutes is this value or higher, we consider it occupied an will not schedule a job to it. The default value for this isncpus-1
. If you have the machine for yourself and want to be a bit more aggressive you could set it toncpus-0.5
. - You can set a maximum number of jobs per registry.
If you set this to 3, no more than 3 jobs of your current registry will ever be scheduled to that worker concurrently.
The default value for this is
ncpus
. - You can adjust the process priority by "niceing" your jobs invoked by
R CMD BATCH
. The Linux commandnice
will be used for this. Niceness range from -20 (most favorable scheduling) to 19 (least favorable). The default is not to callnice
. - The running R jobs on each node are monitored.
If more than 3x
ncpus
R jobs are running, no job is submitted to this node (currently hard-wired). We also count the number of "expensive" R jobs. These are defined as jobs which currently create more than 50% load. If the number of these jobs isncpus
or more we also consider the node completely occupied and do not schedule any further jobs on it until the load drops.
Let's assume that kendall should not be used extensively because somebody else wants to use the machine at the same time:
makeSSHWorker(nodename="kendall", ncpus=6, max.load=4, max.jobs=2)
This will make sure that we will never increase the load to more than 4 on this node and we also will only run 2 jobs concurrently.
Our implemented SSH scheduler works in the following way: For each job that is submitted via submitJobs
, the scheduler figures out which nodes are currently available.
This means:
- The load must be less than
max.load
. - The number of jobs from our registry on this node must be less than
max.jobs
. - The number of "expensive" R processes of any user in total on this node must be less than
ncpus
.
From the available workers one is then selected randomly (workers with a lower load are selected with higher probability) and the job is started.
If no worker is currently available, the waiting mechanism of submitJobs
is invoked and after some seconds the scheduler tries to submit again.
If you run into problems on your system you have a couple of debugging options. Here are a few hints to get you started:
- Run the function
debugSSH(nodename, rhome, ...)
for a node. The function will perform various tests on this node using the passed worker options. Note that this function does not access nor use information specified for your cluster functions in your configuration. Once you've passed all tests and thereby found working arguments for your worker you should transfer required options to your configuration file. If you run into errors and cannot figure out yourself where the problem lies you should contact us. - With a valid config file for SSH mode, you can set the option
debug=TRUE
and the run a simple test, like submitting one trivial job. This will output all issued system commands and their results on your R console, so you (or we) can inspect them.
We have created a couple of convenience functions in BatchJobs to make routine tasks on SSH clusters a bit simpler.
Maybe you are unsure how R is actually set up on some nodes and want to check that. You could then simply do
getSSHWorkersInfo(c("tukey", "rao"))
Calling function on: tukey,rao. Node: tukey R version 2.15.2 (2012-10-26) Platform: x86_64-unknown-linux-gnu (64-bit) R Home: /opt/R/R-2.15.2/lib64/R First lib path: /home/bischl/R/x86_64-unknown-linux-gnu-library/2.15 Node: rao R version 2.15.2 (2012-10-26) Platform: x86_64-unknown-linux-gnu (64-bit) R Home: /opt/R/R-2.15.2/lib64/R First lib path: /home/bischl/R/x86_64-unknown-linux-gnu-library/2.15
and have a look at R versions, installation paths and lib paths.
The function will display a warning if the first lib path on the worker is not writable as this indicates potential problems in the configuration of the node and installPackagesOnSSHWorkers
will not work.
Installing packages is equally simple for the whole cluster or a subset of machines:
installPackagesOnSSHWorkers(nodenames=c("tukey", "rao"), pkgs="BBmisc")
The command above will successively log into each machine and simply call install.packages
on that node.
You will see the console output of the package installation on your console on the master.
You could also install the packages in parallel by doing
installPackagesOnSSHWorkers(nodenames=c("tukey", "rao"), pkgs="BBmisc", consecutive=FALSE)
but you will not see any console output during package installation in that case. Also only do this if you are sure that on the node the package will be installed in different directories and you do not write in parallel to the same place.
If you would like to figure out how many computational resources are currently available on your SSH cluster, simply call showClusterStatus
and have a look at the resulting table:
ncpus load n.rprocs n.rprocs.50 n.jobs kendall 6 3.00 8 3 0 rao 6 3.03 8 3 0 tukey 6 9.00 7 5 0
This displays the number of cpus, running R processes, running R processes with more than 50% load and - if you pass a registry to showClusterState(reg)
- the number of running jobs of this registry for each node.
Can also be done.
callFunctionOnSSHWorkers(c("tukey", "rao"), fun=sessionInfo, simplify=FALSE)
This will call any function on the specified (here: empty) arguments an each node. The execution can be performed in parallel or consecutively, and in the latter case console output can be displayed on the master. Only use this for short administrative tasks, real computation should be done with the usual registry / mapping mechanism from BatchJobs! If you find yourself using this facility for day to day system administration tasks, look at better options such as [Cluster SSH](http://sourceforge.net/apps/mediawiki/clusterssh/index.php?title=Main_Page Cluster SSH) or Fabric.
There are only two minor issues specific to SSH clusters that should be mentioned
- If at some point your cluster is completely occupied, the jobs submission procedure will go to sleep.
You will sometimes see something like this in the status bar: `Status: 1, zzz=10.0s msg=Workers busy: LLJ'.
This just gives you a small hint, WHY your machines are occupied.
Their is a single character for each node (in the same order as you specified them in the config), and the meaning of the characters is:
- L: Load too high, more than
max.load
. - J:
max.jobs
from the registry already running. - R: Too many expensive R processes running (across all users on that node).
- r: Too many R processes (> 3 x
ncpus
).
- L: Load too high, more than
- If you have many jobs that you cannot schedule all at once, it can be nice to combine this waiting mechanism with a terminal multiplexer such as screen or tmux. Open up R inside a multiplexed session, submit the jobs (maybe configuring the waiting-mechanism so it retries to submit extremely often) and detach. Jobs will now be continuously submitted to the workers once they become available for a long time.