Skip to content

Commit

Permalink
[doc] launcher
Browse files Browse the repository at this point in the history
As discussed in deepspeedai#662 this PR modifies the doc:
* explains what to use instead of CUDA_VISIBLE_DEVICES
* puts the `--hostfile` cl arg in the correct place in the invocation script

Fixes: deepspeedai#662
  • Loading branch information
stas00 authored Mar 18, 2021
1 parent 68c8481 commit 546c53a
Showing 1 changed file with 9 additions and 2 deletions.
11 changes: 9 additions & 2 deletions docs/_tutorials/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,8 +186,8 @@ slots available.
The following command launches a PyTorch training job across all available nodes and GPUs
specified in `myhostfile`:
```bash
deepspeed <client_entry.py> <client args> \
--deepspeed --deepspeed_config ds_config.json --hostfile=myhostfile
deepspeed --hostfile=myhostfile <client_entry.py> <client args> \
--deepspeed --deepspeed_config ds_config.json
```

Alternatively, DeepSpeed allows you to restrict distributed training of your model to a
Expand Down Expand Up @@ -264,3 +264,10 @@ not detected or passed in then DeepSpeed will query the number of GPUs on the
local machine to discover the number of slots available. The `--include` and
`--exclude` arguments work as normal, but the user should specify 'localhost'
as the hostname.

Also note that `CUDA_VISIBLE_DEVICES` can't be used with DeepSpeed to control
which devices should be used. For example, to use only gpu1 of the current
node, do:
```bash
deepspeed --include localhost:1 ...
```

0 comments on commit 546c53a

Please sign in to comment.