Adding `--gpu={gpu index}` flag to docker option in tutorials. #1212

ananno · 2020-09-16T01:46:09Z

What this patch does to fix the issue.

Adding gpu flag in docs for docker runtime to use GPU. Without this the instructions in tutorials are incomplete and will not use GPU.

Link to any relevant issues or pull requests.

bo-code-review-bot · 2020-09-16T01:46:13Z

This PR needs Approvals as follows.

Ownership Approval for / from iizukak, tkng, ruimashita

Please choose reviewers and requet reviews!

Click to see how to approve each reviews

You can approve this PR by triggered comments as follows.

Approve all reviews requested to you (readability and ownership) and LGTM review
Approval, LGTM
Approve all ownership reviews
Ownership Approval or OA
Approve all readability reviews
Readability Approval or RA
Approve specified review targets
- Example of Ownership Reviewer of /: Ownership Approval for / or OA for /
- Example of Readability Reviewer of Python: Readability Approval for Python or RA for Python
Approve LGTM review
LGTM

See all trigger comments

Please replace [Target] to review target

Ownership Approval
- Ownership Approval for [Target]
- OA for [Target]
- Ownership Approval
- OA
- Approval
Readability Approval
- Readability Approval for [Target]
- RA for [Target]
- [Target] Readability Approval
- [Target] RA
- Readability Approval
- RA
- Approval
LGTM
- LGTM
- lgtm

CLAassistant · 2020-09-16T01:46:14Z

All committers have signed the CLA.

iizukak · 2020-09-16T02:12:27Z

@ananno
Sorry, I don't understand perfect. CUDA_VISIBLE_DEVICES is not enough for now?
If CUDA_VISIBLE_DEVICES is not working and --gpus is available, please remove description about CUDA_VISIBLE_DEVICES.

ananno · 2020-09-16T03:21:53Z

@ananno
Sorry, I don't understand perfect. CUDA_VISIBLE_DEVICES is not enough for now?
If CUDA_VISIBLE_DEVICES is not working and --gpus is available, please remove description about CUDA_VISIBLE_DEVICES.

@iizukak San:
The CUDA_VISIBLE_DEVICES is an environment variable to restrict TensorFlow to use particular GPU device(s). But --gpu flag is needed for docker to use how many GPU device(s). But only CUDA_VISIBLE_DEVICES is not enough and docker does not initiate CUDA.

I've tested with different combination of these two parameter. The filter first use CUDA_VISIBLE_DEVICES to select allowed devices. Then docker will select the GPU(s) provided by --gpu={_count_}. Check the following combinations of GPU index used in lmsrv05 with 2 TitanX gpus:

CUDA_VISIBLE_DEVICES	--gpus	Training on GPU
0	0	Success
1	0	Success
0	1	Failed
1	1	Failed
0,1	0	Success
0,1	1	Success
0,1	2	Success
Not provided	0	Success
Not provided	1	Success
Not provided	2	Success

Not providing CUDA_VISIBLE_DEVICES means all available GPUs.
--gpus=0 means all available GPUs filtered by CUDA_VISIBLE_DEVICES.
Providing number of gpus in --gpus={} parameter is required.

So, it seems like the GPU list is first filtered by CUDA_VISIBLE_DEVICES than docker use GPU with provided number of gpus from that filtered list.

suttang · 2020-09-16T03:35:28Z

@hadusam As I am unable to judge the validity of this content, could you please confirm it when you have time?

joelN123 · 2020-09-16T03:35:56Z

oh, it's interesting. So, it seems like --gpus=0 and not providing CUDA_VISIBLE_DEVICES will mean that gpu1 will be left free? So, I think that's what @iizukak means, that maybe the CUDA_VISIBLE_DEVICES option is actually not necessary?
p.s. the --gpus flag is the newer version of --runtime=nvidia maybe?
edit: oh, maybe --gpus device=0,1 is more similar to the CUDA_VISIBLE_DEVICES="0,1" usage

iizukak · 2020-09-16T03:50:58Z

@ananno
I'm confusing...
Why Training on GPU successed with --gpus=0? 0 dose not mean don't use GPU?
And why failed with --gpus=1 and CUDA_VISIBLE_DEVICES=0 or 1 pattern?

ananno · 2020-09-16T04:00:17Z

oh, it's interesting. So, it seems like --gpus=0 and not providing CUDA_VISIBLE_DEVICES will mean that gpu1 will be left free? So, I think that's what @iizukak means, that maybe the CUDA_VISIBLE_DEVICES option is actually not necessary?
p.s. the --gpus flag is the newer version of --runtime=nvidia maybe?
edit: oh, maybe --gpus device=0,1 is more similar to the CUDA_VISIBLE_DEVICES="0,1" usage

@joelN123 San @iizukak San:
I've tested --gpus=0 means all available GPUs. In this test I didn't use CUDA_VISIBLE_DEVICES which by default exposes all available GPUs to tensorflow. And with --gpus=0 docker will run training with all available GPUs. The effect is same as not using CUDA_VISIBLE_DEVICES.

And to answer to your question, If you do not use CUDA_VISIBLE_DEVICES in a multi-gpu server it will create tensorflow process on all the GPUs with ~150MB gpu memory. To avoid this and restrict the tensorflow process attach to one or specific gpu you have to use CUDA_VISIBLE_DEVICES. If you don't want to use CUDA_VISIBLE_DEVICES env variable you can use --gpus device=1 to select specific devices and restrict tensorflow process to use GPU=1 only.

joelN123 · 2020-09-16T04:16:17Z

understood ! thank you. In that case, it might be good to have --gpus device=<gpu indices> in the documentation, and then can remove the CUDA_VISIBLE_DEVICES environment variable from the documentation? Maybe it's down to preference. Anyway, I think that is what I will be doing from now on, when running docker on gpu.

ananno · 2020-09-16T04:31:50Z

understood ! thank you. In that case, it might be good to have --gpus device=<gpu indices> in the documentation, and then can remove the CUDA_VISIBLE_DEVICES environment variable from the documentation? Maybe it's down to preference. Anyway, I think that is what I will be doing from now on, when running docker on gpu.

@joelN123 San:
I'm doing it also. I agree to remove CUDA_VISIBLE_DEVICES all together and use --gpus device=<gpu indices> option.

…parameter

hadusam

@ananno
Thanks!
It looks good for me to use --gpus with device= instead of CUDA_VISIBVLE_DEVICES.
I left some comments, please check them!

hadusam · 2020-09-16T04:41:24Z

docs/tutorial/image_cls.md

+> For old docker versions use `--runtime=nvidia` with `-e CUDA_VISIBLE_DEVICES=<gpu indices>` environment variable
+> instead of `--gpu device=<gpu indices>`.


Thanks, but I think we don't need these lines because our supported docker version is Docker >= 19.03.
https://docs.blue-oil.org/install/install.html#prerequisites

OK. I'll remove it. Thanks for the review.

hadusam · 2020-09-16T04:42:20Z

docs/tutorial/image_cls.md

        -v $(pwd)/cifar:/home/blueoil/cifar \
        -v $(pwd)/config:/home/blueoil/config \
        -v $(pwd)/saved:/home/blueoil/saved \
        blueoil_$(id -un):{TAG} \
        blueoil train -c config/cifar10_test.py

 Just like init, set the value of `{TAG}` to the value obtained by `docker images`.
-Change the value of `CUDA_VISIBLE_DEVICES` according to your environment.
+Change the value of `--gpu device=<gpu indices>` according to your environment. 


there are some typo: --gpu -> --gpus, please check them. 🙇

hadusam · 2020-09-16T04:42:25Z

docs/tutorial/image_cls.md

        -v $(pwd)/cifar:/home/blueoil/cifar \
        -v $(pwd)/config:/home/blueoil/config \
        -v $(pwd)/saved:/home/blueoil/saved \
        blueoil_$(id -un):{TAG} \
        blueoil train -c config/cifar10_test.py

 Just like init, set the value of `{TAG}` to the value obtained by `docker images`.
-Change the value of `CUDA_VISIBLE_DEVICES` according to your environment.
+Change the value of `--gpu device=<gpu indices>` according to your environment. 


there are some typo: --gpu -> --gpus, please check them. 🙇

hadusam · 2020-09-16T04:45:55Z

docs/tutorial/image_cls.md

@@ -106,15 +106,18 @@ If configuration finishes, the configuration file is generated in the `cifar10_t
 Train your model by running `blueoil train` with model configuration.

    $ docker run --rm \
-        -e CUDA_VISIBLE_DEVICES=0 \
+        --gpus device=0 \


In my environment (Docker v19.03.7), this doesn't work.
I think we need to use --gpus '"device=0"'. It's not beautiful though... 😭
See: NVIDIA/nvidia-docker#1257

OK. I'm changing it.

This reverts commit 02b429b.

hadusam

@ananno
LGTM!

Could you please fix the title of this PR with the correct option? Because the title will be used to commit message of this PR.
Thanks for the nice PR!

iizukak

@ananno @joelN123
Thanks. I understand just now.
OA

iizukak · 2020-09-23T01:15:06Z

/ready

bo-mergebot · 2020-09-23T01:15:09Z

⏳Merge job is queued...

Adding --gpu={gpu index} flag to docker option in tutorials.

40c00a3

blueoil-butler bot added the CI: auto-run Run CI automatically label Sep 16, 2020

ananno self-assigned this Sep 16, 2020

ananno requested a review from suttang September 16, 2020 01:47

Merge branch 'master' into adding_gpu_flag_to_docs

8d4ff29

ananno requested a review from iizukak September 16, 2020 01:47

Changing parameter description GPU Index -> number of gpus

0026beb

suttang requested a review from hadusam September 16, 2020 03:33

ananno added 2 commits September 16, 2020 13:34

Removing CUDA_VISIBLE_DEVICES and replaced with --gpus devices=0 …

24d25b5

…parameter

Adding instruction for older docker version

02b429b

hadusam reviewed Sep 16, 2020

View reviewed changes

ananno added 2 commits September 16, 2020 13:51

Revert "Adding instruction for older docker version"

aa98022

This reverts commit 02b429b.

Fixing typos

1ce3780

ananno requested a review from hadusam September 16, 2020 04:53

Merge branch 'master' into adding_gpu_flag_to_docs

66cd894

suttang removed their request for review September 16, 2020 04:55

Fixing feature description for --gpus parameter.

394db01

hadusam approved these changes Sep 16, 2020

View reviewed changes

iizukak approved these changes Sep 16, 2020

View reviewed changes

Merge branch 'master' into adding_gpu_flag_to_docs

5de78ad

bo-mergebot bot merged commit 2c78ec5 into master Sep 23, 2020

bo-mergebot bot deleted the adding_gpu_flag_to_docs branch September 23, 2020 01:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding `--gpu={gpu index}` flag to docker option in tutorials. #1212

Adding `--gpu={gpu index}` flag to docker option in tutorials. #1212

ananno commented Sep 16, 2020 •

edited

Loading

bo-code-review-bot bot commented Sep 16, 2020

CLAassistant commented Sep 16, 2020 •

edited

Loading

iizukak commented Sep 16, 2020 •

edited

Loading

ananno commented Sep 16, 2020 •

edited

Loading

suttang commented Sep 16, 2020

joelN123 commented Sep 16, 2020 •

edited

Loading

iizukak commented Sep 16, 2020 •

edited

Loading

ananno commented Sep 16, 2020 •

edited

Loading

joelN123 commented Sep 16, 2020 •

edited

Loading

ananno commented Sep 16, 2020

hadusam left a comment

hadusam Sep 16, 2020

ananno Sep 16, 2020

hadusam Sep 16, 2020

ananno Sep 16, 2020

hadusam Sep 16, 2020

ananno Sep 16, 2020

hadusam Sep 16, 2020

ananno Sep 16, 2020 •

edited

Loading

ananno Sep 16, 2020

hadusam left a comment

iizukak left a comment

iizukak commented Sep 23, 2020

bo-mergebot bot commented Sep 23, 2020

		> For old docker versions use `--runtime=nvidia` with `-e CUDA_VISIBLE_DEVICES=<gpu indices>` environment variable
		> instead of `--gpu device=<gpu indices>`.

Adding --gpu={gpu index} flag to docker option in tutorials. #1212

Adding --gpu={gpu index} flag to docker option in tutorials. #1212

Conversation

ananno commented Sep 16, 2020 • edited Loading

What this patch does to fix the issue.

Link to any relevant issues or pull requests.

bo-code-review-bot bot commented Sep 16, 2020

CLAassistant commented Sep 16, 2020 • edited Loading

iizukak commented Sep 16, 2020 • edited Loading

ananno commented Sep 16, 2020 • edited Loading

suttang commented Sep 16, 2020

joelN123 commented Sep 16, 2020 • edited Loading

iizukak commented Sep 16, 2020 • edited Loading

ananno commented Sep 16, 2020 • edited Loading

joelN123 commented Sep 16, 2020 • edited Loading

ananno commented Sep 16, 2020

hadusam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ananno Sep 16, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadusam left a comment

Choose a reason for hiding this comment

iizukak left a comment

Choose a reason for hiding this comment

iizukak commented Sep 23, 2020

bo-mergebot bot commented Sep 23, 2020

Adding `--gpu={gpu index}` flag to docker option in tutorials. #1212

Adding `--gpu={gpu index}` flag to docker option in tutorials. #1212

ananno commented Sep 16, 2020 •

edited

Loading

CLAassistant commented Sep 16, 2020 •

edited

Loading

iizukak commented Sep 16, 2020 •

edited

Loading

ananno commented Sep 16, 2020 •

edited

Loading

joelN123 commented Sep 16, 2020 •

edited

Loading

iizukak commented Sep 16, 2020 •

edited

Loading

ananno commented Sep 16, 2020 •

edited

Loading

joelN123 commented Sep 16, 2020 •

edited

Loading

ananno Sep 16, 2020 •

edited

Loading