[BREAKING] prevent multi-gpu usage #4749

rongou · 2019-08-07T22:05:44Z

See RFC #4531
Part of 1.0.0. roadmap #4680

tests/cpp/predictor/test_gpu_predictor.cu

rongou · 2019-08-07T22:08:32Z

tests/python-gpu/test_large_sizes.py

@@ -25,7 +25,7 @@ def eprint(*args, **kwargs):
 # reduced to fit onto 1 gpu but still be large
 rows3 = 5000  # small
 rows2 = 4360032  # medium
-rows1 = 42360032  # large
+rows1 = 32360032  # large


The original size crashes on my Titan V, so lowering it a bit.

sriramch

rest looks good to me...

tests/cpp/predictor/test_gpu_predictor.cu

sriramch · 2019-08-08T02:35:20Z

include/xgboost/generic_parameters.h

@@ -40,7 +40,7 @@ struct GenericParameter : public dmlc::Parameter<GenericParameter> {
        .describe("The primary GPU device ordinal.");
    DMLC_DECLARE_FIELD(n_gpus)
        .set_default(0)
-        .set_lower_bound(-1)
+        .set_lower_bound(0)


i'm hoping no one uses the moving target (the head version) in their testing, as this is a change in behavior in their configuration.

Yes this is a breaking change if you are using multi-gpu.

trivialfis · 2019-08-08T18:31:51Z

Please provide some informative messages instead of setting range. Like it's removed in 1.0, what are the alternative options or a reference to related doc page. I think the parameter is quite popular despite its limitations. :-)

rongou · 2019-08-08T18:35:18Z

The parameter description refers to distributed training with one process per GPU. Do we have a canonical document/tutorial for that? Maybe using dask?

trivialfis · 2019-08-08T18:49:38Z

Demo yes. But not official document as the interface is not mature.

trivialfis · 2019-08-08T18:52:09Z

But sure, we should mention dask at least. I will work on it after sorting out current PRs, assuming no one else beats me to it.

rongou · 2019-08-08T20:17:42Z

Happy to point it when you have it ready. Do you want me to change the wording on the current PR?

rongou · 2019-08-09T18:12:59Z

Finally got the CI to pass. Please take another look. @RAMitchell @trivialfis

RAMitchell

LGTM, perhaps slightly change the wording of the error message.

RAMitchell · 2019-08-10T00:32:13Z

src/learner.cc

+        generic_param_.n_gpus = 1;
+      }
+      if (generic_param_.n_gpus != 1) {
+        LOG(FATAL) << "Multi-GPU training is no longer supported. "


I think this error message could mislead a user to think multi-GPU training is completely removed, perhaps it is better to say multi-GPU training using threads is no longer supported, and that this can now be achieved instead using distributed training.

trivialfis · 2019-08-12T02:28:44Z

Please mention distributed training with dask. :)

rongou · 2019-08-12T06:07:31Z

Reworded the messages, PTAL.

trivialfis · 2019-08-12T14:06:43Z

src/learner.cc

@@ -585,8 +585,9 @@ class LearnerImpl : public Learner {
        generic_param_.n_gpus = 1;
      }
      if (generic_param_.n_gpus != 1) {


Not needed for this PR. But after removing n_gpus, the UseGPU function should follow ;-).

trivialfis · 2019-08-12T14:07:39Z

@RAMitchell What do you think? Ready to go?

rongou commented Aug 7, 2019

View reviewed changes

tests/cpp/predictor/test_gpu_predictor.cu Outdated Show resolved Hide resolved

rongou commented Aug 7, 2019

View reviewed changes

tests/cpp/predictor/test_gpu_predictor.cu Outdated Show resolved Hide resolved

rongou commented Aug 7, 2019

View reviewed changes

rongou force-pushed the sans-mgpu branch 2 times, most recently from 62f8718 to 2a55290 Compare August 8, 2019 02:12

trivialfis self-requested a review August 8, 2019 02:14

sriramch reviewed Aug 8, 2019

View reviewed changes

rongou changed the title ~~prevent multi-gpu usage~~ [BREAKING] prevent multi-gpu usage Aug 8, 2019

rongou added 6 commits August 9, 2019 09:49

prevent multi-gpu usage

e3bb7e4

fix build

e1f462e

fix distributed test

a2ca807

combine gpu predictor tests

42c5736

set upper bound on n_gpus

4db36c8

remove failed mgpu tests

31b8a00

rongou force-pushed the sans-mgpu branch from 5207552 to 31b8a00 Compare August 9, 2019 17:23

RAMitchell approved these changes Aug 10, 2019

View reviewed changes

clarify error message

4fe16ce

trivialfis approved these changes Aug 12, 2019

View reviewed changes

RAMitchell merged commit c5b2296 into dmlc:master Aug 12, 2019

rongou deleted the sans-mgpu branch August 13, 2019 17:55

trivialfis mentioned this pull request Aug 18, 2019

Tracking deprecated features. #3986

Open

lock bot locked as resolved and limited conversation to collaborators Nov 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BREAKING] prevent multi-gpu usage #4749

[BREAKING] prevent multi-gpu usage #4749

rongou commented Aug 7, 2019

rongou Aug 7, 2019

sriramch left a comment

sriramch Aug 8, 2019

rongou Aug 8, 2019

trivialfis commented Aug 8, 2019

rongou commented Aug 8, 2019

trivialfis commented Aug 8, 2019

trivialfis commented Aug 8, 2019

rongou commented Aug 8, 2019

rongou commented Aug 9, 2019

RAMitchell left a comment

RAMitchell Aug 10, 2019

rongou Aug 12, 2019

trivialfis commented Aug 12, 2019

rongou commented Aug 12, 2019

trivialfis Aug 12, 2019

trivialfis commented Aug 12, 2019

[BREAKING] prevent multi-gpu usage #4749

[BREAKING] prevent multi-gpu usage #4749

Conversation

rongou commented Aug 7, 2019

rongou Aug 7, 2019

Choose a reason for hiding this comment

sriramch left a comment

Choose a reason for hiding this comment

sriramch Aug 8, 2019

Choose a reason for hiding this comment

rongou Aug 8, 2019

Choose a reason for hiding this comment

trivialfis commented Aug 8, 2019

rongou commented Aug 8, 2019

trivialfis commented Aug 8, 2019

trivialfis commented Aug 8, 2019

rongou commented Aug 8, 2019

rongou commented Aug 9, 2019

RAMitchell left a comment

Choose a reason for hiding this comment

RAMitchell Aug 10, 2019

Choose a reason for hiding this comment

rongou Aug 12, 2019

Choose a reason for hiding this comment

trivialfis commented Aug 12, 2019

rongou commented Aug 12, 2019

trivialfis Aug 12, 2019

Choose a reason for hiding this comment

trivialfis commented Aug 12, 2019