Deterministic cudnn algorithms #11341

wenyangchu · 2018-06-20T00:04:58Z

Hi all,

I see other frameworks such as pytorch have some actions to have a flag to force cudnn to only use deterministic algorithm, specially for convolution and maxpooling:

like in pytorch:
example

Some references to cudnn:
maxpooling
cudnnConvolutionBackwardData

cudnnConvolutionBwdFilterAlgo_t

I am doing a medical product and reproducibility is an issue. Therefore, this feature will be very important for me.

One solution is to have a flag like MXNET_CUDNN_AUTOTUNE_DEFAULT.
Anyone is doing it? I have reviewed the current code and want to do it.
Any suggestion or better solution?

Thanks

vrakesh · 2018-06-20T17:41:21Z

Thank you for suggesting this @wenyangchu , we will look into this, @sandeep-krishnamurthy , requesting to label this under CUDA.

DickJC123 · 2018-06-21T18:16:35Z

I prototyped this functionality last summer but did not PR it. I found at the time that the regression suite would not run if only deterministic algorithms were permitted. I'm not sure if the situation has improved in terms of universal coverage. Some questions to think about:

Assuming an environment variable, should it be MXNET_PREFER_DETERMINISM with a warning issued if no deterministic implementation exists, or MXNET_REQUIRE_DETERMINISM with a fatal error if no deterministic implementation exists?
Should the directive be only for cudnn, or for the entire platform?
Should the environment variable be setting a default, with new parameters added to the operators for individual operator instance control?
What should the priority of determinism be relative to other constraints like "choose fastest algo" or "limit workspace"? [Determinsim should win out I would think]

wenyangchu · 2018-06-22T15:11:05Z

Hi @DickJC123 ,
Thanks for your reply,
I did an implementation this week due to my urgent need. I just put it into a pullrequest meant for discussion for now:

@#11361
Please check the last 2 commits.

It is meant for us to come out with a good solution later on.

For your questions:

If MXNET_PREFER_DETERMINISM is set and it can not find a deterministic algorithm, I suppose it has to have a fatal error because user's need is not to be able to be satisfied.
I think it is a good idea to have it for the entire platform but I will try to solve it with cudnn first because it is the most used one I suppose? I do not see other obvious issue in other backends yet maybe anyone else can suggest where can be not deterministic?

I have tested CPU version with intel MKL in a limited scenarios and it was deterministic for training. Maybe more tests need to be done?

I think it is good to have a global determinism control if feasible. If it is possible to have control over individual layers, I think it is also very good to have.

In the pullrequest I added deterministic parameter (default = False) to Maxpooling:
nn.MaxPool2D(pool_size=(3,3), strides=(2,2) ,deterministic=True)

Added env parameters to select Deterministic algorithms for Conv back propagation algorithm
os.environ["MXNET_CUDNN_AUTOTUNE_DEFAULT"] = "3"

Old:
#Value of 1 chooses the best algo in a limited workspace
#Value of 2 chooses the fastest algo whose memory requirements may be larger than the default workspace threshold
Added:
#Value of 3 choose the deterministic best algo in a limited workspace
#Value of 4 chooses the deterministic fastest algo whose memory requirements may be larger than the default workspace threshold

They could be replaced by a global deterministic flag.

As you see above, I actually think it is good to let user to select deterministic algorithm according to constraints: speed or memory size.

The problem of this solution is that, if cudnn chooses different deterministic algos, it can fail repeatability. I think it is good to have another mechanism to let user to select cudnn algorithm directly if available.

wenyangchu · 2018-06-25T19:56:21Z

Hi @DickJC123 , I have little knowledge on the regression test in mxnet. Could you please let me know how you ran the test? Thank you!

DickJC123 · 2018-06-25T21:16:31Z

I often use tests/jenkins/run_test_ubuntu.sh to compile MXNet and run the regression tests. You may need to set DEV=0 in that script to get past compile warnings treated as errors.

sandeep-krishnamurthy · 2018-08-21T19:06:16Z

Hello @wenyangchu @DickJC123
Can you please let me know next steps here? How you think we can take it forward?

danithaca · 2018-10-15T15:58:56Z

+1. Sometimes we need to be able to reproduce results even at the cost of worse performance. Is there a plan or next step? Thanks.

apeforest · 2018-10-23T17:27:13Z

@wenyangchu I have encountered the same problem and your solution will be very helpful to us. Would you like to continue working on your PR (#11361) in which case I will be glad to review and help to merge it. Otherwise, if you are occupied with other commitments, I also wouldn't mind making this change. Please kindly suggest. Thanks!

apeforest · 2018-10-26T23:57:48Z

@wenyangchu Since I have not got the response from you, I took the liberty to implement this feature in MXNet due to other urgent requests. Thank you very much for your investigation and suggestion. You are more than welcome to review my PR and provide other valuable suggestions.

wenyangchu · 2018-11-12T23:00:37Z

@apeforest Thanks for taking action to make it happen. Sorry the notification was sent to my gmail and filtered.

sandeep-krishnamurthy added the CUDA label Jun 21, 2018

sandeep-krishnamurthy mentioned this issue Aug 21, 2018

Training with the same parameters and seed gets significantly different results #9410

Closed

apeforest mentioned this issue Oct 26, 2018

[MXNET-1179] Enforce deterministic algorithms in convolution layers #12992

Merged

7 tasks

wenyangchu closed this as completed Nov 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deterministic cudnn algorithms #11341

Deterministic cudnn algorithms #11341

wenyangchu commented Jun 20, 2018

vrakesh commented Jun 20, 2018

DickJC123 commented Jun 21, 2018

wenyangchu commented Jun 22, 2018 •

edited

Loading

wenyangchu commented Jun 25, 2018 •

edited

Loading

DickJC123 commented Jun 25, 2018

sandeep-krishnamurthy commented Aug 21, 2018

danithaca commented Oct 15, 2018

apeforest commented Oct 23, 2018 •

edited

Loading

apeforest commented Oct 26, 2018

wenyangchu commented Nov 12, 2018

Deterministic cudnn algorithms #11341

Deterministic cudnn algorithms #11341

Comments

wenyangchu commented Jun 20, 2018

vrakesh commented Jun 20, 2018

DickJC123 commented Jun 21, 2018

wenyangchu commented Jun 22, 2018 • edited Loading

wenyangchu commented Jun 25, 2018 • edited Loading

DickJC123 commented Jun 25, 2018

sandeep-krishnamurthy commented Aug 21, 2018

danithaca commented Oct 15, 2018

apeforest commented Oct 23, 2018 • edited Loading

apeforest commented Oct 26, 2018

wenyangchu commented Nov 12, 2018

wenyangchu commented Jun 22, 2018 •

edited

Loading

wenyangchu commented Jun 25, 2018 •

edited

Loading

apeforest commented Oct 23, 2018 •

edited

Loading