Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Deterministic cudnn algorithms #11341

Closed
wenyangchu opened this issue Jun 20, 2018 · 10 comments
Closed

Deterministic cudnn algorithms #11341

wenyangchu opened this issue Jun 20, 2018 · 10 comments
Labels

Comments

@wenyangchu
Copy link
Contributor

Hi all,

I see other frameworks such as pytorch have some actions to have a flag to force cudnn to only use deterministic algorithm, specially for convolution and maxpooling:

like in pytorch:
example

Some references to cudnn:
maxpooling
cudnnConvolutionBackwardData

cudnnConvolutionBwdFilterAlgo_t

I am doing a medical product and reproducibility is an issue. Therefore, this feature will be very important for me.

One solution is to have a flag like MXNET_CUDNN_AUTOTUNE_DEFAULT.
Anyone is doing it? I have reviewed the current code and want to do it.
Any suggestion or better solution?

Thanks

@vrakesh
Copy link
Contributor

vrakesh commented Jun 20, 2018

Thank you for suggesting this @wenyangchu , we will look into this, @sandeep-krishnamurthy , requesting to label this under CUDA.

@DickJC123
Copy link
Contributor

I prototyped this functionality last summer but did not PR it. I found at the time that the regression suite would not run if only deterministic algorithms were permitted. I'm not sure if the situation has improved in terms of universal coverage. Some questions to think about:

  1. Assuming an environment variable, should it be MXNET_PREFER_DETERMINISM with a warning issued if no deterministic implementation exists, or MXNET_REQUIRE_DETERMINISM with a fatal error if no deterministic implementation exists?
  2. Should the directive be only for cudnn, or for the entire platform?
  3. Should the environment variable be setting a default, with new parameters added to the operators for individual operator instance control?
  4. What should the priority of determinism be relative to other constraints like "choose fastest algo" or "limit workspace"? [Determinsim should win out I would think]

@wenyangchu
Copy link
Contributor Author

wenyangchu commented Jun 22, 2018

Hi @DickJC123 ,
Thanks for your reply,
I did an implementation this week due to my urgent need. I just put it into a pullrequest meant for discussion for now:

@#11361
Please check the last 2 commits.

It is meant for us to come out with a good solution later on.

For your questions:

  1. If MXNET_PREFER_DETERMINISM is set and it can not find a deterministic algorithm, I suppose it has to have a fatal error because user's need is not to be able to be satisfied.

  2. I think it is a good idea to have it for the entire platform but I will try to solve it with cudnn first because it is the most used one I suppose? I do not see other obvious issue in other backends yet maybe anyone else can suggest where can be not deterministic?

I have tested CPU version with intel MKL in a limited scenarios and it was deterministic for training. Maybe more tests need to be done?

  1. I think it is good to have a global determinism control if feasible. If it is possible to have control over individual layers, I think it is also very good to have.

In the pullrequest I added deterministic parameter (default = False) to Maxpooling:
nn.MaxPool2D(pool_size=(3,3), strides=(2,2) ,deterministic=True)

Added env parameters to select Deterministic algorithms for Conv back propagation algorithm
os.environ["MXNET_CUDNN_AUTOTUNE_DEFAULT"] = "3"

Old:
#Value of 1 chooses the best algo in a limited workspace
#Value of 2 chooses the fastest algo whose memory requirements may be larger than the default workspace threshold
Added:
#Value of 3 choose the deterministic best algo in a limited workspace
#Value of 4 chooses the deterministic fastest algo whose memory requirements may be larger than the default workspace threshold

They could be replaced by a global deterministic flag.

  1. As you see above, I actually think it is good to let user to select deterministic algorithm according to constraints: speed or memory size.

The problem of this solution is that, if cudnn chooses different deterministic algos, it can fail repeatability. I think it is good to have another mechanism to let user to select cudnn algorithm directly if available.

@wenyangchu
Copy link
Contributor Author

wenyangchu commented Jun 25, 2018

Hi @DickJC123 , I have little knowledge on the regression test in mxnet. Could you please let me know how you ran the test? Thank you!

@DickJC123
Copy link
Contributor

I often use tests/jenkins/run_test_ubuntu.sh to compile MXNet and run the regression tests. You may need to set DEV=0 in that script to get past compile warnings treated as errors.

@sandeep-krishnamurthy
Copy link
Contributor

Hello @wenyangchu @DickJC123
Can you please let me know next steps here? How you think we can take it forward?

@danithaca
Copy link
Contributor

+1. Sometimes we need to be able to reproduce results even at the cost of worse performance. Is there a plan or next step? Thanks.

@apeforest
Copy link
Contributor

apeforest commented Oct 23, 2018

@wenyangchu I have encountered the same problem and your solution will be very helpful to us. Would you like to continue working on your PR (#11361) in which case I will be glad to review and help to merge it. Otherwise, if you are occupied with other commitments, I also wouldn't mind making this change. Please kindly suggest. Thanks!

@apeforest
Copy link
Contributor

@wenyangchu Since I have not got the response from you, I took the liberty to implement this feature in MXNet due to other urgent requests. Thank you very much for your investigation and suggestion. You are more than welcome to review my PR and provide other valuable suggestions.

@wenyangchu
Copy link
Contributor Author

@apeforest Thanks for taking action to make it happen. Sorry the notification was sent to my gmail and filtered.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants