-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Deterministic cudnn algorithms #11341
Comments
Thank you for suggesting this @wenyangchu , we will look into this, @sandeep-krishnamurthy , requesting to label this under CUDA. |
I prototyped this functionality last summer but did not PR it. I found at the time that the regression suite would not run if only deterministic algorithms were permitted. I'm not sure if the situation has improved in terms of universal coverage. Some questions to think about:
|
Hi @DickJC123 , @#11361 It is meant for us to come out with a good solution later on. For your questions:
I have tested CPU version with intel MKL in a limited scenarios and it was deterministic for training. Maybe more tests need to be done?
In the pullrequest I added deterministic parameter (default = False) to Maxpooling: Added env parameters to select Deterministic algorithms for Conv back propagation algorithm Old: They could be replaced by a global deterministic flag.
The problem of this solution is that, if cudnn chooses different deterministic algos, it can fail repeatability. I think it is good to have another mechanism to let user to select cudnn algorithm directly if available. |
Hi @DickJC123 , I have little knowledge on the regression test in mxnet. Could you please let me know how you ran the test? Thank you! |
I often use tests/jenkins/run_test_ubuntu.sh to compile MXNet and run the regression tests. You may need to set DEV=0 in that script to get past compile warnings treated as errors. |
Hello @wenyangchu @DickJC123 |
+1. Sometimes we need to be able to reproduce results even at the cost of worse performance. Is there a plan or next step? Thanks. |
@wenyangchu I have encountered the same problem and your solution will be very helpful to us. Would you like to continue working on your PR (#11361) in which case I will be glad to review and help to merge it. Otherwise, if you are occupied with other commitments, I also wouldn't mind making this change. Please kindly suggest. Thanks! |
@wenyangchu Since I have not got the response from you, I took the liberty to implement this feature in MXNet due to other urgent requests. Thank you very much for your investigation and suggestion. You are more than welcome to review my PR and provide other valuable suggestions. |
@apeforest Thanks for taking action to make it happen. Sorry the notification was sent to my gmail and filtered. |
Hi all,
I see other frameworks such as pytorch have some actions to have a flag to force cudnn to only use deterministic algorithm, specially for convolution and maxpooling:
like in pytorch:
example
Some references to cudnn:
maxpooling
cudnnConvolutionBackwardData
cudnnConvolutionBwdFilterAlgo_t
I am doing a medical product and reproducibility is an issue. Therefore, this feature will be very important for me.
One solution is to have a flag like MXNET_CUDNN_AUTOTUNE_DEFAULT.
Anyone is doing it? I have reviewed the current code and want to do it.
Any suggestion or better solution?
Thanks
The text was updated successfully, but these errors were encountered: