-
Notifications
You must be signed in to change notification settings - Fork 6.8k
GPU memory usage keeps increasing even hybridize with static_alloc when used in flask debug mode after mxnet 1.6.0post0. #19159
Comments
Hi @kohillyang , I think it is not related to MXNet. When there is a new connection, the library flask will create a new python process to handle the connection, which creates a new copy of MXNet instance To validate it, you can print the id of the predictor by |
@wkcn but even if flask has created a new process, the GPU memory should be freed once the process ends. And the predictor is created in the main function, which should only be called once and has only one predictor instance. On the other side, if the main process has initialized a CUDA environment, the mxnet in the subprocess will fail when inference because their CUDA file descriptor can not be shared between the main process and the sub-process. BTW. , the pid of the process and the id of the predictor remain unchanged. I print them using the following codes: print(id(self))
print(os.getpid()) PS: Thread-safe is of importance because in some time you need to implement a Block with asnumpy, and it is too hard to implement all blocks as HybridBlock and as an asynchronous way. In pytorch it is not a problem because we have DataParallel. It will start a thread for each CPU instance and gather the results, but this operation is not officially supported by mxnet because at least there are something like #13199 which need workarounds. |
@wkcn predictor is created by |
@kohillyang so you are creating a new predictor in every HTTP call? Thus yes, a new Block is created in every HTTP call and due to #18328 the parameter of the Block won't be deallocated. https://github.com/apache/incubator-mxnet/pull/18328/files only contains Python changes. Would you like to try applying the changes to your MXNet files and see if the memory leak goes away. Thank you |
Why do you think I'm creating a new predictor in each call? Apparently there is only one instance for Predictor. |
Nevermind, I didn't read your |
Thus this is unrelated to #18328 |
Description
Hello, I'm using flask with mxnet to write a server. Since it is a web app, we want the GPU memory is fully static allocated.
However, as the title said, I found the GPU memory usage keeps increasing and then raise a OOM when the version of mxnet is 1.6.0post0 and 1.7.0, and if you are using mxnet 1.5.1, then all things are good. Since Flask debug mode uses multi-threading, I think it may be caused by some calls which are not thread-safe.
To Reproduce
This is a naive fLask server:
And just run the following code to request the server:
Environment
I'm using flask 1.0.2 and tornado 5.1, but I think it is independent of the versions of flask and tornado.
We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:
paste outputs here
The text was updated successfully, but these errors were encountered: