-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exposing this API as is would cause everything to stop for memory release which could be problematic for an asynchronous execution environment. Because of the impact on performance for not caching memory, a much more preferable option is to provide the option to completely turn off memory caching, so that people can use it and be sure that mxnet is not holding extra memory for special use cases.
@vladoovtcharov thanks for the contribution. Instead of exposing ReleaseAll, would you mind adding an environment variable for |
Hi @szha, thanks for the feedback. I'm not sure if I completely understood though. At least for my use case I was hoping to still use the memory pool/caching (for performance) but periodically pause and release the memory (so another process could use the gpu memory). I understand the release_all call could be expensive but it would not/should not be called regularly so overall it seems like it would effect performance less than using non pooled memory. |
@vladoovtcharov yes, they could use GPUDeviceStorage, and exposing the option in MXNET_GPU_MEM_POOL_TYPE is also fine. My general concern around exposing this type of knob at runtime is exactly to avoid supporting this type of use case, as it's hard to make any promise in this setting. For example, releasing objects in memory pool likely won't solve all your problem as there are other levels of caches. For now I think it's better to keep it as simple as on or off. |
I added an additional MXNET_GPU_MEM_POOL_TYPE (Unpooled) but I am worried that it is not the best thing to do. It seems like using a pooled memory type with the ability to call ReleaseAll would be preferable in most use cases. Also I'm not sure why the pooled memory managers would have any difference in what can be promised. |
The problem is in the statement "the memory is freed". Not all memories are freed, and users shouldn't be concerned with how caching works in mxnet. |
I'm happy to take this into further consideration in 2.0 design (#9686). For now, let's not add the C API yet. |
@szha ok sounds good |
@vladoovtcharov thanks for understanding. Shall we get the Unpooled option merged first? If so, it would be great if you could look into the CI problems. |
cfd530b
to
ec4eb73
Compare
Would it help to split into another pull request for just the unpooling? (One more formatting error to fix...) |
Yes, feel free to do so if it's easier :) |
@mxnet-label-bot add [pr-work-in-progress] |
ec4eb73
to
2bc55f2
Compare
@vladoovtcharov Did you get a chance to work on changes requested by @szha ? |
@vladoovtcharov Gentle ping... |
@vladoovtcharov Can you please resolve conflicts so that we can move forward with this PR? |
2bc55f2
to
b72472b
Compare
I went ahead and split into two merge requests, as requested. The second is here (#14716) |
b16a96c
to
98cbc92
Compare
I'll update the c-api naming/documentation as well |
@vladoovtcharov Thanks for your contribution! Did you get a chance to update the docs ? Thanks! |
@karan6181 yes everything should be checked in and ready to be merged |
* Allow releasing all gpu memory * fix white space * stuck ci checks * Fix whitespace * Rename release_all -> empty_cache and provide documentation * fix indentation * Rename c_api's MXStorageReleaseAll -> MXStorageEmptyCache and clarify documention * nudge ci * Update context.py
Allows releasing memory from pool from c and python api.
(Should resolve feature request #13482)