-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible memory leak in save
#5401
Comments
Does the memory continue to increase if it is run in a loop of 10 to 1000 times? What is the magnitude of the memory change? It's highly unlikely that there is an actual memory leak of O(image size) in pillow. However, there are a few things that may lead to the appearance of one. The biggest is that pillow uses a memory allocator that does not release image chunks back to the OS, but rather retains them and re-uses them for subsequent images. Note that this is not using the python memory allocator or python objects, so the internal python GC isn't ever used for this memory. Second, at the point where gc is run above, im is still in scope, so it's not going to be garbage collected. I know that you're expecting the "closed" memory to be released, but this indicates that object lifetimes are subtle. Complicating things a bit -- python based profiling doesn't tend to work well, as most of the heavyweight operations and allocations are in the C layer. If you want a good view of what's really going on, |
Hi, thanks for answering, and, well, sorry for this "stream of consciousness", I was writing it while digging through the code 😅 If running in a loop 10 to 1000 times, the memory does not increase if the image is the same. Using different images, it looks like it goes up to the max point ever consumed for any of those images plus some extra, and then it either stops or continues to raise marginally. Which looks exactly like what you wrote:
About close, gc collect, and image object scope - my bad, bad example. But even with a bunch of nested methods and My problem arised mainly because not returning memory but rather keeping it as a buffer for future use is pretty much ok unless you use it in a multi-tenant environment (e.g. swarm) with slight memory overcommitment, and unless you use it behind some WSGI server which tries to fire multiple workers, meaning that each worker will try to keep this "max consumption" buffer, occupying much more memory than it actually uses. There are of course ways how to deal with this behaviour on process/WSGI server level, but I'd say it was rather unexpected. That said, do you know if there is any switch/option/call to force Pillow's allocator to release memory back to the OS rather than keeping it for further use? |
I think you can set the number of retained blocks with the environment variable |
Ok, thank you, I'll try experimenting with this var |
First, about block size. I'm using this test in addition to the real setup to see how memory will behave import gc
import os
from pathlib import Path
import memory_profiler
os.environ['PILLOW_BLOCK_SIZE'] = '1m'
os.environ['PILLOW_BLOCKS_MAX'] = '5'
from PIL import Image
@memory_profiler.profile
def _do(f: str):
with open(f, 'rb') as fp:
with Image.open(fp) as im:
im.load()
gc.collect()
@memory_profiler.profile
def test_run():
_do(str(Path(__file__).parent / '1.png'))
_do(str(Path(__file__).parent / '2.png'))
_do(str(Path(__file__).parent / '3.png'))
gc.collect() With Next, about valgrind. I ran some tests on the test env, and it actually reports very low heap size, as low as 30mb at the peak. But the VM reports 600mb consumption, which dumbfolds me quite a bit. VM reported consumption jumps 10-20mb each time when |
Ok, it seems that uwsgi has it's own opinion on how to fork threads and manage their memory, which does not really work well with Pillow allocator |
@upcFrost so what would you like to happen with this issue? |
close probably, and leave here for history. It's not a bug in Pillow, it's more like two design decisions (Pillow and uwsgi) not working well together |
Thanks |
Hi,
I think there's a memory leak inside the
save
method, inside_ensure_mutable
->_copy
to be precise. After the image is saved, the memory is not returned to the OS even if the gc is called manually.In the example below, please note that those tiff files are different. If same file is used over and over, the memory consumption does not increase, which makes me think the problem is caused either by memory fragmentation, or by some internal caching/buffers.
For multipage files, it seem to happen during the last page processing (during the last "save").
im.close
at the end does decrease memory use, but only marginally compared to the increase reported forsave
update: It seem to happen inside the TiffPlugin
load
method, when thedecode
is called. Calling load without even saving the image causes memory to jump without going back. Which is fine as the op is lazy, but this memory seems to be never marked as free, which doesn't look rightWhat did you do?
Trying to save an image (png or tiff first page) as jpeg/png
What did you expect to happen?
Save and mark the memory used during the save as free
What actually happened?
The memory is still reported as used
What are your OS, Python and Pillow versions?
The text was updated successfully, but these errors were encountered: