Possible memory leak in `save` #5401

upcFrost · 2021-04-11T17:19:52Z

Hi,

I think there's a memory leak inside the save method, inside _ensure_mutable -> _copy to be precise. After the image is saved, the memory is not returned to the OS even if the gc is called manually.

In the example below, please note that those tiff files are different. If same file is used over and over, the memory consumption does not increase, which makes me think the problem is caused either by memory fragmentation, or by some internal caching/buffers.

For multipage files, it seem to happen during the last page processing (during the last "save").

im.close at the end does decrease memory use, but only marginally compared to the increase reported for save

update: It seem to happen inside the TiffPlugin load method, when the decode is called. Calling load without even saving the image causes memory to jump without going back. Which is fine as the op is lazy, but this memory seems to be never marked as free, which doesn't look right

What did you do?

Trying to save an image (png or tiff first page) as jpeg/png

What did you expect to happen?

Save and mark the memory used during the save as free

What actually happened?

The memory is still reported as used

What are your OS, Python and Pillow versions?

OS: MacOS/Debian
Python: 3.7.9
Pillow: 8.1.1-8.2.0

import gc
import tempfile
from pathlib import Path

from PIL import Image
from memory_profiler import profile


@profile
def _do(i: int):
    image = Path(__file__).parent / f'{i}.tiff'
    im = Image.open(str(image))
    with tempfile.TemporaryDirectory() as tmpdir:
        im.save(f'{tmpdir}/{i}', "JPEG")
        im.close()
    gc.collect()


@profile
def test_save():
    _do(1)
    _do(2)
    _do(3)
    _do(4)

The text was updated successfully, but these errors were encountered:

wiredfool · 2021-04-12T08:16:09Z

Does the memory continue to increase if it is run in a loop of 10 to 1000 times? What is the magnitude of the memory change?

It's highly unlikely that there is an actual memory leak of O(image size) in pillow. However, there are a few things that may lead to the appearance of one.

The biggest is that pillow uses a memory allocator that does not release image chunks back to the OS, but rather retains them and re-uses them for subsequent images. Note that this is not using the python memory allocator or python objects, so the internal python GC isn't ever used for this memory.

Second, at the point where gc is run above, im is still in scope, so it's not going to be garbage collected. I know that you're expecting the "closed" memory to be released, but this indicates that object lifetimes are subtle.

Complicating things a bit -- python based profiling doesn't tend to work well, as most of the heavyweight operations and allocations are in the C layer. If you want a good view of what's really going on, valgrind --tool=massif is the way to figure it out.

upcFrost · 2021-04-12T08:45:06Z

Hi, thanks for answering, and, well, sorry for this "stream of consciousness", I was writing it while digging through the code 😅

If running in a loop 10 to 1000 times, the memory does not increase if the image is the same. Using different images, it looks like it goes up to the max point ever consumed for any of those images plus some extra, and then it either stops or continues to raise marginally. Which looks exactly like what you wrote:

pillow uses a memory allocator that does not release image chunks back to the OS, but rather retains them and re-uses them for subsequent images

About close, gc collect, and image object scope - my bad, bad example. But even with a bunch of nested methods and del calls to ensure that the object is definitely done for, it is still just as you said - the memory is kept to be reused.

My problem arised mainly because not returning memory but rather keeping it as a buffer for future use is pretty much ok unless you use it in a multi-tenant environment (e.g. swarm) with slight memory overcommitment, and unless you use it behind some WSGI server which tries to fire multiple workers, meaning that each worker will try to keep this "max consumption" buffer, occupying much more memory than it actually uses. There are of course ways how to deal with this behaviour on process/WSGI server level, but I'd say it was rather unexpected.

That said, do you know if there is any switch/option/call to force Pillow's allocator to release memory back to the OS rather than keeping it for further use?

wiredfool · 2021-04-12T12:34:31Z

I think you can set the number of retained blocks with the environment variable PILLOW_BLOCKS_MAX https://github.com/python-pillow/Pillow/blob/master/src/PIL/Image.py#L3265, and that will affect the number of allocated blocks retained as a memory pool.

upcFrost · 2021-04-12T12:38:18Z

Ok, thank you, I'll try experimenting with this var

upcFrost · 2021-04-13T09:36:07Z

First, about block size. I'm using this test in addition to the real setup to see how memory will behave

import gc
import os
from pathlib import Path

import memory_profiler

os.environ['PILLOW_BLOCK_SIZE'] = '1m'
os.environ['PILLOW_BLOCKS_MAX'] = '5'

from PIL import Image


@memory_profiler.profile
def _do(f: str):
    with open(f, 'rb') as fp:
        with Image.open(fp) as im:
            im.load()
    gc.collect()


@memory_profiler.profile
def test_run():
    _do(str(Path(__file__).parent / '1.png'))
    _do(str(Path(__file__).parent / '2.png'))
    _do(str(Path(__file__).parent / '3.png'))
    gc.collect()

With SIZE=1m and MAX=5, memory still jumps way higher than 5m. On the real setup, it bumps into 450m hard limit pretty fast (for the real setup, MAX=150). Actually on this synthetic test above, the memory jumps higher than with SIZE=16m.
With SIZE=16m, the memory consumption actually goes down in the test, but on the real setup it still climbs up.
With SIZE=64m, memory_profiler reports memory going down during the load, which doesn't look right (will need to check it in valgrind), but on the real setup it still goes up. What is a bit strange is that it doesn't go up by 64mb, the profile is almost similar to the 16m one.

Next, about valgrind. I ran some tests on the test env, and it actually reports very low heap size, as low as 30mb at the peak. But the VM reports 600mb consumption, which dumbfolds me quite a bit. VM reported consumption jumps 10-20mb each time when im.load() is called, but massif doesn't show any sign of memory consumption spike. So it looks like some massive memory fragmentation is happening, and I'm honestly not sure if it's related to Python, Pillow, uWSGI, or Docker.

upcFrost · 2021-04-16T09:25:42Z

Ok, it seems that uwsgi has it's own opinion on how to fork threads and manage their memory, which does not really work well with Pillow allocator

radarhere · 2021-04-22T03:16:51Z

@upcFrost so what would you like to happen with this issue?

upcFrost · 2021-04-22T09:58:26Z

close probably, and leave here for history. It's not a bug in Pillow, it's more like two design decisions (Pillow and uwsgi) not working well together

radarhere · 2021-04-22T09:59:02Z

Thanks

radarhere added the Memory label Apr 11, 2021

radarhere closed this as completed Apr 22, 2021

gmischler mentioned this issue Aug 23, 2022

Possible memory leak py-pdf/fpdf2#501

Closed

wiredfool mentioned this issue Apr 2, 2024

Memory of copied PIL Images is not released #7935

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible memory leak in `save` #5401

Possible memory leak in `save` #5401

upcFrost commented Apr 11, 2021 •

edited

Loading

wiredfool commented Apr 12, 2021

upcFrost commented Apr 12, 2021

wiredfool commented Apr 12, 2021

upcFrost commented Apr 12, 2021

upcFrost commented Apr 13, 2021 •

edited

Loading

upcFrost commented Apr 16, 2021

radarhere commented Apr 22, 2021

upcFrost commented Apr 22, 2021

radarhere commented Apr 22, 2021

Possible memory leak in save #5401

Possible memory leak in save #5401

Comments

upcFrost commented Apr 11, 2021 • edited Loading

What did you do?

What did you expect to happen?

What actually happened?

What are your OS, Python and Pillow versions?

wiredfool commented Apr 12, 2021

upcFrost commented Apr 12, 2021

wiredfool commented Apr 12, 2021

upcFrost commented Apr 12, 2021

upcFrost commented Apr 13, 2021 • edited Loading

upcFrost commented Apr 16, 2021

radarhere commented Apr 22, 2021

upcFrost commented Apr 22, 2021

radarhere commented Apr 22, 2021

Possible memory leak in `save` #5401

Possible memory leak in `save` #5401

upcFrost commented Apr 11, 2021 •

edited

Loading

upcFrost commented Apr 13, 2021 •

edited

Loading