Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory not being freed with Image.fromarray #8549

Closed
jmspereira opened this issue Nov 11, 2024 · 13 comments
Closed

Memory not being freed with Image.fromarray #8549

jmspereira opened this issue Nov 11, 2024 · 13 comments

Comments

@jmspereira
Copy link

What did you do?

Hey everyone,
I have an application that uses pillow to encode numpy arrays as jpegs, however I am seeing a strange behavior regarding the memory usage of that application.

What did you expect to happen?

All allocated memory be freed.

What actually happened?

There is memory that is not freeded.

What are your OS, Python and Pillow versions?

  • OS: ubuntu 22.04
  • Python: 3.10.12
  • Pillow: 11.0.0
--------------------------------------------------------------------
Pillow 11.0.0
Python 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0]
--------------------------------------------------------------------
--- PIL CORE support ok, compiled for 11.0.0
--- TKINTER support ok, loaded 8.6
--- FREETYPE2 support ok, loaded 2.13.2
--- LITTLECMS2 support ok, loaded 2.16
--- WEBP support ok, loaded 1.4.0
--- JPEG support ok, compiled for libjpeg-turbo 3.0.4
--- OPENJPEG (JPEG2000) support ok, loaded 2.5.2
--- ZLIB (PNG/ZIP) support ok, loaded 1.2.11
--- LIBTIFF support ok, loaded 4.6.0
--- RAQM (Bidirectional Text) support ok, loaded 0.10.1, fribidi 1.0.8, harfbuzz 10.0.1
*** LIBIMAGEQUANT (Quantization method) support not installed
--- XCB (X protocol) support ok
--------------------------------------------------------------------

Code that reproduces the problem:

import time
import numpy as np
from io import BytesIO
from PIL import Image


def open_pillow_image():
    random_image = (np.random.rand(720, 1280, 3) * 255).astype(np.uint8)

    with BytesIO() as output, Image.fromarray(random_image) as pillow_image:
        pillow_image.save(output, format="jpeg")


def main():
    print("before")
    ### Memory here is around 60mbs...
    time.sleep(10)
    open_pillow_image()

    ### Memory here is around 65mbs...
    print("after")
    time.sleep(1000)


if __name__ == '__main__':
    main()
@Yay295
Copy link
Contributor

Yay295 commented Nov 11, 2024

Does anything change if you add

import gc
gc.collect()

after open_pillow_image()?

@radarhere
Copy link
Member

#7935 (comment)

Pillow's memory allocator doesn't necessarily release the memory in the pool back as soon as an image is destroyed, as it uses that memory pool for future allocations. See Storage.c (https://github.com/python-pillow/Pillow/blob/main/src/libImaging/Storage.c#L310) for the implementation.

@jmspereira
Copy link
Author

jmspereira commented Nov 12, 2024

@Yay295, calling the garbage collector explicitly does not make any difference.

@radarhere according to the documentation:

"There is now a memory pool to contain a supply of recently freed blocks, which can then be reused without going back to the OS for a fresh allocation. This caching of free blocks is currently disabled by default (...)" (https://pillow.readthedocs.io/en/stable/reference/block_allocator.html)

It appears that the caching of free blocks should be disabled by default, and tweaking with the PILLOW_BLOCKS_MAX as mentioned in the issue that you reference does not make any difference.

@radarhere
Copy link
Member

I see, "caching of free blocks" refers to

memory_get_block(ImagingMemoryArena arena, int requested_size, int dirty) {
ImagingMemoryBlock block = {NULL, 0};
if (arena->blocks_cached > 0) {
// Get block from cache
arena->blocks_cached -= 1;
block = arena->blocks_pool[arena->blocks_cached];
// Reallocate if needed
if (block.size != requested_size) {
block.ptr = realloc(block.ptr, requested_size);
}
if (!block.ptr) {
// Can't allocate, free previous pointer (it is still valid)
free(arena->blocks_pool[arena->blocks_cached].ptr);
arena->stats_freed_blocks += 1;
return block;
}
if (!dirty) {
memset(block.ptr, 0, requested_size);
}
arena->stats_reused_blocks += 1;
if (block.ptr != arena->blocks_pool[arena->blocks_cached].ptr) {
arena->stats_reallocated_blocks += 1;
}

By default, the following is used instead.

} else {
if (dirty) {
block.ptr = malloc(requested_size);
} else {
block.ptr = calloc(1, requested_size);
}
arena->stats_allocated_blocks += 1;
}
block.size = requested_size;
return block;
}

Testing further, I think the issue doesn't occur only when loading the array, but rather when saving.

@radarhere
Copy link
Member

If I suggest that calling JpegImagePlugin directly improves the situation, do you agree?

from PIL import JpegImagePlugin
with BytesIO() as output, Image.fromarray(random_image) as pillow_image:
    pillow_image.encoderinfo = {}
    JpegImagePlugin._save(pillow_image, output, "filename")

@jmspereira
Copy link
Author

jmspereira commented Nov 12, 2024

Hum, It doesn't seem to make any difference

@radarhere
Copy link
Member

Do you agree that saving is the problem? As in, I think this code should be fine.

with BytesIO() as output, Image.fromarray(random_image) as pillow_image:
    pass

@jmspereira
Copy link
Author

jmspereira commented Nov 12, 2024

Hum, I do not think so. If I run this:

import time
from io import BytesIO

import numpy as np
from PIL import Image


def open_pillow_image():
    random_image = (np.random.rand(720, 1280, 3) * 255).astype(np.uint8)

    with BytesIO() as output, Image.fromarray(random_image) as pillow_image:
        pass


def main():
    print("before")
    time.sleep(10)
    open_pillow_image()
    print("after")
    time.sleep(1000)


if __name__ == '__main__':
    main()

The memory used by the script is larger after opening the image.

@radarhere
Copy link
Member

Just to be sure, if you remove Pillow, does the problem go away?

import time
from io import BytesIO

import numpy as np


def open_pillow_image():
    random_image = (np.random.rand(720, 1280, 3) * 255).astype(np.uint8)

    with BytesIO() as output:
        pass


def main():
    print("before")
    time.sleep(10)
    open_pillow_image()
    print("after")
    time.sleep(1000)


if __name__ == '__main__':
    main()

@jmspereira
Copy link
Author

Yes, the problem does not exist without pillow.

@wiredfool
Copy link
Member

I've run this under massif, starting with the first example. I've also run with 100 loops, commenting out the write and using smaller images, and passing the random value in, not writing the jpeg. Valgrind/massif ascii art to follow.

  • Running loops doesn't change the memory, i.e., there don't appear to be leaks. This is running 10 iterations, with a slow loop of +=1 between the trials.


    MB
31.48^                              #                                         
     |                              #                                         
     |                              #                                         
     |                              #                                         
     |                              #                                         
     |                              #                                         
     |                              #                                         
     |                              #                                         
     |                              #                                         
     |                              #                                         
     |                              #                                         
     |                              #                                         
     |                              #                                         
     |                              #:::::::::::::::::::::::@@@@::: ::: :::   
     |                              #:  :   :   :   :   :   @   :   :   :     
     |                              #:  :   :   :   :   :   @   :  ::  ::  @  
     |                              #:  :   :   :   :   :   @   :  ::  ::  @  
     |                        @:@@@@#:  :   :   :   :   :   @   :  ::  ::  @@ 
     |                  :@::::@:@   #:  :   :   :   :   :   @   :  ::  ::  @@@
     |               :@::@::::@:@   #:  :   :   :   :   :   @   :  ::  ::  @@@
   0 +----------------------------------------------------------------------->Gi
     0                                                                   3.609
  • The peak usage is coming from the numpy manipulation of the array. This is the same one, without doing any pillow. Oddly here, the numpy data remains fully in memory, where loading from it appears to unload a large portion. Perhaps it's lazily evaluated?
    MB
31.47^                                    ##################################  
     |                                    #                                   
     |                                    #                                   
     |                                    #                                   
     |                                    #                                   
     |                                    #                                   
     |                                    #                                   
     |                                    #                                   
     |                                    #                                   
     |                                    #                                   
     |                                    #                                   
     |                                    #                                   
     |                                    #                                   
     |                                    #                                   
     |                                    #                                   
     |                                    #                                   
     |                                    #                                   
     |                             @::::::#                                 : 
     |                         @@:@@:::   #                                 ::
     |                     :@@:@@:@@:::   #                                 :@
   0 +----------------------------------------------------------------------->Gi
     0                                                                   3.857
  • It's really hard to get valgrind to sample in a sleep, but running tight loops can make it work. sleep, no, for _ in range(10000): i+=1, ok.
  • There's a large ramp of memory that looks a lot like numpy being loaded into the system. This is with a 1x1
    MB
5.714^                                 #####################################  
     |                                 #                                    : 
     |                                 #                                    : 
     |                              @@:#                                    @ 
     |                              @@:#                                    @ 
     |                             @@@:#                                    @:
     |                             @@@:#                                    @:
     |                          @:@@@@:#                                    @@
     |                          @:@@@@:#                                    @@
     |                        @@@:@@@@:#                                    @@
     |                        @@@:@@@@:#                                    @@
     |                       :@@@:@@@@:#                                    @@
     |                       @@@@:@@@@:#                                    @@
     |                      :@@@@:@@@@:#                                    @@
     |                     @:@@@@:@@@@:#                                    @@
     |                     @:@@@@:@@@@:#                                    @@
     |  @::::::::::::::::::@:@@@@:@@@@:#                                    @@
     | :@:              : :@:@@@@:@@@@:#                                    @@
     |::@:              : :@:@@@@:@@@@:#                                    @@
     |::@:              : :@:@@@@:@@@@:#                                    @@
   0 +----------------------------------------------------------------------->Gi
     0                                                                   3.579

My suspicion here is that it's actually the code that's being loaded. 5MB is in the realm of the size I'd expect.

This is the massif run that from the original code, minus the trailing 1000 second sleep. It has all of the significant allocations in the process, at a few shapshots.
massif_run.zip

@radarhere
Copy link
Member

@jmspereira did that answer your question?

Copy link

github-actions bot commented Dec 7, 2024

Closing this issue as no feedback has been received.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants