Memory Leak in Image? #74

isparks · 2020-03-07T22:24:03Z

I have a document where I add a few hundred ~300Kb images using the Image tag. These are PNG images with dimensions 1920x1080.

When I run my document the memory usage goes off the charts. The following program an example image may demonstrate. On my machine the memory usage goes to 3+GB.

A similar program using platypus and reportlab Image directly barely registers at all.

# Test case that takes my machine to ~3 GB of RAM used to generate a 450kb PDF file

from z3c.rml import rml2pdf

images = []
for i in range(500):
    images.append('<img width="15cm" preserveAspectRatio="true" src="/path/to/283kb_file.jpg"></img>')

images = "\n".join(images)

content = f"""<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE document SYSTEM "rml.dtd">
<document filename="Test.pdf" xmlns:doc="http://namespaces.zope.org/rml/doc">

    <stylesheet/>

    <template>
         <pageTemplate id="main" pagesize="(595,842)">
            <frame id="first" x1="2cm" y1="2cm" width="17cm" height="26cm"/>
         </pageTemplate>
    </template>

    <story>
        { images }
    </story>
</document>
"""


pdf = rml2pdf.parseString(content)

f = open("out.pdf","wb")
f.write(pdf.read())
f.close()

isparks · 2020-03-07T22:24:55Z

The inline image is the example I used in my program above.

strichter · 2020-12-08T11:04:01Z

I do not think that it is a memory leak. I bet you that if you call GC after generating the file and report memory it will be released -- though you have to be careful in Linux interpreting OS memory tool results and use Python tools instead.

While the z3c.rml code is careful to not load images into memory or cache every image, it does open the file 500 times and sends it to reportlab.lib.utils.ImageReader.

@isparks Do you have the pure Reportlab example too?

iansparks · 2020-12-08T11:20:51Z

It's been a long while. I think this is the test program I used. The path to file is the same as used above. When I run it, memory barely moves.


from reportlab.platypus.flowables import Image
from reportlab.platypus import SimpleDocTemplate
from reportlab.lib.utils import ImageReader
from reportlab.lib.units import cm

image_path = "/home/isparks/Desktop/stars.jpg"

doc = SimpleDocTemplate('test.pdf')
story = []
for i in range(10000):

   args = dict(width=15*cm)

   with open(image_path, "rb") as f:
      img = ImageReader(f)
      iw, ih = img.getSize()
      if 'width' in args and 'height' not in args:
         args['height'] = args['width'] * ih / iw
      elif 'width' not in args and 'height' in args:
         args['width'] = args['height'] * iw / ih
      elif 'width' in args and 'height' in args:
         # In this case, the width and height specify a bounding box
         # and the size of the image within that box is maximized.
         if args['width'] * ih / iw <= args['height']:
            args['height'] = args['width'] * ih / iw
         elif args['height'] * iw / ih < args['width']:
            args['width'] = args['height'] * iw / ih
         else:
            # This should not happen.
            raise ValueError('Cannot keep image in bounding box.')

   story.append(Image(image_path, **args))


doc.build(story)

iansparks · 2020-12-08T11:22:13Z

Not saying the above is good code, clearly I don't need to be messing with aspect ratios for the same image over and over but I think I was trying to recreate conditions in the rml code. As I said, has been a while.

strichter · 2020-12-08T12:41:53Z

Yeah, I think because I keep a file open some caching kicks in. Also, I rely on GC to close files, so not ideal either.

iansparks · 2020-12-08T12:57:41Z

You were able to see a difference? I expect that the use of XML causes some memory inflation but it seemed too much.

I did a bit more digging. Here is some code I ended up with. Now instead of <img... in RML we use <tg_img...

Most of it is copied from Image but check the line:

        # Magic line that ensures that files are only opened when needed, huge memory saving
        args["lazy"] = 2

Might be voodoo but maybe it can help? I recall that there was no way to pass lazy=2 into the use of <img>

class ITGImage(interfaces.IRMLDirectiveSignature):
    src = attr.Text(
        title=u"Image Source", description=u"The file that is used to extract the image data.", required=True
    )

    width = attr.Measurement(title=u"Image Width", description=u"The width of the image.", required=False)

    height = attr.Measurement(title=u"Image Height", description=u"The height the image.", required=False)

    preserveAspectRatio = attr.Boolean(
        title=u"Preserve Aspect Ratio",
        description=(
            u"If set, the aspect ratio of the image is kept. When "
            u"both, width and height, are specified, the image "
            u"will be fitted into that bounding box."
        ),
        default=False,
        required=False,
    )

    mask = attr.Color(
        title=u"Mask",
        description=u'The color mask used to render the image, or "auto" to use the alpha channel if available.',
        default="auto",
        required=False,
        acceptAuto=True,
    )

    align = attr.Choice(
        title=u"Alignment",
        description=u"The alignment of the image within the frame.",
        choices=interfaces.ALIGN_TEXT_CHOICES,
        required=False,
    )

    vAlign = attr.Choice(
        title=u"Vertical Alignment",
        description=u"The vertical alignment of the image.",
        choices=interfaces.VALIGN_TEXT_CHOICES,
        required=False,
    )


class TGImage(Flowable):
    signature = ITGImage
    klass = reportlab.platypus.flowables.Image
    attrMapping = {"src": "filename", "align": "hAlign"}

    def process(self):
        args = dict(self.getAttributeValues(attrMapping=self.attrMapping))
        preserveAspectRatio = args.pop("preserveAspectRatio", False)
        if preserveAspectRatio:
            img = utils.ImageReader(args["filename"])
            iw, ih = img.getSize()
            if "width" in args and "height" not in args:
                args["height"] = args["width"] * ih / iw
            elif "width" not in args and "height" in args:
                args["width"] = args["height"] * iw / ih
            elif "width" in args and "height" in args:
                # In this case, the width and height specify a bounding box
                # and the size of the image within that box is maximized.
                if args["width"] * ih / iw <= args["height"]:
                    args["height"] = args["width"] * ih / iw
                elif args["height"] * iw / ih < args["width"]:
                    args["width"] = args["height"] * iw / ih
                else:
                    # This should not happen.
                    raise ValueError("Cannot keep image in bounding box.")
            else:
                # No size was specified, so do nothing.
                pass

        vAlign = args.pop("vAlign", None)
        hAlign = args.pop("hAlign", None)

        # Magic line that ensures that files are only opened when needed, huge memory saving
        args["lazy"] = 2

        img = self.klass(**args)
        if hAlign:
            img.hAlign = hAlign
        if vAlign:
            img.vAlign = vAlign
        self.parent.flow.append(img)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Leak in Image? #74

Memory Leak in Image? #74

isparks commented Mar 7, 2020 •

edited

Loading

isparks commented Mar 7, 2020

strichter commented Dec 8, 2020

iansparks commented Dec 8, 2020

iansparks commented Dec 8, 2020

strichter commented Dec 8, 2020

iansparks commented Dec 8, 2020 •

edited

Loading

Memory Leak in Image? #74

Memory Leak in Image? #74

Comments

isparks commented Mar 7, 2020 • edited Loading

isparks commented Mar 7, 2020

strichter commented Dec 8, 2020

iansparks commented Dec 8, 2020

iansparks commented Dec 8, 2020

strichter commented Dec 8, 2020

iansparks commented Dec 8, 2020 • edited Loading

isparks commented Mar 7, 2020 •

edited

Loading

iansparks commented Dec 8, 2020 •

edited

Loading