Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak in Image? #74

Open
isparks opened this issue Mar 7, 2020 · 6 comments
Open

Memory Leak in Image? #74

isparks opened this issue Mar 7, 2020 · 6 comments

Comments

@isparks
Copy link

isparks commented Mar 7, 2020

I have a document where I add a few hundred ~300Kb images using the Image tag. These are PNG images with dimensions 1920x1080.

When I run my document the memory usage goes off the charts. The following program an example image may demonstrate. On my machine the memory usage goes to 3+GB.
283kb_file
A similar program using platypus and reportlab Image directly barely registers at all.

# Test case that takes my machine to ~3 GB of RAM used to generate a 450kb PDF file

from z3c.rml import rml2pdf

images = []
for i in range(500):
    images.append('<img width="15cm" preserveAspectRatio="true" src="/path/to/283kb_file.jpg"></img>')

images = "\n".join(images)

content = f"""<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE document SYSTEM "rml.dtd">
<document filename="Test.pdf" xmlns:doc="http://namespaces.zope.org/rml/doc">

    <stylesheet/>

    <template>
         <pageTemplate id="main" pagesize="(595,842)">
            <frame id="first" x1="2cm" y1="2cm" width="17cm" height="26cm"/>
         </pageTemplate>
    </template>

    <story>
        { images }
    </story>
</document>
"""


pdf = rml2pdf.parseString(content)

f = open("out.pdf","wb")
f.write(pdf.read())
f.close()
@isparks
Copy link
Author

isparks commented Mar 7, 2020

The inline image is the example I used in my program above.

@strichter
Copy link
Contributor

I do not think that it is a memory leak. I bet you that if you call GC after generating the file and report memory it will be released -- though you have to be careful in Linux interpreting OS memory tool results and use Python tools instead.

While the z3c.rml code is careful to not load images into memory or cache every image, it does open the file 500 times and sends it to reportlab.lib.utils.ImageReader.

@isparks Do you have the pure Reportlab example too?

@iansparks
Copy link

It's been a long while. I think this is the test program I used. The path to file is the same as used above. When I run it, memory barely moves.


from reportlab.platypus.flowables import Image
from reportlab.platypus import SimpleDocTemplate
from reportlab.lib.utils import ImageReader
from reportlab.lib.units import cm

image_path = "/home/isparks/Desktop/stars.jpg"

doc = SimpleDocTemplate('test.pdf')
story = []
for i in range(10000):

   args = dict(width=15*cm)

   with open(image_path, "rb") as f:
      img = ImageReader(f)
      iw, ih = img.getSize()
      if 'width' in args and 'height' not in args:
         args['height'] = args['width'] * ih / iw
      elif 'width' not in args and 'height' in args:
         args['width'] = args['height'] * iw / ih
      elif 'width' in args and 'height' in args:
         # In this case, the width and height specify a bounding box
         # and the size of the image within that box is maximized.
         if args['width'] * ih / iw <= args['height']:
            args['height'] = args['width'] * ih / iw
         elif args['height'] * iw / ih < args['width']:
            args['width'] = args['height'] * iw / ih
         else:
            # This should not happen.
            raise ValueError('Cannot keep image in bounding box.')

   story.append(Image(image_path, **args))


doc.build(story)

@iansparks
Copy link

Not saying the above is good code, clearly I don't need to be messing with aspect ratios for the same image over and over but I think I was trying to recreate conditions in the rml code. As I said, has been a while.

@strichter
Copy link
Contributor

Yeah, I think because I keep a file open some caching kicks in. Also, I rely on GC to close files, so not ideal either.

@iansparks
Copy link

iansparks commented Dec 8, 2020

You were able to see a difference? I expect that the use of XML causes some memory inflation but it seemed too much.

I did a bit more digging. Here is some code I ended up with. Now instead of <img... in RML we use <tg_img...

Most of it is copied from Image but check the line:

        # Magic line that ensures that files are only opened when needed, huge memory saving
        args["lazy"] = 2

Might be voodoo but maybe it can help? I recall that there was no way to pass lazy=2 into the use of <img>

class ITGImage(interfaces.IRMLDirectiveSignature):
    src = attr.Text(
        title=u"Image Source", description=u"The file that is used to extract the image data.", required=True
    )

    width = attr.Measurement(title=u"Image Width", description=u"The width of the image.", required=False)

    height = attr.Measurement(title=u"Image Height", description=u"The height the image.", required=False)

    preserveAspectRatio = attr.Boolean(
        title=u"Preserve Aspect Ratio",
        description=(
            u"If set, the aspect ratio of the image is kept. When "
            u"both, width and height, are specified, the image "
            u"will be fitted into that bounding box."
        ),
        default=False,
        required=False,
    )

    mask = attr.Color(
        title=u"Mask",
        description=u'The color mask used to render the image, or "auto" to use the alpha channel if available.',
        default="auto",
        required=False,
        acceptAuto=True,
    )

    align = attr.Choice(
        title=u"Alignment",
        description=u"The alignment of the image within the frame.",
        choices=interfaces.ALIGN_TEXT_CHOICES,
        required=False,
    )

    vAlign = attr.Choice(
        title=u"Vertical Alignment",
        description=u"The vertical alignment of the image.",
        choices=interfaces.VALIGN_TEXT_CHOICES,
        required=False,
    )


class TGImage(Flowable):
    signature = ITGImage
    klass = reportlab.platypus.flowables.Image
    attrMapping = {"src": "filename", "align": "hAlign"}

    def process(self):
        args = dict(self.getAttributeValues(attrMapping=self.attrMapping))
        preserveAspectRatio = args.pop("preserveAspectRatio", False)
        if preserveAspectRatio:
            img = utils.ImageReader(args["filename"])
            iw, ih = img.getSize()
            if "width" in args and "height" not in args:
                args["height"] = args["width"] * ih / iw
            elif "width" not in args and "height" in args:
                args["width"] = args["height"] * iw / ih
            elif "width" in args and "height" in args:
                # In this case, the width and height specify a bounding box
                # and the size of the image within that box is maximized.
                if args["width"] * ih / iw <= args["height"]:
                    args["height"] = args["width"] * ih / iw
                elif args["height"] * iw / ih < args["width"]:
                    args["width"] = args["height"] * iw / ih
                else:
                    # This should not happen.
                    raise ValueError("Cannot keep image in bounding box.")
            else:
                # No size was specified, so do nothing.
                pass

        vAlign = args.pop("vAlign", None)
        hAlign = args.pop("hAlign", None)

        # Magic line that ensures that files are only opened when needed, huge memory saving
        args["lazy"] = 2

        img = self.klass(**args)
        if hAlign:
            img.hAlign = hAlign
        if vAlign:
            img.vAlign = vAlign
        self.parent.flow.append(img)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants