SystemError: <built-in function TextPage_extractIMGINFO> returned a result with an exception set #2905

myhloli · 2023-12-18T10:47:56Z

Description of the bug

when i use the api page.get_image_rects，there is a SystemError: returned a result with an exception set

How to reproduce the bug

pdf file:
https://drive.google.com/file/d/1yCwGVTOwRSXvHCzNC0C0dsVJLQI-p2ZM/view
page_id:115（page_id start from 0）

pdf_docs = fitz.open("pdf", pdf_bytes)
for page_id, page in enumerate(pdf_docs):
    page_imgs = page.get_images()
    for img in page_imgs:
    recs = page.get_image_rects(img, transform=True)

my code like this demo

    recs = page.get_image_rects(img, transform=True)
           │    │               └ (2052, 0, 1895, 1248, 8, 'DeviceRGB', '', 'Im0', 'DCTDecode')
           │    └ <function get_image_rects at 0x0000027B025F5E40>
           └ page 115 of <memory, doc# 1>
    infos = page.get_image_info(hashes=True)
            │    └ <function get_image_info at 0x0000027B025F5DA0>
            └ page 115 of <memory, doc# 1>
    imginfo = tp.extractIMGINFO(hashes=hashes)
              │  │                     └ True
              │  └ <function TextPage.extractIMGINFO at 0x0000027B025E3D80>
              └ <fitz.fitz.TextPage; proxy of <Swig Object of type 'TextPage *' at 0x0000027B027579F0> >
    return _fitz.TextPage_extractIMGINFO(self, hashes)
           │     │                       │     └ True
           │     │                       └ <fitz.fitz.TextPage; proxy of <Swig Object of type 'TextPage *' at 0x0000027B027579F0> >
           │     └ <built-in function TextPage_extractIMGINFO>
           └ <module 'fitz._fitz' from

there is stack when i meet this error

PyMuPDF version

1.23.7

Operating system

Windows

Python version

3.11

The text was updated successfully, but these errors were encountered:

JorjMcKie · 2023-12-18T15:06:08Z

We are in the process of migrating PyMuPDF to a new architecture. This implementation is available already now under the import statement import fitz_new as fitz.
IAW it is a drop-in replacement for the classic implementation.
We plan to roll out the new implementation after the coming 1.23.8 (which is slated presumably this week).
Your issue is fixed in this new implementation, and we do not intend to apply the fix in the current version.

import fitz_new as fitz
doc=fitz.open("test.pdf")
page=doc[115]
for img in page.get_images():
    print(page.get_image_rects(img))

    
[Rect(-9.0, -9.89996337890625, 1270.400146484375, 833.0001220703125)]
[Rect(619.5001220703125, 45.3999137878418, 642.35009765625, 76.84991455078125)]
[Rect(255.75010681152344, 572.2999877929688, 364.2001037597656, 662.0999755859375)]

JorjMcKie · 2023-12-19T09:35:38Z

The reason for failure in the classic implementation is the use of non-UTF8 characters in the colorspace name of image xref 2054 on that page. So an alternative Python method to make the respective string object is an easy solution.
Colorspace info of image 2054:

print(doc.xref_object(4516))
[ /Separation /#BA#DA#C9#AB /DeviceCMYK 4517 0 R ]

The bytes((0xBA,0xDA,0xC9,0xAB)) is not UTF8-decodable.

myhloli · 2023-12-20T06:14:03Z

We are in the process of migrating PyMuPDF to a new architecture. This implementation is available already now under the import statement import fitz_new as fitz. IAW it is a drop-in replacement for the classic implementation. We plan to roll out the new implementation after the coming 1.23.8 (which is slated presumably this week). Your issue is fixed in this new implementation, and we do not intend to apply the fix in the current version.
import fitz_new as fitz
doc=fitz.open("test.pdf")
page=doc[115]
for img in page.get_images():
    print(page.get_image_rects(img))

    
[Rect(-9.0, -9.89996337890625, 1270.400146484375, 833.0001220703125)]
[Rect(619.5001220703125, 45.3999137878418, 642.35009765625, 76.84991455078125)]
[Rect(255.75010681152344, 572.2999877929688, 364.2001037597656, 662.0999755859375)]

Thanks for your reply , but i have a question
what's the different between the fitz_new and fitz？
could i use fitz_new replace fitz in my project？
or i should wait the PyMuPDF update 1.23.8？

JorjMcKie · 2023-12-20T08:38:40Z

You can use import fitz_new as fitz immediately. This is the only change in your scripts to have the "rebased" version working.
There are no changes in the behavior.

Early next year (first two weeks), the final migration step will take place - such that you again can simply say import fitz.

julian-smith-artifex-com · 2024-01-11T12:00:58Z

Fixed in 1.23.9 where import fitz gets the rebased implementation.

JorjMcKie added the wontfix no intention to resolve label Dec 18, 2023

julian-smith-artifex-com added the Works in rebased label Dec 18, 2023

JorjMcKie removed the wontfix no intention to resolve label Dec 19, 2023

julian-smith-artifex-com closed this as completed Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SystemError: <built-in function TextPage_extractIMGINFO> returned a result with an exception set #2905

SystemError: <built-in function TextPage_extractIMGINFO> returned a result with an exception set #2905

myhloli commented Dec 18, 2023

JorjMcKie commented Dec 18, 2023

JorjMcKie commented Dec 19, 2023 •

edited

Loading

myhloli commented Dec 20, 2023 •

edited

Loading

JorjMcKie commented Dec 20, 2023

julian-smith-artifex-com commented Jan 11, 2024

SystemError: <built-in function TextPage_extractIMGINFO> returned a result with an exception set #2905

SystemError: <built-in function TextPage_extractIMGINFO> returned a result with an exception set #2905

Comments

myhloli commented Dec 18, 2023

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

JorjMcKie commented Dec 18, 2023

JorjMcKie commented Dec 19, 2023 • edited Loading

myhloli commented Dec 20, 2023 • edited Loading

JorjMcKie commented Dec 20, 2023

julian-smith-artifex-com commented Jan 11, 2024

JorjMcKie commented Dec 19, 2023 •

edited

Loading

myhloli commented Dec 20, 2023 •

edited

Loading