TypeError: unhashable type: 'IndirectObject' when enumerating images #1955

michelcrypt4d4mus · 2023-07-09T07:58:47Z

The background info is the same as #1954 - an exception while iterating over the images on a page of a PDF - but the exception, and traceback are slightly different.

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ workspace/clown_sort/clown_sort/files/pdf_file.py:52 in extracted_text            │
│                                                                                                  │
│    49 │   │   │   │                                                                              │
│    50 │   │   │   │   # Extracting images is a bit fraught (lots of PIL and pypdf exceptions h   │
│    51 │   │   │   │   try:                                                                       │
│ ❱  52 │   │   │   │   │   for image_number, image in enumerate(page.images, start=1):            │
│    53 │   │   │   │   │   │   image_name = f"Page {page_number}, Image {image_number}"           │
│    54 │   │   │   │   │   │   self._log_to_stderr(f"   Processing {image_name}...")              │
│    55 │   │   │   │   │   │   page_buffer.print(Panel(image_name, expand=False))                 │
│                                                                                                  │
│ Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/sit │
│ e-packages/pypdf/_page.py:2596 in __iter__                                                       │
│                                                                                                  │
│   2593 │                                                                                         │
│   2594 │   def __iter__(self) -> Iterator[ImageFile]:                                            │
│   2595 │   │   for i in range(len(self)):                                                        │
│ ❱ 2596 │   │   │   yield self[i]                                                                 │
│   2597 │                                                                                         │
│   2598 │   def __str__(self) -> str:                                                             │
│   2599 │   │   p = [f"Image_{i}={n}" for i, n in enumerate(self.ids_function())]                 │
│                                                                                                  │
│ Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/sit │
│ e-packages/pypdf/_page.py:2592 in __getitem__                                                    │
│                                                                                                  │
│   2589 │   │   │   index = len_self + index                                                      │
│   2590 │   │   if index < 0 or index >= len_self:                                                │
│   2591 │   │   │   raise IndexError("sequence index out of range")                               │
│ ❱ 2592 │   │   return self.get_function(lst[index])                                              │
│   2593 │                                                                                         │
│   2594 │   def __iter__(self) -> Iterator[ImageFile]:                                            │
│   2595 │   │   for i in range(len(self)):                                                        │
│                                                                                                  │
│ Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/sit │
│ e-packages/pypdf/_page.py:533 in _get_image                                                      │
│                                                                                                  │
│    530 │   │   │   return f                                                                      │
│    531 │   │   else:  # in a sub object                                                          │
│    532 │   │   │   ids = id[1:]                                                                  │
│ ❱  533 │   │   │   return self._get_image(ids, cast(DictionaryObject, xobjs[id[0]]))             │
│    534 │                                                                                         │
│    535 │   @property                                                                             │
│    536 │   def images(self) -> List[ImageFile]:                                                  │
│                                                                                                  │
│ Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/sit │
│ e-packages/pypdf/_page.py:522 in _get_image                                                      │
│                                                                                                  │
│    519 │   │   │   │   │   raise KeyError("no inline image can be found")                        │
│    520 │   │   │   │   return self.inline_images[id]                                             │
│    521 │   │   │                                                                                 │
│ ❱  522 │   │   │   imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))                      │
│    523 │   │   │   extension, byte_stream = imgd[:2]                                             │
│    524 │   │   │   f = ImageFile(                                                                │
│    525 │   │   │   │   name=f"{id[1:]}{extension}",                                              │
│                                                                                                  │
│ Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/sit │
│ e-packages/pypdf/filters.py:826 in _xobj_to_image                                                │
│                                                                                                  │
│   823 │   if x_object_obj.get("/BitsPerComponent", 8) == 1:                                      │
│   824 │   │   mode = _get_imagemode("1bit", 0, "")                                               │
│   825 │   else:                                                                                  │
│ ❱ 826 │   │   mode = _get_imagemode(                                                             │
│   827 │   │   │   color_space,                                                                   │
│   828 │   │   │   2                                                                              │
│   829 │   │   │   if (                                                                           │
│                                                                                                  │
│ Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/sit │
│ e-packages/pypdf/filters.py:682 in _get_imagemode                                                │
│                                                                                                  │
│   679 │   │   "/DeviceCMYK": "CMYK",                                                             │
│   680 │   }                                                                                      │
│   681 │   mode: mode_str_type = (                                                                │
│ ❱ 682 │   │   mode_map.get(color_space)  # type: ignore                                          │
│   683 │   │   or list(mode_map.values())[color_components]                                       │
│   684 │   │   or prev_mode                                                                       │
│   685 │   )  # type: ignore                                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

The text was updated successfully, but these errors were encountered:

pubpub-zz · 2023-07-09T08:23:19Z

@michelcrypt4d4mus, without data this is too difficult to perform some analysis. Would it be possible to share just the failling pages in private with us (sending them privately by e-mail to @MartinThoma [email protected]

MartinThoma · 2023-07-10T19:41:34Z

Let's leave this open until 31.07.2023. If somebody has an example PDF that can trigger this, we can work on it. Otherwise we have to close it to focus on stuff that we can work on.

michelcrypt4d4mus · 2023-07-11T18:13:10Z

@michelcrypt4d4mus, without data this is too difficult to perform some analysis. Would it be possible to share just the failling pages in private with us (sending them privately by e-mail to @MartinThoma [email protected]

this may be possible. can PyPDF we used to extract a single page without any modifications?

pubpub-zz · 2023-07-11T20:36:29Z

If you use writer.append() to extract the page the file should contain the data( you should be able to confirm this reruning your test on the new produced document)

pubpub-zz · 2023-07-14T15:54:17Z

@michelcrypt4d4mus
have you been able to send a sample ?

MartinThoma · 2023-07-15T20:17:47Z

@michelcrypt4d4mus I've forwarded the two other mails, but I think for this one I didn't receive an example

pubpub-zz · 2023-07-19T07:29:21Z

@michelcrypt4d4mus : +1?

MartinThoma · 2023-07-19T12:40:31Z

I'm closing this for the moment. If anybody has a PDF that causes this, we can re-open :-)

michelcrypt4d4mus · 2023-07-22T01:06:21Z

i sent a PDF to @MartinThoma but it cannot be shared publicly or used in the test suite

pubpub-zz · 2023-07-22T07:17:39Z

@michelcrypt4d4mus
thanks we will of course keep privacy as requested

pubpub-zz · 2023-07-24T08:16:31Z

the images in the PDF are using /Separation ColorSpace ; this colorspace requires also color inversion. I've produced a test PDF for this:
TestWithSeparationBlack.pdf

closes py-pdf#1955

Closes #1955

michelcrypt4d4mus mentioned this issue Jul 9, 2023

TypeError: unhashable type: 'ArrayObject' when enumerating images #1956

Closed

michelcrypt4d4mus changed the title ~~TypeError: unhashable type: 'IndirectObject'~~ TypeError: unhashable type: 'IndirectObject' when enumerating images Jul 9, 2023

pubpub-zz mentioned this issue Jul 9, 2023

ValueError: not enough image data when enumerating images #1957

Closed

pubpub-zz added the needs-pdf The issue needs a PDF file to show the problem label Jul 9, 2023

MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Jul 10, 2023

MartinThoma closed this as completed Jul 19, 2023

pubpub-zz reopened this Jul 22, 2023

pubpub-zz added Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests and removed needs-pdf The issue needs a PDF file to show the problem labels Jul 22, 2023

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Jul 24, 2023

BUG: process Separation ColorSpace

80b5a0d

closes py-pdf#1955

pubpub-zz mentioned this issue Jul 24, 2023

BUG: Process /Separation ColorSpace #2007

Merged

MartinThoma closed this as completed in #2007 Jul 29, 2023

MartinThoma pushed a commit that referenced this issue Jul 29, 2023

BUG: Process /Separation ColorSpace (#2007)

6b70364

Closes #1955

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: unhashable type: 'IndirectObject' when enumerating images #1955

TypeError: unhashable type: 'IndirectObject' when enumerating images #1955

michelcrypt4d4mus commented Jul 9, 2023

pubpub-zz commented Jul 9, 2023 •

edited

Loading

MartinThoma commented Jul 10, 2023

michelcrypt4d4mus commented Jul 11, 2023

pubpub-zz commented Jul 11, 2023

pubpub-zz commented Jul 14, 2023

MartinThoma commented Jul 15, 2023

pubpub-zz commented Jul 19, 2023

MartinThoma commented Jul 19, 2023

michelcrypt4d4mus commented Jul 22, 2023

pubpub-zz commented Jul 22, 2023

pubpub-zz commented Jul 24, 2023

TypeError: unhashable type: 'IndirectObject' when enumerating images #1955

TypeError: unhashable type: 'IndirectObject' when enumerating images #1955

Comments

michelcrypt4d4mus commented Jul 9, 2023

pubpub-zz commented Jul 9, 2023 • edited Loading

MartinThoma commented Jul 10, 2023

michelcrypt4d4mus commented Jul 11, 2023

pubpub-zz commented Jul 11, 2023

pubpub-zz commented Jul 14, 2023

MartinThoma commented Jul 15, 2023

pubpub-zz commented Jul 19, 2023

MartinThoma commented Jul 19, 2023

michelcrypt4d4mus commented Jul 22, 2023

pubpub-zz commented Jul 22, 2023

pubpub-zz commented Jul 24, 2023

pubpub-zz commented Jul 9, 2023 •

edited

Loading