You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Validation process with the PDF of a dissertation ran for minutes before I aborted the process. I assume, there's something in the pdf structure which causes JHOVE to stuck in an infinite loop. This problem occurs for JHOVE GUI 1.28.0 (2023-05-18, Buffer Size -1, PDFhul selected) and with PDF-hul 1.12.4 (16.03.2023). The same problem with JHOVE 1.26.1, Plugin Version 1.2, plugin name PDF-hul-1.26.
Please note that I tested the dissertation file with JHOVE 1.26.1, 2022-07-14 on Windows 10, and that it produces results.
The file seems to be a well-formed and valid PDF 1.4 with a PDF-HUL-136 infomessage.
It just takes a long time, about half an hour on my Intel Xeon 3.10 GHz laptop with 8 GB RAM, and produces 13 megabytes worth of output.
See the attached output of "jhove.bat -m pdf-hul KALPAKCI_MakingCIAM_Dissertation.pdf > dissertation.txt". dissertation.zip
I'm getting these same errors with Tiff files after upgrading to Archivematica 1.15.1 which uses JHOVE 1.26.1. Tested the same files on earlier version of JHOVE (1.20.0) without any errors.
Thank you for investigating the issue. Indeed, the file did not produce an infinite loop. Nevertheless, I feel that the issue should not be closed yet.
The result of the JHOVE validation (dissertation.txt, attached above by Rvan Veenendal) is a file with around 564,000 lines. Lines 47 to 555706 describe metadata for around 60'000 images. The metadata for each image requires 9 lines each and looks ok.
However, I can only find about 200 images in the PDF when I open it with Adobe. The validation with Adobe Preflight only takes a few seconds. Preflight only finds the 200 images, no attachments and nothing conspicuous. VeraPDF also only takes a few seconds, finds a small (but slightly higher) number of images, and states that the file is a valid PDF/A-1a file.
There are several method of embedding images in a PDF, but these 60'000 images found by JHOVE are puzzling me. Couldn't it be a bug in JHOVE?
Validation process with the PDF of a dissertation ran for minutes before I aborted the process. I assume, there's something in the pdf structure which causes JHOVE to stuck in an infinite loop. This problem occurs for JHOVE GUI 1.28.0 (2023-05-18, Buffer Size -1, PDFhul selected) and with PDF-hul 1.12.4 (16.03.2023). The same problem with JHOVE 1.26.1, Plugin Version 1.2, plugin name PDF-hul-1.26.
The Dissertaion "Making CIAM..." is too large for the Upload. The PDF is a available from https://www.research-collection.ethz.ch/handle/20.500.11850/183653. The direct link is here:
https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/183653/KALPAKCI_MakingCIAM_Dissertation.pdf?sequence=3&isAllowed=y
The text was updated successfully, but these errors were encountered: