Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix content data stream concatenation mangling output in cfFilterPDFToPDF #56

Merged
merged 2 commits into from
Jun 11, 2024

Conversation

sergio-gdr
Copy link
Contributor

When running the following command on example.pdf
./pdftopdf 1 1 1 1 "" example.pdf > output.pdf

the resultant output.pdf is mangled.

This is better explained in this bug report.

It seems that the logic that provides the content data streams and concatenates them to form a single stream in an XObject is assuming a correct separation between the contents of successive streams.
To better illustrate: in example.pdf above (note that the sample pdf's have been run through qpdf --qdf --recompress-flate --compress-streams=n --object-streams=disable for illustration purposes), page 1's contents are

 /Contents [9 0 R 11 0 R 13 0 R 15 0 R]                                                                                                                                           

and looking at objects 9-11:

%% Contents for page 1
%% object 9
9 0 obj
<</Length 10 0 R>>
stream
q
endstream
endobj

%% object 10
10 0 obj
1
endobj

%% object 11
11 0 obj
<</Length 12 0 R>>
stream
q 0.1 0 0 0.1 0 0 cm
%% only the beginning of object 11 shown

This results in the following in output.pdf above:

%% resultant XObject. irrelevant info excluded
11 0 obj
<</Subtype /Form /Type /XObject /Length 12 0 R>>
stream
qq 0.1 0 0 0.1 0 0 cm

so the concatenation of streams 9 and 11 result in the (invalid) command 'qq', confusing pdf parsers and mangling the output.

With the patch applied, the output becomes

%% resultant XObject. irrelevant info excluded
11 0 obj
<</Subtype /Form /Type /XObject /Length 12 0 R>>
stream
q
q 0.1 0 0 0.1 0 0 cm

I'm not sure if this is the best solution for the problem, but hopefully the analysis can at least point to that.

sergio-gdr and others added 2 commits June 11, 2024 10:44
…eStreamData

When concatenating the data streams for the page's contents, add a new
line at the end of each data stream to avoid cases where the
concatenation might result in a corruption.
Eg (extracted from a real pdf):

        %% Contents for page 1
        %% Stream 1
        9 0 obj
        <<
          /Length 10 0 R
        >>
        stream
        q
        endstream
        endobj

        10 0 obj
        1
        endobj

        %% Stream 2
        11 0 obj
        <<
          /Length 12 0 R
        >>
        stream
        q 0.1 0 0 0.1 0 0 cm

the output pdf results in
        qq 0.1 0 0 0.1 0 0 cm

with the effect that 'qq' is not being parsed correctly, effectively
mangling the contents.

Signed-off-by: Sergio Gómez <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants