-
-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to create a table of contents (document outline)? #127
Comments
Hello @Johann-S! I'm not sure what precisely you mean by "table of contents". You can certainly write out some text on a page to outline the contents of your document. Are you wanting to do something more than that? |
Yep some PDF have a table of contents inside them, but not on a separate page, sometimes it's called signets too |
Yep exactly ! |
Yes, this is definitely possible to do. As with page links, I created an example script to demonstrate how to do it. Here's the resulting PDF, along with a screenshot previewing the outline panel: with_outline.pdf And here's the script itself: // ...imports omitted...
const PAGE_WIDTH = 500;
const PAGE_HEIGHT = 750;
const getPageRefs = (pdfDoc) => {
const refs = [];
pdfDoc.catalog.Pages.traverse((kid, ref) => {
if (kid instanceof PDFPage) refs.push(ref);
});
return refs;
};
const createOutlineItem = (pdfDoc, title, parent, nextOrPrev, page, isLast = false) =>
PDFDictionary.from(
{
Title: PDFString.fromString(title),
Parent: parent,
[isLast ? 'Prev' : 'Next']: nextOrPrev,
Dest: PDFArray.fromArray(
[
page,
PDFName.from('XYZ'),
PDFNull.instance,
PDFNull.instance,
PDFNull.instance,
],
pdfDoc.index,
),
},
pdfDoc.index,
);
const pdfDoc = PDFDocumentFactory.create();
const [fontRef, font] = pdfDoc.embedStandardFont(StandardFonts.Helvetica);
const contentStream1 = pdfDoc.register(
pdfDoc.createContentStream(
drawText(font.encodeText('PAGE 1'), {
font: 'Helvetica',
size: 50,
x: 175,
y: PAGE_HEIGHT - 100,
}),
),
);
const contentStream2 = pdfDoc.register(
pdfDoc.createContentStream(
drawText(font.encodeText('PAGE 2'), {
font: 'Helvetica',
size: 50,
x: 175,
y: PAGE_HEIGHT - 100,
}),
),
);
const contentStream3 = pdfDoc.register(
pdfDoc.createContentStream(
drawText(font.encodeText('PAGE 3'), {
font: 'Helvetica',
size: 50,
x: 175,
y: PAGE_HEIGHT - 100,
}),
),
);
const page1 = pdfDoc
.createPage([PAGE_WIDTH, PAGE_HEIGHT])
.addFontDictionary('Helvetica', fontRef)
.addContentStreams(contentStream1);
const page2 = pdfDoc
.createPage([PAGE_WIDTH, PAGE_HEIGHT])
.addFontDictionary('Helvetica', fontRef)
.addContentStreams(contentStream2);
const page3 = pdfDoc
.createPage([PAGE_WIDTH, PAGE_HEIGHT])
.addFontDictionary('Helvetica', fontRef)
.addContentStreams(contentStream3);
pdfDoc.addPage(page1);
pdfDoc.addPage(page2);
pdfDoc.addPage(page3);
const pageRefs = getPageRefs(pdfDoc);
const outlinesDictRef = pdfDoc.index.nextObjectNumber();
const outlineItem1Ref = pdfDoc.index.nextObjectNumber();
const outlineItem2Ref = pdfDoc.index.nextObjectNumber();
const outlineItem3Ref = pdfDoc.index.nextObjectNumber();
const outlineItem1 = createOutlineItem(
pdfDoc,
'Page 1',
outlinesDictRef,
outlineItem2Ref,
pageRefs[0],
);
const outlineItem2 = createOutlineItem(
pdfDoc,
'Page 2',
outlinesDictRef,
outlineItem3Ref,
pageRefs[1],
);
const outlineItem3 = createOutlineItem(
pdfDoc,
'Page 3',
outlinesDictRef,
outlineItem2Ref,
pageRefs[2],
true,
);
const outlinesDict = PDFDictionary.from(
{
Type: PDFName.from('Outlines'),
First: outlineItem1Ref,
Last: outlineItem3Ref,
Count: PDFNumber.fromNumber(3),
},
pdfDoc.index,
);
pdfDoc.index.assign(outlinesDictRef, outlinesDict);
pdfDoc.index.assign(outlineItem1Ref, outlineItem1);
pdfDoc.index.assign(outlineItem2Ref, outlineItem2);
pdfDoc.index.assign(outlineItem3Ref, outlineItem3);
pdfDoc.catalog.set('Outlines', outlinesDictRef);
const pdfBytes = PDFDocumentWriter.saveToBytes(pdfDoc);
fs.writeFileSync('./with_outline.pdf', pdfBytes); This is, of course, a very simple document outline without any nesting. If you'd like to create something more complex, with multiple nested levels, you can certainly do so. However, I'll refer you to section 12.3.3 Document Outline and annex H.6 Outline Hierarchy Example of the PDF specification for the details. I hope this helps. Please let me know if you have any additional questions! |
Thanks @Hopding you're a PDF expert 👍 |
Ideally this would be handled by Chrome during printing. However, https://bugs.chromium.org/p/chromium/issues/detail?id=840455 is not implemented yet (and cannot rely on metadata extracted from the asciidoc format directly). Therefore this implements it by introducing some kind of post-processing using `pdf-lib`. If the `:toc:` property is set, the `.adoc` file is scanned for sections (respecting the `:toclevels:` attribute) and an outline is generated. This only works if a ToC is also generated within the document (or better: links exist to each section), because otherwise Chrome would not generate the necessary `Dests` fields within the PDF. Unfortunately, Chrome also has some bugs regarding Umlaute in anchors, leading to omission of the relevant `Dests` fields. Therefore a warning is printed if a anchor cannot be located in the `Dests` field of the PDF catalog. Based upon https://gitlab.pagedmedia.org/tools/pagedjs-cli/commit/df0d10bd1bb12d1e2077323e1fb5eec75da35a1e which itself is based on @Dopding's comment at: Hopding/pdf-lib#127 (comment)
Ideally this would be handled by Chrome during printing. However, https://bugs.chromium.org/p/chromium/issues/detail?id=840455 is not implemented yet (and cannot rely on metadata extracted from the asciidoc format directly). Therefore this implements it by introducing some kind of post-processing using `pdf-lib`. If the `:toc:` property is set, the `.adoc` file is scanned for sections (respecting the `:toclevels:` attribute) and an outline is generated. This only works if a ToC is also generated within the document (or better: links exist to each section), because otherwise Chrome would not generate the necessary `Dests` fields within the PDF. Unfortunately, Chrome also has some bugs regarding Umlaute in anchors, leading to omission of the relevant `Dests` fields. Therefore a warning is printed if a anchor cannot be located in the `Dests` field of the PDF catalog. Based upon https://gitlab.pagedmedia.org/tools/pagedjs-cli/commit/df0d10bd1bb12d1e2077323e1fb5eec75da35a1e which itself is based on @Dopding's comment at: Hopding/pdf-lib#127 (comment)
Ideally this would be handled by Chrome during printing. However, https://bugs.chromium.org/p/chromium/issues/detail?id=840455 is not implemented yet (and cannot rely on metadata extracted from the asciidoc format directly). Therefore this implements it by introducing some kind of post-processing using `pdf-lib`. If the `:toc:` property is set, the `.adoc` file is scanned for sections (respecting the `:toclevels:` attribute) and an outline is generated. This only works if a ToC is also generated within the document (or better: links exist to each section), because otherwise Chrome would not generate the necessary `Dests` fields within the PDF. Unfortunately, Chrome also has some bugs regarding Umlaute in anchors, leading to omission of the relevant `Dests` fields. Therefore a warning is printed if a anchor cannot be located in the `Dests` field of the PDF catalog. Based upon https://gitlab.pagedmedia.org/tools/pagedjs-cli/commit/df0d10bd1bb12d1e2077323e1fb5eec75da35a1e which itself is based on @Dopding's comment at: Hopding/pdf-lib#127 (comment)
Ideally this would be handled by Chrome during printing. However, https://bugs.chromium.org/p/chromium/issues/detail?id=840455 is not implemented yet (and cannot rely on metadata extracted from the asciidoc format directly). Therefore this implements it by introducing some kind of post-processing using `pdf-lib`. If the `:toc:` property is set, the `.adoc` file is scanned for sections (respecting the `:toclevels:` attribute) and an outline is generated. This only works if a ToC is also generated within the document (or better: links exist to each section), because otherwise Chrome would not generate the necessary `Dests` fields within the PDF. Unfortunately, Chrome also has some bugs regarding Umlaute in anchors, leading to omission of the relevant `Dests` fields. Therefore a warning is printed if a anchor cannot be located in the `Dests` field of the PDF catalog. Based upon https://gitlab.pagedmedia.org/tools/pagedjs-cli/commit/df0d10bd1bb12d1e2077323e1fb5eec75da35a1e which itself is based on @Dopding's comment at: Hopding/pdf-lib#127 (comment)
Ideally this would be handled by Chrome during printing. However, https://bugs.chromium.org/p/chromium/issues/detail?id=840455 is not implemented yet (and cannot rely on metadata extracted from the asciidoc format directly). Therefore this implements it by introducing some kind of post-processing using `pdf-lib`. If the `:toc:` property is set, the `.adoc` file is scanned for sections (respecting the `:toclevels:` attribute) and an outline is generated. This only works if a ToC is also generated within the document (or better: links exist to each section), because otherwise Chrome would not generate the necessary `Dests` fields within the PDF. Unfortunately, Chrome also has some bugs regarding Umlaute in anchors, leading to omission of the relevant `Dests` fields. Therefore a warning is printed if a anchor cannot be located in the `Dests` field of the PDF catalog. Based upon https://gitlab.pagedmedia.org/tools/pagedjs-cli/commit/df0d10bd1bb12d1e2077323e1fb5eec75da35a1e which itself is based on @Dopding's comment at: Hopding/pdf-lib#127 (comment)
Ideally this would be handled by Chrome during printing. However, https://bugs.chromium.org/p/chromium/issues/detail?id=840455 is not implemented yet (and cannot rely on metadata extracted from the asciidoc format directly). Therefore this implements it by introducing some kind of post-processing using `pdf-lib`. If the `:toc:` property is set, the `.adoc` file is scanned for sections (respecting the `:toclevels:` attribute) and an outline is generated. This only works if a ToC is also generated within the document (or better: links exist to each section), because otherwise Chrome would not generate the necessary `Dests` fields within the PDF. Unfortunately, Chrome also has some bugs regarding Umlaute in anchors, leading to omission of the relevant `Dests` fields. Therefore a warning is printed if a anchor cannot be located in the `Dests` field of the PDF catalog. Based upon https://gitlab.pagedmedia.org/tools/pagedjs-cli/commit/df0d10bd1bb12d1e2077323e1fb5eec75da35a1e which itself is based on @Dopding's comment at: Hopding/pdf-lib#127 (comment)
Ideally this would be handled by Chrome during printing. However, https://bugs.chromium.org/p/chromium/issues/detail?id=840455 is not implemented yet (and cannot rely on metadata extracted from the asciidoc format directly). Therefore this implements it by introducing some kind of post-processing using `pdf-lib`. If the `:toc:` property is set, the `.adoc` file is scanned for sections (respecting the `:toclevels:` attribute) and an outline is generated. This only works if a ToC is also generated within the document (or better: links exist to each section), because otherwise Chrome would not generate the necessary `Dests` fields within the PDF. Unfortunately, Chrome also has some bugs regarding Umlaute in anchors, leading to omission of the relevant `Dests` fields. Therefore a warning is printed if a anchor cannot be located in the `Dests` field of the PDF catalog. Based upon https://gitlab.pagedmedia.org/tools/pagedjs-cli/commit/df0d10bd1bb12d1e2077323e1fb5eec75da35a1e which itself is based on @Dopding's comment at: Hopding/pdf-lib#127 (comment)
Ideally this would be handled by Chrome during printing. However, https://bugs.chromium.org/p/chromium/issues/detail?id=840455 is not implemented yet (and cannot rely on metadata extracted from the asciidoc format directly). Therefore this implements it by introducing some kind of post-processing using `pdf-lib`. If the `:toc:` property is set, the `.adoc` file is scanned for sections (respecting the `:toclevels:` attribute) and an outline is generated. This only works if a ToC is also generated within the document (or better: links exist to each section), because otherwise Chrome would not generate the necessary `Dests` fields within the PDF. Unfortunately, Chrome also has some bugs regarding Umlaute in anchors, leading to omission of the relevant `Dests` fields. Therefore a warning is printed if a anchor cannot be located in the `Dests` field of the PDF catalog. Based upon https://gitlab.pagedmedia.org/tools/pagedjs-cli/commit/df0d10bd1bb12d1e2077323e1fb5eec75da35a1e which itself is based on @Dopding's comment at: Hopding/pdf-lib#127 (comment)
Ideally this would be handled by Chrome during printing. However, https://bugs.chromium.org/p/chromium/issues/detail?id=840455 is not implemented yet (and cannot rely on metadata extracted from the asciidoc format directly). Therefore this implements it by introducing some kind of post-processing using `pdf-lib`. If the `:toc:` property is set, the `.adoc` file is scanned for sections (respecting the `:toclevels:` attribute) and an outline is generated. This only works if a ToC is also generated within the document (or better: links exist to each section), because otherwise Chrome would not generate the necessary `Dests` fields within the PDF. Unfortunately, Chrome also has some bugs regarding Umlaute in anchors, leading to omission of the relevant `Dests` fields. Therefore a warning is printed if a anchor cannot be located in the `Dests` field of the PDF catalog. Based upon https://gitlab.pagedmedia.org/tools/pagedjs-cli/commit/df0d10bd1bb12d1e2077323e1fb5eec75da35a1e which itself is based on @Dopding's comment at: Hopding/pdf-lib#127 (comment)
--outline-tags allows to specify the HTML tags which should be considered for the outline. The tags are expected to be given in order of hierachy, for example, 'h1,h2' will trigger a generation with h1 elements as top level outline entries and h2 as their childs. Ideally this would not be required if Chromium would add this directly. So if these bugs are closed this can probably be removed again: - https://bugs.chromium.org/p/chromium/issues/detail?id=840455 - puppeteer/puppeteer#1778 This code is heavily based on @Hopding's comment at: Hopding/pdf-lib#127 (comment)
Hi @Hopding - seems to me this example might be based on an older version of pdf-lib? Could you let me know what changes I should think about for using with the latest version? |
@jackwshepherd, I was facing the same problem, but this JS lib still seems to be the best for my needs. Did some deep inspecting of the current code and compared it to the older versions and managed to update @Hopding's solution. Do pardon me if my code is inefficient as I'm still quite new to JS and I'm writing code to be run in Electron. const { PDFDocument, PDFPageLeaf, PDFDict, PDFString, PDFArray, PDFName, PDFNull, PDFNumber, } = require("pdf-lib");
const fs = require("fs");
async function creatOutlines() {
const doc = await PDFDocument.load(
fs.readFileSync("##YOUR CURRENT FILE NAME##")
);
const getPageRefs = (pdfDoc) => {
const refs = [];
pdfDoc.catalog.Pages().traverse((kid, ref) => {
if (kid instanceof PDFPageLeaf) refs.push(ref);
});
return refs;
};
//(PDFDocument, string, PDFRef, PDFRef, PDFRef, boolean)
const createOutlineItem = (pdfDoc, title, parent, nextOrPrev, page, isLast = false) => {
let array = PDFArray.withContext(pdfDoc.context);
array.push(page);
array.push(PDFName.of("XYZ"));
array.push(PDFNull);
array.push(PDFNull);
array.push(PDFNull);
const map = new Map();
map.set(PDFName.Title, PDFString.of(title));
map.set(PDFName.Parent, parent);
map.set(PDFName.of(isLast ? "Prev" : "Next"), nextOrPrev);
map.set(PDFName.of("Dest"), array);
return PDFDict.fromMapWithContext(map, pdfDoc.context);
}
const pageRefs = getPageRefs(doc);
const outlinesDictRef = doc.context.nextRef();
const outlineItem1Ref = doc.context.nextRef();
const outlineItem2Ref = doc.context.nextRef();
const outlineItem3Ref = doc.context.nextRef();
const outlineItem1 = createOutlineItem(
doc,
"Page 1",
outlinesDictRef,
outlineItem2Ref,
pageRefs[0]
);
const outlineItem2 = createOutlineItem(
doc,
"Page 2",
outlinesDictRef,
outlineItem3Ref,
pageRefs[1]
);
const outlineItem3 = createOutlineItem(
doc,
"Page 3",
outlinesDictRef,
outlineItem2Ref,
pageRefs[2],
true
);
const outlinesDictMap = new Map();
outlinesDictMap.set(PDFName.Type, PDFName.of("Outlines"));
outlinesDictMap.set(PDFName.of("First"), outlineItem1Ref);
outlinesDictMap.set(PDFName.of("Last"), outlineItem3Ref);
outlinesDictMap.set(PDFName.of("Count"), PDFNumber.of(3)); //This is a count of the number of outline items. Should be changed for X no. of outlines
//Pointing the "Outlines" property of the PDF's "Catalog" to the first object of your outlines
doc.catalog.set(PDFName.of("Outlines"),outlinesDictRef)
const outlinesDict = PDFDict.fromMapWithContext(outlinesDictMap, doc.context);
//First 'Outline' object. Refer to table H.3 in Annex H.6 of PDF Specification doc.
doc.context.assign(outlinesDictRef, outlinesDict);
//Actual outline items that will be displayed
doc.context.assign(outlineItem1Ref, outlineItem1);
doc.context.assign(outlineItem2Ref, outlineItem2);
doc.context.assign(outlineItem3Ref, outlineItem3);
const file = await doc.save();
fs.writeFileSync("##YOUR DESTINATION FILE NAME##", file);
}
creatOutlines(); It is a lot of work for 3 outlines. I will be working on nested outlines and I'd need that. Happy to share with anyone that might need it when I'm done with that. |
I tried adopting your code and allowing a merge of n numbers of PDFs, with the option of adding a bookmark for each PDF with a specified name passed through an argument to command line. Code is as follows. Two bookmarks only are added (out of the expected 4 in the demo I was running), and they both had the same title. Any ideas where i've gone wrong?
|
@Resurg3nt @feodormak @jackwshepherd @Hopding @Johann-S |
array.push(PDFName.of("XYZ")); |
You are not doing this right. I had the same issue, You are not mapping the item references correctly. Check my implementation for the above lines of code. Hope this helps. const outlinesDictRef = mergedPdf.context.nextRef();
|
See a better implementation here: The function you are looking for is async function setOutline(doc: PDFDocument, outlines: readonly PDFOutline[]) |
Hi thanks for your good implementation! I tried to use your pdf.ts and setOutline to assign some bookmarks into my pdf, the bookmarks link work well to be able to be clicked and jump to the right pages, but the only issue is that the bookmark titles all seem to be totally invisible like transparent in my adobe pdf reviewer(i tried different reviewers,wps or chrome browser, and none of them can show the textual title for any bookmark that has been added in the pdf..), below is my code, I was wondering could you please help me find the reason of the issue, thanks! import { PDFDocument, PDFRef, rgb } from 'pdf-lib';
} |
oh, never mind, I figured out, since I didn't import pdf-lib library packages from the internal code... thanks. |
Can you please help me with this implementation i'm kind of stuck |
Can you share a working repo in nodejs that will help me a lot @yekaiLiu2022 @devnoname120 |
I implemented an example using node. To demonstrate generality, I used additional libraries to generate a PDF and successfully added bookmarks/outlines. Thanks for the awesome lib 👍 |
hi,i have the same problem。Can you tell me how to solve it, thank you |
i know. import { PDFString } from 'pdf-lib'
// const createOutline
// ....
// PDFHexString.fromText(outline.title)
PDFString.of(outline.title) // ok |
Hi @Hopding,
Again thanks for your awesome lib 👍
Do you think it's possible to create a table of contents in a PDF with your libs ? If it's possible I would love to see how to do that.
Thanks 👍
BTW you should add a way to support your work 😉
The text was updated successfully, but these errors were encountered: