-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ebooks and pdf export #88
Comments
I would like to support pdf and ebook format. I think this could already be developed out of tree if you use the Renderer trait from mdBook. I am not sure I want to depend on a full blown Gui tool though. There must surely be a better alternative for that. |
Not familiar with many conversion tools like this. Pandoc also seems like a plausible option. Don't know of any others. |
Yeah pandoc seems a lot better! |
Did some exploration on this and seems doable. Here's the default epub version of the Rust book. Note the chapters out of order and links not working. To get good output, I think we would need to:
I'm interested in working on this but will be a bit slow. Useful info here: Pandoc commands and styling options |
@asolove, I have implemented this (among other transformations) in https://github.com/killercup/trpl-ebook, feel free to use my code. |
@killercup great, thanks! |
Great! Thanks for doing this :)
This is already done in the Rust code, the
Concatenating the markdown files is also not that hard, I do it for the print page. Replacing the links could be a little trickier, what should internal links look like for pandoc? I know that pulldown-cmark gives you the ability to transform the parsed markdown events before rendering, but it's not well documented. Maybe link replacing is in it's capabilities. Static files, like images, will probably also need some special treatment to be included correctly?
That is absolutely no problem, there is no rush. I am also planning on doing a big refactor (#90) to clean up and create a better API. For example, I am thinking about adding a way to poll the |
FIY, I'm doing some regex work to transform links relative to the doc.rust-lang.org domain and make reference link names unique for the combined markdown file. |
Thanks! Does pandoc auto-generate the anchors from the markdown files in those formats? like |
@azerupi I'm pretty sure pandoc generates those. I've had problems before because pandoc generates slugs in a different way than rustdoc. It should be possible to add a specific id to each header, though. The syntax is You might also want to look at adjust_header_level.rs and adjust_reference_names.rs. |
Ok thanks for all the information, this will probably help @asolove a lot! :) |
Not sure if this will help you guys, but I've created a simple rust tool which will collate multiple markdown files into one, resolving internal links and turning them into anchor links We can use this in a pipeline on the way to converting to PDF:
Code can be found here: Happy to accept any PRs |
@cetra3 That is really cool! I am not sure I would add a dependency just for that functionality, because there is always the possibility that it will not be maintained actively. But it could be considered if it offers enough useful methods that we wouldn't have to reinvent here. |
I'm also sceptical about Calibre. We use it in Russian translation of TRPL and we've come along several problems with EPUB (links are to descriptions in Russian, for reference):
|
Thanks for sharing your experience :)
I am not sure how this is handled with Pandoc, but having a custom theme could be a good thing. |
It's probably possible to wrap up those command line tools into a combined tool or expose it as a rust library. The last component (html to pdf) would need to use FFI as The complication arises in that markdown is a superset of HTML which means that you need something that can present HTML in a printable fashion. In my experience with this problem, Pandoc and Calibre will do a subset, but you won't get full parity. |
There are a few things to be aware of, but in general pandoc is really amazing at converting Markdown to LaTeX. Which is what you want, I think—it has some very nice features that you currently can't get with HTML-to-PDF converters. For example, my PDF versions of the Rust Book include cross-references like "This is a mutable variable binding (section 5, page 163)". If you're no LaTeX wizard (I'm not), you might want to look at this template I threw together. If you have any issues with this, just mention me. |
Thanks for all your help Pascal! |
+1 for the effort, I am looking forward to using It seems to have stalled a bit, is anyone currently working on this? |
Indeed, it has stalled a bit. In the last 6 months I have been overwhelmed with work at school 😕 I am (very) slowly working on the refactoring / clean-up that I wanted to do. And that work is probably going to change the way this specific feature is going to be implemented. Hopefully I will have some time in September to make significant progress on the internal rewrite so that I can work on new features again. |
@azerupi How much space is there for discussing this feature? There are some specific things I would be looking for in a CLI ebook helper, but maybe you are already determined in which way to go. Some time ago I wrote prophecy, a ruby gem to automate the tasks I needed when producing ebooks. This is and example of the output. It has been very useful for me, but I believe I am the only user :) I have been wanting to rewrite it with some of the hindsight since its early days, but when I saw this I thought maybe There is an asciinema recording to see to sort of things it does. |
I'm open to all ideas :) |
One of the things that doesn't seem to be mentioned anywhere on this ticket is the ability to highlight the important bits. I have used a chrome extension called Hypothesis to do this until recently but a) chrome extension, ew b) its pretty sloppy about whether the highlights are saved under your personal view or public view (ie, in some cases you can see others' highlights) and c) I say recently because I'm pretty sure when the book gets updated, all my highlights and attached notes get deleted, too. Anyway, just wanted to add my two cents and support for a PDF version to be released in parallel to the online book's update. I realize that must be a lot harder than many of us make it out to be, and you all are doing a wonderful job regardless of the format in which we are all consuming your work. Thanks! |
Seems that mdproof can be used for such a task |
Trying to open
|
Nice feature request. |
Yes, I think this can be an important bit. Is this work useful? void-linux/void-docs#416 |
I can't install I think the maintainers would rather encourage user to use external plugins like |
As an example on how I'd like a ebook to look like.
Rust by example book: https://flibusta.is/b/619885/
|
I see an almost blank page in Russian (and the text seems to say "The page is not found"). Which part of it do you mean you want an ebook to look like? |
yes, sorry. The right URL is https://flibusta.is/b/619928
|
Since there is zero interest to support that in mdBook, I recommend a relatively new framework to create books, more flexible that commonly known Bookdown - Quarto. It's pandoc-based, thus can export to basically anything. You can see their gallery for samples how such different formats and exports look like. It's quite actively developed as well. |
@dustinmatlock ... I am afraid that I disagree:
A PDF file is usually complete in that a link from the Table of Contents, when clicked would go to page #202 say. With a generated PDF file from a browser, clicking a hyperlink in the resulting PDF provieds this informative explaination ...
An export function would at least support hyperlinks for Table of Contents, an Index and Footnotes. Sometimes it is useful for an exported file to link enternally.. In such cases I believe the PDF format specifies "internal" and "external" links. Yes, a saved PDF is a usable solution to an off-line document, it is not a solution that I can save to a tablet and use when I'm out of touch with the internet, or late at night, etc. Have a good one ...! |
Hi all! I just created a mdBook backend named mdbook-pdf for generating PDF based on headless chrome and Chrome DevTools Protocol Page.printToPDF. It depends on Google Chrome / Microsoft Edge / Chromium. The generated page are pretty much alike the one you manually printed to PDF in your browser by opening For the issue aplatypus just mentioned above by using this method #88 (comment) , I guess for those "internal" links inside the book, work should be done in the mdbook side for |
Hi, I wrote a quick bash script to generate a PDF from mdBook markdown using pandoc and the Eisvogel Pandoc/LaTeX template. Maybe it is of help to someone: #!/bin/sh
# This script converts mdBook markdown output into a pdf using Pandoc/LaTeX and
# the eisvogel pandoc template (https://github.com/Wandmalfarbe/pandoc-latex-template).
# By default, it assumes that the script is put in to a direct subfolder of your
# mdBook project, next to the eisvogel.latex file and your mdBook project root
# contains the book.toml, your markdown sources at ./src and the preprocessed markdown
# will be created in ./book/markdown. Your book.toml file needs to contain the line
# [output.markdown]
# The path of the resulting pdf file will be ./book/pdf/output.pdf
# Directory that this script is in (e. g. subfolder of PROJECT_DIR)
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
# Project directory (contains book.toml, src folder and book output folder)
# Change this if your script is not inside a subfolder (e.g. 'scripts') of the project directory.
PROJECT_DIR="$( dirname $SCRIPT_DIR)"
# Pandoc LaTeX template
# This script works with the eisvogel-Template
# (https://github.com/Wandmalfarbe/pandoc-latex-template).
# If you want to use this template, please put the file
# eisvogel.latex in the same directory as this script
# (e.g. $PROJECT_DIR/scripts/eisvogel.latex).
TPL="$SCRIPT_DIR/eisvogel.latex"
# Build markdown
# Ensure that your book.toml contains the line
# [output.markdown]
mdbook build
# Make output and temp folders
mkdir -p $PROJECT_DIR/book/pdf
mkdir -p $PROJECT_DIR/book/markdown-temp/images
# Copy all images to a single directory
find $PROJECT_DIR/src -name \*.png -exec cp {} $PROJECT_DIR/book/markdown-temp/images \;
# Define output file path
OUTPUT_FILE=$PROJECT_DIR/book/markdown-temp/output.md
# Read meta information from book.toml
CONFIG_FILE_CONTENTS=$( < $PROJECT_DIR/book.toml )
[[ $CONFIG_FILE_CONTENTS =~ title\ +=\ +\"(.*)\" ]] \
&& DOCUMENT_TITLE=${BASH_REMATCH[1]}
[[ $CONFIG_FILE_CONTENTS =~ language\ =\ \"([a-z]*)\" ]] \
&& DOCUMENT_LANGUAGE=${BASH_REMATCH[1]}
[[ $CONFIG_FILE_CONTENTS =~ authors\ +=\ +(\[[^\]]+])\ * ]] \
&& DOCUMENT_AUTHORS=${BASH_REMATCH[1]}
# Write the document title and configuration to output file
cat > $OUTPUT_FILE<< EOF
---
title: ${DOCUMENT_TITLE}
author: ${DOCUMENT_AUTHORS}
date: "11.06.2022"
titlepage: true
fontsize: 10pt
logo: ""
logo-width: 110mm
toc: true
toc-own-page: true
keywords: [Markdown, Example]
...
EOF
# echo -e "# $DOCUMENT_TITLE\n" >> $OUTPUT_FILE
# Read SUMMARY.md, combine output titles and individual .md file contents
# into single output markdown file
while read line
do
[[ $line =~ Summary ]] && continue
# Write SUMMARY.md section titles to markdown file
# [[ $line =~ ^\# ]] && echo -e "$line\n" >> $OUTPUT_FILE
# Combine different markdown files, increasing the section level
# for each headline
# [[ $line =~ \((.*\.md)\) ]] \
# && sed -e 's/^#/##/g' \
# "$PROJECT_DIR/book/markdown/${BASH_REMATCH[1]}" \
# >> $OUTPUT_FILE
# Combine markdown files, leaving the section headings as they are
[[ $line =~ \((.*\.md)\) ]] \
&& cat "$PROJECT_DIR/book/markdown/${BASH_REMATCH[1]}" \
>> $OUTPUT_FILE
echo -e "\n" >> $OUTPUT_FILE
done < $PROJECT_DIR/src/SUMMARY.md
# Do pandoc conversion of markdown
cd $PROJECT_DIR/book/markdown-temp
pandoc -w latex --template $TPL -o ../pdf/output.pdf output.md --number-sections -V lang=$DOCUMENT_LANGUAGE |
The main issue when creating a PDF from mdbook sources, is that the Markdown sources are a tree, potentially/likely randomly interlinked (just like HTML, which makes the conversion trivial), while a PDF is a single, linear document. |
@hoijui Very nice project, I'll definitely check it out. |
@hoijui ... You could look to the open source Okular tool to see how they load a MD document and render it as PDF. |
@aplatypus As I wrote before, the issue is not how to render a single MD file as PDF, that is trivial and possible with many tools and libraries. The issue is, how to convert a tree of Markdown files/documents into a single Markdown file. |
mdbook-pdf now supports Table of Content, see: HollowMan6/mdbook-pdf#1 (comment)
|
If you are looking for pdf output, check out the project I just posted in #815 (comment) |
I built |
Gitbook supports export to ebooks and pdfs via calibre. This might be easy to hook into.
See also rust-lang/rust-by-example#684 for problems this implementation creates for rustbyexample.
The text was updated successfully, but these errors were encountered: