Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[for review] multi-language mdbook #200

Closed
wants to merge 51 commits into from

Conversation

gambhiro
Copy link
Contributor

@gambhiro gambhiro commented Jan 13, 2017

See a demo: Alice's Adventures in Wonderland, in three languages with mdbook. Chapter excerpts for testing. Markdown sources here.

Use the gambhiro/mdbook/multilang branch for compiling or reading the code.

This PR is for reviewing and discussing a large refactoring which involves:

This PR incorporates PR #147 (new book struct), as I started by merging that and working from there.

This also prepares the ground for #88 (ebooks), which after this can be implemented as another renderer.

Does everything still work?

Let's compose a list of user-level features that I can test and debug:

  • the Rust Book should build
  • CLI commands
  • single-language book
  • multi-language book
  • any more?

Rust book

Move src/img to assets/img for the images to be copied. Paths in the Markdown source don't have to change.

CLI commands

init

  • SUMMARY.md is parsed and missing chapter files are created
  • .gitignore is created on confirmation

build

  • the html builds
  • custom template is used if found in assets/_html-template
  • book assets are copied if found in assets

watch

  • making a change in a chapter rebuilds the book

serve

  • book is served on :3000
  • making a change in a chapter updates the page in the browser

test

  • chapter files are tested with rustdoc

Single language book

See src/tests/book-minimal

Works as expected as far as I can tell.

Multi-language book

See src/tests/book-wonderland-multilang

Works as expected as far as I can tell.

Features

Renderer

Renderer is a trait expecting two functions, .build() and .render():

pub trait Renderer {

    /// When the output format is determined (by a CLI argument for example),
    /// call `.build()` of the selected Renderer implementation.
    ///
    /// Constructs an `MDBook` struct given the path of the book project,
    /// preparing the project and calling `render()`, doing what is necessary
    /// for the particular output format.
    ///
    /// This involves parsing config options from `book.toml` and parsing the
    /// `SUMMARY.md` of each translation to a nested `Vec<TocItem>`.
    ///
    /// Finally it calls `render()` to process the chapters and static assets.
    fn build(&self, project_root: &PathBuf) -> Result<(), Box<Error>>;

    /// Responsible for rendering the chapters and copying static assets.
    fn render(&self, book_project: &MDBook) -> Result<(), Box<Error>>;

}

The general idea is that a data structure (MDBook) is constructed by parsing
the book's files, and this is given to the renderer's .render() function which
does whatever it needs to do with it to produce its output format.

.build() is responsible for calling the necessary functions to construct
MDBook, and .render() is responsible for writing the output based on that
MDBook.

So MDBook is not responsible to rendering or writing anything, but it should
represent all the information that the renderer might need to write all its
output files.

MDBook has a .render_intent attribute that is a descriptive enum which
internal functions can inspect if they need to make decisions based on what
output format has been selected.

pub struct MDBook {
    ...
    /// Informs other functions which renderer has been selected, either by
    /// default or CLI argument.
    render_intent: RenderIntent,
    ...
}

pub enum RenderIntent {
    HtmlHandlebars,
}

Static assets

The application's static assets are embedded in the binary from data/ using includedir.

The book's static assets are expected in assets/. Everything will be copied to the book's output folder, except for folders which start with underscore (i.e. user might have _sass or _html-template and so on).

chapter TOML headers

TOML headers can be added at the beginning of a chapter file, these will be parsed into the attributes of the Chapter struct. See babel.md.

+++
title = "The Library of Babel"
author = "Jorge Luis Borges"
translator = "James E. Irby"
+++

# Babel

The universe (which others call the Library) is composed of an indefinite and
perhaps infinite number of hexagonal galleries, with vast air shafts between,
surrounded by very low railings. From any of the hexagons one can see,
interminably, the upper and lower floors.

Multi-language books

See src/tests/book-wonderland-multilang

translation cross-linking

Automatic chapter-to-chapter linking is implemented with incrementally trying harder to find a translation, but never refusing to build the book. I.e. something should always happen, whatever the author does, and more and more should happen as they keep working on their content.

  • links to the top-level index pages of the translations are displayed above the TOC in the sidebar
  • chapter translations are displayed in the title bar when the application can find a translation, otherwise it displays a grayed-out text of the language code

Buidling on the ideas described in #5, finding a translation works step by step this way:

  • taking the manual links are given in the TOML header (see alice/rabbit-hole)
  • finding a match by a specific chapter.translation_id string given in the TOML header (see alice/long-tale)
  • finding a match by chapter.src_path (see alice/tears)
  • finding a match by section number, if the TOC is structurally the same, checking by counting the number of sections

This covers the following scenarios:

If the translator copy-pastes the SUMMARY.md of the original, changing only the titles, translations will be identified by the .src_path.

If they rename the file names too, so that the URLs also are in the target language, but the TOC keeps the same sectioning structure, translations will be identified by section numbers.

If they change the sectioning structure too, they can insert a chapter.translation_id string in the original and the translation. Any string, not necessarily an UUID. This would maintain cross-links when the original changes file name.

If nothing else, they can provide the translation link directly in the TOML header. This breaks when the target file changes file name.

It has to be kept in mind that translations are projects on their own, and can even present the same material in a different structure than the original. The original and its translation are likely to be at different stages at various times. In addition, people have different workflows, they don't necessarily work in a one-two-three disciplined way either.

book.toml

The main language is recognized as the first given in the TOML. Otherwise it has to be marked with is_main_book = true.

The language code will always be the translation key, the language name can be set optionally.

[[translations.en]]
title = "Alice's Adventures in Wonderland"
author = "Lewis Carroll"

[[translations.fr]]
title = "Alice au pays des merveilles"
author = "Lewis Carroll"
language_name = "Français"

[[translations.hu]]
title = "Alice Csodaországban"
author = "Lewis Carroll"

folder layout

book-wonderland-multilang
├── book.toml
├── assets
│  └── images
│     ├── Queen.jpg
│     ├── Rabbit.png
│     ├── Tail.png
│     └── Tears.png
└── src
   ├── en
   │  ├── SUMMARY.md
   │  ├── long-tale.md
   │  ├── rabbit-hole.md
   │  ├── tears.md
   │  └── titlepage.md
   ├── fr
   │  ├── SUMMARY.md
   │  ├── cocasse.md
   │  ├── larmes.md
   │  ├── terrier.md
   │  └── titre.md
   └── hu
      ├── SUMMARY.md
      ├── cimoldal.md
      ├── konnyto.md
      ├── nyuszi.md
      └── tarka-farka.md

Documentation

  • update documentation in the code
  • update documentation in book-example

Structs

structs

Catching up

The last common commit with master was 8a178e3, I will have to catch up with the updates since then.

  • catch up with updates
  • resolve conflicts with master

azerupi and others added 27 commits June 28, 2016 16:38
parses toc and chapters

renders html

hbs helpers and asset embedding

copy static assets by pattern

review

fix prev nav link

copy local assets when found

multilang renders

is_multilang as property

theme is template

cli init and build

bump version

structs diagram
rename with _
translation links
css for translation links
@steveklabnik
Copy link
Member

Oh neat! I will dig into the technical details of this later.

One comment about the language switcher: it always goes back to the index, rather than switching the page that you're on. Is this intended?

@gambhiro
Copy link
Contributor Author

@steveklabnik For now the translation just links to the index page of each translation,
except in this case where the linking is defined in a TOML header of the
markdown chapter.

@gambhiro
Copy link
Contributor Author

Onward to glory. At this point it's over to you. No rush, take your time. I think this is now a decent update. I'm sure there will be bugs but I think I used the binary from this branch enough to catch the obvious ones.

Conflicts with master resolved, docs updated, tests are passing with the advanced method of ignoring the failing ones. I'd rather describe them in a separate issue then struggling with them here.

On linux i686 something is wrong with ci/script.sh, and on Windows it's related to the path separator. It might even be just the way the test is written.

@azerupi
Copy link
Contributor

azerupi commented Feb 7, 2017

I am going to go through a couple of topics, but since this is a huuuge change, I am not going to go through every detail.

Renderer

pub trait Renderer {
    fn build(&self, project_root: &PathBuf) -> Result<(), Box<Error>>;

    fn render(&self, book_project: &MDBook) -> Result<(), Box<Error>>;

}

The renderer should not launch the build and construct MDBook, it should rather happen the other way around. MDBook should be the "command center" where you launch the build, it will then call the renderer(s) to render the book to a specific format.

The user may specify multiple rendering targets in the configuration file:

# "outputs" is a table where each entry is the identifier of a renderer
# containing the configuration options for that renderer
[output]
html = { path = "book/" }
pdf = { path = "pdf/mdBook.pdf" }
# OR alternatively
# [output.html]
# path = "book/"
#
# [output.pdf]
# path = "pdf/mdBook.pdf"

In that case, mdBook should render to all specified targets when running build. If the renderer initiates the build, then every renderer will individually create an MDBook struct, which results in redundant work.

Also, mdBook should allow to be used programatically. This means that the MDBook struct can/could be created by an alternative means than parsing the summary file. When rendering, you don't want that struct to be destroyed.

So, the way I see it, the renderer should take an (immutable?) reference to the MDBook struct and perform it's task with that.

I had some vague idea for a hypothetical feature that I may want to implement in the future:
A web interface where the user can modify his book directly from the browser and get "instant feedback" on how it would be rendered. But for this to work, the renderer should only be a small link in a chain and not the overarching, all-powerful construct.

Multi-language

From your examples, it seems that there is no "primary language", but in my opinion it is better to make a distinction between the primary language and the translations.

The primary language will be used as the landing page and translated chapters can then be mapped to the ones of the primary language.

[languages]
en = { name = "English", default = true }
fr = { name = "Français" }
# OR alternatively
# [languages.en]
# name = "English"
# default = true
#
# [languages.fr]
# name = "Français"

Others

I really like the chapter TOML headers, a lot more than YAML headers. The only thing I would add is a way to escape it (if it's not already supported).

Way Forward

This is a huge change, it touches almost the whole codebase and unfortunately I can't review this as a whole. I have been trying to go through it for the last couple of days but there is just too much. 😕

I don't want to discourage you, because I really want to merge this, but the only way I see this moving forward is if it is split into multiple smaller PRs and merged incrementally.

There are a couple of design decisions that I would like to see changed, the two points above are the most important ones. There are also a couple of small changes I would avoid merging and others would require some modifications or improvements.

So considering all this, I think it is better to incrementally add the changes by making smaller PRs focused on one aspect that can be more easily reviewed and discussed. It will be more gratifying for both of us to see small parts of this PR get merged one at the time instead of the whoel thing being blocked by a couple of issues.

I now have time to work on it too, so let's coordinate our efforts!

@gambhiro
Copy link
Contributor Author

gambhiro commented Feb 8, 2017

Re: Renderer and MDBook.

When the user calls mdbook build, a few general things have to happen in any
case, not necessarily in this order:

  • a) figure out the render format(s)
  • b) call the appropriate renderer(s)
  • c) construct general representation (chapters, etc.) of the book from the input sources
  • d) construct any further data specific to the renderer (ebook metadata, etc.)
  • e) write out the target format's files.

So you always have this separation between a representation you build from the
input files, and the process of creating files in the specific manner for the
target format.

I did notice that the renderer function was originally attached to the MDBook
struct, but this got confusing and difficult to work with. The borrow checker
can't inspect the renderer. You can't derive debug. You can't construct a simple
MDBook for testing.

So I gave up on that and treated the MDBook as purely the representation of the
book, which then you can pass around mutably for internal construction (such as
TOC building) or immutably for producing files from (rendering).

This made it much easier to think about how to implement features. The MDBook is
data, give it to a function that does some work.

The Renderer trait is separated to build() and render() because it makes things
clear to encapsulate what sequence of function call prepare the MDBook as data
before we are ready to write output files from it. This is easier to write test
for as well.

Re: multiple rendering targets.

There is no problem there, since the MDBook is just data, you don't have to
start figuring out how multiple kinds of format-specific behaviours should
happen.

The config example you gave above can work that way too. Keep in mind you don't
have to do everything with MDBook's functions. The behaviour logic gets very
complicated to think that way.

The cli command function is the closest to the user's interaction, determine any
and all rendering targets there and call build() and render(). The cli
command is in the best position to figure out the user's intention from cli
arguments and parsing specific sections of the book.toml.

every renderer will individually create an MDBook struct, which results in redundant work.

This is a benefit, as target formats have different construction procedures.

For every target format, it is very specific what data needs to be prepared
before you are ready to write files. There may be format specific structs which
the rendere wants to construct as well, such as metadata for an ebook or
publisher data for PDF.

The renderer's build() encapsulates that. It is good that every renderer
should build its MDBook and other supporting structs.

Also, mdBook should allow to be used programatically.

This is fine. The tests are written like that.

A web interface where the user can modify his book directly from the browser
and get "instant feedback" on how it would be rendered. But for this to work,
the renderer should only be a small link in a chain and not the overarching,
all-powerful construct.

Imagine how the pieces connect. MDBook is just data. The web server has access
to an MDBook which it's API calls can manipulate. You serialize the MDBook to
JSON and the frontend requests it with a GET call. Say if it's Clojurescript,
construct an atom from it. User types his book in the frontend, Clojurescript
updates the atom, the changes are rendered realtime in the window.

You don't have to talk to the server until you are saving files, and then you
serialize the atom to JSON, put it in a POST request to the server and it will
update the necessarily files on the disk.

I can recommend these to whet your appetite:

Reagent: Minimalistic React for ClojureScript

reagent-template

Lambda Days 2015 - Norbert Wójtowicz - ClojureScript + React.js

ClojureScript Release - Rich Hickey

Interactive programming Flappy Bird in ClojureScript

Clojurescript had an enormous creative effect on me, really enjoyed building UI with it, finally I could stop thinking in Javascript.

Other ppl like Vue.js, I suppose it's a matter of temperament or something.

The only thing you should stay away from is Angular, don't believe the marketing.
It was good in it's time but the world moved on.

Re: multi-language

The first language in the book.toml is taken as the main language, unless the
user specifically sets is_main_book = true on a different one. At the moment
it affect which index page is the root landing page for a multi-language book.

So the information is parsed, other features can do logic with it.

Re: escaping TOML

What you mean is a TOML block between +++ lines in the middle of the page?

If the block has anything else than whitespace before it (i.e. not the first
thing in a chapter file), it is not taken as chapter attributes and not parsed.

Re: small PRs

I don't see how that would work or why is it necessary. You can't move this in a
piece-by-piece fashion, it wouldn't even compile.

It is backwards compatible as far as CLI users are concerned. The can update and
their book should build just the same.

Test whether the user-visible features work as intended, and if that works, then
I don't see the blocking factor.

@gambhiro
Copy link
Contributor Author

Hm, what should we do about this?

It's accumulating conflicts, and I can resolve that, but beyond that?

@steveklabnik
Copy link
Member

I would like to try and check it out this coming week.

I share @azerupi 's feeling in general, smaller, incremental PRs are better. I haven't actually looked at the diff here yet, so I can't say specifically, but this is just always true for any open source.

@gambhiro
Copy link
Contributor Author

Yes, I realise that small PRs are easier to review, but this changes the main structs in ways that doesn't really offer a step-by-step progression. There is one big refactoring at the beginning and then the additions are fairly incremental.

It is perhaps easier to review by scrolling through the files in the source, rather than using the PR diff.

@gambhiro
Copy link
Contributor Author

@azerupi @steveklabnik I added you both as collaborators to the PR's repo, if you see something you wish to change directly, by all means go ahead and commit it, I'm mostly interested in that the features work, and maybe you see better ways of doing it. Remember the PR is in the multilang branch.

@gambhiro
Copy link
Contributor Author

How do you feel about this?

Maybe this multilang business is something cool, but not necessarily a goal for mdbook as a tool.

I know that in the next few months I won't have time to contribute much to this PR.

So how about cancelling this? It is a loose end now with no apparent direction.

This little PR allowed me to learn Rust, it was much easier to get into the language with your encouragement. Thank you for the time you spent on feedback and discussion, it kept me motivated.

So I don't mind if you'd rather cancel this and approach the problem at another time. It has been an excellent experience for me, and the valuable results are often not the tangible ones.

@steveklabnik
Copy link
Member

How do you feel about this?

I mostly feel still terrible that I have not had the time to truly dig into a PR this big.

Maybe this multilang business is something cool, but not necessarily a goal for mdbook as a tool.

multilang is something we need, generally. That is, at least for The Rust Programming Language, we want this feature. It's just hard to review something so big, and it'll probably be a few more weeks before I can really spend the time to do things other than fix small issues in mdbook.

So I don't mind if you'd rather cancel this and approach the problem at another time. It has been an excellent experience for me, and the valuable results are often not the tangible ones.

Let's close it for now, then, as that sounds like the best thing to do. When I start to really dig into this problem, I may base it off of this work, we'll see.

Sorry that this has dragged on. I'm glad you got some stuff out of it. Open source is tough. ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants