Skip to content

Commit

Permalink
Add section on architecture
Browse files Browse the repository at this point in the history
Closes GH-6.

Reviewed-by: Christian Murphy <[email protected]>
  • Loading branch information
wooorm authored Feb 13, 2021
1 parent e4c2340 commit 1593eb5
Showing 1 changed file with 79 additions and 2 deletions.
81 changes: 79 additions & 2 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ time.
* [Footnotes](#footnotes)
* [Frontmatter](#frontmatter)
* [Differences from `@mdx-js/mdx`](#differences-from-mdx-jsmdx)
* [Architecture](#architecture)
* [Security](#security)
* [Related](#related)
* [License](#license)
Expand Down Expand Up @@ -1737,6 +1738,78 @@ export default MDXContent
* ± same as `main` branch of `@mdx-js/mdx`
* Fix JSX tags to prevent `<p><h1 /></p>`

## Architecture

To understand what this project does, it’s very important to first understand
what unified does: please read through the
[`unifiedjs/unified`](https://github.com/unifiedjs/unified) readme (the part
until you hit the API section is required reading).

**xdm** is a unified pipeline — wrapped so that most folks don’t need to know
about unified: [`core.js#L76-L101`](https://github.com/wooorm/xdm/blob/e4c2340b41d3354617aa42350306fd35cb57967d/lib/core.js#L76-L101).
The processor goes through these steps:

1. Parse MDX (serialized markdown with embedded JSX, ESM, and expressions)
to mdast (markdown syntax tree)
2. Transform through remark (markdown ecosystem)
3. Transform mdast to hast (HTML syntax tree)
4. Transform through rehype (HTML ecosystem)
5. Transform hast to esast (JS syntax tree)
6. Do the work needed to get a component
7. Serialize esast as JavaScript

The *input* is MDX (serialized markdown with embedded JSX, ESM, and
expressions).
The markdown is parsed with [`micromark`][micromark] and the embedded JS with
one of its extensions
[`micromark-extension-mdxjs`](https://github.com/micromark/micromark-extension-mdxjs)
(which in turn uses [acorn][]).
Then [`mdast-util-from-markdown`](https://github.com/syntax-tree/mdast-util-from-markdown)
and its extension
[`mdast-util-mdx`](https://github.com/syntax-tree/mdast-util-mdx) are used to
turn the results from the parser into a syntax tree:
[mdast](https://github.com/syntax-tree/mdast).

Markdown is closest to the source format.
This is where [remark plugins][remark-plugins] come in.
Typically, there shouldn’t be much going on here.
But perhaps you want to support GFM (tables and such) or frontmatter?
Then you can add a plugin here: `remark-gfm` or `remark-frontmatter`,
respectively.

After markdown, we go to [hast](https://github.com/syntax-tree/hast) (HTML).
This transormation is done by
[`mdast-util-to-hast`](https://github.com/syntax-tree/mdast-util-to-hast).
Wait, why, what does HTML have to do with it?
Part of the reason is that we care about HTML semantics: we want to know that
something is an `<a>`, not whether it’s a link with a resource (`[text](url)`)
or a reference to a defined link definition (`[text][id]\n\n[id]: url`).
So an HTML AST is *closer* to where we want to go.
Another reason is that there are many things folks need when they go MDX -> JS,
markdown -> HTML, or even folks who only process their HTML -> HTML: use cases
other than xdm.
By having a single AST in these cases and writing a plugin that works on that
AST, that plugin can supports *all* these use cases (for example,
[`rehype-highlight`](https://github.com/rehypejs/rehype-highlight)
for syntax highlighting or
[`rehype-katex`](https://github.com/remarkjs/remark-math/tree/main/packages/rehype-katex)
for math).
So, this is where [rehype plugins][rehype-plugins] come in: most of the plugins,
probably.

Then we go to JavaScript: esast (JS; an AST which is compatible with estree but
looks a bit more like other unist ASTs).
This transformation is done by
[`hast-util-to-estree`](https://github.com/syntax-tree/hast-util-to-estree).
This is a new ecosystem that does not have utilities or plugins yet.
But it’s where **xdm** does its thing: where it adds imports/exports, where it
compiles JSX away into `_jsx()` calls, and where it does the other cool things
that it provides.

Finally, The output is serialized JavaScript.
That final step is done by [astring](https://github.com/davidbonnet/astring), a
small and fast JS generator.

## Security

MDX is unsafe: it’s a programming language.
Expand All @@ -1755,9 +1828,9 @@ transforms, before finally serializing JavaScript.

Most of the work is done by:

* [`micromark`](https://github.com/micromark/micromark)
* [`micromark`][micromark]
— Handles parsing of markdown (CommonMark)
* [`acorn`](https://github.com/acornjs/acorn)
* [`acorn`][acorn]
— Handles parsing of JS (ECMAScript)
* [`unifiedjs.com`](https://unifiedjs.com)
— Ties it all together
Expand Down Expand Up @@ -1833,3 +1906,7 @@ Most of the work is done by:
[rollup]: #rollup

[caveats]: #caveats

[micromark]: https://github.com/micromark/micromark

[acorn]: https://github.com/acornjs/acorn

0 comments on commit 1593eb5

Please sign in to comment.