Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown syntax for roles and directives #63

Closed
choldgraf opened this issue Feb 8, 2020 · 23 comments
Closed

Markdown syntax for roles and directives #63

choldgraf opened this issue Feb 8, 2020 · 23 comments
Labels
discussion no fixed close condition syntax descisions on syntax formats

Comments

@choldgraf
Copy link
Member

Last week we had a few nice conversations around "how to extend Markdown to support roles and directives from Sphinx".

This is a quick issue to try an keep track of our thinking there.

What kinds syntax chunks are there?

  • operate on raw text and yield a block element
  • operate on raw text and yield an inline element
  • operate on commonmark block content and yield a block element
  • operate on commonmark inline content and yield an inline element

What we arrived at

After a few conversations, we arrived at a syntax that uses triple-backticks, followed by directive name, followed by configuration with two options (either using {key=val} or YAML front-matter inside the code block).

So something like (ignore the slashes, just for rendering purposes):

\```mydirective {key=val}
\```

And for in-line text, using single-backticks followed by an identifier in the traits associated w/ it:

This is `my role`{myrolename key=val}

This effectively treats everything as "raw text", with the idea that this would degrade gracefully by just rendering as a raw blob if the directive didn't exist.

How would this clash with current markdown or rST behavior?

Something like this:

  • If a triple-backtick block is found, check the language associated with it
  • See if the language exists in a list of directives in the current environment
    • If so, treat it as a directive block and not a language block. Anything in {} becomes configuration for the directive. Anything inside the backticks becomes content that is processed by the directive.
    • If not, treat it as a language block

Something similar could be done with in-line blocks

What others have found

@choldgraf
Copy link
Member Author

A suggestion from John Macfarlane:

Sincewe've been talking about dedicated syntax that would map on to a directive, but wouldn't be confusable with code blocks, use what RMarkdown and Pandoc do and use {} for "special" inline or block literals, something like:

```{mydirective}
This is
my special section
literal
```

We could assume that any code blocks that had curly brackets were block-level directives, and reference the first element in the {} against our list of directives. If it doesn't exist, fall back to assuming it is just an attribute.

This would also be fairly parsable in other markdown parsers, since the {} pattern is quite common, and we wouldn't introduce any extra syntax. Also we could then still use

```language
This is
my language syntax
```

@choldgraf
Copy link
Member Author

also - just a note for @rowanc1 here, I feel like if we end up using Sphinx and have a directive / role syntax for markdown, then maybe that's a place where components.ink pieces could be inserted into content at build time by writing a role/directive that injects the proper JS and HTML into the page (maybe as a separate sphinx extension?) curious what you think about that...

@rowanc1
Copy link
Member

rowanc1 commented Feb 11, 2020

My interpretation of how this would apply is:

Some happy text.
```{ink-scope name=scope1}
``{ink-var name=x value=2}
My variable $x=$``{ink-display name=x}.

```

I am putting the scope in there as an example. It gets a bit messy, especially if you have multiple block directives. For example, styling an input as a callout box.

A couple of questions:

  • Would indentation or raw html input be allowed?
  • Any thoughts on empty content for inline elements?

For example: indentation

{ink-scope name=scope1}:
    ``{ink-var name=x value=2}
    {ink-callout kind=info}:
        Variable $x=$``{ink-display name=x}.

For example: html

<ink-scope name="scope1">
    ``{ink-var name=x value=2}
    Variable $x=$``{ink-display name=x}.
</ink-scope>

And that would either be ignored in other representations - or perhaps if you have an intermediate AST then it could last until there? I liked the comment you posted about the C markdown parser coming to a common xml representation that can be acted upon.

@chrisjsewell
Copy link
Member

chrisjsewell commented Feb 11, 2020

Behold the first Markdown directive parser!

See the bottom of https://github.com/chrisjsewell/mistletoe/blob/myst/test.ipynb

Current format is:

````{note}
abcd *abc* [a](link)

```{warning}
xyz
```

````

which is transformed to docutils AST:

<document source="">
    <note>
        <paragraph>
            abcd 
            <emphasis>
                abc
             
            <pending_xref refdomain="True" refexplicit="True" reftarget="link" reftype="any" refwarn="True">
                <reference refuri="link">
                    a
        <warning>
            <paragraph>
                xyz

FYI for all the tests (which are extensive) see: https://travis-ci.org/chrisjsewell/mistletoe

@choldgraf
Copy link
Member Author

@akhmerov
Copy link
Contributor

akhmerov commented Feb 11, 2020

This looks totally awesome!

A quick question: should it be possible to configure default directive/role akin to sphinx? These could use blank {} for example.

@jstac
Copy link
Member

jstac commented Feb 11, 2020

Very cool. I see you're picking up line numbers corresponding to cells in the AST. So ticking all the boxes already in terms of what was needed...

@chrisjsewell
Copy link
Member

So I've added testing against most of the docutils directives (see here), and added parsing of arguments, e.g.

```{image} path/to/image
```

The last part is to parse options. It has been mentioned about parsing like ```{name key=value}, but a major problem with this is it would break the current code fence regex, which looks for a string with no spaces for the language component (I also don't think it looks very nice).

I think the YAML block is the best way and I was thinking, for efficient parsing, it would be good to signify in the first line if the block contains options. Something like:
(note the +)

```{image}+ path/to/image
height: 20
width: 40
---
Here is a *caption*.
```

Then it would read everything as YAML until either a --- is found or the end of the block is reached.

@jstac
Copy link
Member

jstac commented Feb 11, 2020

You've worked hard!!!

I personally find the YAML syntax far more readable than {name key=value} when there are multiple options. But opinion will be split on that point.

Regarding the YAML syntax, could you do

```{image} path/to/image
---
height: 20
width: 40
---
Here is a *caption*.
```

That seems a bit more symmetric --- and hence easy to remember.

@chrisjsewell
Copy link
Member

Yeh that could also work ta.

Implemented roles and math as well now (no option key/val parsing yet). It actually could end up being more powerful than RST in some respects, because you can nest inline elements, which isn't possible in RST:

````{note}
abcd *abc* [a](link)

```{warning}
xyz
```

````

```{figure}+ path/to/image
height: 40
---
Caption
```

**{code}`` a=1{`} ``**

**$a=1$**

$$b=2$$

`` a=1{`} ``

goes to:

<document source="">
    <note>
        <paragraph>
            abcd 
            <emphasis>
                abc
             
            <pending_xref refdomain="True" refexplicit="True" reftarget="link" reftype="any" refwarn="True">
                <reference refuri="link">
                    a
        <warning>
            <paragraph>
                xyz
    <figure>
        <image height="40" uri="path/to/image">
        <caption>
            Caption
    <paragraph>
        <strong>
            <literal classes="code">
                a=1{`}
    <paragraph>
        <strong>
            <math>
                a=1
    <paragraph>
        <math_block xml:space="preserve">
            b=2
    <paragraph>
        <literal>
            a=1{`}

@chrisjsewell
Copy link
Member

chrisjsewell commented Feb 12, 2020

@choldgraf @mmcky @jstac @AakashGfude I've added the Sphinx Parser 😃

You just install my fork of mistletoe (pip install -e .[sphinx,testing], on the myst branch), and add extensions = ["mistletoe"] to your conf.py and it will pick up all the .md files.

Note if you look in myst/test/test_sphinx/test_sphinx_builds.py, I have set up automated testing of sphinx builds, for folders in myst/test/test_sphinx/sourcedirs. So if you run that with pytest it will actually generate the _build folders (comment out the remove_sphinx_builds fixture, so that they are not removed at the end of the test).

@chrisjsewell
Copy link
Member

@choldgraf FYI front-matter does start with --- (see here), so it makes sense in the directives to also do this, which I've now changed to:

```{name} argument text
---
option: 1
---
content with *markdown* **syntax**
```

@jstac
Copy link
Member

jstac commented Feb 12, 2020

Love your work @chrisjsewell. Outstanding.

@choldgraf
Copy link
Member Author

Duuude - it works! So cool! Tonight I'll try making a little sphinx documentation site in your myst branch using the content that @AakashGfude put together...I am curious how it'll look!

@jorisvandenbossche
Copy link

(Chris pointed me to those discussions; I am an extensive sphinx user due to being one of the maintainers of the pandas docs, which is a quite big sphinx site. And I am excited about the issues you are tackling here: I love sphinx, but I also love to see improvements to it ;))

One thing I am wondering: to what extent are you already set on the syntax for roles and directives?

It seems you are now taking the syntax for code (both for inline and blocks) with adding a role/directive name in the {}.

This is closer to existing markdown syntax, so I can imagine this is easier to extend an existing parser for this? (and it's also closer to things in the existing standard / pandoc, which are very good reasons)

But thinking about some usecases for roles in the documentation projects I am working with, and I think something along the lines of the generic directives syntax proposal might be easier to work with (as an end user):

Small example rst snippet:

We can link to :meth:`pandas.DataFrame` in the API reference
or to another section :ref:`here <label>` (:issue:`1234`).

How it might look like based on the role examples above (the details might not be correct):

We can link to `pandas.DataFrame`{meth} in the API reference
or to another section `here`{ref, id=label} (`1234`{issue}).

And how it might look like with the linked proposal:

We can link to :meth[pandas.DataFrame] in the API reference
or to another section :ref[here]{label} (:issue[1234]).

Personally, I think the third snippet "looks" better than the second (but that's very subjective of course. Maybe that's because I am so used to having colons in rst .. ;-))
But maybe a slightly more objective argument: I think having the role name come first, instead of in the end, improves readability. And it also gives more contrast with actual code snippets.

@chrisjsewell
Copy link
Member

chrisjsewell commented Feb 12, 2020

I think having the role name come first, instead of in the end, improves readability.

Yep that how it has now been implemented, as {name}`content`. I guess the issue with using square brackets, is that they are not degradable when using a standard Markdown parser; with backticks the content will remain raw text, whereas in brackets it will be treated as Markdown.

Also with colons, this might clash with the potential syntax extension of field lists . For example, if you want to be able to use the :orphan: metadata token.

@stefanv
Copy link

stefanv commented Feb 12, 2020

This is great stuff, thanks @chrisjsewell!

Wondering about that yaml header: if you use two --- lines, that takes up the majority of space in the fenced block. Can you think of any risk in removing the first instance? I couldn't immediately see a downside.

```{name} argument text
---
--- and any such arbitrary text
​```

Quickly surveying the landscape: in pandoc, the yaml blocks are surrounded by --- and ... respectively (no idea why); Hugo uses matching ---; org-mode uses #+VARIABLE_NAME: value.

@choldgraf
Copy link
Member Author

@stefanv I believe the main reason for this is because otherwise the regex search can become really expensive.

Imagine that you have lots of code blocks with parameters inside. Because --- is also valid markdown, you need to figure out if the --- is there because it is the break between YAML config and the content, or if it is just regular markdown ---. So you have to do some more complex search to figure it out.

If you know there's a character that defines "this is the start of config" then it becomes much easier, so adding a starting --- makes this trivial to figure out, at the cost of extra verbosity.

After using it a bit, I think a way we could get around this issue is to also support some kind of arguments in the first line, and suggest that people use this only if they have a very small number of arguments. Then if the number of args is non-trivial (maybe > 2 or so) they can use the YAML, and if they number of args is small they can keep it close to a one-liner.

@choldgraf
Copy link
Member Author

Another option would be to denote that arguments section with a special character on each line. For example, "parameters can be provided by starting a line with : at the beginning of the content block. E.g.:

```{directive}
:key: val
:key2: val2
:arg3:
:key4: |
  Val 4
Content
```

That would have the benefit of even more parity w/ rST. For a very short paragraph then you'd have something like:

```{code-block} python
:linenos:
My content
```

@chrisjsewell
Copy link
Member

chrisjsewell commented Feb 12, 2020

Yeh as I've noted in #24, I think I will add in a block token for docutils field list syntax, which I didn't actually realise before was part of the RST spec. Then you should be able to use:

```{name} arguments
:option: a
:non-kwarg:

Content
```

@choldgraf
Copy link
Member Author

@chrisjsewell is the idea that this would replace the YAML parsing? Or just be an option? I quite like the YAML syntax. Instead of allowing full rST syntax could we just say that if the block starts with lines that begin with : then those will be parsed as YAML lines? (AKA it is just a shorthand to avoid requiring the --- fences?)

@chrisjsewell
Copy link
Member

Yeh I don’t think I’m going to add actual parsing for these field lists any more; in favour of just using YAML. But yeh for directives you could maybe include that alternative approach.

@choldgraf
Copy link
Member Author

choldgraf commented Feb 16, 2020

I think it'd be helpful to include the : short-hand for metadata. That way there are basically two options for YAML metadata, depending on whether you care about conciseness. As an example we could recommend:

If there are <= 2 configuration lines:

```{directivename}
:key: true
:key2: config2
```

If there are >=2 configuration lines:

```{directivename}
---
key: true
key2: config2
key3: config3
key4: |
  Multi line
  config
---
```

Either would be valid, but for cases where the directive just needs one or two config
options (which is common) I think supporting : could keep things tighter. It would help avoid the case where there are more "configuration fence" lines than actual configuration options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion no fixed close condition syntax descisions on syntax formats
Projects
None yet
Development

No branches or pull requests

7 participants