Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft markdown flavor JEP #99

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

fcollonval
Copy link
Contributor

@fcollonval fcollonval commented Mar 10, 2023

This draft PR is to share the output of the Notebook format workshop on specifying the Markdown flavor description in notebook file.

Please discussed this proposal in the associate issue: resolve #98.

For implementation-oriented JEPs, this section should focus on how other Jupyter
developers should think about the change, and give examples of its concrete impact. For policy JEPs, this section should provide an example-driven introduction to the policy, and explain its impact in concrete terms. -->

This proposal will introduce a new key `mimetype` to the markdown cell. It is not mandatory to allow backward compatibility. And therefore, this proposal is introducing a _fallback value_ (TBD).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may allow going a bit more broad than I'd recommend, since mimetype can be a lot of unrelated and yet well specified types.

Conceptually though, calling it markdown_format would make sense. Limit it to an enum of possible variants:

  • marked
  • commonmark
  • gfm
  • commonmark+latex

Copy link

@stevejpurves stevejpurves Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from being well established way to identify the content the interesting thing using mimetype here is thst. is could potentially be used to specify other content like YAML, or HTML or SQL which could be rendered appropriately in a "markdown" cell - maybe we should think of markdown cell more as an "input" cell.

Immediately opens up the case for notebooks with hetrogenous inputs (for better or worse) including markdown variants, or things like YAML, or SQL and for that content to be properly identified - I believe this content is all already in use in notebooks but not well supported

Then the markdown cell can have a single `output` storing the rendered content as a mime bundle. If for example the mime bundle is providing text/html rendered content, a tool like nbconvert could inject that html directly when converting the notebook to HTML.

Due to the markdown fragmentation, a side effect of this JEP is the need to define a fallback Markdown flavor. From what is out there, the best path seen is to create and maintain an integration test suite for the fallback supported syntax. This will clarify the supported syntax and the associated rendered HTML. This is similar to what [CommonMark](https://spec.commonmark.org/) and [GFM](https://github.github.com/gfm/) are defining.
A starting point would be that [pull request](https://github.com/jupyterlab/benchmarks/pull/97). But moving it into [nbformat](https://github.com/jupyter/nbformat) repository. The open-source tools will need to be aligned - in particular the question of aligning Python based renderers like nbconvert and web based renderers like JupyterLab/Notebook must be tackled?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than do it per cell, I'd aim for the notebook document itself to have the markdown format specified at the top level. There are text/markdown outputs after all.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to see some more discussion on this point, We were a bit torn when discussing this at the workshop and a few of the open questions point to this..

A couple of scenarios we thought about here think about here:

  1. a notebook is created from scratch in a single client, and edited there
  2. a notebook is created by someone in client A, and then shared, opened and one cell is edited by someone else in a different client, client B. The two clients render/support different markdown flavours.

So when setting the cell mimetype type per cell

  • felt like it was more in line with upcoming proposals to allow inputs to be fully mimebundle based
  • In scenario (2), client B would have to make a best efforts to render the markdown cells on displaying the notebook. When a user edited a markdown cell - the client would then (what?) assume the content was it's markdown flavour and update the mimetype accordingly?
  • opens up the doors for hetrogenous notebooks with different "Markdown" cells containing any mimetye identified content e.g. SQL, yaml and for that to be rendered appropriately by extensions (no matter what is intended, once. mimetype is available it'll likely be leveraged for this)

When setting the markdown type per notebook:

  • in scenario (2) what would happen on edit? would a client change the notebook wide markdown type to it's own or leave it as be? would it try to convert content to it's own flavour (probably it's not in a position to!)
  • or the mimetype is taken as a hint, to indicate what renderer is best suited to handle the content in the notebook
  • should clients expose UI to change this?

It feels like without stepping out of this JEP and into the larger discussion new mimetype based inputs, which could accomodate a commonmark markdown fallback alongside -- it'll be hard to reconcile these issues around "editing" here but adding the mimetype field to either improves the experience of rendering, provided the content is consistent with the declared type!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also note the discussion on this issue

},
```

The structure of the mime type must follow the standard defined in [RFC7763](https://www.rfc-editor.org/rfc/rfc7763): `text/markdown;variant=<variant name>` (variant parameter is optional).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL there's a way to declare a variant of markdown.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


The resulting fully specified `mimetype` would be `text/markdown;variant=GFM`

The [RFC7763](https://www.rfc-editor.org/rfc/rfc7763.html) specifies a [registry provided by IANA](https://www.iana.org/assignments/markdown-variants/markdown-variants.xhtml).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both quarto and myst have been added to the registry in the last few weeks based on follow ups from the notebook format meeting. If would be great to add them to the JEP!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow up @rowanc1 ? Would you mind providing guidance how the process to add such variant?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can register a new variant by filling out this form: https://www.iana.org/form/protocol-assignment

@krassowski
Copy link
Member

I think that this proposal should ideally address the question of security model for proposed outputs. The current Jupyter Notebook security model prohibits execution of JS and forces sanitation of HTML/CSS in all Markdown cells. Since outputs in this proposal are modelled on code cells (which can be trusted or not), would it be worth re-visiting the trust question?

Please see more thoughts on potential trust evolution in #95 (comment).

@fcollonval
Copy link
Contributor Author

We have started looking at this at the SSC meetings. We have decided to give at least another 2 weeks of discussion before moving forward.

- Switch to `outputs` to align better with existing data structure
- Define default/fallback variant
- Provides answers to open questions
- Add question about trust
@jjallaire
Copy link

What is the current thinking around the precise content type of the fallback ("Rendered Markdown source")? This might be clear in the revised text of the proposal but I wanted to be sure. The scenarios I'm thinking offer for renderers like nbconvert or for notebook editors that don't know about the specified mimetype. In these cases it seems like editing tools should provide a standard Jupyter Markdown version of the content -- e.g. if a callout supported by MyST or Quarto is in the markdown, the fallback should be something suitable e.g. a markdown blockquote). The alternative would be to provide e.g. text/html however if you are attempting to render a PDF or DOCX this won't be helpful (whereas Jupyter Markdown will be).

I also wonder if given that notebook editors won't immediately be aware of this extension whether the actual source should be always written in Jupyter Markdown, and the source that uses an extended markdown variant be provide separately. This would allow "variant aware" editors to deal with the content properly but allow the rest of the ecosystem to keep working without changes? This might already be anticipated in an aspect of the proposal I'm not appreciating, but in the case that its not is definitely worth considering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pre-proposal: Specify the Markdown cell's markdown flavor
6 participants