Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix lean syntax highlighting. #77

Merged
merged 6 commits into from
Oct 11, 2022

Conversation

lovettchris
Copy link
Contributor

@lovettchris lovettchris commented Sep 20, 2022

and fixes #10

The snippets now look nice:

image

But there's a problem with "lean, error" blocks, can we drop the "lean" language prefix on these?

image

@david-christiansen
Copy link
Collaborator

How is that code highlighting the Lean? It looks to me like it's using an inaccurate regexp-based highlighter. For instance, in head?, the ? is not highlighted as part of the name, which I think has a strong potential to confuse readers. Is there a place where we can adjust the regexps being used to highlight to make sure that it's up to date with modern Lean 4 conventions?

The formatting of the error blocks (and presumably also the info blocks?) is much better, though!

@lovettchris
Copy link
Contributor Author

lovettchris commented Oct 10, 2022

Yes it is using highlight.js from mdbook, which we customized adding a "Lean" section to the end of it, so yes, it is mostly a regex based syntax highlighter and this is the same technique used in

So if you find bugs in the syntax highlighting then we need to fix highlight.js for all these books. There's some question marks in https://leanprover.github.io/theorem_proving_in_lean4/interacting_with_lean.html which seem ok, so we'll have to investigate what is going on with your example...

@david-christiansen
Copy link
Collaborator

The link in question doesn't have any question marks in the same syntactic category that I see it in in your screenshot above - a name being defined with def. My guess is that something like [a-zA-Z][a-zA-Z0-9_]+ needs to have an extra \?char put in it. Fairly classic issue.

Precisely because a trailing ? is so unfamiliar to programmers who don't already know Scheme or Ruby, I think it's important to highlight that accurately before we merge the overall highlighting patch. I'm happy to do any fixups of Lean to lean needed as part of the merge (that's a super easy find-replace while resolving conflicts), but I don't think I have the bandwidth to debug the hightlighting JS, at least not until the next release is out.

I'd merge a PR that improves the display of info and errors immediately, that's a real clear win. And it's fine to drop the Lean prefix as long as that doesn't break the CSS (I think mdbook sticks different classes on the Markdown language indication than it does on the Markdown class indications, so we might need to write something like:

```output error

instead of

```lean error

which is another easy find-replace.

What would it take to use LeanInk instead as part of the build script for the book? I'm imagining something like processing its intermediate JSON notation to at least get accurate highlighting, even if the full experience is too much work to do. But if we can just grep the LeanInk HTML for code bits and it works to just inline those and add some JS at the top, even better! I definitely don't have time to do it, though.

@lovettchris
Copy link
Contributor Author

Yes, totally agree about '?' being an unusual identifier char, confused me first time I saw it, I had to ask Leo about it. I think the LEAN_IDENT regex is wrong, and needs to include ? and not just at the end of the identifier:

def test?forteen? :=
  IO.println "yep"

#eval test?forteen?

I'll work on a fix to the highlight.js and ensure Leo and others are fine with the fix in the other books also so we have the same highlight.js used everywhere. I'll think more about your leanink question and post comment on that soon.

@lovettchris
Copy link
Contributor Author

Note that "LeanInk" seems to get the colors wrong also, see https://leanprover.github.io/lean4/doc/monads/transformers.lean.html where def indexOf? highlights the ? in purple making out that it's something different from the identifier, which it isn't. So I'll work on a fix for this also.

@lovettchris
Copy link
Contributor Author

Ok, I'm proposing this fix, we'll see what Leo and Sebastian say about this:
leanprover/lean4#1713

@lovettchris
Copy link
Contributor Author

lovettchris commented Oct 10, 2022

Ok, I just updated the highlight.js in this PR so it is in sync with the fixed version in Lean4, it does not change the formatting of meta variables because in Lean you cannot start an identifier with question mark, but it does fix things like this, in book/html/getting-to-know/conveniences.html

image

@lovettchris
Copy link
Contributor Author

Ok, now back to your LeanInk question. First off yes, we should change the error outputs to language output error or something other than lean, except in rare cases where the lean snippet is invoking a lean pretty printer :-)

But LeanInk doesn't actually do syntax highlighting, it is alectryon that does that and it is a big complex python program for generating formatted markdown, but yes, you could write your book in .lean then pre-process that with alectryon (which calls leanink just to get the elaborated type information) and get a bunch of markdown which you then run through mdbook.

But all your python needs to run BEFORE alectryon to insert the code snippets! So I think it is too complicated to think about integration of alectryon right now for your book. I think long term it would be better to port all the python code in alectryon to Lean, integrate that into LeanInk, and then port all your python code to .lean and integrate that into LeanInk then we have one command line to rule them all, replacing mdbook :-) But that may take a while... but it is something we are talking about doing...

@lovettchris
Copy link
Contributor Author

Note: with this "PR" you need to remember from here on out to use lower case lean on the triple back tick lean code blocks.

@david-christiansen
Copy link
Collaborator

This looks good!

RE Alectryon and LeanInk - I'm aware of how it works, my curiosity is more as to whether it would be plausible to replace the current build process:

Lean file -> Python data structure mapping named examples to code strings -> Markdown with substituted strings

with something like:

Lean file -> LeanInk JSON -> Alectryon HTML/JS -> Python data structure mapping named examples to code HTML -> Markdown with inline HTML

I suspect it would require truly fearsome regular expressions, or alternatively some smarter HTML matching library together with fearsome regular expressions. But it seems plausilble that it could work.

The other, simple alternative is:

Lean file -> LeanInk JSON -> Accurately syntactically highlighted HTML -> Python data structure mapping named examples to code HTML -> Markdown with inline HTML

This seems easy enough, if somewhat tedious to implement. I could do this build process today:

Lean file -> Emacs buffer with LSP-based accurate highlighting -> Accurately syntactically highlighted HTML -> Python data structure mapping named examples to code HTML -> Markdown with inline HTML

with about 10 lines of elisp and a bit more Python, but I don't want to include batch-mode Emacs in the build script :-)

In any case, I'll merge this for now, as it's a clear improvement over what we had, and I'll be watching for formatting issues in my next proofreading pass. Thanks!

@lovettchris
Copy link
Contributor Author

The fixed highlight.js was just merged into the lean repo for the lean reference manual to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

odd character
2 participants