Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add @mermaid-js/parser separate package #4450

Closed
wants to merge 190 commits into from

Conversation

Yokozuna59
Copy link
Member

@Yokozuna59 Yokozuna59 commented Jun 2, 2023

📑 Summary

Add @mermaid-js/parser separate package.

related issues:

📏 Design Decisions

Common:

  • Comments
  • Yaml
  • Directives
  • Title
  • Accessibilities

Diagrams:

  • C4
  • Class
  • Entity Relationship
  • Flow
  • Gantt
  • Git
  • Info
  • Mindmap
    • remaining
      mindmap
      root
        child1
          subchild
        child2
      
  • Pie
  • Quadrant
  • Requirement
  • Sankey
  • Sequence
  • State
  • Timeline
  • User Journey

📋 Tasks

Make sure you

  • 📖 have read the contribution guidelines
  • 💻 have added unit/e2e tests (if appropriate)
  • 📓 have added documentation (if appropriate)
  • 🔖 targeted develop branch

@Yokozuna59

This comment was marked as resolved.

@Yokozuna59

This comment was marked as resolved.

@Yokozuna59

This comment was marked as resolved.

@sidharthv96
Copy link
Member

@Yokozuna59 I've made parse in Diagram async and added pie back into mermaid.
But I'm getting the following errors when trying to run it in the browser. We shouldn't have the language server in the browser. The final mermaid library needs to be minimal, and contain only the parsing and AST logic.

I'd also avoid using vscode-uri if that was an option.

image

@sidharthv96
Copy link
Member

Also, I think we should move the code from mermaid-lint into packages/parser so that this PR actually has everything we need to know regarding the change to langium. We cannot have the grammar outside mermaid repo anyway.

@Yokozuna59
Copy link
Member Author

Yokozuna59 commented Jun 4, 2023

@Yokozuna59 I've made parse in Diagram async and added pie back into mermaid. But I'm getting the following errors when trying to run it in the browser. We shouldn't have the language server in the browser. The final mermaid library needs to be minimal, and contain only the parsing and AST logic.

Thanks! I'll try figuring out how to resolve this. I haven't seen any based langium to be used as parser without LSP. So I'm sure if I'm doing the best approach yet.

I'd also avoid using vscode-uri if that was an option.

It's appear it's not, I tried without and it throws exceptions.


Also, I think we should move the code from mermaid-lint into packages/parser so that this PR actually has everything we need to know regarding the change to langium. We cannot have the grammar outside mermaid repo anyway.

Will do asap.

@Yokozuna59
Copy link
Member Author

Should it be packages/parser or packages/mermaid-parser? So that it would be inline with other packages?

@msujew
Copy link

msujew commented Jun 4, 2023

@Yokozuna59 FYI, in your case, you don't need to perform the workspace initialization, making it unnecessary to make the whole parse function async. You don't even need to perform the build step, given that the language doesn't have any cross references and you don't need the diagnostics/validations to run in order to get an AST. You can get away with way less work:

function createParserServices() {
  const services = createMermaidServices(EmptyFileSystem).Mermaid;
  const parser = services.parser.LangiumParser;

  const parse = (input: string) => {
    return parser.parse<PieChart>(input);
  };

  return { services, parse };
}

If you turn Langium into nothing but a parser, most things in the framework become very much obsolete. However, since the service initialization needs to bootstrap the grammar from a json format, internal dependencies from Langium such as vscode-uri aren't optional. Treeshaking (i.e. removing all the LSP code etc.) probably also isn't doing too well, since we basically export the whole framework from the langium/index.js import. Note that in theory it should be possible to engineer a version of mermaid-parser that doesn't require these dependencies but it's more work than necessary, see below.

It really depends on how much you guys are interested in having full LSP or advanced diagnostics reporting for the language, but it seems to me like using Langium is a bit of overkill for the parsing aspect of mermaid. It basically turns Langium into a glorified EBNF to Chevrotain code transformer, with a lot of unnecessary overhead. As much as I'd love to see it used in here - we are avid users of mermaid ourselves, much of our documentation is rendered with it - I think a much slimmer solution using Chevrotain directly would benefit this effort greatly.

Just my 2 cents.

@sidharthv96
Copy link
Member

sidharthv96 commented Jun 4, 2023

@msujew I did a POC with Chevrotain few months back, but discovered Langium when using Zenstack. Eventhough I loved writing the grammar in TS for Chevrotain, I feel the langium syntax is a less disruptive and easier to read one.

If you turn Langium into nothing but a parser, most things in the framework become very much obsolete.

There's 2 aspects we need to handle.

  1. The mermaid runtime. This is what gets bundled into mermaid.min.js and gets used by most people. This needs to be as small as possible. A slim EBNF to Chevrotain code transformer might be what we need here.
  2. The mermaid "toolkit". Linter, Formatter, Code Highlighting, Completion, all the stuff that mermaid is lacking currently. This is mainly what we are trying to solve with Langium. This can be a separate package, which will be used by vscode plugins, eslint/prettier plugins, mermaid.live, etc.

We can't maintain both jison and langium grammars as they're guaranteed to diverge, and jison is a source of some of our problems. So, is there any way we can have both from the same grammar file?


@Yokozuna59 packages/parser is fine as it's our internal parser. We can release the package as @mermaid-js/parser or something.

@Yokozuna59
Copy link
Member Author

@Yokozuna59 FYI, in your case, you don't need to perform the workspace initialization, making it unnecessary to make the whole parse function async. You don't even need to perform the build step, given that the language doesn't have any cross references and you don't need the diagnostics/validations to run in order to get an AST. You can get await with way less work:

function createParserServices() {
  const services = createMermaidServices(EmptyFileSystem).Mermaid;
  const parser = services.parser.LangiumParser;

  const parse = (input: string) => {
    return parser.parse<PieChart>(input);
  };

  return { services, parse };
}

@msujew Thank you so much!

If you turn Langium into nothing but a parser, most things in the framework become very much obsolete. However, since the service initialization needs to bootstrap the grammar from a json format, internal dependencies from Langium such as vscode-uri aren't optional. Treeshaking (i.e. removing all the LSP code etc.) probably also isn't doing too well, since we basically export the whole framework from the langium/index.js import. Note that in theory it should be possible to engineer a version of mermaid-parser that doesn't require these dependencies but it's more work than necessary, see below.

It really depends on how much you guys are interested in having full LSP or advanced diagnostics reporting for the language, but it seems to me like using Langium is a bit of overkill for the parsing aspect of mermaid. It basically turns Langium into a glorified EBNF to Chevrotain code transformer, with a lot of unnecessary overhead. As much as I'd love to see it used in here - we are avid users of mermaid ourselves, much of our documentation is rendered with it - I think a much slimmer solution using Chevrotain directly would benefit this effort greatly.

I guess one of the reasons for making @sidharthv96 suggest Langium is because it supports LSP. It can be used (and mostly will be used if it is stable enough) in mermaid-live-editor since it supports and resolves many different errors in the current parser. So it's two birds with one stone.

So currently, it might be better to support the grammar of the current diagrams without affecting other parts, then support LSP with other services for enhancement. If we put all the effort into one diagram (supporting validation, format, completion, etc.), it would take forever instead of full migration, then support other features.

@sidharthv96 sidharthv96 added this to the v11 milestone Jun 4, 2023
@Yokozuna59
Copy link
Member Author

@sidharthv96 I'm currently having some issues for adding the files from the original repo, the tsconfig.json effects the imports in the source code

I have updated moduleResolution to be node which resolve most issues, but still cant import chevrotain.

image

@sidharthv96 sidharthv96 changed the base branch from develop to next June 4, 2023 12:15
@sidharthv96
Copy link
Member

We shouldn't go back to node. #4213


If we put all the effort into one diagram (supporting validation, format, completion, etc.), it would take forever instead of full migration, then support other features.

Yes, our plan should be to introduce langium in the easiest and least disruptive way, and then build the cool features on top afterwards.

@Yokozuna59
Copy link
Member Author

Yokozuna59 commented Jun 4, 2023

We shouldn't go back to node. #4213

But the generated grammar from langium don't add .js to files, it doesn't make since to add .js after each time building ast.

@msujew
Copy link

msujew commented Jun 4, 2023

I feel the langium syntax is a less disruptive and easier to read one.

That's my main gripe with Chevrotain as well: Grammars become really difficult to read/maintain once they reach a certain size and complexity, especially with embedded actions in the mix.

The mermaid "toolkit". Linter, Formatter, Code Highlighting, Completion, all the stuff that mermaid is lacking currently. This is mainly what we are trying to solve with Langium. This can be a separate package, which will be used by vscode plugins, eslint/prettier plugins, mermaid.live, etc.
We can't maintain both jison and langium grammars as they're guaranteed to diverge, and jison is a source of some of our problems. So, is there any way we can have both from the same grammar file?

Ok, with that in mind, I can see why you would want to use Langium. Editor support for web/vscode/cli/linters is the core use case we've been going for after all. And yes, I agree, having two sources of truth in regards to the grammar will always lead to problems.

I'll think about the idea of splitting of some of the Langium features into separate packages, so that features such as the grammar-to-chevrotain transformer can be consumed independently of the rest of the framework. The main issue here is that everything in Langium is bootstrapped, i.e. our own grammar language is written in the grammar language. This has some implications on what we can extract from the framework.

Anyway, glad to hear you're still trying to go forward with Langium! I'm always available if you want some input from my side or have any other questions :)

@sidharthv96
Copy link
Member

sidharthv96 commented Jun 4, 2023

But the generated grammar from langium don't add .js to files, it doesn't make since to add .js after each time building ast.

I suppose if it's a valid scenario, we could add a flag to the langium generator to add .js extensions?

@remcohaszing do you think using the new bundler will cause a downstream issue?

Tagging @aloisklink too.

@Yokozuna59 Installing @chevrotain/types fixed the issue for me.

@Yokozuna59

This comment was marked as resolved.

@Yokozuna59
Copy link
Member Author

@sidharthv96 I have pushed files from mermaid-lint repo, but still there is some issues with tsconfig.json and I couldn't resolve them tbh.

@Yokozuna59 Yokozuna59 changed the title add parser in mermaid-parser for pie chart add mermaid-parser separate package Jun 4, 2023
@Yokozuna59
Copy link
Member Author

@Yokozuna59 if you need to experiment with esbuild, you can use #4109.

The problem why we can't use it now is because we need the umd bundle, which had some issues with esbuild.

Okay, I'll give it a try.

You can also use a plugin to stub the unused vscode imports to keep the bundle size down. (But we can do that separately. Priority is getting a digram to render)

Is it possible to stub more than that? If we could stub the language server related stuff then it should render diagrams in browsers.

@sidharthv96
Copy link
Member

I guess you could stub any dependency required. As long as the stubbed dependencies are not used in runtime, it shouldn't be a problem. You could simply stub with an empty object most probably.

@Yokozuna59
Copy link
Member Author

Then should we try stubbing instead of esbuild? I guess stubbing would resolve two issues at the same time, whereas esbuild may resolve one.

@Yokozuna59 Yokozuna59 mentioned this pull request Aug 11, 2023
4 tasks
@sidharthv96
Copy link
Member

sidharthv96 commented Aug 12, 2023

image

Finally!

After using esbuild from #4109

@Yokozuna59
Copy link
Member Author

Hi there, @msujew. Thanks for your consistent support while helping us work on the new parser. We currently plan to support two diagrams within that parser, info and pie, since their files are converted to TS in #4501 and #4486. We will support both of those diagrams in separate PRs so that contributors and maintainers are aware of what files need to be modified and how to integrate new diagrams into Mermaid. So we're going to support info in #4727, and if you have time, please take a look at parser files that are inline with Langium services and possible to use for LSP in the future.
 
We're thinking of using the ServiceRegistry service you suggested here to detect diagrams instead of current, but not sure how it works or how to use it while we're exporting all separate grammars.

@sidharthv96
Copy link
Member

@Yokozuna59 I think the method suggested is the same that we are already doing. Testing the diagram text with a regex and using the corresponding object.

The only difference is that, in our case (currently), the testing will occur inside the mermaid part, and the parser will be called with the detected type and the text to parse.

@sidharthv96 sidharthv96 removed this from the v11 milestone Aug 18, 2023
@Yokozuna59
Copy link
Member Author

@msujew I tried using the ServiceRegistry with your suggest code, I have implemented with some modifications. Anyway, it keep throwing errors:

Uncaught Error: mermaid: no progress in tokenizer in rule: (unknown)

image

Error:  Error: The service registry is empty. Use `register` to register the services of a language.

image


The MermaidServiceRegistry will call register function and register all services, but the getServices function hasn't been call, I'm not sure how this service work. I tried using createInfoServices instead of createMermaidServices and it only thrown the first error but it was working fine and suggesting according to context.

Should I push those changes into a separate repo?

@msujew
Copy link

msujew commented Aug 25, 2023

@Yokozuna59 Please do. Having a small example repo would be great :)

@Yokozuna59
Copy link
Member Author

Yokozuna59 commented Aug 25, 2023

@msujew Sure, here it is https://github.com/Yokozuna59/mermaid-service-regitry-example.

To build:

pnpm build:web

To serve:

pnpm serve

@msujew
Copy link

msujew commented Aug 25, 2023

@Yokozuna59 There were a few issues in how you create the shared services, mainly, you're not allowed to override any property; I believe the underlying proxy doesn't even care that you "overwrote" a property. It will just ignore the assignment. Also, I'm not sure how well you can reuse a service container that already has another shared container. I think the shared container will point to the wrong reference, making some services probably behave incorrectly.

I additionally noticed an issue in my own advice: Since the documents factory requires the services (and vice versa), we need a quick way to identify the document type before going to the service registry. I've made a working version here.

@Yokozuna59
Copy link
Member Author

you're not allowed to override any property; I believe the underlying proxy doesn't even care that you "overwrote" a property. It will just ignore the assignment.

@msujew I see, then we'll need to use inject in this case.

Also, I'm not sure how well you can reuse a service container that already has another shared container. I think the shared container will point to the wrong reference, making some services probably behave incorrectly.

I additionally noticed an issue in my own advice: Since the documents factory requires the services (and vice versa), we need a quick way to identify the document type before going to the service registry.

So it would behave similar to detectType in mermaid, we'll need to remove all yaml, directives, and comments that are at the beginning of the document to let the detector work correctly.

I've made a working version here.

Thanks!


It's kind of weird; some parts of the grammar aren't supported; in this case, it's showData in info:

image
image


And it's stopped suggesting according to context; here it should suggest showData as a keyword:
image


And still this error is being thrown:
image

@msujew
Copy link

msujew commented Aug 28, 2023

And it's stopped suggesting according to context; here it should suggest showData as a keyword:

Yeah, the completion for whitespace/newline sensitive languages can sometimes be very weird. We're looking into improving that on the framework level.

And still this error is being thrown:

You're using an invalid monarch grammar (i.e. {}). Disabling the syntactical highlighting or fixing the monarch grammar resolves the issue.

So it would behave similar to detectType in mermaid, we'll need to remove all yaml, directives, and comments that are at the beginning of the document to let the detector work correctly.

Right, exactly :)

It's kind of weird; some parts of the grammar aren't supported; in this case, it's showData in info:

Isn't the keyword showInfo?

@Yokozuna59
Copy link
Member Author

Yeah, the completion for whitespace/newline sensitive languages can sometimes be very weird. We're looking into improving that on the framework level.

Awesome! But I'm not sure if the problem is because of whitespace/newline sensitive languages or because of ServiceRegistry, because when using createInfoServices it works fine, but when using createMermaidServices it doesn't.

You're using an invalid monarch grammar (i.e. {}). Disabling the syntactical highlighting or fixing the monarch grammar resolves the issue.

So that's why highlighting isn't working. I'll try figuring out how to resolve it later on. But I removed all unnecessary rules but it keeps throwing and syntax highlighting isn't working.

Isn't the keyword showInfo?

Oh, sorry :)

@EchoZhaoH
Copy link

This PR is great! Do you have any plans to merge?

@Yokozuna59
Copy link
Member Author

This PR is great! Do you have any plans to merge?

@EchoZhaoH No, we don’t plan to merge it.
This PR was just a POC parser using Langium. We’re currently creating separate PRs to implement each diagram parser individually:

The new package has already been merged into the next branch (v11), but others are under review.

In case you’re wondering why this PR is still open, it's because we haven't created PRs for each existing parser yet.
And in case you’re wondering why I haven’t created those parsers yet, it’s because of:

  1. I don’t have much time right now.
  2. We need to standardize and convert each diagram JS file to TS before or while integrating the parser.

@EchoZhaoH
Copy link

i see, thx @Yokozuna59

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment