JDL Compiler front-end Architecture & Implementation. #144

bd82 · 2017-09-09T13:13:02Z

Introduction

This issue is meant for discussion of the existing compiler architecture and implementation,
It's short comings and alternative approaches to resolve these short comings.

Existing Issues.

Root Issue

The root issue is that the existing solution solves the wrong problem.
The solution solves the problem of a compiler for code generation
Not a compiler that can also providing editor services.

Examples of missing requirements:

Display multiple syntax errors, not just the first one.
Content assist suggestions.
Reproduce the original input text, (example for formatting JDL source code).

I recommend viewing this 30mins video on Modern Compiler Construction by Anders Hejlsberg for more details.

Secondary Issue

The other issue is that there is not enough separation of concerns in the existing implementation
Specifically separating the syntactic analysis phase (parsing) and everything else (Ast Building).

A small code snippet from the existing code clearly demonstrates this:

//Entities
entityDecl
  = jd:comment? SPACE* ENTITY SPACE* e:ENTITY_NAME SPACE* tableName:entityTableNameDecl? SPACE* eb:entityBody? SPACE*  {
    return { name: e, tableName: tableName ? tableName: e, body: eb, javadoc: jd };
  }
/ ENTITY SPACE* e:ENTITY_NAME SPACE* eb:entityBody? { return { name: e, body: eb, javadoc: '' }; }

Can you tell at a glance:

What is this snippet is the grammar?
What in structure (AST) is built?

Alternative Approach

There would be two main differences:

Adding another step to the process:
Instead of:
- Lexing -> Parsing -> AST
  We will do:
- Lexing -> Parsing -> CST/Parse Tree -> AST.

This CST/Parse Tree structure will represents the original input, thus enabling the basis for these more advanced editor flows.

Replacing the Parsing library (peg.js) with one (Chevrotain) that supports other capabilities related to
supporting editor related flows such as error recovery and multiple start rules.

Poc Highlights.

Instead of writing too many words lets just point to references in a run-able POC.
This POC implements a small subset of the JDL grammar to demonstrates the proposed
architecture and tooling. There several things missing, for example:

Reconnecting Comments to the AST.
Saving position information on the AST.

But the existing content is more than enough to review and discuss.

Phases

Lexing Phase

Difference: Separate Lexing phase from parsing phase.
- Can ignore Whitespace/Comments easily.
- Can introduce ambiguities as parsing context is unavailable, for example different syntactic rules for
  naming entities/fields/types/... cannot be applied as the names have common prefixes and the lexer does not have any information on the current context.

Parsing Phase
- Difference: The parser only deals with syntactic analysis, there is no code inside the parser
  To create an AST. Instead the parser automatically creates a CST/Parse Tree which can later
  be consumed for different use cases (separation of concerns mentioned earlier).
- Difference: Many capabilities obtained for "free" due to using Chevrotain.
Ast Building Phase

Difference: Separated from the Parsing Phase (separation of concerns).
Can be modified to build ASTs from partially valid CSTs.

Bonus Phase

Formatter
- This works directly on the CST/Parse Tree, which shows another example of separation of concerns
  as different "end users" can implement different parser related logic without modifying the parser
  itself.

The text was updated successfully, but these errors were encountered:

MathieuAA · 2017-09-09T13:46:09Z

The solution solves the problem of a compiler for code generation

The point wasn't to have a compiler, but only a parser.

Display multiple syntax errors, not just the first one

That's one of the issues I have with PegJS.

Content assist suggestions

The project doesn't handle this point.

Reproduce the original input text, (example for formatting JDL source code)

What's the point in doing this?

I don't understand your second issue.

bd82 · 2017-09-09T16:36:10Z

Parser vs compiler's front-end

The point wasn't to have a compiler, but only a parser.

This is a matter of semantics, some refer to the lexing + parsing + ast creation + semantic checks
as a "compiler front-end", but the exact terms do not really matter as long as we agree on the scope.

Content-Assist

The project doesn't handle this point.

Yes indeed it does not, which is why there is logic duplication in the JDL-studio project
related to lex/parsing for content assist.

https://github.com/jhipster/jdl-studio/blob/gh-pages/js/codemirror/codemirror.jdl-mode.js

So if the grammar changes, for example in the context of enriching the grammar that was discussed recently, that duplicate logic may also have to be modified.
I would also be surprised if this duplicated logic matches perfectly with the original grammar.

Original Text reproduction.

What's the point in doing this?

Lets say you want to build a linter for JDL
The linter would warn about too many spaces before a comma
or missing newline when opening an entity's block.

How would you implement this using the existing data structure output from the parser?
Do you know the exact position of every single token after parsing?
This full textual information would be needed for editor related functionalities such as
code formatting / linting / styling.

Another example would be refactoring.
Lets assume you want to refactor the name of a JDL constant.
This would require first finding all the usages of that constant which is easy.
But secondly it would require knowing those usages exact offsets to perform the replacements.
Once again we come back to needing information on the original text, not just the parser's AST output.

I don't understand your second issue.

I will try to emphasize this second issue once I fill the "alternative approach" section.
Code samples may speak louder than words here 😄

bd82 · 2017-09-14T17:46:35Z

Pinging @MathieuAA @deepu105 😄

What do you think of this proposal?
- The good?
- The bad?
Is this relevant for the JHipster Core project?
- If so how do we proceed?

deepu105 · 2017-09-15T08:48:33Z

Since @MathieuAA knows much better about the current parser I'll leave the decision to him. My personal opinion is that this seems more readable and maintainable than PegJS but a little more work initially to migrate everything I guess

MathieuAA · 2017-09-15T23:27:21Z

@deepu105 I've finally finished testing and playing with everything, and I think this is the way to go.
@bd82 your work is really astounding, and I can't emphasize that enough.
I guess the next step is cutting loose ends (the big'ol PegJS parser) and eventually cleaning things up.

deepu105 · 2017-09-16T08:27:26Z

@bd82 @MathieuAA just one question currently I use the output json from PegJS parser for JDL studio, with this will the json structure change?

bd82 · 2017-09-16T10:16:16Z

with this will the json structure change?

I think it should not change, otherwise the scope of changes would be too large to safely handle.
The #3 phase (Ast building) should build an identical data structure to what the Peg parser outputs now, so everything post parsing stays the same.

The difficult part is to ensure this actually happens without unintended changes...

bd82 · 2017-09-16T10:25:31Z

I guess the next step is cutting loose ends (the big'ol PegJS parser) and eventually cleaning things up.

I could probably help by migrating the the grammar (Phases 1 & 2).
The Ast building phase should probably be done by someone already familiar with the existing AST.

And thanks for the compliment @MathieuAA 😄

bd82 · 2017-09-24T22:25:18Z

Just a small update.

I am hoping to start working on the grammar conversion (Phase 2) this weekend.

MathieuAA · 2017-12-10T11:21:27Z

This is getting messy, here's what I'm going to do:

put everything in the master branch into another branch
maybe remove the app parsing branch (for the time being)
make the dev branch the new master
make the app parsing work with PegJS
finally have a working release
migrate to the other system when everything is cleaned up

bd82 · 2017-12-10T16:26:55Z

Hi @MathieuAA

I've recently finished version 1.0.0 of Chevrotain, so more free time...
So I hope to invest some extra time to push this ahead, such as implementing the rest of the grammar myself.

However I can only minimize the amount of work needed not complete it all.
(As I've done by implementing a draft of phase1 and most of phase 2)

Hopefully I can minimize it enough to fit in your own free time constraints
😄 .

MathieuAA · 2017-12-10T16:52:45Z

I still have some cleaning up to do (both in the internal project and in the app generation with PegJS), and I don't think the migration to chevrotain will be a priority. In my opinion, my initial mistake was not to 100% finish the app generation (and release it) before beginning the work needed for the chevrotain system.
I'm clearly not going to let that happen again, and I'm going to release a working app generation with PegJS first.
After that, the new parsing system will be the top priority.

bd82 · 2017-12-10T17:41:25Z

The more incremental approach seems best 👍

I will try to push migration issue a little farther in the upcoming new "side" branch.
Once the higher priority tasks are finished, ping me if additional help is needed for the migration.

MathieuAA · 2017-12-10T18:01:56Z

Alright!

bd82 · 2018-06-26T09:39:10Z

I think we can close this issue as merged the initial implementation to master
and will hopefully soon finish handling the leftover issues.

MathieuAA · 2018-06-26T11:21:37Z

yes

bd82 mentioned this issue Sep 9, 2017

WIP - POC with Chevrotain Parser. #142

Merged

bd82 mentioned this issue Feb 14, 2018

Feedback requested from Parser Combinators authors jneen/parsimmon#226

Closed

MathieuAA closed this as completed Jun 26, 2018

bd82 mentioned this issue Jun 26, 2018

Less Parser Architecture Discussion less/less.js#3234

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JDL Compiler front-end Architecture & Implementation. #144

JDL Compiler front-end Architecture & Implementation. #144

bd82 commented Sep 9, 2017 •

edited

Loading

MathieuAA commented Sep 9, 2017

bd82 commented Sep 9, 2017

bd82 commented Sep 14, 2017

deepu105 commented Sep 15, 2017

MathieuAA commented Sep 15, 2017

deepu105 commented Sep 16, 2017

bd82 commented Sep 16, 2017

bd82 commented Sep 16, 2017 •

edited

Loading

bd82 commented Sep 24, 2017

MathieuAA commented Dec 10, 2017

bd82 commented Dec 10, 2017

MathieuAA commented Dec 10, 2017

bd82 commented Dec 10, 2017

MathieuAA commented Dec 10, 2017

bd82 commented Jun 26, 2018

MathieuAA commented Jun 26, 2018

JDL Compiler front-end Architecture & Implementation. #144

JDL Compiler front-end Architecture & Implementation. #144

Comments

bd82 commented Sep 9, 2017 • edited Loading

Introduction

Existing Issues.

Root Issue

Secondary Issue

Alternative Approach

Poc Highlights.

Phases

Bonus Phase

MathieuAA commented Sep 9, 2017

bd82 commented Sep 9, 2017

Parser vs compiler's front-end

Content-Assist

Original Text reproduction.

bd82 commented Sep 14, 2017

deepu105 commented Sep 15, 2017

MathieuAA commented Sep 15, 2017

deepu105 commented Sep 16, 2017

bd82 commented Sep 16, 2017

bd82 commented Sep 16, 2017 • edited Loading

bd82 commented Sep 24, 2017

MathieuAA commented Dec 10, 2017

bd82 commented Dec 10, 2017

MathieuAA commented Dec 10, 2017

bd82 commented Dec 10, 2017

MathieuAA commented Dec 10, 2017

bd82 commented Jun 26, 2018

MathieuAA commented Jun 26, 2018

bd82 commented Sep 9, 2017 •

edited

Loading

bd82 commented Sep 16, 2017 •

edited

Loading