Skip to content
This repository has been archived by the owner on Mar 21, 2021. It is now read-only.

JDL Compiler front-end Architecture & Implementation. #144

Closed
bd82 opened this issue Sep 9, 2017 · 16 comments
Closed

JDL Compiler front-end Architecture & Implementation. #144

bd82 opened this issue Sep 9, 2017 · 16 comments

Comments

@bd82
Copy link
Contributor

bd82 commented Sep 9, 2017

Introduction

This issue is meant for discussion of the existing compiler architecture and implementation,
It's short comings and alternative approaches to resolve these short comings.

Existing Issues.

Root Issue

The root issue is that the existing solution solves the wrong problem.
The solution solves the problem of a compiler for code generation
Not a compiler that can also providing editor services.

Examples of missing requirements:

  • Display multiple syntax errors, not just the first one.
  • Content assist suggestions.
  • Reproduce the original input text, (example for formatting JDL source code).

I recommend viewing this 30mins video on Modern Compiler Construction by Anders Hejlsberg for more details.

Secondary Issue

The other issue is that there is not enough separation of concerns in the existing implementation
Specifically separating the syntactic analysis phase (parsing) and everything else (Ast Building).

A small code snippet from the existing code clearly demonstrates this:

//Entities
entityDecl
  = jd:comment? SPACE* ENTITY SPACE* e:ENTITY_NAME SPACE* tableName:entityTableNameDecl? SPACE* eb:entityBody? SPACE*  {
    return { name: e, tableName: tableName ? tableName: e, body: eb, javadoc: jd };
  }
/ ENTITY SPACE* e:ENTITY_NAME SPACE* eb:entityBody? { return { name: e, body: eb, javadoc: '' }; }

Can you tell at a glance:

  • What is this snippet is the grammar?
  • What in structure (AST) is built?

Alternative Approach

There would be two main differences:

  1. Adding another step to the process:
    Instead of:
    • Lexing -> Parsing -> AST
      We will do:
    • Lexing -> Parsing -> CST/Parse Tree -> AST.

This CST/Parse Tree structure will represents the original input, thus enabling the basis for these more advanced editor flows.

  1. Replacing the Parsing library (peg.js) with one (Chevrotain) that supports other capabilities related to
    supporting editor related flows such as error recovery and multiple start rules.

Poc Highlights.

Instead of writing too many words lets just point to references in a run-able POC.
This POC implements a small subset of the JDL grammar to demonstrates the proposed
architecture and tooling. There several things missing, for example:

  • Reconnecting Comments to the AST.
  • Saving position information on the AST.

But the existing content is more than enough to review and discuss.

Phases

  1. Lexing Phase
  • Difference: Separate Lexing phase from parsing phase.
    • Can ignore Whitespace/Comments easily.
    • Can introduce ambiguities as parsing context is unavailable, for example different syntactic rules for
      naming entities/fields/types/... cannot be applied as the names have common prefixes and the lexer does not have any information on the current context.
  1. Parsing Phase

  2. Ast Building Phase

  • Difference: Separated from the Parsing Phase (separation of concerns).
  • Can be modified to build ASTs from partially valid CSTs.

Bonus Phase

  1. Formatter
    • This works directly on the CST/Parse Tree, which shows another example of separation of concerns
      as different "end users" can implement different parser related logic without modifying the parser
      itself.
@MathieuAA
Copy link
Member

The solution solves the problem of a compiler for code generation

The point wasn't to have a compiler, but only a parser.

Display multiple syntax errors, not just the first one

That's one of the issues I have with PegJS.

Content assist suggestions

The project doesn't handle this point.

Reproduce the original input text, (example for formatting JDL source code)

What's the point in doing this?

I don't understand your second issue.

@bd82
Copy link
Contributor Author

bd82 commented Sep 9, 2017

Parser vs compiler's front-end

The point wasn't to have a compiler, but only a parser.

This is a matter of semantics, some refer to the lexing + parsing + ast creation + semantic checks
as a "compiler front-end", but the exact terms do not really matter as long as we agree on the scope.

Content-Assist

The project doesn't handle this point.

Yes indeed it does not, which is why there is logic duplication in the JDL-studio project
related to lex/parsing for content assist.

So if the grammar changes, for example in the context of enriching the grammar that was discussed recently, that duplicate logic may also have to be modified.
I would also be surprised if this duplicated logic matches perfectly with the original grammar.

Original Text reproduction.

What's the point in doing this?

Lets say you want to build a linter for JDL
The linter would warn about too many spaces before a comma
or missing newline when opening an entity's block.

How would you implement this using the existing data structure output from the parser?
Do you know the exact position of every single token after parsing?
This full textual information would be needed for editor related functionalities such as
code formatting / linting / styling.

Another example would be refactoring.
Lets assume you want to refactor the name of a JDL constant.
This would require first finding all the usages of that constant which is easy.
But secondly it would require knowing those usages exact offsets to perform the replacements.
Once again we come back to needing information on the original text, not just the parser's AST output.

I don't understand your second issue.

I will try to emphasize this second issue once I fill the "alternative approach" section.
Code samples may speak louder than words here 😄

@bd82
Copy link
Contributor Author

bd82 commented Sep 14, 2017

Pinging @MathieuAA @deepu105 😄

  • What do you think of this proposal?
    • The good?
    • The bad?
  • Is this relevant for the JHipster Core project?
    • If so how do we proceed?

@deepu105
Copy link
Member

Since @MathieuAA knows much better about the current parser I'll leave the decision to him. My personal opinion is that this seems more readable and maintainable than PegJS but a little more work initially to migrate everything I guess

@MathieuAA
Copy link
Member

@deepu105 I've finally finished testing and playing with everything, and I think this is the way to go.
@bd82 your work is really astounding, and I can't emphasize that enough.
I guess the next step is cutting loose ends (the big'ol PegJS parser) and eventually cleaning things up.

@deepu105
Copy link
Member

@bd82 @MathieuAA just one question currently I use the output json from PegJS parser for JDL studio, with this will the json structure change?

@bd82
Copy link
Contributor Author

bd82 commented Sep 16, 2017

with this will the json structure change?

I think it should not change, otherwise the scope of changes would be too large to safely handle.
The #3 phase (Ast building) should build an identical data structure to what the Peg parser outputs now, so everything post parsing stays the same.

The difficult part is to ensure this actually happens without unintended changes...

@bd82
Copy link
Contributor Author

bd82 commented Sep 16, 2017

I guess the next step is cutting loose ends (the big'ol PegJS parser) and eventually cleaning things up.

I could probably help by migrating the the grammar (Phases 1 & 2).
The Ast building phase should probably be done by someone already familiar with the existing AST.

And thanks for the compliment @MathieuAA 😄

@bd82
Copy link
Contributor Author

bd82 commented Sep 24, 2017

Just a small update.

I am hoping to start working on the grammar conversion (Phase 2) this weekend.

@MathieuAA
Copy link
Member

This is getting messy, here's what I'm going to do:

  • put everything in the master branch into another branch
  • maybe remove the app parsing branch (for the time being)
  • make the dev branch the new master
  • make the app parsing work with PegJS
  • finally have a working release
  • migrate to the other system when everything is cleaned up

@bd82
Copy link
Contributor Author

bd82 commented Dec 10, 2017

Hi @MathieuAA

I've recently finished version 1.0.0 of Chevrotain, so more free time...
So I hope to invest some extra time to push this ahead, such as implementing the rest of the grammar myself.

However I can only minimize the amount of work needed not complete it all.
(As I've done by implementing a draft of phase1 and most of phase 2)

Hopefully I can minimize it enough to fit in your own free time constraints
😄 .

@MathieuAA
Copy link
Member

I still have some cleaning up to do (both in the internal project and in the app generation with PegJS), and I don't think the migration to chevrotain will be a priority. In my opinion, my initial mistake was not to 100% finish the app generation (and release it) before beginning the work needed for the chevrotain system.
I'm clearly not going to let that happen again, and I'm going to release a working app generation with PegJS first.
After that, the new parsing system will be the top priority.

@bd82
Copy link
Contributor Author

bd82 commented Dec 10, 2017

The more incremental approach seems best 👍

I will try to push migration issue a little farther in the upcoming new "side" branch.
Once the higher priority tasks are finished, ping me if additional help is needed for the migration.

@MathieuAA
Copy link
Member

Alright!

@bd82
Copy link
Contributor Author

bd82 commented Jun 26, 2018

I think we can close this issue as merged the initial implementation to master
and will hopefully soon finish handling the leftover issues.

@MathieuAA
Copy link
Member

yes

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants