Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VIP: Custom Parser #563

Closed
fubuloubu opened this issue Dec 8, 2017 · 27 comments
Closed

VIP: Custom Parser #563

fubuloubu opened this issue Dec 8, 2017 · 27 comments
Labels
VIP: Discussion Used to denote VIPs and more complex issues that are waiting discussion in a meeting

Comments

@fubuloubu
Copy link
Member

fubuloubu commented Dec 8, 2017

Preamble

VIP: 563
Title: Custom Parser
Author: @fubuloubu @DavidKnott @jacqueswww
Type: Standard
Status: Draft
Created: 2017-12-08

Simple Summary

Implement a custom parser for Viper that doesn't directly tie us to Python-only syntax, enabling a more focused grammer for our langauge

Abstract

We've been discussing this for a while. A custom parser would allow us to define our syntax more precisely rather than leveraging Python syntax and being tied to only what Python's syntax can provide. We will continue to use Python as a template due to it's clarity and ease of reading, but we need to make decisions that diverge from Python and a custom Parse will enable that.

Motivation

There are specific things that have been discussed where this is necessary:

  • Clarifying the external contract type
  • Changing the mapping type for greater clarity
  • Custom Types
  • etc...

Specification

We may be able to leverage a Python-compatible lex/yacc library like ply. We should also leverage some of the work the k-framework guys are doing in order to infer a grammar that is consistent and free of formalized conflicts

Backwards Compatibility

Try to maintain backwards compatibility initially, however some of the VIPs this one will enable will be breaking changes in the syntax.

Copyright

Copyright and related rights waived via CC0

@dani-corie
Copy link

Please don't. Let's not turn this into another Solidity. :(

@fubuloubu
Copy link
Member Author

We're already running into limitations of the underlying python syntax, and for future growth there will be a need to violate the syntax in subtle and not-so-subtle ways. Viper is definitely a different language from Python, we will try to stick to the syntax as closely as possible but there are different situations that we need a custom parser to handle.

@DavidKnott
Copy link
Contributor

I don't think having a custom parser will negatively effect clarity, given how different writing smart contract code is to python code (particularly in terms of security) it would be helpful to customize certain parts of the parser. Here are a couple examples where a customer parser could come in handy:

  1. Using a more descriptive keyword than class for defining external
  2. Changing the syntax of logging to make it clearer, as MyLog: __log__({arg1: num}) is limited by python syntax

@tmke8
Copy link

tmke8 commented Dec 15, 2017

The danger of a custom parser is bugs. (There were 8 critical bugs in the Serpent compiler when Augur ordered an audit...)

External contracts could be nicely specified if this suggestion was implemented. You could say that you can only inherit from Contract and ExternalContract and there can only be one Contract per file. The inheritance would basically do nothing.

More descriptive keywords could be realized by the preprocessor that is mentioned in this proposal.

For the mapping syntax, something like this should be possible: Mapping[KeyType, ValueType].

@mslipper
Copy link
Contributor

One potential solution is to copy the existing Python grammar, modify it to match the Viper language, then generate a parser from that grammar using a tool like ANTLR. This confers a number of benefits:

  • Generated parsers have been around forever and are used regularly in security-critical applications.
  • Development is simplified considerably. The Viper team can focus on the language's syntax and features without spending time maintaining a bespoke parser.
  • Parser generators favor a grammar-first development workflow. The grammar won't be just another document to maintain - it'll be an integral part of the language. Having an up-to-date grammar simplifies third-party tool development (i.e., static analysis tools, syntax highlighters, etc.) considerably.

If there's interest, I'd be happy to build a small prototype.

@fubuloubu
Copy link
Member Author

@mslipper 👍 I think this is the approach we were getting at. Thanks for suggesting a tool!

Is there an easy way to integrate this with our Python flow (e.g. ANTLR wrapper module) so that the build process could be managed 100% in Python? ANTLR is a Java program, but I see some evidence that this is possible here

@fubuloubu
Copy link
Member Author

Meeting Minutes:

  1. Can we use custom keywords in the Python AST module?
  2. Is there a better way to segment/streamline work with an external Module to handle Lexing/Parsing

@mslipper
Copy link
Contributor

@fubuloubu ANTLR is written in Java, but it'll generate a parser in any language it supports. The python-target you linked is the exact solution you're looking for 😄. For build flow, I'd suggest adding a target to your Makefile that runs ANTLR prior to making the egg and running tests. That way there is no Java dependency for Viper's users, only developers.

@fubuloubu
Copy link
Member Author

We were discussing this along with a few other things in the call we had today. I think we're still a little reluctant to move to a custom solution fully. We were trying to figure out if there was a way to modify or extend the AST module to get what we're looking for, I think to do that we need a summary of the changes we are looking to make.

From the original post above, these are (with examples):

  1. Clarifying the external contract type VIP: Contract data type #541
my_contract: contract(
    foo(),
    bar() -> num,
)
  1. Changing the mapping syntax VIP: Change Mapping Syntax #564
my_map: map(basetype1 -> basetype2)
  1. Allowing custom types/type aliasing VIP: Named Structs #300
wei := num("wei")
fee: wei

We also chatted a bit today about #584, and I believe my proposed solution may be able to sidestep all of this by changing how types are handled a bit. I think most of our reasons for wanting a custom parser have more to do with being able to specify and easily work with different kinds of globals. Check out the bottom of that issue and feel free to add to the discussion.

@DavidKnott DavidKnott added VIP: Discussion Used to denote VIPs and more complex issues that are waiting discussion in a meeting post beta labels Jan 15, 2018
@maurelian
Copy link
Contributor

I've started working on defining the grammar in ANTLR. I'm following a similar approach used to create a js-solidity parser.

This will enable us to generate a parser to use in our Surya tool.

So far I've just been extracting the grammar from documentation and examples, but if the vyper project itself might make use of it in the future, it would be better to ensure the names and structure of nodes is similarly defined. Would someone from the core team be willing to spend 30 minutes walking me through the parser code, or even collaborate on defining the grammar?

@jacqueswww
Copy link
Contributor

@maurelian sure, glad to help - we can arrange a call time on gitter.

This will be a good start to define a grammer: https://github.com/python/cpython/blob/master/Parser/Python.asdl

@fubuloubu
Copy link
Member Author

fubuloubu commented Jul 25, 2018

One approach I've been wanting to take is a conversion step from the Python AST to a Vyper-specific AST. This can be defined in a friendly way for ANTLR or the K framework.

from vyper import ast
# Parses with Python ast, then Vyper ast
print('Vyper AST:', ast.parse(code))
# Prints out the grammer, perhaps in an ANTLR/K friendly format
print('Vyper AST Grammer rules:', ast._grammer)

@jakerockland
Copy link
Contributor

jakerockland commented Nov 28, 2018

Arrived here from what I've been following in #300. Curious on what the status is for the pathway to implementing this. From previous calls/conversations was the consensus on moving forward with a solution built with Sly @fubuloubu? That's what it seemed from the convo with @charles-cooper on #300, which makes sense to me but was also curious if/why the parser generator route had been ruled out.

@fubuloubu
Copy link
Member Author

fubuloubu commented Nov 28, 2018

@jakerockland we can discuss it for sure at the next meeting. If people want to take on this challenge, it may be the time to do it.

A few things to note:

  1. Writing a compiler front end is difficult. This will take at least 1 person-months to get right.
  2. Any good front end should be formally verified to avoid the risk of really insidious bugs. K framework can help.
  3. You will probably have to refactor a substantial portion of the compiler. That might not be the worst idea from a readability/maintainability perspective.

In regards to refactoring our current codebase, that is something @davesque was exploring. The current codebase mixes too many things from parsing into code generation. It would be nicer to see all compiler stages as separate modules with distinct interfaces between the stages, more akin to how you build compilers with functional languages like OCaml (which has an excellent set of libraries for that). I have a really, really, really old example of how that might be done here: https://github.com/fubuloubu/blocktract/blob/master/blocktract/ast.py

The idea is that each stage would be formalized in separate modules e.g. tokens.py, grammar.py, ast.py, a types/ directory, optimization/, etc. When adding new features or types, it would be pretty obvious how to do that, and there could be an auto-registration function that makes it trivial to register these new features with the overall compiler pipeline. That makes it easy to traverse through the stages and see how the code forms and morphs between each stage, making it really easy to debug when things go wrong and also easy to work on each step in series as you progress. Finally, the compiler interface should make it easy to hook into each stage and configure how the stages work together, which is important for handling optimzation correctly. Example of that here

@jacqueswww
Copy link
Contributor

jacqueswww commented Nov 28, 2018

I don't think a custom parser at this point is a good idea, we can plan this for the 0.2 release. But at this stage there is plenty of other "not as flashy" issues to work on. Using the tokeniser for class isn't the most elegant solution, but can be done without too much trouble. There is a reason we want to keep vyper parsable by the python ast, and that is that it will always stay firmly rooted in python.

To me the codebase isn't ready for a custom parser (yet), and needs refactoring, whereafter one probably does not need the custom parser :P

Happy to discuss further on the next call.

@jakerockland
Copy link
Contributor

@fubuloubu @jacqueswww Thank you both for all the input here! Would be great to loop back on this on the next call but definitely doesn't have to be a deep dive as there are a lot of hotter button issues that need to be resolved. Was mostly just curious what the state of this issue was 😄 👍

@davesque
Copy link
Contributor

davesque commented Nov 28, 2018

@jakerockland I also had some thoughts about using a custom parser when I originally started looking at Vyper. However, I think I agree with @jacqueswww that there are higher priorities. Of course, I'm still learning a lot about the entire codebase so my opinion is tastier with salt. 😄

@charles-cooper
Copy link
Member

A resource for researching different parser generators and tools https://wiki.python.org/moin/LanguageParsing

@charles-cooper
Copy link
Member

Of the options in the above link, the following seem reasonably modern/maintained, and also use grammars defined as some variant of EBNF (rather than python code):
https://github.com/erikrose/parsimonious
https://github.com/lark-parser/lark/
https://github.com/neogeny/TatSu/
https://github.com/pyparsing/pyparsing/

lark, pyparsing and tatsu have pre-written python grammar examples:
lark example
tatsu example
pyparsing example

@pipermerriam
Copy link
Contributor

pipermerriam commented Mar 11, 2019

We have been using https://github.com/erikrose/parsimonious for eth-abi for a bit and I'm in the process of using it to define the grammar for the s-expression format of webassembly. My experience thus far is quite positive, though I have little to compare it to. The library is quite small and simple and the manner in which the parsing happens has been easy to reason about.

@charles-cooper
Copy link
Member

I'm kind of liking tatsu, it lets us create our VyperAST module by just annotating the grammar (https://tatsu.readthedocs.io/en/stable/mini-tutorial.html#object-models) and it has abstractions for ast traversal (https://tatsu.readthedocs.io/en/stable/mini-tutorial.html#one-rule-per-expression-type) and code generation (https://tatsu.readthedocs.io/en/stable/mini-tutorial.html#code-generation). Not sure how powerful the latter is but I can see its potential.

@fubuloubu
Copy link
Member Author

Just noticed TatSu is a refactor of Grako, so it has a LOT more pedigree than the GitHub would lead you to believe!

@fubuloubu
Copy link
Member Author

Also, to summarize a discussion we had, this PR will be split into a few "stages". The stages of this VIP concerning actually replacing the use of the AST module is beyond the scope of the v0.1 release, but the early stages will prepare for this to make it as seamless as possible.

@jacqueswww
Copy link
Contributor

Closing in favour of #1363.

@fubuloubu
Copy link
Member Author

fubuloubu commented Apr 4, 2019

Can we make #1363 a VIP then? Capture the important bits of this one?

@jacqueswww
Copy link
Contributor

jacqueswww commented Apr 4, 2019

@fubuloubu We can, but we haven't really ever had to do a VIP for internal before? (or have we?)

@fubuloubu
Copy link
Member Author

That's true! If it doesn't change syntax, than it's just a refactor. If there are any syntax changes, we should make sure to capture those separately as VIPs so people can stay informed.

@fubuloubu fubuloubu mentioned this issue Jul 15, 2020
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
VIP: Discussion Used to denote VIPs and more complex issues that are waiting discussion in a meeting
Projects
None yet
Development

No branches or pull requests