-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement support for contextual keywords #598
Conversation
🦋 Changeset detectedLatest commit: d14285e The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
I knew I forgot about something important...
It was introduced alongside the `revert` contextual keyword, see ethereum/solidity#11037.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a hack, this works, but I don't like the way it confuses the model re keywords in general, and my discomfort is telling me we should do it properly. Will discuss f2f.
Non-controversial bits separated from #598. This updates versions for the `revert` keyword (and the associated statement, introduced in 0.8.4) and the `global` (introduced in 0.8.13 for the using directive) contextual keywords in both the new DSL and our old YAML spec. --------- Co-authored-by: Omar Tawfik <[email protected]>
We decided to hold this off until we migrate to a DSL v2, as that will change how keywords will be defined and identifier<>keyword will interact. |
Closes #568 There is still one outstanding issue where we return a `Vec<TokenKind>` from `next_token`; it'd like to return a more specialized type and ideally pass it on stack (2x2 bytes), rather than on-heap (extra 3x8 bytes for the Vec handle + indirection). We should name it better and properly show that we can return at most 2 token kinds (single token kind or identifier + kw combo). To do: - [x] Return tokens from `next_token` via stack Apart from that, I think this is a more correct approach than #598, especially accounting for the new keyword definition format in DSL v2. The main change is that we only check the keyword trie and additionally the (newly introduced) compound keyword scanners only after the token has been lexed as an identifier. For each context, we collect Identifier scanners used by the keywords and attempt promotion there. The existing lexing performance is not impacted from what I've seen when running the sanctuary tests and I can verify (incl. CST tests) that we now properly parse source that uses contextual keywords (e.g. `from`) and that the compound keywords (e.g. `ufixedMxN`) are properly versioned. This adapts the existing `codegen_grammar` interface that's a leftover from DSLv1; I did that to work on finishing #638; once this is merged and we now properly parse contextual keywords, I'll move to clean it up and reduce the parser codegen indirection (right now we go from v2 -> v1 model -> code generator -> Tera templates; it'd like to at least cut out the v1 model and/or simplify visiting v2 from the existing `CodeGenerator`). Please excuse the WIP comments in the middle; the first and the last ones should make sense when reviewing. I can simplify this a bit for review, if needed.
Fixes #568.
First few commits are small cleanups to the PG code, the last one implements the support along with a new CST test.
This introduces a new
ScannerDefinitionNode::ContextualKeyword
that includes the literal and the underlying parser.The reason why the identifier parser is used is to be able to reference its token kind in the parser generator, as we don't intentionally specify any built-in Identifier parsers (well, we do hardcode
Identifier
in one place but it's a hack I'd rather not spread. Do we need to handleYulIdentifier
as well/separately?).For the version ranges that it's not a contextual keyword, it's still included in the trie as usual, so the old machinery works as expected and the lexer prioritises that match over a catch-all Identifier we use now.
However, for the version ranges that it is a contextual keyword, the version check is moved outside to a new
Lexer::as_contextual_keyword
, which attempts to promote a given token to a contextual keyword and eats that if it's the expected token kind.This way, we can eat
TokenKind::SomeContextualKeyword
if necessary, sinceparse_token
attempts to promote a keyword, but for a given version where it's contextual, it's not returned early from the lexer, so we lex an Identifier by default.