Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for contextual keywords #598

Closed
wants to merge 16 commits into from

Conversation

Xanewok
Copy link
Contributor

@Xanewok Xanewok commented Sep 19, 2023

Fixes #568.

First few commits are small cleanups to the PG code, the last one implements the support along with a new CST test.

This introduces a new ScannerDefinitionNode::ContextualKeyword that includes the literal and the underlying parser.
The reason why the identifier parser is used is to be able to reference its token kind in the parser generator, as we don't intentionally specify any built-in Identifier parsers (well, we do hardcode Identifier in one place but it's a hack I'd rather not spread. Do we need to handle YulIdentifier as well/separately?).

For the version ranges that it's not a contextual keyword, it's still included in the trie as usual, so the old machinery works as expected and the lexer prioritises that match over a catch-all Identifier we use now.

However, for the version ranges that it is a contextual keyword, the version check is moved outside to a new Lexer::as_contextual_keyword, which attempts to promote a given token to a contextual keyword and eats that if it's the expected token kind.

This way, we can eat TokenKind::SomeContextualKeyword if necessary, since parse_token attempts to promote a keyword, but for a given version where it's contextual, it's not returned early from the lexer, so we lex an Identifier by default.

@Xanewok Xanewok requested a review from a team as a code owner September 19, 2023 11:26
@changeset-bot
Copy link

changeset-bot bot commented Sep 19, 2023

🦋 Changeset detected

Latest commit: d14285e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@nomicfoundation/slang Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

@AntonyBlakey AntonyBlakey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a hack, this works, but I don't like the way it confuses the model re keywords in general, and my discomfort is telling me we should do it properly. Will discuss f2f.

@Xanewok Xanewok mentioned this pull request Sep 21, 2023
github-merge-queue bot pushed a commit that referenced this pull request Sep 22, 2023
Non-controversial bits separated from #598.

This updates versions for the `revert` keyword (and the associated
statement, introduced in 0.8.4) and the `global` (introduced in 0.8.13
for the using directive) contextual keywords in both the new DSL and our
old YAML spec.

---------

Co-authored-by: Omar Tawfik <[email protected]>
@Xanewok
Copy link
Contributor Author

Xanewok commented Sep 24, 2023

We decided to hold this off until we migrate to a DSL v2, as that will change how keywords will be defined and identifier<>keyword will interact.

@Xanewok Xanewok closed this Sep 24, 2023
@Xanewok Xanewok deleted the feat-contextual-keywords branch January 4, 2024 17:27
github-merge-queue bot pushed a commit that referenced this pull request Jan 8, 2024
Closes #568 

There is still one outstanding issue where we return a `Vec<TokenKind>`
from `next_token`; it'd like to return a more specialized type and
ideally pass it on stack (2x2 bytes), rather than on-heap (extra 3x8
bytes for the Vec handle + indirection). We should name it better and
properly show that we can return at most 2 token kinds (single token
kind or identifier + kw combo).

To do:
- [x] Return tokens from `next_token` via stack

Apart from that, I think this is a more correct approach than #598,
especially accounting for the new keyword definition format in DSL v2.

The main change is that we only check the keyword trie and additionally
the (newly introduced) compound keyword scanners only after the token
has been lexed as an identifier. For each context, we collect Identifier
scanners used by the keywords and attempt promotion there.

The existing lexing performance is not impacted from what I've seen when
running the sanctuary tests and I can verify (incl. CST tests) that we
now properly parse source that uses contextual keywords (e.g. `from`)
and that the compound keywords (e.g. `ufixedMxN`) are properly
versioned.

This adapts the existing `codegen_grammar` interface that's a leftover
from DSLv1; I did that to work on finishing #638; once this is merged
and we now properly parse contextual keywords, I'll move to clean it up
and reduce the parser codegen indirection (right now we go from v2 -> v1
model -> code generator -> Tera templates; it'd like to at least cut out
the v1 model and/or simplify visiting v2 from the existing
`CodeGenerator`).

Please excuse the WIP comments in the middle; the first and the last
ones should make sense when reviewing. I can simplify this a bit for
review, if needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow keywords to overlap identifiers
2 participants