-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement (contextual) keywords and use their versioning from v2 #723
Conversation
Also optimize a bit the kw scanner function and remove some comments.
🦋 Changeset detectedLatest commit: 4fb350e The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
This also removes the `scan` from the NAPI but we wanted to hide that anyways ourselves.
Instead of pub enum ScannedToken {
Single(TokenKind),
IdentifierOrKeyword {
identifier: TokenKind,
kw: KeywordScan,
},
} Not only we don't go through heap for the often-called To keep the existing parser flow mostly the same, a helper fn This also introduces another helper fn |
$( | ||
{ | ||
if let result @ (KeywordScan::Present(..) | KeywordScan::Reserved(..)) = ($scanner) { | ||
if $ident.len() == $stream.position().utf8 - save.utf8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not perfect, I admit; this should probably be handled with an automaton for the keywords and executed/matched on the string till it's exhausted but let's punt on this for now
.../solidity/testing/snapshots/cst_output/EventDefinition/transfer/generated/0.4.11-success.yml
Show resolved
Hide resolved
...ity/testing/snapshots/cst_output/YulVariableDeclarationStatement/keyword_ufixed8x8/input.sol
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few suggestions.
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and publish to npm yourself or [setup this action to publish automatically](https://github.com/changesets/action#with-publishing). If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @nomicfoundation/[email protected] ### Minor Changes - [#710](#710) [`2025b6cb`](2025b6c) Thanks [@Xanewok](https://github.com/Xanewok)! - CST children nodes are now named - [#723](#723) [`b3dc6bcd`](b3dc6bc) Thanks [@Xanewok](https://github.com/Xanewok)! - Properly parse unreserved keywords in an identifier position, i.e. `from`, `emit`, `global` etc. - [#728](#728) [`662a672c`](662a672) Thanks [@Xanewok](https://github.com/Xanewok)! - Remove Language#scan API; use the parser API instead - [#719](#719) [`1ad6bb37`](1ad6bb3) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - introduce strong types for all Solidity non terminals in the TypeScript API. ### Patch Changes - [#719](#719) [`1ad6bb37`](1ad6bb3) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - unify Rust/TypeScript node helpers: `*_with_kind()`, `*_with_kinds()`, `*_is_kind()`), ... - [#731](#731) [`3deaea2e`](3deaea2) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - add `RuleNode.unparse()` to the TypeScript API Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Closes #568
There is still one outstanding issue where we return a
Vec<TokenKind>
fromnext_token
; it'd like to return a more specialized type and ideally pass it on stack (2x2 bytes), rather than on-heap (extra 3x8 bytes for the Vec handle + indirection). We should name it better and properly show that we can return at most 2 token kinds (single token kind or identifier + kw combo).To do:
next_token
via stackApart from that, I think this is a more correct approach than #598, especially accounting for the new keyword definition format in DSL v2.
The main change is that we only check the keyword trie and additionally the (newly introduced) compound keyword scanners only after the token has been lexed as an identifier. For each context, we collect Identifier scanners used by the keywords and attempt promotion there.
The existing lexing performance is not impacted from what I've seen when running the sanctuary tests and I can verify (incl. CST tests) that we now properly parse source that uses contextual keywords (e.g.
from
) and that the compound keywords (e.g.ufixedMxN
) are properly versioned.This adapts the existing
codegen_grammar
interface that's a leftover from DSLv1; I did that to work on finishing #638; once this is merged and we now properly parse contextual keywords, I'll move to clean it up and reduce the parser codegen indirection (right now we go from v2 -> v1 model -> code generator -> Tera templates; it'd like to at least cut out the v1 model and/or simplify visiting v2 from the existingCodeGenerator
).Please excuse the WIP comments in the middle; the first and the last ones should make sense when reviewing. I can simplify this a bit for review, if needed.