Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement (contextual) keywords and use their versioning from v2 #723

Merged
merged 39 commits into from
Jan 8, 2024
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
e2a7cc5
cleanup: Use BTreeMap for CodeGenerator::{scanner,parser}_functions
Xanewok Sep 18, 2023
af43e8f
cleanup: Use BTreeMap for CodeGenerator::scanner_contexts
Xanewok Sep 18, 2023
be32592
cleanup: Mark top_level_scanner_names as unused in templates
Xanewok Sep 18, 2023
391c596
cleanup: Hoist the Identifier hack in PG trie code
Xanewok Sep 18, 2023
cd84e1a
WIP: Add a comment
Xanewok Dec 19, 2023
6c15e5d
refactor: Clean up a bit Trie code
Xanewok Dec 20, 2023
9c048ac
wtf?
Xanewok Dec 23, 2023
f3b520e
cleanup: Introduce a helper CodeGenerator::current_context fn
Xanewok Dec 23, 2023
5eeb0bd
Deduplicate longest_match in Lexer::next_token
Xanewok Dec 23, 2023
87d407a
WIP
Xanewok Dec 23, 2023
82d9ef6
WIP2
Xanewok Dec 23, 2023
e2da887
WIP3
Xanewok Dec 23, 2023
08b4396
WIP more
Xanewok Dec 23, 2023
0746345
WIP: Add some more
Xanewok Dec 23, 2023
7b12d89
Don't always rescan with underlying when trying to scan keywords
Xanewok Dec 23, 2023
3f80727
Make sure the identifier is always scanned as a last compound scanner
Xanewok Dec 27, 2023
9ff1b90
clean up some bits
Xanewok Dec 27, 2023
1c82302
Speed up lexing by only attempting kw promotion if it lexes as an ide…
Xanewok Dec 27, 2023
05edbe1
Bring back keyword lookup using trie
Xanewok Dec 27, 2023
05f154f
Simplify the trie
Xanewok Dec 27, 2023
26e55fb
Fix compound keyword promotion and add CST tests
Xanewok Dec 27, 2023
a0dc824
cleanup: remove unnecessary now wrong_self_convention lint
Xanewok Dec 27, 2023
90d8d88
Simplify emitted code for the compound keyword scanners
Xanewok Dec 27, 2023
d348dde
Remove unnecessary comment
Xanewok Dec 27, 2023
5180f98
cleanup: Remove some WIP code
Xanewok Dec 27, 2023
7de6920
Fix a typo
Xanewok Dec 27, 2023
e6b5c15
Add more comments
Xanewok Dec 27, 2023
8651b2e
Hold the scanned kw token kind in the KeywordScan enum
Xanewok Jan 2, 2024
8dc56e8
Don't Option-wrap keyword scan results when using a trie
Xanewok Jan 2, 2024
0a615ec
Introduce ScannedToken to separately handle ident/kw from the scanner
Xanewok Jan 2, 2024
581d611
Rename `identifier_scanners` to `identifier_scanner_names`
Xanewok Jan 2, 2024
317ce6f
Clean up a bit the resulting next_token
Xanewok Jan 2, 2024
93942ac
perf: Only attempt scanning a compound keyword if we didn't find one
Xanewok Jan 2, 2024
bbea7fe
Add a changeset file
Xanewok Jan 2, 2024
7b3ec72
Merge remote-tracking branch 'upstream/main' into keyword-idents-take-2
Xanewok Jan 3, 2024
322f105
Add comments about specific keyword reservation in the CST snapshots
Xanewok Jan 3, 2024
2336ad9
Rename `identifier_scanner_names` to `promotable_identifier_scanners`
Xanewok Jan 3, 2024
8fe0b94
Merge remote-tracking branch 'upstream/main' into keyword-idents-take-2
Xanewok Jan 4, 2024
4fb350e
Add more regression CST tests
Xanewok Jan 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion crates/codegen/grammar/src/grammar.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ use semver::Version;

use crate::parser_definition::{ParserDefinitionRef, TriviaParserDefinitionRef};
use crate::visitor::{GrammarVisitor, Visitable};
use crate::{PrecedenceParserDefinitionRef, ScannerDefinitionRef};
use crate::{KeywordScannerDefinitionRef, PrecedenceParserDefinitionRef, ScannerDefinitionRef};

pub struct Grammar {
pub name: String,
Expand Down Expand Up @@ -36,6 +36,7 @@ impl Grammar {
#[derive(Clone)]
pub enum GrammarElement {
ScannerDefinition(ScannerDefinitionRef),
KeywordScannerDefinition(KeywordScannerDefinitionRef),
TriviaParserDefinition(TriviaParserDefinitionRef),
ParserDefinition(ParserDefinitionRef),
PrecedenceParserDefinition(PrecedenceParserDefinitionRef),
Expand All @@ -45,6 +46,7 @@ impl GrammarElement {
pub fn name(&self) -> &'static str {
match self {
Self::ScannerDefinition(scanner) => scanner.name(),
Self::KeywordScannerDefinition(scanner) => scanner.name(),
Self::TriviaParserDefinition(trivia_parser) => trivia_parser.name(),
Self::ParserDefinition(parser) => parser.name(),
Self::PrecedenceParserDefinition(precedence_parser) => precedence_parser.name(),
Expand Down Expand Up @@ -80,6 +82,7 @@ impl Visitable for GrammarElement {
fn accept_visitor<V: GrammarVisitor>(&self, visitor: &mut V) {
match self {
Self::ScannerDefinition(scanner) => scanner.accept_visitor(visitor),
Self::KeywordScannerDefinition(scanner) => scanner.accept_visitor(visitor),
Self::TriviaParserDefinition(trivia_parser) => trivia_parser.accept_visitor(visitor),
Self::ParserDefinition(parser) => parser.accept_visitor(visitor),
Self::PrecedenceParserDefinition(precedence_parser) => {
Expand Down
7 changes: 6 additions & 1 deletion crates/codegen/grammar/src/parser_definition.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@ use std::fmt::Debug;
use std::rc::Rc;

use crate::visitor::{GrammarVisitor, Visitable};
use crate::{PrecedenceParserDefinitionRef, ScannerDefinitionRef, VersionQualityRange};
use crate::{
KeywordScannerDefinitionRef, PrecedenceParserDefinitionRef, ScannerDefinitionRef,
VersionQualityRange,
};

/// A named wrapper, used to give a name to a [`ParserDefinitionNode`].
#[derive(Clone, Debug)]
Expand Down Expand Up @@ -59,6 +62,7 @@ pub enum ParserDefinitionNode {
Sequence(Vec<Named<Self>>),
Choice(Named<Vec<Self>>),
ScannerDefinition(ScannerDefinitionRef),
KeywordScannerDefinition(KeywordScannerDefinitionRef),
TriviaParserDefinition(TriviaParserDefinitionRef),
ParserDefinition(ParserDefinitionRef),
PrecedenceParserDefinition(PrecedenceParserDefinitionRef),
Expand Down Expand Up @@ -128,6 +132,7 @@ impl Visitable for ParserDefinitionNode {
}

Self::ScannerDefinition(_)
| Self::KeywordScannerDefinition(_)
| Self::TriviaParserDefinition(_)
| Self::ParserDefinition(_)
| Self::PrecedenceParserDefinition(_) => {}
Expand Down
95 changes: 95 additions & 0 deletions crates/codegen/grammar/src/scanner_definition.rs
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,98 @@ impl Visitable for ScannerDefinitionNode {
}
}
}

pub trait KeywordScannerDefinition: Debug {
fn name(&self) -> &'static str;
fn identifier_scanner(&self) -> &'static str;
fn definitions(&self) -> &[KeywordScannerDefinitionVersionedNode];
}

pub type KeywordScannerDefinitionRef = Rc<dyn KeywordScannerDefinition>;

impl Visitable for KeywordScannerDefinitionRef {
fn accept_visitor<V: GrammarVisitor>(&self, visitor: &mut V) {
visitor.keyword_scanner_definition_enter(self);
}
}

#[derive(Debug)]
pub struct KeywordScannerDefinitionVersionedNode {
// Underlying keyword scanner (i.e. identifier scanner)
pub value: KeywordScannerDefinitionNode,
/// When the keyword scanner is enabled
pub enabled: Vec<VersionQualityRange>,
/// When the keyword is reserved, i.e. can't be used in other position (e.g. as a name)
pub reserved: Vec<VersionQualityRange>,
}

#[derive(Clone, Debug)]
pub enum KeywordScannerDefinitionNode {
Optional(Box<Self>),
Sequence(Vec<Self>),
Choice(Vec<Self>),
Atom(String),
// No repeatable combinators, because keywords are assumed to be finite
}

impl From<KeywordScannerDefinitionNode> for ScannerDefinitionNode {
fn from(val: KeywordScannerDefinitionNode) -> Self {
match val {
KeywordScannerDefinitionNode::Optional(node) => {
ScannerDefinitionNode::Optional(Box::new((*node).into()))
}
KeywordScannerDefinitionNode::Sequence(nodes) => {
ScannerDefinitionNode::Sequence(nodes.into_iter().map(Into::into).collect())
}
KeywordScannerDefinitionNode::Atom(string) => ScannerDefinitionNode::Literal(string),
KeywordScannerDefinitionNode::Choice(nodes) => {
ScannerDefinitionNode::Choice(nodes.into_iter().map(Into::into).collect())
}
}
}
}

/// A [`KeywordScannerDefinitionRef`] that only has a single atom value.
///
/// The main usage for this type is to construct a keyword trie in parser generator, as trie will
/// only work with single atom values and keyword promotion needs to additionally account for
/// keyword reservation, rather than just literal presence.
#[derive(Clone)]
pub struct KeywordScannerAtomic(KeywordScannerDefinitionRef);

impl KeywordScannerAtomic {
/// Wraps the keyword scanner definition if it is a single atom value.
pub fn try_from_def(def: &KeywordScannerDefinitionRef) -> Option<Self> {
match def.definitions() {
[KeywordScannerDefinitionVersionedNode {
value: KeywordScannerDefinitionNode::Atom(_),
..
}] => Some(Self(def.clone())),
_ => None,
}
}
}

impl std::ops::Deref for KeywordScannerAtomic {
type Target = KeywordScannerDefinitionRef;

fn deref(&self) -> &Self::Target {
&self.0
}
}

impl KeywordScannerAtomic {
pub fn definition(&self) -> &KeywordScannerDefinitionVersionedNode {
let def = &self.0.definitions().get(0);
def.expect("KeywordScannerAtomic should have exactly one definition")
}
pub fn value(&self) -> &str {
match self.definition() {
KeywordScannerDefinitionVersionedNode {
value: KeywordScannerDefinitionNode::Atom(atom),
..
} => atom,
_ => unreachable!("KeywordScannerAtomic should have a single atom value"),
}
}
}
7 changes: 4 additions & 3 deletions crates/codegen/grammar/src/visitor.rs
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
use crate::{
Grammar, ParserDefinitionNode, ParserDefinitionRef, PrecedenceParserDefinitionNode,
PrecedenceParserDefinitionRef, ScannerDefinitionNode, ScannerDefinitionRef,
TriviaParserDefinitionRef,
Grammar, KeywordScannerDefinitionRef, ParserDefinitionNode, ParserDefinitionRef,
PrecedenceParserDefinitionNode, PrecedenceParserDefinitionRef, ScannerDefinitionNode,
ScannerDefinitionRef, TriviaParserDefinitionRef,
};

pub trait GrammarVisitor {
fn grammar_enter(&mut self, _grammar: &Grammar) {}
fn grammar_leave(&mut self, _grammar: &Grammar) {}

fn scanner_definition_enter(&mut self, _scanner: &ScannerDefinitionRef) {}
fn keyword_scanner_definition_enter(&mut self, _scanner: &KeywordScannerDefinitionRef) {}
fn trivia_parser_definition_enter(&mut self, _trivia_parser: &TriviaParserDefinitionRef) {}
fn parser_definition_enter(&mut self, _parser: &ParserDefinitionRef) {}
fn precedence_parser_definition_enter(&mut self, _parser: &PrecedenceParserDefinitionRef) {}
Expand Down
Loading