-
-
Notifications
You must be signed in to change notification settings - Fork 55
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
NodePattern
complete rewrite. Add support for multiple variadic terms.
- Loading branch information
Showing
32 changed files
with
2,536 additions
and
882 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,252 @@ | ||
= Hacker's guide to the `NodePattern` compiler | ||
|
||
This documentation is aimed at anyone wanting to understand / modify the `NodePattern` compiler. | ||
It assumes some familiarity with the syntax of https://github.com/rubocop-hq/rubocop-ast/blob/master/doc/modules/ROOT/pages/node_pattern.md[`NodePattern`], as well as the AST produced by the `parser` gem. | ||
|
||
== High level view | ||
|
||
The `NodePattern` compiler uses the same techniques as the `parser` gem: | ||
|
||
* a `Lexer` that breaks source into tokens | ||
* a `Parser` that uses tokens and a `Builder` to emit an AST | ||
* a `Compiler` that converts this AST into Ruby code | ||
|
||
Example: | ||
|
||
* Pattern: `+(send nil? {:puts :p} $...)+` | ||
* Tokens: `+'(', [:tNODE_TYPE, :send], [:tPREDICATE, :nil?], '{', ...+` | ||
* AST: `+s(:sequence, s(:node_type, :send), s(:predicate, :nil?), s(:union, ...+` | ||
* Ruby code: | ||
+ | ||
[source,ruby] | ||
---- | ||
node.is_a?(::RuboCop::AST::Node) && node.children.size >= 2 && | ||
node.send_type? && | ||
node.children[0].nil?() && | ||
(union2 = node.children[1]; ... | ||
---- | ||
|
||
The different parts are described below | ||
|
||
== Vocabulary | ||
|
||
*"node pattern"*: something that can be matched against a single `AST::Node`. | ||
While `(int 42)` and `#is_fun?` both correspond to node patterns, `+...+` (without the parenthesis) is not a node pattern. | ||
|
||
*"sequence"*: a node pattern that describes the sequence of children of a node (and its type): `+(type first_child second_child ...)+` | ||
|
||
*"variadic"*: element of a sequence that can match a variable number of children. | ||
`+(send _ int* ...)+` has two variadic elements (`int*` and `+...+`). | ||
`(send _ :name)` contains no variadic element. | ||
Note that a sequence is itself never variadic. | ||
|
||
*"atom"*: element of a pattern that corresponds with a simple Ruby object. | ||
`(send nil? | ||
:puts (str 'hello'))` has two atoms: `:puts` and `'hello'`. | ||
|
||
== Lexer | ||
|
||
The `lexer.rb` defines `Lexer` and has the few definitions needed for the lexer to work. | ||
The bulk of the processing is in the inherited class that is generated by https://github.com/seattlerb/oedipus_lex[`oedipus_lex`] | ||
|
||
[discrete] | ||
==== Rules | ||
|
||
https://github.com/seattlerb/oedipus_lex[`oedipus_lex`] generates the Ruby file `lexer.rex.rb` from the rules defined in `lexer.rex`. | ||
|
||
These rules map a Regexp to code that emits a token. | ||
|
||
`oedipus_lex` aims to be simple and the generated file is readable. | ||
It uses https://ruby-doc.org/stdlib-2.7.1/libdoc/strscan/rdoc/StringScanner.html[`StringScanner`] behind the scene. | ||
It selects the first rule that matches, contrary to many lexing tools that prioritize longest match. | ||
|
||
[discrete] | ||
==== Tokens | ||
|
||
The `Lexer` emits tokens with types that are: | ||
|
||
* string for the syntactic symbols (e.g. | ||
`'('`, `'$'`, `+'...'+`) | ||
* symbols of the form `:tTOKEN_TYPE` for the rest (e.g. | ||
`:tPREDICATE`) | ||
|
||
Tokens are stored as `[type, value]`, or `[type, [value, location]]` if locations are emitted. | ||
|
||
[discrete] | ||
==== Generation | ||
|
||
Use `rake generate:lexer` to generate the `lexer.rex.rb` from `lexer.rex` file. | ||
This is done automatically by `rake spec`. | ||
|
||
NOTE: the `lexer.rex.rb` is not under source control, but is included in the gem. | ||
|
||
== Parser | ||
|
||
Similarly to the `Lexer`, the `parser.rb` defines `Parser` and has the few definitions needed for the parser to work. | ||
The bulk of the processing is in the inherited class `parser.racc.rb` that is generated by https://ruby-doc.org/stdlib-2.7.0/libdoc/racc/parser/rdoc/Racc.html#module-Racc-label-Writing+A+Racc+Grammar+File[`racc`] from the rules in `parser.y`. | ||
|
||
[discrete] | ||
==== Nodes | ||
|
||
The `Parser` emits `NodePattern::Node` which are similar to RuboCop's node. | ||
They both inherit from ``parser``'s `Parser::AST::Source::Node`, and share additional methods too. | ||
|
||
Like for RuboCop's nodes, some nodes have specicialized classes (e.g. | ||
`Sequence`) while other nodes use the base class directly (e.g. | ||
`s(:number, 42)`) | ||
|
||
[discrete] | ||
==== Rules | ||
|
||
The rules follow closely the definitions above. | ||
In particular a distinction between `node_pattern_list`, which is a list of node patterns (each term can match a single node), while the more generic `variadic_pattern_list` is a list of elements, some of which could be variadic, others simple node patterns. | ||
|
||
[discrete] | ||
==== Generation | ||
|
||
Similarly to the lexer, use `rake generate:parser` to generate the `parser.racc.rb` from `parser.y` file. | ||
This is done automatically by `rake spec`. | ||
|
||
NOTE: the `parser.racc.rb` is not under source control, but is included in the gem. | ||
|
||
== Compiler | ||
|
||
The compiler's core is the `Compiler` class. | ||
It holds the global state (e.g. | ||
references to named arguments). | ||
The goal of the compiler is to produce `matching_code`, Ruby code that can be run against an `AST::Node`, or any Ruby object for that matter. | ||
|
||
Packaging of that `matching_code` into code for a `lambda`, or method `def` is handled separately by the `MethodDefiner` module. | ||
|
||
The compilation itself is handled by three subcompilers: | ||
|
||
* `NodePatternSubcompiler` | ||
* `AtomSubcompiler` | ||
* `SequenceSubcompiler` | ||
|
||
=== Visitors | ||
|
||
The subcompilers use the visitor pattern [https://en.wikipedia.org/wiki/Visitor_pattern] | ||
|
||
The methods starting with `visit_` are used to process the different types of nodes. | ||
For a node of type `:capture`, the method `visit_capture` will be called, or if none is defined then `visit_other_type` will be called. | ||
|
||
No argument is passed, as the visited node is accessible with the `node` attribute reader. | ||
|
||
=== NodePatternSubcompiler | ||
|
||
Given any `NodePattern::Node`, it generates the Ruby code that can return `true` or `false` for the given node, or node type for sequence head. | ||
|
||
==== `var` vs `access` | ||
|
||
The subcompiler can be called with the current node stored either in a variable (provided with the `var:` keyword argument) or via a Ruby expression (e.g. | ||
`access: 'current_node.children[2]'`). | ||
|
||
The subcompiler will not generate code that executes this `access` expression more than once or twice. | ||
If it might access the node more than that, `multiple_access` will store the result in a temporary variable (e.g. | ||
`union`). | ||
|
||
==== Sequences | ||
|
||
Sequences are the most difficult elements to handle and are deferred to the `SequenceSubcompiler`. | ||
|
||
==== Atoms | ||
|
||
Atoms are handled with `visit_other_type`, which defers to the `AtomSubcompiler` and converts that result to a node pattern by appending `=== cur_node` (or `=== cur_node.type` if in sequence head). | ||
|
||
This way, the two arguments in `(_ #func?(%1) %2)` would be compiled differently; | ||
`%1` would be compiled as `param1`, while `%2` gets compiled as `param2 === node.children[1]`. | ||
|
||
==== Precedence | ||
|
||
The code generated has higher or equal precedence to `&&`, so as to make chaining convenient. | ||
|
||
=== AtomSubcompiler | ||
|
||
This subcompiler produces Ruby code that gets evaluated to a Ruby object. | ||
E.g. | ||
`"42"`, `:a_symbol`, `param1`. | ||
|
||
A good way to think about it is when it has to be passed as arguments to a function call. | ||
For example: | ||
|
||
[source,ruby] | ||
---- | ||
# Pattern '#func(42, %1)' compiles to | ||
func(node, 42, param1) | ||
---- | ||
|
||
Note that any node pattern can be output by this subcompiler, but those that don't correspond to a Ruby literal will be output as a lambda so they can be combined. | ||
For example: | ||
|
||
[source,ruby] | ||
---- | ||
# Pattern '#func(int)' compiles to | ||
func(node, ->(compare) { compare.is_a?(::RuboCop::AST::Node) && compare.int_type? }) | ||
---- | ||
|
||
=== SequenceSubcompiler | ||
|
||
The subcompiler compiles the sequences' terms in turn, keeping track of which children of the `AST::Node` are being matched. | ||
|
||
==== Variadic terms | ||
|
||
The complexity comes from variadic elements, which have complex processing _and_ may make it impossible to know at compile time which children are matched by the subsequent terms. | ||
|
||
*Example* (no variadic terms) | ||
|
||
---- | ||
(_type int _ str) | ||
---- | ||
|
||
First child must match `int`, third child must match `str`. | ||
The subcompiler will use `children[0]` and `children[2]`. | ||
|
||
*Example* (one variadic terms) | ||
|
||
---- | ||
(_type int _* str) | ||
---- | ||
|
||
First child must match `int` and _last_ child must match `str`. | ||
The subcompiler will use `children[0]` and `children[-1]`. | ||
|
||
*Example* (multiple variadic terms) | ||
|
||
---- | ||
(_type int+ sym str+) | ||
---- | ||
|
||
The subcompiler can not use any integer and `children[]` to match `sym`. | ||
This must be tracked at runtime in a variable (`cur_index`). | ||
|
||
The subcompiler will use fixed indices before the first variadic element and after the last one. | ||
|
||
==== Node pattern terms | ||
|
||
The node pattern terms are delegated to the `NodePatternSubcompiler`. | ||
|
||
In the pattern `(:sym :sym)`, both `:sym` will be compiled differently because the first `:sym` is in "sequence head": `:sym === node.type` and `:sym == node.children[0]` respectively. | ||
The subcompiler indicates if the pattern is in "sequence head" or not, so the `NodePatternSubcompiler` can produce the right code. | ||
|
||
Variadic elements may not (currently) cover the sequence head. | ||
As a convenience, `+(...)+` is understood as `+(_ ...)+`. | ||
Other types of nodes will raise an error (e.g. | ||
`(<will not compile>)`; | ||
see `Node#in_sequence_head`) | ||
|
||
==== Precedence | ||
|
||
Like the node pattern subcompiler, it generates code that has higher or equal precedence to `&&`, so as to make chaining convenient. | ||
|
||
== Variant: WithMeta | ||
|
||
These variants of the Parser / Builder / Lexer generate `location` information (exactly like the `parser` gem) for AST nodes as well as comments with their locations (like the `parser` gem). | ||
|
||
Since this information is not typically used when one ony wants to define methods, it is not loaded by default. | ||
|
||
== Variant: Debug | ||
|
||
These variants of the Compiler / Subcompilers works by adding tracing code before and after each compilation of `NodePatternSubcompiler` and `SequenceSubcompiler`. | ||
A unique ID is assigned to each node and the tracing code flips a corresponding switch when the expression is about to be evaluated, and after (joined with `&&` so it only flips the switch if the node was a match). | ||
Atoms are not compiled differently as they are not really matchable (when not compiled as a node pattern) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.