Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Edge::label from optional to required #1205

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

mjoerussell
Copy link
Contributor

@mjoerussell mjoerussell commented Dec 30, 2024

This involves a couple of changes outside of just changing the field type:

  1. CursorIterator must be able to produce a label for every node now. This wasn't possible before because the root nodes of a cursor had no parent, and so could not derive a label. I've changed Cursor in the following ways to fix this:
    • Cursor::parent is a new type, Parent<T>, instead of Option<Rc<PathAncestor<T>>>. Parent<T> is an enum with three variants - None, Open(Rc<PathAncestor<T>>), and Closed(Rc<PathAncestor<T>>). A Closed parent is used when a cursor is spawned from an arbitrary non-root node, and means that the parent can be read but not traversed to. None is only used for the true root node.
  2. KindTypes::EdgeLabel is given a Default implementation, which for now just returns the first PredefinedLabel variant. This is needed in order to be able to set a "concrete" EdgeLabel value from the metaslang_cst crate.
  3. Added a Root variant to PredefinedLabel. This will be used when a node is the true root node. For now this will be the default value for KindTypes::EdgeLabel.

Closes #815

This involves a couple of changes outside of just changing the field type:
1. `CursorIterator` must be able to produce a label for every node now. This wasn't possible before because the root nodes of a cursor had no parent, and so could not derive a label. I've changed `Cursor` in the following ways to fix this:
    * `Cursor::parent` is a new type, `Parent<T>`, instead of `Option<Rc<PathAncestor<T>>>`. `Parent<T>` is an enum with three variants - `None`, `Open(Rc<PathAncestor<T>>)`, and `Closed(Rc<PathAncestor<T>>)`. A `Closed` parent is used when a cursor is spawned from an arbitrary non-root node, and means that the parent can be read but not traversed to. `None` is only used for the true root node.
2. `KindTypes::EdgeLabel` is given a `Default` implementation, which for now just returns the first `PredefinedLabel` variant. This is needed in order to be able to set a "concrete" `EdgeLabel` value from the `metaslang_cst` crate.
3. Added a `Root` variant to `PredefinedLabel`. This will be used when a node is the true root node. For now this will be the default value for `KindTypes::EdgeLabel`.
@mjoerussell mjoerussell self-assigned this Dec 30, 2024
@mjoerussell mjoerussell requested review from a team as code owners December 30, 2024 21:26
Copy link

changeset-bot bot commented Dec 30, 2024

🦋 Changeset detected

Latest commit: 68614b2

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@nomicfoundation/slang Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

@OmarTawfik OmarTawfik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

@mjoerussell
Copy link
Contributor Author

looks like we are missing labels for UNRECOGNIZED nodes. Should we add a matching EdgeLabel::UNRECOGNIZED for these? thoughts?

looks like we are missing labels for MISSING nodes. I wonder if we are able to use the original label the parser was trying to parse, to indicate what is missing?

I've added an Unrecognized label for UNRECOGNIZED nodes. However, getting the correct label for MISSING nodes was not that straightforward, at least for how well I currently understand the parser. For now, I've left those as having the label Root. @OmarTawfik if you have any thoughts about how to get those labels, I'd be happy to go back and make those updates.

All the other comments should be addressed now so this PR is ready for another review.

@mjoerussell mjoerussell requested a review from OmarTawfik January 3, 2025 21:59
Copy link
Contributor

@OmarTawfik OmarTawfik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made another pass on all unresoved comments.
Thanks!

Copy link
Contributor

@OmarTawfik OmarTawfik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is one unresolved comment still: #1205 (comment)

So, for the time being, WDYT of adding a similar EdgeLabel::MISSING for now, so that we don't mislabel them as root in the meantime?

Additionally, I suggest adding a changeset to describe the changes here.

Thanks!

@mjoerussell
Copy link
Contributor Author

I think there is one unresolved comment still: #1205 (comment)

Wasn't that added to the backlog as #1215? I thought that that was going to be resolved in the future.

Copy link
Contributor

@OmarTawfik OmarTawfik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there are a few root labels in sub-trees still. For example, root꞉ Comma in
crates/solidity/testing/snapshots/cst_output/ContractMembers/separated_recovery/generated/0.6.0-failure.yml.

Maybe to catch such cases, we can add a check in crates/solidity/outputs/cargo/tests/src/cst_output/runner.rs to make sure we never put Label::default() in sub-trees. Thoughts?

let cursor = output.create_tree_cursor();
while cursor.go_to_next() {
    assert!(!cursor.label.is_default());
}

…the current node is the root and if it has the default label

* Use these utility functions in `cst_output/runner.rs` to assert that child nodes never have the default label (`root`).
@mjoerussell
Copy link
Contributor Author

I added the assertion but haven't fixed the issue, so some tests should be failing right now.

I've found some places where the Edge::root constructor is used, but I'm not sure how to determine what labels should be used instead. All of these calls to Edge::root occur in the context of building or reading a ParseResult, for example this code in lexer/mod.rs.

Using this I narrowed down the failures to three test cases:

contract_members::separated_recovery
expression::incomplete_operand
source_unit::pratt_precedence_recovery

These all occur in the context of a parsing failure, and the node given the root label is the node joining a successful match and a failure. My suspicion is that somewhere in error recovery we're creating a new subtree, and the root of that subtree is being given the root edge label. I just haven't been able to find exactly where that's happening yet.

@OmarTawfik do you have any thoughts on these?

@OmarTawfik
Copy link
Contributor

@mjoerussell Thanks for the analysis! I think there are two separate issues here:

For contract_members::separated_recovery, I wonder if it is a case of NOT applying separator_label in SeparatedHelper to the separator token when we recover?

For expression::incomplete_operand and source_unit::pratt_precedence_recovery, looks like you uncovered a bug in error recovery in the PRATT parser. We are not creating the right structure for the specific Expression variant.

For example, in incomplete_operand, we should be (idially) creating the parent Expression, with a single child MultiplicativeExpression (the specific expression variant), which would have three children:

  1. left_operand: Expression with a single DecimalNumberExpression child
  2. operand: Asterisk
  3. right_operand: Expression with a single NewExpression child

But instead, because of the current implementation, the three children are promoted to the top Expression, and the child MultiplicativeExpression only has the operator node:

  - (root꞉ Expression): # "2 * new\n" (0..8)
      - (variant꞉ DecimalNumberExpression) ► (literal꞉ DecimalLiteral): "2" # (0..1)
      - (root꞉ MultiplicativeExpression): # " *" (1..3)
          - (leading_trivia꞉ Whitespace): " " # (1..2)
          - (operator꞉ Asterisk): "*" # (2..3)
      - (variant꞉ NewExpression): # " new\n" (3..8)
          - (leading_trivia꞉ Whitespace): " " # (3..4)
          - (new_keyword꞉ NewKeyword): "new" # (4..7)
          - (trailing_trivia꞉ EndOfLine): "\n" # (7..8)
      - (missing꞉ MISSING): "" # (8..8)

I'm not sure what is the exact fix for PrecedenceHelper here, but happy to brainstorm/pair on this if you would like.

@mjoerussell
Copy link
Contributor Author

For contract_members::separated_recovery, I wonder if it is a case of NOT applying separator_label in SeparatedHelper to the separator token when we recover?

This was exactly the issue, and it was a pretty simple fix to get this case working again.

I'm not sure what is the exact fix for PrecedenceHelper here, but happy to brainstorm/pair on this if you would like.

I don't think that there's an issue with PrecedenceHelper, I think that the issue is actually in SequenceHelper. In SequenceHelper PrattOperatorMatch's get flattened when they're succeeded by a bad match. PrattElement::into_nodes creates an Edge::root when extracting the inner nodes of the element. This adds the root label that we're seeing, and it also prevents PrecedenceHelper from correctly reducing the expression.

@mjoerussell
Copy link
Contributor Author

@OmarTawfik Diving into SequenceHelper and PrattElement-related things is becoming a bigger rabbit hole than I'd expected. I'm not sure if fixing this bug is in the scope of this PR.

If it's OK with you, I can fix the current errors by forcing the flattened pratt expressions to be given a different label, perhaps Unrecognized, and we can return to the issue at a later date. Since this situation only occurs in the context of error recovery, I think that it should not impact parsing in general.

…root node of the flattened expression the label `Unrecognized` instead of `Root`.
@OmarTawfik
Copy link
Contributor

Since the current labels we produce are already inaccurate, so changing it to Unrecognized seems like an improvement, as long as this only affects invalid trees, and not correct parses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make EdgeLabel required in the CST
2 participants