Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppress output of unnecessary nodes #8

Open
oovm opened this issue Oct 20, 2021 · 10 comments
Open

Suppress output of unnecessary nodes #8

oovm opened this issue Oct 20, 2021 · 10 comments
Labels
enhancement New feature or request

Comments

@oovm
Copy link
Contributor

oovm commented Oct 20, 2021

This is my grammar file,

https://github.com/oovm/lrpeg-test/blob/master/projects/lrpeg/src/ygg.peg

This is my parsing output of x=0

https://github.com/oovm/lrpeg-test/blob/95b032c9ca0fd46c39ea31edf480d41eaecdec1b/projects/lrpeg/tests/assign.yaml#L23-L34

I don't understand why there are two Terminals here, and the statement nodes I need are wrapped in children.

In my understanding, Terminal is ε or string or regex, it should have no children.

@seanyoung
Copy link
Owner

You are right, this looks broken.

@seanyoung
Copy link
Owner

(statement IGNORE)* becomes a node in tree, which is incorrectly labelled Terminal. I've just pushed a change which labels them List (amongst others). Let me know how this works for you.

@oovm
Copy link
Contributor Author

oovm commented Oct 20, 2021

I'm just translating pest

program  = {SOI ~ statement* ~ EOI}
vs
program  <- IGNORE (statement IGNORE)* EOI;
// IGNORE means anything that can be skipped
IGNORE <- space* / newline* / comment?

a ~ b   =>  a IGNORE b
a ~ b?  =>  a IGNORE b?
a ~ b*  =>  a IGNORE (b IGNORE)*
a ~ b+  => a ~ b ~ b*

From the results, pest did not generate additional nodes

@oovm
Copy link
Contributor Author

oovm commented Oct 20, 2021

Okay, it makes sense, after I traverse it once and flatten it, the result is correct

@seanyoung
Copy link
Owner

lrpeg does generate too many nodes. Lots of them do not have useful information.

Does pest have a way of marking a rule/line as "do not generate nodes for this" or it is clever in some other way?

@oovm
Copy link
Contributor Author

oovm commented Oct 20, 2021

In my opinion, whether it is useful or not needs to be determined according to the purpose. My classification is like this

  • Useless: Hard to think of usage
    • ε, EOI
  • Ignored: Formatting needs to use these semantics
    • comment, space, newline
  • Unnamed(Weak semantics): macros need to use these semantics
    • keywords, brackets, operators
  • Effective semantics:
    • others

According to this classification, my filter looks like this

pub fn flatten(node: Node) -> Node {
    let mut buffer = vec![];
    for node in node.children {
        flatten_rec(node, &mut buffer)
    }
    Node {
        rule: node.rule,
        start: node.start,
        end: node.end,
        children: buffer,
        alternative: node.alternative,
    }
}

pub fn flatten_rec(node: Node, buffer: &mut Vec<Node>) {
    match node.rule {
        // flatten these nodes
        Rule::Any | Rule::List => {
            for node in node.children {
                flatten_rec(node, buffer)
            }
        }
        // not important
        Rule::EOI => {}
        #[cfg(feature = "no-ignored")]
        Rule::IGNORE => {}
        #[cfg(not(feature = "no-ignored"))]
        Rule::IGNORE if node.start == node.end => {}
        #[cfg(feature = "no-unnamed")]
        Rule::Terminal => {}
        #[cfg(not(feature = "no-unnamed"))]
        Rule::Terminal if node.start == node.end => {}
        _ => buffer.push(flatten(node)),
    }
}

@seanyoung
Copy link
Owner

How can the parser generator decided which nodes to create and which not to create nodes for?

@seanyoung
Copy link
Owner

We could take inspiration from pest and do not create nodes for rules which start with an underscore.

@seanyoung seanyoung added the enhancement New feature or request label Oct 23, 2021
@seanyoung seanyoung changed the title Qustion about children of Terminal Suppress output of unnecessary nodes Oct 23, 2021
@oovm
Copy link
Contributor Author

oovm commented Oct 27, 2021

It sounds like a feasible design, but

If the node is hidden, who will hold the label and alternative attached to the node.

eg: what's the result of 1+2 under rule:

expr <- _expr0;
_expr0 <- 
    add:/ <lhs:_expr0> ("+"/"-") <rhs:_expr1>
        / _expr1;
expr1 <- 
    mul:/ <lhs:_expr1> ("*"/"/") <rhs:_expr2>
        / _expr2;
_expr2 <- 
    pow:/ <lhs:_expr3> "^" <rhs:_expr2>
        / _expr3;
_expr3 <- num:/num;

@seanyoung
Copy link
Owner

So the idea is that if a node is hidden, then it will inherit its (non-hidden) children. So lhs and rhs are in the parse tree, even though _expr0 is not.

Now that does leave the question about the alternative though..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants