Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pratt parsing support #131

Open
2 tasks done
epage opened this issue Feb 7, 2023 · 33 comments · May be fixed by #614
Open
2 tasks done

Pratt parsing support #131

epage opened this issue Feb 7, 2023 · 33 comments · May be fixed by #614
Labels
A-combinator Area: combinators C-enhancement Category: Raise on the bar on expectations E-help-wanted Call for participation: Help is requested to fix this issue.

Comments

@epage
Copy link
Collaborator

epage commented Feb 7, 2023

Please complete the following tasks

winnow version

0.2.0

Describe your use case

This would be a performant way of handling precedence in parsing

See zesterer/chumsky#51

Describe the solution you'd like

See

Alternatives, if applicable

No response

Additional Context

No response

@epage epage added A-combinator Area: combinators E-help-wanted Call for participation: Help is requested to fix this issue. C-enhancement Category: Raise on the bar on expectations labels Feb 7, 2023
@epage
Copy link
Collaborator Author

epage commented Nov 14, 2023

For more technical details, see

@Zij-IT
Copy link

Zij-IT commented Jan 28, 2024

Heyyo! I was the one to handle a significant portion of the Pratt parser implementation that is available in chumsky :)

I would have to familiarize myself with how y'all do things here at Winnow, but I would be happy to take a crack at getting a Pratt parsing implementation going for Winnow.

Do you have an idea already for an API that you would prefer, or guidelines on how to keep the API winnow-style :D I'd love to contribute to getting some cross-pollination going between two great parsing crates!

@epage
Copy link
Collaborator Author

epage commented Jan 29, 2024

That would be greatly appreciated!

I have no formal background in parsing and only know of Pratt Parsing from matklad's posts and then your work with chumksy. I'm not even sure where or if I would use it in my own work. So its hard for me to give too specific of guidance on the API.

Overall, the philosophical divergence between chumsky and winnow is that chumsky is a framrwork while winnow is a toolbox. Chumsky owns the process and users plug into that (e.g. users are expected to not implement the Parser trait). Winnow provides common primitives and allows users to build up their parser out of that, as needed. We freely encourage mixing of imperative and functional / declarative parsers. Unsure how that might apply but thought I'd point that out.

One challenge I repeatedly run into is that when we give too high level of a helper, it also constrains the user. A trivial example of this is Parser::try_map which doesn't let you cut-to-top the error from try_map separate from the error from the parser it wraps (#180).

@Zij-IT
Copy link

Zij-IT commented Jan 30, 2024

No worries! I'll do what I can to explain what the plan is before jumping on the solution so that we can get at one that you feel both fits into winnow, matches it's approach and is performant!

Okay, so before getting into the implementation itself, would something like the following be acceptable? Below I have different variations of about the same concept listed. Which ones do you find more "winnow-ish"

Operator Parsers


For Pratt parsing there are three important pieces of information that are required in order to properly parse an operator:

  1. Its associativity (right or left)
  2. Its strength (how tightly it binds. e.g. + binds weaker than *)
  3. Whether its prefix, postfix or infix

To this end, its important to discuss how an operator parser should look like. Here are a couple ideas that I have:

  1. So, the first idea would be to take the implementation from Chumsky, and adapt the implementation for winnow. That would look like this:

    // Pretend that `unary` and `binary` create expressions of one type `Expr`
    prefix(Right(1), '-', |r| unary(r, Op::Neg));
    infix(Left(0), '+', |l, r| binary(l, r, Op::Add));
    postfix(Right(3), '!', |r| unary(r, Op::Fact));
  2. Another idea would be to add infix, prefix and postfix methods onto Parser instead of free-standing functions:

    '-'.prefix(Right(1), |r| unary(r, Op::Neg));
    '+'.infix(Left(0), |l, r| binary(l, r, Op::Add));
    '!'.prefix(Right(3), |r| unary(r, Op::Fact));

Pratt Parser


Assuming that you prefer the second option, here is an idea for how it would look like with a pratt method provided by the Parser trait. pratt could also be free-standing function which accepts two parsers: One for the "atom" expression and one for the operators.

// Allowing all the operators to be grouped in one tuple
let atom = digits1.map(Expr::Int);
let calc = atom.pratt((
    '-'.prefix(Right(1), |r| unary(r, Op::Neg));
    '+'.infix(Left(0), |l, r| binary(l, r, Op::Add));
    '!'.prefix(Right(3), |r| unary(r, Op::Fact));
));

General things


Would I be correct in assuming that this would be best put behind a feature "pratt", or would you want this to be included by default?

@epage
Copy link
Collaborator Author

epage commented Jan 30, 2024

Its strength (how tightly it binds. e.g. + binds weaker than *)

So this is "precedence", right?

Right(1),

I assume this is associativity and strength combined. Why have them together?

pratt could also be free-standing function which accepts two parsers:

This reminds me that I forgot to list one of the design principles. Grammar-level functionality (ie things you'd see in BNF like literals, repetition, alternatives) are free functions and Parser trait methods are reserved for adapting to the application domain (along with some grammar-level error reporting like Parser::try_map).

So I assume we'd go with

let calc = pratt(
    digits1.map(Expr::Int),
    (
        '-'.prefix(Right(1), |r| unary(r, Op::Neg));
        '+'.infix(Left(0), |l, r| binary(l, r, Op::Add));
        '!'.prefix(Right(3), |r| unary(r, Op::Fact));
    )
);
  • Looking at chumsky, prefix/postfix don't use an associativity, only strength. Thoughts on that?
  • I find their functions wrapping the Associativity enum odd. I guess just to force everything to be lower case?
  • Looks like chumsky accepts multiple function signatures for fold expressions. Is the assumption that this would do the same?

Naming is hard. I'm finding that I prefer names that works at the grammar-level like with #440. I've also had bad experiences looking at graph libraries and not being able to find what I want because they are named after their algorithm and not their purpose. Of course, there is the challenge of multiple algorithms that can meet the same purpose with different criteria, so I'm not too sure how to balance the two.

Some tools we have to help are

  • #[doc(alias = "pratt")] (if we don't name it pratt)
  • #[doc(alias = "separated")]
  • Cross-linking separated to pratt
  • Ensuring the doc-comment summary focuses on the intent

As an aside, when it comes to the combinator summary, feel free to leave all the entries blank because this won't fit into such a simple summary.

Would I be correct in assuming that this would be best put behind a feature "pratt", or would you want this to be included by default?

You could start it under unstable-pratt which gives us time to answer questions like that and lowers the bar for what the initial API looks like.

For when we get there, is there a reason to have this be a separate feature?

@Zij-IT
Copy link

Zij-IT commented Jan 30, 2024

So this is "precedence", right?

Yup!

I assume this is associativity and strength combined. Why have them together?

That's just how it was done with Chumsky. It would be totally fine to have another parameter for the strength.

Looking at chumsky, prefix/postfix don't use an associativity, only strength. Thoughts on that?

I just totally spaced that, and should have removed it from prefix and postfix.

I find their functions wrapping the Associativity enum odd. I guess just to force everything to be lower case?

This choice is largely just keystrokes I believe. It is indeed a bit odd in hindsight 😅

Looks like chumsky accepts multiple function signatures for fold expressions. Is the assumption that this would do the same?

It can be if that would be desired. It would just require an intermediate trait as in the Chumsky impl. It would also be possible to just restrict it to something in the form Fn(O, Op) -> O for pre- or postfix and Fn(O, Op, O) -> O for infix. I don't know if winnow has an equivalent to MapExtra that would be meaningful to pass in.

For when we get there, is there a reason to have this be a separate feature?

Not any that I can think of. It should be a rather small addition, but I wanted to check :) Chumsky has it under a flag, so I wanted to check.

@epage
Copy link
Collaborator Author

epage commented Jan 30, 2024

It can be if that would be desired. It would just require an intermediate trait as in the Chumsky impl. It would also be possible to just restrict it to something in the form Fn(O, Op) -> O for pre- or postfix and Fn(O, Op, O) -> O for infix. I don't know if winnow has an equivalent to MapExtra that would be meaningful to pass in.

We don't have MapExtra at this time.

I'm fine with a trait being used for this. I'm considering doing similar in other places.

This has been a great conversation however if we start with it under unstable-pratt, we can move forward with whatever and iterate on it over time without affecting semver.

@epage
Copy link
Collaborator Author

epage commented Jul 15, 2024

https://gist.github.com/ilonachan/3d92577265846e5327a3011f7aa30770 is interesting to consider which I came across via rust-bakery/nom#1362

@39555 39555 linked a pull request Nov 12, 2024 that will close this issue
8 tasks
@epage
Copy link
Collaborator Author

epage commented Nov 12, 2024

Trying to collect all styles of "pratt" parsers for easier comparison of their API

chumksy::pratt

let atom = text::int::<_, _, extra::Err<Simple<char>>>(10)
    .from_str()
    .unwrapped()
    .map(Expr::Literal)
    .padded();

let op = |c| just(c).padded();

let expr = atom.pratt((
    // We want factorial to happen before any negation, so we need its precedence to be higher than `Expr::Neg`.
    postfix(4, op('!'), |lhs| Expr::Factorial(Box::new(lhs))),
    // Just like in math, we want that if we write -x^2, our parser parses that as -(x^2), so we need it to have
    // exponents bind tighter than our prefix operators.
    infix(right(3), op('^'), |l, r| Expr::Pow(Box::new(l), Box::new(r))),
    // Notice the conflict with our `Expr::Sub`. This will still parse correctly. We want negation to happen before
    // `+` and `-`, so we set its precedence higher.
    prefix(2, op('-'), |rhs| Expr::Neg(Box::new(rhs))),
    prefix(2, op('*'), |rhs| Expr::Deref(Box::new(rhs))),
    // Our `-` and `+` bind the weakest, meaning that even if they occur first in an expression, they will be the
    // last executed.
    infix(left(1), op('+'), |l, r| Expr::Add(Box::new(l), Box::new(r))),
    infix(left(1), op('-'), |l, r| Expr::Sub(Box::new(l), Box::new(r))),
))
    .map(|x| x.to_string());

combine-language::expression_parser

fn op(l: Expr, o: &'static str, r: Expr) -> Expr {
    Op(Box::new(l), o, Box::new(r))
}

let op_parser = string("+").or(string("*"))
    .map(|op| {
        let prec = match op {
            "+" => 6,
            "*" => 7,
            _ => unreachable!()
        };
        (op, Assoc { precedence: prec, fixity: Fixity::Left })
    })
    .skip(spaces());
let term = many(letter())
    .map(Id)
    .skip(spaces());
let mut parser = expression_parser(term, op_parser, op);

proposed nom::precedence (rust-bakery/nom#1362)

  precedence(
    unary_op(1, tag("-")), //prefix operators
    fail, //postfix operators
    alt(( //binary operators
      binary_op(2, Assoc::Left, tag("*")),
      binary_op(2, Assoc::Left, tag("/")),
      binary_op(3, Assoc::Left, tag("+")),
      binary_op(3, Assoc::Left, tag("-")),
    )),
    alt(( //operands
      map_res(digit1, |s: &str| s.parse::<i64>()),
      delimited(tag("("), parser, tag(")")), //subexpression handled via recursion
    )),
    |op: Operation<&str, &str, &str, i64>| { //evaluating the expression step by step
      use nom::precedence::Operation::*;
      match op {
        Prefix("-", o) => Ok(-o),
        Binary(lhs, "*", rhs) => Ok(lhs * rhs),
        Binary(lhs, "/", rhs) => Ok(lhs / rhs),
        Binary(lhs, "+", rhs) => Ok(lhs + rhs),
        Binary(lhs, "-", rhs) => Ok(lhs - rhs),
        _ => Err("Invalid combination"),
      }
    }
  )(i)

https://gist.github.com/ilonachan/3d92577265846e5327a3011f7aa30770

            pratt(
                atom,
                alt((
                    binary_op(tag("="), 2, 1, map_infix),
                    ternary_op,
                    binary_op(tag("+").or(tag("-")), 5, 6, map_infix),
                    binary_op(tag("*").or(tag("/")), 7, 8, map_infix),
                    binary_op(tag("."), 14, 13, map_infix),
                )),
                alt((
                    unary_op(tag::<&str, &str, E>("+").or(tag("-")), 9, map_prefix),
                    group_op,
                )),
                alt((unary_op(tag("!"), 11, map_postfix), index_op)),
                start_bp,
            )
            .parse(i)

Initial #614

precedence(
    digit1.try_map(|d: &str| d.parse::<i32>()),
    (
        "-".value(2).prefix(|x| -1 * x),
        "+".value(2).prefix(|x| x),
        "!".value(2).postfix(|x| factorial(x)),
        "+".value(0).infix(|a, b| a + b),
        "-".value(0).infix(|a, b| a + b),
        "*".value(1).infix(|a, b| a * b),
        "/".value(1).infix(|a, b| a / b),
    ),
)
.parse_next(i)

@39555
Copy link

39555 commented Nov 17, 2024

Found another nice interface https://docs.rs/pest/latest/pest/pratt_parser/struct.PrattParser.html. Unfortunately all map closures are boxed.

let pratt =
    PrattParser::new()
        .op(Op::infix(Rule::add, Assoc::Left) | Op::infix(Rule::sub, Assoc::Left))
        .op(Op::infix(Rule::mul, Assoc::Left) | Op::infix(Rule::div, Assoc::Left))
        .op(Op::infix(Rule::pow, Assoc::Right))
        .op(Op::prefix(Rule::neg))
        .op(Op::postfix(Rule::fac));

    
pratt
        .map_primary(|primary| match primary.as_rule() {
            Rule::int  => primary.as_str().parse().unwrap(),
            Rule::expr => parse_expr(primary.into_inner(), pratt), // from "(" ~ expr ~ ")"
            _          => unreachable!(),
        })
        .map_prefix(|op, rhs| match op.as_rule() {
            Rule::neg  => -rhs,
            _          => unreachable!(),
        })
        .map_postfix(|lhs, op| match op.as_rule() {
            Rule::fac  => (1..lhs+1).product(),
            _          => unreachable!(),
        })
        .map_infix(|lhs, op, rhs| match op.as_rule() {
            Rule::add  => lhs + rhs,
            Rule::sub  => lhs - rhs,
            Rule::mul  => lhs * rhs,
            Rule::div  => lhs / rhs,
            Rule::pow  => (1..rhs+1).map(|_| lhs).product(),
            _          => unreachable!(),
        })
        .parse(pairs)
}

@epage
Copy link
Collaborator Author

epage commented Nov 18, 2024

For anyone subscribed to this but not following along, @39555 has been doing a lot of exploration for implementation and API.

In particular, we've been looking at recursive vs iterative. Recursion is generally faster but hides the recursive nature and prevents people from doing stack overflow prevention.

See

@epage
Copy link
Collaborator Author

epage commented Nov 18, 2024

In #614 (comment), @39555 called out a missing feature in most of the above pratt parsing APIs: the ability to parse within an operator.. https://gist.github.com/ilonachan/3d92577265846e5327a3011f7aa30770 is the only one I saw that handles this and uses it for ternary opeator, indexing, and parenthesis.

@39555 couldn't that be handled by changing

            delimited(multispace0, alt((
                dispatch! {any;
                    '!' => empty.value((20, (|_: &mut _, a| Ok(Expr::Fac(Box::new(a)))) as _)),
                    '?' => empty.value((3, (|i: &mut &str, cond| {

to

            delimited(multispace0, alt((
                dispatch! {any;
                    '!' => empty.value((20, (|_: &mut _, a| Ok(Expr::Fac(Box::new(a)))) as _)),
                    '?' => cut_err(seq!(
                        _: multispace0,
                        pratt_parser,
                        _: multispace0,
                        _: ':',
                        _: multispace0, 
                        pratt_parser,
                        _: multispace0,
                    )).value((3, (|i: &mut &str, cond| {

@39555
Copy link

39555 commented Nov 18, 2024

@epage There are 2 Problems. We can't capture results from the left and right parsers. It is unusual for the operator to produce additional operands.

'?' => cut_err(seq!(
                        _: multispace0,
                        pratt_parser,
                        _: multispace0,
                        _: ':',
                        _: multispace0, 
                        pratt_parser,
                        _: multispace0,
                    )).map(|(left, right)| (3, &(move |cond| {Expr::Ternary(Box::new(cond), Box::new(left), Box::new(right))})
  • The closure becomes FnOnce here, and since all closures are &dyn, I can't call it because calling a function moves FnOnce, and moving is not possible from the reference &dyn.

  • If it were something like |(left, right)| (3, &(move |cond| { Expr::Ternary(Box::new(cond), Box::new(left.clone()), Box::new(right.clone())) }), then the closure would be locally created, and we would return a reference to the local value."

@epage
Copy link
Collaborator Author

epage commented Nov 18, 2024

Wouldn't the "fold map" get

  • left: condition
  • right: (true_expr, false_expr)

So no state is needed and we can use Fn?

@39555
Copy link

39555 commented Nov 18, 2024

If I understand correctly, you propose implementing the ternary operator as an infix operator, so it would be (cond) ? (left : right)? In that case, the right operand would actually be unconstrained from ?, allowing incorrect syntax like 1 + 3 : 2.

Or do you mean if we could return whatever the parser produced back to the closure? This whatever would need to be the same type. But maybe it’s possible and a good idea!

@epage
Copy link
Collaborator Author

epage commented Nov 19, 2024

Or do you mean if we could return whatever the parser produced back to the closure? This whatever would need to be the same type. But maybe it’s possible and a good idea!

This. I suspect its a big if on making the fold accept the associated parser output rather than enforcing all of the fold's to be the same.

@39555
Copy link

39555 commented Nov 20, 2024

@epage
Copy link
Collaborator Author

epage commented Nov 20, 2024

From #620 (comment)

It is hard to beat the application stack in terms of flexibility, platform support and speed. The recursion grows only with prefixes like -+--++-++--+1 and with right associative infix operators such as pow: 1 ** 2 ** 3 ** 4 ** 5 ** 7 ** 2 ** 1. I assume the user can track the recursion inside the parser and the fold function using the Stateful<> input, as thefold occurs only after all the operands have been parsed.

|i: &mut Stateful<...>| {
  if i.state.recursion > 99 {
       return Err(...)
  }
  i.state.recursion +=1;
  ...
}.value((Right(20), |i: _, a: _, b:_ | {
     i.state.recursion -= 1;
     a + b
}))

@epage
Copy link
Collaborator Author

epage commented Nov 20, 2024

That is true that users can limit recursion on their side. We will need a warning block in our documentation that recursion happens and they may consider checking for recursion depth. Not as helpful as using the heap but likely good enough.

So I'm fine with moving forward with recusion.

So that just leaves which API design we should go for, right?

@39555
Copy link

39555 commented Nov 20, 2024

Yes. I hope we will find something nice and winnowish. The current interface is practical (it supports dispatch, recursion, custom allocators, error reporting) and somewhat understandable if the operators are wrapped in trace. While building the example, I started to like it. It is much better than whatever combine-language has. It is clearer than chumsky and nom in terms of how operators come first and form a column, making it easy to spot each one.

My concerns are:

  • Sometimes, when something is incorrect in one of the many fold closures, the compiler produces unclear error messages for the entire parser instead of pointing to the specific place.
  • A fold closure receives the mutable input along with its arguments, which is unusual for winnow. Moreover, it returns a Result, behaving somewhat between map, try_map, and a parser.
  • |_|{} as _
  • Not important: The separated arguments for prefix, postfix, and infix are nice, but while working on the example, I found myself wanting to sort operators by precedence (similar to the reference table) instead of jumping back and forth. However, I guess it's not possible or is difficult to achieve while keeping everything simple.

@epage
Copy link
Collaborator Author

epage commented Nov 20, 2024

Taking the current API from #620

pub fn precedence<I, ParseOperand, ParseInfix, ParsePrefix, ParsePostfix, Operand, E>(
    start_power: i64,
    mut operand: ParseOperand,
    mut prefix: ParsePrefix,
    mut postfix: ParsePostfix,
    mut infix: ParseInfix,
) -> impl Parser<I, Operand, E>
where
    I: Stream + StreamIsPartial,
    ParseOperand: Parser<I, Operand, E>,
    ParseInfix: Parser<I, (Assoc, fn(&mut I, Operand, Operand) -> PResult<Operand, E>), E>,
    ParsePrefix: Parser<I, (i64, fn(&mut I, Operand) -> PResult<Operand, E>), E>,
    ParsePostfix: Parser<I, (i64, fn(&mut I, Operand) -> PResult<Operand, E>), E>,
    E: ParserError<I>;

#[derive(Debug, Clone, Copy)]
pub enum Assoc {
    Left(i64),
    Right(i64),
    Neither(i64),
}
  • I am currently leaning towards the name expression
  • I'd prefer for the trace to be inside expression rather than requiring users to do it to have readable code
  • Keeping the scope smaller for each part and being more specific would probably help (the unclear error messages, the as _ coercion)

Those last two have me leaning to go ahead with methods on an Expression type with struct-tuples for each parser's return type. The struct tuples will hopefully do the coercion of functions and make the generic bounds clearer.

Maybe we could impl Parser for <struct-tuple> that takes care of empty.value(...) for users.

I do wonder about Assoc not having any value part of it like rust-bakery/nom#1362 so the precedence levels all line up for easier visibility.

For the behavior of the "fold", I wish there was a sane default which would let us make this a method, allowing us to pick a specific variant. A nasty idea is we could use tacit traits / marker trait specialization. People I approach about solving problems with tacit traits usually get turned off by the amount of generics complexity making it less approachable.

Not important: The separated arguments for prefix, postfix, and infix are nice, but while working on the example, I found myself wanting to sort operators by precedence (similar to the reference table) instead of jumping back and forth. However, I guess it's not possible or is difficult to achieve while keeping everything simple.

This does sadden me but between dispatch! and this, I lean towards dispatch!. There might be a way to make both work but I doubt it will be worth it.

@39555
Copy link

39555 commented Nov 20, 2024

Thanks for the tacit traits article! I will give it a try.

Those last two have me leaning to go ahead with methods on an Expression type with struct-tuples for each parser's return type. The struct tuples will hopefully do the coercion of functions and make the generic bounds clearer.

Do you mean

expression().prefix().infix().postfix() // or
expr = expression();
expr.prefix(dispatch!{...}); // or 
struct Infix(i64, Assoc, fn(...) -> ...);
expr.infix(Infix(dispatch{...}))

?

@epage
Copy link
Collaborator Author

epage commented Nov 20, 2024

expression(digit1.try_map(|d: &str| d.parse::<i32>()))
    .prefix(dispatch! {any;
        '+' => Prefix(9, |_: &mut _, a| Ok(a)),
        '-' => Prefix(9, |_: &mut _, a: i32| Ok(-a)),
        _ => fail
    })
    .infix(dispatch! {any;
        '+' => Infix(5, Assoc::Left, |_: &mut _, a, b| Ok(a + b)),
        '-' => Infix(5, Assoc::Left, |_: &mut _, a, b| Ok(a - b)),
        '*' => Infix(7, Assoc::Left, |_: &mut _, a, b| Ok(a * b)),
        '/' => Infix(7, Assoc::Left, |_: &mut _, a, b| Ok(a / b)),
        '%' => Infix(7, Assoc::Left, |_: &mut _, a, b| Ok(a % b)),
        '^' => Infix(9, Assoc::Right, |_: &mut _, a, b| Ok(a ^ b)),
        _ => fail
    })
    .parse_next(i)
  • It is redundant with infix / Infix but so would a method for removing as _
  • I wish we could default Assoc so it only shows up when its Right / Neither because that seems like the most common case

Maybe Infix could the enum

expression(digit1.try_map(|d: &str| d.parse::<i32>()))
    .prefix(dispatch! {any;
        '+' => Prefix(9, |_: &mut _, a| Ok(a)),
        '-' => Prefix(9, |_: &mut _, a: i32| Ok(-a)),
        _ => fail
    })
    .infix(dispatch! {any;
        '+' => Infix::Left(5, |_: &mut _, a, b| Ok(a + b)),
        '-' => Infix::Left(5, |_: &mut _, a, b| Ok(a - b)),
        '*' => Infix::Left(7, |_: &mut _, a, b| Ok(a * b)),
        '/' => Infix::Left(7, |_: &mut _, a, b| Ok(a / b)),
        '%' => Infix::Left(7, |_: &mut _, a, b| Ok(a % b)),
        '^' => Infix::Right(9, |_: &mut _, a, b| Ok(a ^ b)),
        _ => fail
    })
    .parse_next(i)
  • Users could use winnow::combiantor::Infix::Left as ... to alias it for short
  • Haven't thought through the impact on implementation
  • Doesn't have the full consistency for precedence levels that I was originally hoping for but syntax highlighting helps

Seeing the builder being used, makes me also wonder about

operand(digit1.try_map(|d: &str| d.parse::<i32>()))
    .prefix(dispatch! {any;
        '+' => Prefix(9, |_: &mut _, a| Ok(a)),
        '-' => Prefix(9, |_: &mut _, a: i32| Ok(-a)),
        _ => fail
    })
    .infix(dispatch! {any;
        '+' => Infix::Left(5, |_: &mut _, a, b| Ok(a + b)),
        '-' => Infix::Left(5, |_: &mut _, a, b| Ok(a - b)),
        '*' => Infix::Left(7, |_: &mut _, a, b| Ok(a * b)),
        '/' => Infix::Left(7, |_: &mut _, a, b| Ok(a / b)),
        '%' => Infix::Left(7, |_: &mut _, a, b| Ok(a % b)),
        '^' => Infix::Right(9, |_: &mut _, a, b| Ok(a ^ b)),
        _ => fail
    })
    .parse_next(i)
  • We name the parameter at each stage
  • Seems like the struct should be named Expression but this would be the first time a combinator doesn't match in function and struct names

@epage
Copy link
Collaborator Author

epage commented Nov 20, 2024

@39555 of course in all of this, feel free to share your opinions. If you think we should go in a different direction, I would love to hear your thoughts!

@39555
Copy link

39555 commented Nov 21, 2024

The as _ related to rust-lang/rust#58078

@39555
Copy link

39555 commented Nov 21, 2024

My experiments so far. I managed to get impl Parsers working so numbers are nicely aligned and Left is the default.(See .infix). But my attempt to support multiple signatures and eliminate the need for as _ has failed (see .prefix)

  expression(0, digit1.parse_to::<i32>())
      .prefix(dispatch! {any;
          // Cannot implement with tacit traits because the arguments of the closure are not recognized
          // when coercing to fn() without explicitly specifying the turbofish:
          // multiple applicable items in scope
          // candidate #1 is defined in an impl for the type `UnaryFn<fn(&mut I, O) -> PResult<O, E>>`
          // candidate #2 is defined in an impl for the type `UnaryFn<fn(O) -> O>`
          // Same with `UnaryFn<O>` or UnaryFn<(O,)>
          // '-' => UnaryFn::new(3, |_: &mut _, a: i32| Ok(-a)), // <- Error
           '+' => UnaryFn::<_>::new(3, |a: i32| -a),
          _ => fail
      })
      .infix(dispatch! {any;
         '+' => (5, (|_: &mut _, a, b| Ok(a + b)) as _),
         '-' => (5, (|_: &mut _, a, b| Ok(a - b)) as _),
         '*' => (7, (|_: &mut _, a, b| Ok(a * b)) as _),
         '/' => (7, (|_: &mut _, a, b| Ok(a / b)) as _),
         '%' => (7, (|_: &mut _, a, b| Ok(a % b)) as _),
         '^' => (9, Assoc::Right, (|_: &mut _, a, b| Ok(a ^ b)) as _),
         _ => fail
      })
      .parse_next(i)

impl<I: Stream, O, E: ParserError<I>> Parser<I, Infix<I, O, E>, E>
    for (i64, fn(&mut I, O, O) -> PResult<O, E>)
{
    fn parse_next(&mut self, input: &mut I) -> PResult<Infix<I, O, E>, E> {
        empty
            .value(Infix(self.0, Assoc::Left, self.1))
            .parse_next(input)
    }
}
impl<I: Stream, O, E: ParserError<I>> Parser<I, Infix<I, O, E>, E>
    for (i64, Assoc, fn(&mut I, O, O) -> PResult<O, E>)
{
    fn parse_next(&mut self, input: &mut I) -> PResult<Infix<I, O, E>, E> {
        empty.value(Infix(self.0, self.1, self.2)).parse_next(input)
    }
}

@epage
Copy link
Collaborator Author

epage commented Nov 21, 2024

Hmm, I feel like handling the two tuples like that will be confusing to users. The more layers through generics you go, the harder it is. This is the reason I'm hesitant to use tacit traits. I know chumsky documents each signature on each function that accepts them which helps.

Does Prefix(9, |_: &mut _, a| Ok(a)) / Infix::Left(5, |_: &mut _, a, b| Ok(a + b)) not need as _ and not need turbofish?

@39555
Copy link

39555 commented Nov 22, 2024

Works great #614 (comment)

use Infix::*;
expression(0, digit1.parse_to::<i32>())
    .prefix(dispatch! {any;
        '+' => Prefix(12, |_, a| Ok(a)),
        '-' => Prefix(12, |_, a: i32| Ok(-a)),
        _ => fail
    })
    .infix(dispatch! {any;
       '+' => Left(5, |_, a, b| Ok(a + b)),
       '-' => Left(5, |_, a, b| Ok(a - b)),
       '*' => Left(7, |_, a, b| Ok(a * b)),
       '/' => Left(7, |_, a, b| Ok(a / b)),
       '%' => Left(7, |_, a, b| Ok(a % b)),
       '^' => Right(9, |_, a, b| Ok(a ^ b)),
       _ => fail
    })
    .parse_next(i)

@epage
Copy link
Collaborator Author

epage commented Nov 22, 2024

I feel like I've been driving the API conversation at this point. Any thoughts of what could be better or done differently?

@39555
Copy link

39555 commented Nov 22, 2024

No. I tried what I wanted in #131 (comment). I think the current solution is much closer to the ideal than all previous ones. Maybe I will find something interesting over the weekend.

@39555
Copy link

39555 commented Nov 23, 2024

@epage What is your opinion about the api with one big fold in rust-bakery/nom#1362? Initially, I didn't like it, but now, after taking a closer look, it resembles the repeat parser, accumulate or fold.

@epage
Copy link
Collaborator Author

epage commented Nov 23, 2024

That would simplify the API and keep it consistent with other parts.

However, it would make it so each operator parser must return the same value which could make some of the nested parsing done with dispatch! more complex.

It also requires you re-match on the operator which will be slower and more error prone.

@39555
Copy link

39555 commented Nov 23, 2024

rereading the nom issue I found ilonachan's ideas quite interesting rust-bakery/nom#1362 (comment). A parser with only 2 arguments:

    def expression_parser() -> impl Parser<&str, i32, ContextError> {
        expression(
            dispatch!{any;
                // prefix
                '-' => |i: &mut &str| {
                    let op = parser.parse_next(i)?;
                    Ok(-op)
                },
                '+' => |i: &mut &str| {
                    let op = parser.parse_next(i)?;
                    Ok(op)
                },
                // operand
                _ => digit1.parse_to::<i32>()

            },
            move |a: i32| {
                dispatch!{any;
                    // postfix
                    '!' => |_| {
                        Ok(factorial(a))
                    },
                    '?' => |i: &mut &str| {
                        let (a, b) = separated_pair(parser, ':', parser).parse_next(i)?;
                        ...
                    },
                    // infix
                    '+' => |i: &mut &str| {
                        let b = parser.parse_next(i)?;
                        Ok(a + b)
                    },
                    _ => digit1.parse_to::<i32>()
                },
            }
        )
    }

+ User can control the recursion (or maybe they could use a Vec?).
+ No fn or Fn
+ more dispatch
- a lot of repetition for the infix
- I don't know how to control the precedence

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-combinator Area: combinators C-enhancement Category: Raise on the bar on expectations E-help-wanted Call for participation: Help is requested to fix this issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants