-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pratt parsing support #131
Comments
Heyyo! I was the one to handle a significant portion of the Pratt parser implementation that is available in chumsky :) I would have to familiarize myself with how y'all do things here at Winnow, but I would be happy to take a crack at getting a Pratt parsing implementation going for Winnow. Do you have an idea already for an API that you would prefer, or guidelines on how to keep the API winnow-style :D I'd love to contribute to getting some cross-pollination going between two great parsing crates! |
That would be greatly appreciated! I have no formal background in parsing and only know of Pratt Parsing from matklad's posts and then your work with chumksy. I'm not even sure where or if I would use it in my own work. So its hard for me to give too specific of guidance on the API. Overall, the philosophical divergence between chumsky and winnow is that chumsky is a framrwork while winnow is a toolbox. Chumsky owns the process and users plug into that (e.g. users are expected to not implement the One challenge I repeatedly run into is that when we give too high level of a helper, it also constrains the user. A trivial example of this is |
No worries! I'll do what I can to explain what the plan is before jumping on the solution so that we can get at one that you feel both fits into winnow, matches it's approach and is performant! Okay, so before getting into the implementation itself, would something like the following be acceptable? Below I have different variations of about the same concept listed. Which ones do you find more "winnow-ish" Operator ParsersFor Pratt parsing there are three important pieces of information that are required in order to properly parse an operator:
To this end, its important to discuss how an operator parser should look like. Here are a couple ideas that I have:
Pratt ParserAssuming that you prefer the second option, here is an idea for how it would look like with a // Allowing all the operators to be grouped in one tuple
let atom = digits1.map(Expr::Int);
let calc = atom.pratt((
'-'.prefix(Right(1), |r| unary(r, Op::Neg));
'+'.infix(Left(0), |l, r| binary(l, r, Op::Add));
'!'.prefix(Right(3), |r| unary(r, Op::Fact));
)); General thingsWould I be correct in assuming that this would be best put behind a feature |
So this is "precedence", right?
I assume this is associativity and strength combined. Why have them together?
This reminds me that I forgot to list one of the design principles. Grammar-level functionality (ie things you'd see in BNF like literals, repetition, alternatives) are free functions and So I assume we'd go with let calc = pratt(
digits1.map(Expr::Int),
(
'-'.prefix(Right(1), |r| unary(r, Op::Neg));
'+'.infix(Left(0), |l, r| binary(l, r, Op::Add));
'!'.prefix(Right(3), |r| unary(r, Op::Fact));
)
);
Naming is hard. I'm finding that I prefer names that works at the grammar-level like with #440. I've also had bad experiences looking at graph libraries and not being able to find what I want because they are named after their algorithm and not their purpose. Of course, there is the challenge of multiple algorithms that can meet the same purpose with different criteria, so I'm not too sure how to balance the two. Some tools we have to help are
As an aside, when it comes to the combinator summary, feel free to leave all the entries blank because this won't fit into such a simple summary.
You could start it under For when we get there, is there a reason to have this be a separate feature? |
Yup!
That's just how it was done with Chumsky. It would be totally fine to have another parameter for the strength.
I just totally spaced that, and should have removed it from
This choice is largely just keystrokes I believe. It is indeed a bit odd in hindsight 😅
It can be if that would be desired. It would just require an intermediate trait as in the Chumsky impl. It would also be possible to just restrict it to something in the form
Not any that I can think of. It should be a rather small addition, but I wanted to check :) Chumsky has it under a flag, so I wanted to check. |
We don't have I'm fine with a trait being used for this. I'm considering doing similar in other places. This has been a great conversation however if we start with it under |
https://gist.github.com/ilonachan/3d92577265846e5327a3011f7aa30770 is interesting to consider which I came across via rust-bakery/nom#1362 |
Trying to collect all styles of "pratt" parsers for easier comparison of their API let atom = text::int::<_, _, extra::Err<Simple<char>>>(10)
.from_str()
.unwrapped()
.map(Expr::Literal)
.padded();
let op = |c| just(c).padded();
let expr = atom.pratt((
// We want factorial to happen before any negation, so we need its precedence to be higher than `Expr::Neg`.
postfix(4, op('!'), |lhs| Expr::Factorial(Box::new(lhs))),
// Just like in math, we want that if we write -x^2, our parser parses that as -(x^2), so we need it to have
// exponents bind tighter than our prefix operators.
infix(right(3), op('^'), |l, r| Expr::Pow(Box::new(l), Box::new(r))),
// Notice the conflict with our `Expr::Sub`. This will still parse correctly. We want negation to happen before
// `+` and `-`, so we set its precedence higher.
prefix(2, op('-'), |rhs| Expr::Neg(Box::new(rhs))),
prefix(2, op('*'), |rhs| Expr::Deref(Box::new(rhs))),
// Our `-` and `+` bind the weakest, meaning that even if they occur first in an expression, they will be the
// last executed.
infix(left(1), op('+'), |l, r| Expr::Add(Box::new(l), Box::new(r))),
infix(left(1), op('-'), |l, r| Expr::Sub(Box::new(l), Box::new(r))),
))
.map(|x| x.to_string());
fn op(l: Expr, o: &'static str, r: Expr) -> Expr {
Op(Box::new(l), o, Box::new(r))
}
let op_parser = string("+").or(string("*"))
.map(|op| {
let prec = match op {
"+" => 6,
"*" => 7,
_ => unreachable!()
};
(op, Assoc { precedence: prec, fixity: Fixity::Left })
})
.skip(spaces());
let term = many(letter())
.map(Id)
.skip(spaces());
let mut parser = expression_parser(term, op_parser, op); proposed precedence(
unary_op(1, tag("-")), //prefix operators
fail, //postfix operators
alt(( //binary operators
binary_op(2, Assoc::Left, tag("*")),
binary_op(2, Assoc::Left, tag("/")),
binary_op(3, Assoc::Left, tag("+")),
binary_op(3, Assoc::Left, tag("-")),
)),
alt(( //operands
map_res(digit1, |s: &str| s.parse::<i64>()),
delimited(tag("("), parser, tag(")")), //subexpression handled via recursion
)),
|op: Operation<&str, &str, &str, i64>| { //evaluating the expression step by step
use nom::precedence::Operation::*;
match op {
Prefix("-", o) => Ok(-o),
Binary(lhs, "*", rhs) => Ok(lhs * rhs),
Binary(lhs, "/", rhs) => Ok(lhs / rhs),
Binary(lhs, "+", rhs) => Ok(lhs + rhs),
Binary(lhs, "-", rhs) => Ok(lhs - rhs),
_ => Err("Invalid combination"),
}
}
)(i) https://gist.github.com/ilonachan/3d92577265846e5327a3011f7aa30770 pratt(
atom,
alt((
binary_op(tag("="), 2, 1, map_infix),
ternary_op,
binary_op(tag("+").or(tag("-")), 5, 6, map_infix),
binary_op(tag("*").or(tag("/")), 7, 8, map_infix),
binary_op(tag("."), 14, 13, map_infix),
)),
alt((
unary_op(tag::<&str, &str, E>("+").or(tag("-")), 9, map_prefix),
group_op,
)),
alt((unary_op(tag("!"), 11, map_postfix), index_op)),
start_bp,
)
.parse(i) Initial #614 precedence(
digit1.try_map(|d: &str| d.parse::<i32>()),
(
"-".value(2).prefix(|x| -1 * x),
"+".value(2).prefix(|x| x),
"!".value(2).postfix(|x| factorial(x)),
"+".value(0).infix(|a, b| a + b),
"-".value(0).infix(|a, b| a + b),
"*".value(1).infix(|a, b| a * b),
"/".value(1).infix(|a, b| a / b),
),
)
.parse_next(i) |
Found another nice interface https://docs.rs/pest/latest/pest/pratt_parser/struct.PrattParser.html. Unfortunately all map closures are boxed. let pratt =
PrattParser::new()
.op(Op::infix(Rule::add, Assoc::Left) | Op::infix(Rule::sub, Assoc::Left))
.op(Op::infix(Rule::mul, Assoc::Left) | Op::infix(Rule::div, Assoc::Left))
.op(Op::infix(Rule::pow, Assoc::Right))
.op(Op::prefix(Rule::neg))
.op(Op::postfix(Rule::fac));
pratt
.map_primary(|primary| match primary.as_rule() {
Rule::int => primary.as_str().parse().unwrap(),
Rule::expr => parse_expr(primary.into_inner(), pratt), // from "(" ~ expr ~ ")"
_ => unreachable!(),
})
.map_prefix(|op, rhs| match op.as_rule() {
Rule::neg => -rhs,
_ => unreachable!(),
})
.map_postfix(|lhs, op| match op.as_rule() {
Rule::fac => (1..lhs+1).product(),
_ => unreachable!(),
})
.map_infix(|lhs, op, rhs| match op.as_rule() {
Rule::add => lhs + rhs,
Rule::sub => lhs - rhs,
Rule::mul => lhs * rhs,
Rule::div => lhs / rhs,
Rule::pow => (1..rhs+1).map(|_| lhs).product(),
_ => unreachable!(),
})
.parse(pairs)
} |
For anyone subscribed to this but not following along, @39555 has been doing a lot of exploration for implementation and API. In particular, we've been looking at recursive vs iterative. Recursion is generally faster but hides the recursive nature and prevents people from doing stack overflow prevention. See |
In #614 (comment), @39555 called out a missing feature in most of the above pratt parsing APIs: the ability to parse within an operator.. https://gist.github.com/ilonachan/3d92577265846e5327a3011f7aa30770 is the only one I saw that handles this and uses it for ternary opeator, indexing, and parenthesis. @39555 couldn't that be handled by changing delimited(multispace0, alt((
dispatch! {any;
'!' => empty.value((20, (|_: &mut _, a| Ok(Expr::Fac(Box::new(a)))) as _)),
'?' => empty.value((3, (|i: &mut &str, cond| { to delimited(multispace0, alt((
dispatch! {any;
'!' => empty.value((20, (|_: &mut _, a| Ok(Expr::Fac(Box::new(a)))) as _)),
'?' => cut_err(seq!(
_: multispace0,
pratt_parser,
_: multispace0,
_: ':',
_: multispace0,
pratt_parser,
_: multispace0,
)).value((3, (|i: &mut &str, cond| { |
@epage There are 2 Problems. We can't capture results from the left and right parsers. It is unusual for the operator to produce additional operands. '?' => cut_err(seq!(
_: multispace0,
pratt_parser,
_: multispace0,
_: ':',
_: multispace0,
pratt_parser,
_: multispace0,
)).map(|(left, right)| (3, &(move |cond| {Expr::Ternary(Box::new(cond), Box::new(left), Box::new(right))})
|
Wouldn't the "fold map" get
So no state is needed and we can use |
If I understand correctly, you propose implementing the ternary operator as an infix operator, so it would be Or do you mean if we could return whatever the parser produced back to the closure? This |
This. I suspect its a big if on making the fold accept the associated parser output rather than enforcing all of the fold's to be the same. |
The |
From #620 (comment)
|
That is true that users can limit recursion on their side. We will need a warning block in our documentation that recursion happens and they may consider checking for recursion depth. Not as helpful as using the heap but likely good enough. So I'm fine with moving forward with recusion. So that just leaves which API design we should go for, right? |
Yes. I hope we will find something nice and winnowish. The current interface is practical (it supports dispatch, recursion, custom allocators, error reporting) and somewhat understandable if the operators are wrapped in My concerns are:
|
Taking the current API from #620 pub fn precedence<I, ParseOperand, ParseInfix, ParsePrefix, ParsePostfix, Operand, E>(
start_power: i64,
mut operand: ParseOperand,
mut prefix: ParsePrefix,
mut postfix: ParsePostfix,
mut infix: ParseInfix,
) -> impl Parser<I, Operand, E>
where
I: Stream + StreamIsPartial,
ParseOperand: Parser<I, Operand, E>,
ParseInfix: Parser<I, (Assoc, fn(&mut I, Operand, Operand) -> PResult<Operand, E>), E>,
ParsePrefix: Parser<I, (i64, fn(&mut I, Operand) -> PResult<Operand, E>), E>,
ParsePostfix: Parser<I, (i64, fn(&mut I, Operand) -> PResult<Operand, E>), E>,
E: ParserError<I>;
#[derive(Debug, Clone, Copy)]
pub enum Assoc {
Left(i64),
Right(i64),
Neither(i64),
}
Those last two have me leaning to go ahead with methods on an Maybe we could I do wonder about For the behavior of the "fold", I wish there was a sane default which would let us make this a method, allowing us to pick a specific variant. A nasty idea is we could use tacit traits / marker trait specialization. People I approach about solving problems with tacit traits usually get turned off by the amount of generics complexity making it less approachable.
This does sadden me but between |
Thanks for the tacit traits article! I will give it a try.
Do you mean expression().prefix().infix().postfix() // or
expr = expression();
expr.prefix(dispatch!{...}); // or
struct Infix(i64, Assoc, fn(...) -> ...);
expr.infix(Infix(dispatch{...})) ? |
expression(digit1.try_map(|d: &str| d.parse::<i32>()))
.prefix(dispatch! {any;
'+' => Prefix(9, |_: &mut _, a| Ok(a)),
'-' => Prefix(9, |_: &mut _, a: i32| Ok(-a)),
_ => fail
})
.infix(dispatch! {any;
'+' => Infix(5, Assoc::Left, |_: &mut _, a, b| Ok(a + b)),
'-' => Infix(5, Assoc::Left, |_: &mut _, a, b| Ok(a - b)),
'*' => Infix(7, Assoc::Left, |_: &mut _, a, b| Ok(a * b)),
'/' => Infix(7, Assoc::Left, |_: &mut _, a, b| Ok(a / b)),
'%' => Infix(7, Assoc::Left, |_: &mut _, a, b| Ok(a % b)),
'^' => Infix(9, Assoc::Right, |_: &mut _, a, b| Ok(a ^ b)),
_ => fail
})
.parse_next(i)
Maybe expression(digit1.try_map(|d: &str| d.parse::<i32>()))
.prefix(dispatch! {any;
'+' => Prefix(9, |_: &mut _, a| Ok(a)),
'-' => Prefix(9, |_: &mut _, a: i32| Ok(-a)),
_ => fail
})
.infix(dispatch! {any;
'+' => Infix::Left(5, |_: &mut _, a, b| Ok(a + b)),
'-' => Infix::Left(5, |_: &mut _, a, b| Ok(a - b)),
'*' => Infix::Left(7, |_: &mut _, a, b| Ok(a * b)),
'/' => Infix::Left(7, |_: &mut _, a, b| Ok(a / b)),
'%' => Infix::Left(7, |_: &mut _, a, b| Ok(a % b)),
'^' => Infix::Right(9, |_: &mut _, a, b| Ok(a ^ b)),
_ => fail
})
.parse_next(i)
Seeing the builder being used, makes me also wonder about operand(digit1.try_map(|d: &str| d.parse::<i32>()))
.prefix(dispatch! {any;
'+' => Prefix(9, |_: &mut _, a| Ok(a)),
'-' => Prefix(9, |_: &mut _, a: i32| Ok(-a)),
_ => fail
})
.infix(dispatch! {any;
'+' => Infix::Left(5, |_: &mut _, a, b| Ok(a + b)),
'-' => Infix::Left(5, |_: &mut _, a, b| Ok(a - b)),
'*' => Infix::Left(7, |_: &mut _, a, b| Ok(a * b)),
'/' => Infix::Left(7, |_: &mut _, a, b| Ok(a / b)),
'%' => Infix::Left(7, |_: &mut _, a, b| Ok(a % b)),
'^' => Infix::Right(9, |_: &mut _, a, b| Ok(a ^ b)),
_ => fail
})
.parse_next(i)
|
@39555 of course in all of this, feel free to share your opinions. If you think we should go in a different direction, I would love to hear your thoughts! |
The |
My experiments so far. I managed to get impl expression(0, digit1.parse_to::<i32>())
.prefix(dispatch! {any;
// Cannot implement with tacit traits because the arguments of the closure are not recognized
// when coercing to fn() without explicitly specifying the turbofish:
// multiple applicable items in scope
// candidate #1 is defined in an impl for the type `UnaryFn<fn(&mut I, O) -> PResult<O, E>>`
// candidate #2 is defined in an impl for the type `UnaryFn<fn(O) -> O>`
// Same with `UnaryFn<O>` or UnaryFn<(O,)>
// '-' => UnaryFn::new(3, |_: &mut _, a: i32| Ok(-a)), // <- Error
'+' => UnaryFn::<_>::new(3, |a: i32| -a),
_ => fail
})
.infix(dispatch! {any;
'+' => (5, (|_: &mut _, a, b| Ok(a + b)) as _),
'-' => (5, (|_: &mut _, a, b| Ok(a - b)) as _),
'*' => (7, (|_: &mut _, a, b| Ok(a * b)) as _),
'/' => (7, (|_: &mut _, a, b| Ok(a / b)) as _),
'%' => (7, (|_: &mut _, a, b| Ok(a % b)) as _),
'^' => (9, Assoc::Right, (|_: &mut _, a, b| Ok(a ^ b)) as _),
_ => fail
})
.parse_next(i)
impl<I: Stream, O, E: ParserError<I>> Parser<I, Infix<I, O, E>, E>
for (i64, fn(&mut I, O, O) -> PResult<O, E>)
{
fn parse_next(&mut self, input: &mut I) -> PResult<Infix<I, O, E>, E> {
empty
.value(Infix(self.0, Assoc::Left, self.1))
.parse_next(input)
}
}
impl<I: Stream, O, E: ParserError<I>> Parser<I, Infix<I, O, E>, E>
for (i64, Assoc, fn(&mut I, O, O) -> PResult<O, E>)
{
fn parse_next(&mut self, input: &mut I) -> PResult<Infix<I, O, E>, E> {
empty.value(Infix(self.0, self.1, self.2)).parse_next(input)
}
} |
Hmm, I feel like handling the two tuples like that will be confusing to users. The more layers through generics you go, the harder it is. This is the reason I'm hesitant to use tacit traits. I know chumsky documents each signature on each function that accepts them which helps. Does |
Works great #614 (comment) use Infix::*;
expression(0, digit1.parse_to::<i32>())
.prefix(dispatch! {any;
'+' => Prefix(12, |_, a| Ok(a)),
'-' => Prefix(12, |_, a: i32| Ok(-a)),
_ => fail
})
.infix(dispatch! {any;
'+' => Left(5, |_, a, b| Ok(a + b)),
'-' => Left(5, |_, a, b| Ok(a - b)),
'*' => Left(7, |_, a, b| Ok(a * b)),
'/' => Left(7, |_, a, b| Ok(a / b)),
'%' => Left(7, |_, a, b| Ok(a % b)),
'^' => Right(9, |_, a, b| Ok(a ^ b)),
_ => fail
})
.parse_next(i) |
I feel like I've been driving the API conversation at this point. Any thoughts of what could be better or done differently? |
No. I tried what I wanted in #131 (comment). I think the current solution is much closer to the ideal than all previous ones. Maybe I will find something interesting over the weekend. |
@epage What is your opinion about the api with one big |
That would simplify the API and keep it consistent with other parts. However, it would make it so each operator parser must return the same value which could make some of the nested parsing done with It also requires you re- |
rereading the nom issue I found ilonachan's ideas quite interesting rust-bakery/nom#1362 (comment). A parser with only 2 arguments: def expression_parser() -> impl Parser<&str, i32, ContextError> {
expression(
dispatch!{any;
// prefix
'-' => |i: &mut &str| {
let op = parser.parse_next(i)?;
Ok(-op)
},
'+' => |i: &mut &str| {
let op = parser.parse_next(i)?;
Ok(op)
},
// operand
_ => digit1.parse_to::<i32>()
},
move |a: i32| {
dispatch!{any;
// postfix
'!' => |_| {
Ok(factorial(a))
},
'?' => |i: &mut &str| {
let (a, b) = separated_pair(parser, ':', parser).parse_next(i)?;
...
},
// infix
'+' => |i: &mut &str| {
let b = parser.parse_next(i)?;
Ok(a + b)
},
_ => digit1.parse_to::<i32>()
},
}
)
}
|
Please complete the following tasks
winnow version
0.2.0
Describe your use case
This would be a performant way of handling precedence in parsing
See zesterer/chumsky#51
Describe the solution you'd like
See
Alternatives, if applicable
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered: