You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm writing a simple assembly-like language and I'm using Lark to parse it's AST, but I'm having trouble with ambiguity. Here's a boiled-down MinRe:
grammar="""program: statement* // zero or more statements?statement: instructioninstruction: pneumonic [parameter]* // match zero or more expected parameterspneumonic: CNAMEparameter: ESCAPED_STRING | INTSPACING: /[ \t\f]+/ %ignore SPACING%import common.CNAME%import common.ESCAPED_STRING%import common.INT"""asm_parser=Lark(grammar, start="program", ambiguity="explicit")
example="""PNEUMONIC "text" 10"""syntax_tree=asm_parser.parse(example)
print(syntax_tree.pretty())
The above produces this output, which shows there's some phantom parameter between the pneumonic and the explicit strings and ints:
At first, I thought this was the parser matching the whitespace between the pneumonic and the first parameter, but removing this whitespace doesn't seem to help (especially since this whitespace is seemingly ignored anyway):
example="""PNEUMONIC"text" 10"""# produces the same output as above
What did work however was to remove the brackets within the instruction rule:
instruction: pneumonic parameter*
which resolves the ambiguity:
program
instruction
pneumonic PNEUMONIC
parameter "text"
parameter 10
From what I understand the brackets indicate an "expected value" and the parser supplies None when nothing is found, but what is the parser actually matching in-between the pneumonic and the first parameter in this case?
The text was updated successfully, but these errors were encountered:
Which means that repeating empty rules is probably creating this pattern.
erezsh
changed the title
Phantom captures appearing when using brackets?
Empty matches appearing unnecessarily when repeating empty rules ambiguously
Jul 30, 2023
Probably a duplicate of #1283. The solution is to make sure that you don't have ambiguites, and ideally you always want to use parser='lalr'. In this case if you just use parameter* (no brackets, since that is a duplication of the empty match possibility), it works and is even lalr compatible.
I'm writing a simple assembly-like language and I'm using Lark to parse it's AST, but I'm having trouble with ambiguity. Here's a boiled-down MinRe:
The above produces this output, which shows there's some phantom parameter between the pneumonic and the explicit strings and ints:
At first, I thought this was the parser matching the whitespace between the pneumonic and the first parameter, but removing this whitespace doesn't seem to help (especially since this whitespace is seemingly ignored anyway):
What did work however was to remove the brackets within the
instruction
rule:which resolves the ambiguity:
From what I understand the brackets indicate an "expected value" and the parser supplies
None
when nothing is found, but what is the parser actually matching in-between the pneumonic and the first parameter in this case?The text was updated successfully, but these errors were encountered: