Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Divergences with KDE on the interpretation of some syntax rules #170

Closed
jonathanpoelen opened this issue Jul 19, 2023 · 8 comments
Closed

Comments

@jonathanpoelen
Copy link

jonathanpoelen commented Jul 19, 2023

Hi, there are some discrepancies on the interpretation of some rules like <Int>. Original bug here: https://bugs.kde.org/show_bug.cgi?id=468394

Skylighting doesn't detect 12 in 12L as an integer. Here's what kate does:

$ kate-syntax-highlighter --output-format=ansi256colors --syntax-trace=format --stdin -s R
12
|
Int
12L
| |
Int
  |
  Normal Text

<Int>, <Float>, <HlCHex> and <HlCOct> check the delimiter on the left, but not on the right. I think the behavior diverges there.

For #156, the issue is related to line 802 of prolog.xml (<DetectIdentifier />), which doesn't specify an attribute and uses the context attribute (Syntax Error). However, we need to take the one from the context which includes, line 654 <IncludeRules context="quoted_last"/>) in the context <context name="dq" lineEndContext="#stay" attribute="&quot;double-quoted&quot;".

I note that #43 and #86 are fixed upstream and in skylighting.

@jgm
Copy link
Owner

jgm commented Jul 19, 2023

Is the L only allowed as a modifier on an Int, or also on Float, HlCHex, HlCOct?

@jonathanpoelen
Copy link
Author

For all 4, but not specially L, anything after the digits is ignored in these rules.

@jgm
Copy link
Owner

jgm commented Jul 19, 2023

I'm still confused about this.
Over at the KDE documentation it says this about Int:

Detect an integer number (as the regular expression: \b[0-9]+).

Nothing there indicates that the rule is supposed to consume any characters after the digits...

@jonathanpoelen
Copy link
Author

I probably misspoke. When I say ignored, I mean that the rule stops its evaluation and stops consuming, nothing more. From my point of view and based on -T flag, skylighting does the equivalent of \b[0-9]+\b (the second \b being too much). Also, it's not explained - or very implicitly - but \b is a fusion of deliminators and local deliminators (not a strict equivalent of regex).

@npearlmu
Copy link

npearlmu commented Jul 19, 2023

I'm the original reporter of issue #468394 that jonathanpoelen linked to (thanks, Jonathan, for opening this here). Just to (hopefully?) clarify: The KDE Int rule does seem to behave as in the documentation (ignoring whatever might follow a sequence of digits), so in KDE, it correctly tags the digits from a sequence like 12L and ignores the L. But (it seems) that same Int rule fails to tag the digits when it's interpreted through skylighting (at least as used by pandoc, specifically with the r.xml syntax highlighting file). I couldn't test elsewhere; but replacing the Int rule with a corresponding RegExpr rule (including the KDE-documentation-suggested regex) worked fine, tagging the digits and leaving following material (the L, etc.) for other rules. The Int rule does work as expected in skylighting/pandoc when there's not an alpha character immediately after the digits.

@jgm
Copy link
Owner

jgm commented Jul 19, 2023

Thanks for the clarifications.

@jgm
Copy link
Owner

jgm commented Jul 19, 2023

Yes we have

pDec :: A.Parser ()
pDec = do
  mbMinus
  _ <- A.takeWhile1 (A.inClass "0-9")
  guardWordBoundary

and the guardWordBoundary here is preventing this from being parsed as an int.

Can I just remove that? KDE will parse 12hello as an integer followed by other stuff?

@jonathanpoelen
Copy link
Author

Can I just remove that? KDE will parse 12hello as an integer followed by other stuff?

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants