-
-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
self-expression: Add KDL grammar in KDL #475
Comments
It's not a "blob of text", it's a grammar description in a well-known grammar description language. Writing the grammar "in KDL" would require us to define a new grammar description language using KDL as the syntax base, which would be a project on its own. I don't see great value in that. It would also require anyone reading the grammar that defines KDL to already know the grammar of KDL. (Which, tbf, is in broad strokes very simple, but still.) (I'll also note that what you have written there, while technically a valid KDL document, isn't a meaningful KDL document. It treats section comments as nodes, and requires you to examine the exact order of properties and attributes, since a grammar term is a property and the following attributes up to the next property; for example, "keyword" is defined by its value ( |
Which one and why isn't it marked as such in the spec.md? The description implies ABNF, but when I used ABNF syntax highlighter, it failed since apparently (the bigger issue is that it's not really a formal grammar despite the superficial syntax familiarity, so you can't use it as a standalone description of the language, but have to constantly cross-reference description and tests, but that's a separate ignored issue #64)
The great value is that it's not a grey blob of text, but a syntax-highlighted doc, which improves readability
Not really, you don't need to understand that
it treats sections as nodes, comments are args, but more imporantly
yes, it requires (and is designed for) READING, not programmatic parsing
so it is meaningful for reading, hence instead of
|
If the goal is syntax highlighting, I would much prefer we just go ahead and rewrite as ABNF, which would make it more idiomatic for #461 anyway. While this self-referential grammar looks, on the surface, like a regular grammar, it has no good semantic meaning in KDL at all. It is very superficially using KDL syntax, while yielding a completely nonsensical document if parsed (properties and arguments have no mutual ordering requirement so all the “productions” exist on completely different planes than subsequent arguments). I’m also not sure I like how, even if it did produce a valid KDL doc, the presentation/formatting is taking advantage of some kinda clever things rather than formatting the text idiomatically. |
The other goal was compactness (also structure/vis alignment)
I don't get it, you both bring it up as an issue without explaining what the issue is. Are you parsing "ABNF" as a data document for use anywhere? No Then what are you planning to do with a parsed grammar that you need it to make "formal sense"?
It should be viewed as a testament to KDL's expressive power that it can generate clean docs using such tricks |
The whole problem is that it’s NOT generating a sensible doc. The data is semantically lossy. Some parsers might preserve order, but they would be doing so outside the bounds of KDL proper. And, as a matter of fact, ABNF can be used for parser generators. |
But you still haven't said what the problem is! What are your use cases for a "sensible doc" and why is it important that the better formatted doc must conform to them while the current non-formatted doc should not?
In theory. In practice this grammar can't and isn't. And likely won't ever be (and I've actually tried it for syntax generation way back with v1 since to me it's the most sensible way, but it didn't work due to both deficiencies in the grammar and the generator). But also in theory the KDL version could be restructured into a "sensible doc", but it would likely cost in worse readability, so with no value identified it doesn't make sense to bear that cost. After all, this issue doesn't prescribe any specific format, the screenshots/linked PR are just an example of what I used for myself because working referencing a poorly structured poorly formatted spec was too painful |
It’s confusing to me to read a KDL document which, superficially, shares syntax with KDL, but is semantically meaningless as a KDL document. If we were to define the grammar in terms of KDL itself, I would like to see the resulting document both formatted idiomatically, and for the document itself to meaningfully, semantically describe the grammar using idiomatic structuring of the productions |
Yeah, given your grammar fragment:
This is a P node, containing several arguments and properties. While the KDL data model treats the relative order of arguments as meaningful, it treats properties as unordered, and doesn't define a canonical ordering for arguments and properties relative to each other. It also treats all the string syntaxes as equivalent, so ident strings and quoted strings aren't required to be distinguished. That is, it would be valid, per the data model, to reformat that node to:
And this is, clearly, meaningless. This is what Kat means by your example being "meaningless as a KDL document" - it's a set of KDL constructs that happen to syntax-highlight well, but don't actually form a meaningful KDL document. It would be like a document written "in JSON" that interpreted [] differently based on whether it was formatted across several lines or compactly on one line. If we did want to do something like this, it would be with meaningful node structures, something like:
Here, sections are nodes with child nodes defining productions. The production's children are the alternatives, with node names defining the type of each value. In less trivial examples that require, say, sequences of productions, or modifiers, they'd be their own nodes with children, like:
This sort of structure meaningfully follows the KDL data model; reformatting won't change its meaning. It's also just the ABNF's parse-tree, so it's more verbose and often harder to read (but is potentially more clear in complex situations, which carries its own value). |
Indeed it is, which is runs counter to my main goal with this suggestion - to have a single structured and easily readable reference doc (and in your extreme example of having to name every literal as And since you still haven't identified a single use case for the data model being "valid" (this doc is meant to be read as a reference by humans creating their parsers/documents), it's a pure downside
Hard to say for certain without an example situation, but potentiality this is also achieved via extra verbosity/comments, not necessarily via a valid data model, which I think won't help you see that the pre-closing dedent is the primary one (do you need variables in grammar for this besides terms and nts?) and that equality is strict re the composition of spaces |
If it is just about highlighting, we could bake the highlighting into the HTML (or indeed use ABNF). I don't see how it's relevant that we could format the grammar in a way that happens to be parsable as KDL. |
HTML is adding a huge unergonomic (e.g., for basic things like search, also no edit) dependency - a browser. ABNF adds a little extra dependency outside the KDL ecosystem. The relevant part here is self-containment. And there is a tiny benefit that you could use grammar as a test document itself (helped me catch a couple of bugs). Maybe you could also have a cleaner doc (like without Although ABNF in theory could give you LSP-like convenience of in-place rule popups/jumping to rule definition, though I think these exist in practice. So its only practical benefit currently is that it can highlight |
I could see something like this perhaps.
Note that you should distinguish between strings that refer to other rules and literal strings, especially since the grammar is normative. Technically
Hence I think a specialized format is preferable. |
Or you can just add a grammar rule: if it's not obvious from the bare string's value or grammar's description, bare string refers to another rule if such a rule exists left of
You can check the example in the linked PR, which encodes everything in a more succinct way than the current one, but it should be viewed with syntax highlighter so that you can see you don't need to dirty every rule with |
I think this discussion has run its course. We are not interested in formatting our grammar as a meaningless KDL document solely because KDL syntax highlighting happens to look reasonable on it. If you want a highlighted grammar instead of plain text, that is something we'd be willing to take, with HTML colorizing, and could put that into the spec document. |
Why would you force everyone to use a browser to see colors??? And you actually get HTML colorizing from the grammar being a valid KDL document as far as I understand, at least https://kdl.dev/play/ playground highlights different elements differently. So the only thing preventing your from having your HTML colorizing is you being stuck on a meaningless requirement of a "meaningful" data structure |
Currently the grammar is only available as an unformatted blob of text in the markdown document.
In addition it would be great if KDL could express itself and you could read a properly formatted/highlighted grammar document
So that instead of this gray uniformity
you could have something like this
The text was updated successfully, but these errors were encountered: