Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document divergences from the spec #8

Open
Lucretiel opened this issue Sep 21, 2021 · 5 comments
Open

Document divergences from the spec #8

Lucretiel opened this issue Sep 21, 2021 · 5 comments
Labels
documentation Improvements or additions to documentation

Comments

@Lucretiel
Copy link
Owner

Lucretiel commented Sep 21, 2021

There are a few cases where we're making a conscious choice to divert from the KDL spec. These should be documented near the top. Currently this includes:

  • Many entities in KDL are defined in terms of code points (for instance, KDL identifiers are made up of "any code point except for ...". Rust strings and char are sequences of Unicode Scalar Values, rather than Code Points. A Scalar Value is a slight subset of a Code Point that just excludes low and high surrogates. In practice we don't expect this will cause any issues.
  • KDL calls for duplicate property keys to be last-key-wins, and other keys ignored. We instead will use the ordinary serde map handling for these cases (ie, next_key_seed will always return the next key, without any consideration for duplicates).We prefer the flexibility offered by this, since Deserialize types have the opportunity to define their own behavior when receiving duplicate keys. HashMap, for instance, uses the last-key-wins strategy, while structs with derive(Deserialize) will fail with an error on a duplicate key.
@Lucretiel Lucretiel added the documentation Improvements or additions to documentation label Sep 21, 2021
@tbmreza
Copy link

tbmreza commented Sep 23, 2021

What is the main selling point of kaydle implementation? (or does it need one)

@Lucretiel
Copy link
Owner Author

serde is definitely the main selling point.

@CAD97
Copy link

CAD97 commented Sep 27, 2021

Many entities in KDL are defined in terms of code points ... Rust strings and char are sequences of Unicode Scalar Values

See also kdl-org/kdl#207

I argue that while the spec refers to "code points," the top-level requirement for the document to be UTF-8 encoded eliminates the possibility for surrogates to show up, as well-formed UTF-8 is an encoded sequence of USV and MUST NOT include surrogates, unpaired nor paired. I think the only location surrogate code points may actually show up per the spec is in \u{...} escapes, and the requirement that "Strings MUST be represented as UTF-8 values" may also prevent codepoints which are not USV in that location as well.

Or IOW, I think this one may formally be a non-issue.

@Lucretiel
Copy link
Owner Author

well-formed UTF-8 is an encoded sequence of USV and MUST NOT include surrogates, unpaired nor paired

Oh no kidding? This would actually be news to me, that's interesting.

@Lucretiel
Copy link
Owner Author

Thinking more about the duplicate property keys thing. While I like the flexibility offered by leaning into the serde model, I'm somewhat unhappy that this could cause intentionally valid KDL documents to be rejected (for instance, a configuration dumping tool could deliberately make use of the last-key-wins behavior). I've had an idea for how to implement this in the parse (by adding a lookahead to NodeProcessor::next_event), so probably I'll make it runtime configurable (since opting into the conforming behavior incurs a performance penalty due to the lookahead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants