-
-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove '#' from legal characters in a bare identifier #204
Conversation
* Use same order from grammar definition
/ is not a valid identifier character. This updates unusual_chars_in_bare_id.kdl to only include unusual but valid identifier characters, and adds separate invalid input tests for non-identifier characters. Fixes kdl-org#166
… id test (kdl-org#181) Co-authored-by: Hannah Kolbeck <[email protected]>
So KQL current picks out two more symbols that are currently legal identifiers: See https://github.com/kdl-org/kdl/blob/main/QUERY-SPEC.md#selectors Should we use those, or reserve something else? |
Do you for-see adding general regex support to string matchers? The natural operator seems like EDIT: Haha just kidding, I forgot |
|
oh I didn't realize that. That's right!
I like this a lot, but thinking about verbosity, I think it's must more straightforward to say I'm also pretty in favor of removing |
I don't think I'm in favor of removing |
I'm also kinda wanting to bring back |
if we do that, URLs would become legal identifiers in most cases? I'm trying to think of when they wouldn't be, but maybe "URL encodable" might be the target we want fo KDL 2.0? |
https://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid/1547940#1547940 Or maybe not. That means we would need to include |
I feel like
Is that a commented line or a node named |
oh right I forgot about comments haha |
Should we make a new branch and make this PR against that, btw? Like a 2.0 tracking branch that has 1.0-breaking changes in it? |
boom, done :) you might need to rebase onto it tho I think I did something bad. |
Yes it is, so long as the second character isn't a digit; the grammar just prevents confusion with numbers. It's allowed by itself as a bare-ident. |
Fwiw, the CSSWG (aka, me, in this case) tried to add |
So how about making the final list of removed characters just |
I'm fine with that. (We could keep |
ugh but + is so useful, esp as a first char. I wonder if there's something else we could use for "sibling"/"next"? Also, should we pre-emptively ban |
after thinking about it, I don't know if we should remove any extra characters beyond I think that'll take care of it, and we can just merge this branch. What do y'all think? |
Yeah, that sounds really good. It leaves KQL's evolution much more open without tying it so intimately to KDL's precise syntax model, and avoids restricting KDL itself unnecessarily. (Recall that CSS's implicit descendant combinator comes from the very first version, when there weren't any combinators at all. Child/next-sibling/general-sibling are all later inventions layered on top. We don't need to repeat that.) |
Re: It's certainly slightly clumsier than a simple blanket prohibition, but it also means we maintain maximum ident syntax without requiring excessive lookahead. |
I'm wavering. It's definitely clearer and probably easier to implement with parser generators to simply ban |
Yeah, so just outlawing the numberish or rawstringish starts are easy for hand-rolled parsers, but for the grammar itself, let's see...
I suspect the latter is the most compatible with parser gens, since the subtractions are single-character set subtractions, rather than sequence subtractions like the |
Rather than banning `#` in idents altogether, as #204 does, this takes the same approach as our number-like handling, and just bans idents from looking like a raw string *in their first two characters*: just as an ident can't start with `digit` or `sign digit`, it now can't start with `r#` or `r"` either. (That last isn't an ident, but it prevents a naive parser not using first-wins collision resolution from reading it as an ident `r` followed by junk.) It also refactors the bare-ident production a bit to make the structure clearer, as it was already getting a bit messy just trying to ban number-likes.
This is done in kdl-v2 now, so I'm closing this one. |
Based on discussion here: #200