Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add identifier like Unquoted Strings. #62

Closed
wants to merge 1 commit into from
Closed

Conversation

tef
Copy link

@tef tef commented Feb 24, 2013

Allow identifiers as a special low-effort string. Make foo and "foo" the same. Make key and keygroup names be strings.

i.e

[foo.baz]
key = bar

and

["foo"."baz"]
"key" = "bar"

Are the same thing.

Unquoted strings cannot have dots, square brackets or spaces in, or start with a number.

Note: ["foo.bar"] isn't the same as [foo.bar]. Dots inside quotes do not count.

Rationale:

  • Easier to tokenize TOML files.
  • Grammar is simpler
  • Keygroups can have .'s in the name, but you have to quote them.

Additionally, eliminates a whole slew of stupid edge cases in current spec, i.e

'= foo'  - an empty key
'[]' - an empty keygroup
'[.]' - two nested empty named keygroups
'a = foo = bar' - under current spec, could be parsed as 'a = foo', 'bar'

This should make the format easier to write a parser for, and lets people have strings as keys.

Allow identifiers as a special low-effort string.

It makes the format easier to parse (well, tokenise), less weird edge cases about what can appear in keygroup/key names.
It's backwards compatible, and lets people deserialize things where they have been bad and put a . in the name.

i.e. ["foo.bar"] isn't [foo.bar]
@haileys
Copy link
Contributor

haileys commented Feb 24, 2013

What about barewords with the same name as an existing key?

@tef
Copy link
Author

tef commented Feb 24, 2013

Should have the same behaviour as a duplicate key, which afaik is to break.

i.e.

[foo]
key = "value"
key = "value2"

is just as broken as

[foo]
"key" = "value"
"key" = "value2"

@haileys
Copy link
Contributor

haileys commented Feb 24, 2013

I mean something like this:

[foo]
foo = bar
bar = foo

@tef
Copy link
Author

tef commented Feb 24, 2013

Barewords was a bad choice of word. I originally called them unquoted strings.

[foo]
foo = bar
bar = foo

Should be the same as writing

[foo]
foo = "bar"
bar = "foo"

It isn't really anything other than a way to write simple strings without quotes. So, the following should be equivalent too:

["foo"]
"foo" = "bar"
"bar" = "foo"

@haileys
Copy link
Contributor

haileys commented Feb 24, 2013

Unfortunately this will break my implementation

@tef
Copy link
Author

tef commented Feb 24, 2013

If it breaks the implementation where you run a regex over it and eval the output, I'm even more for this change :trollface:

p.s. i've updated the first comment to be clearer and unfuck my markdown errors.

```

You can indent keys and their values as much as you like. Tabs or spaces. Knock
yourself out. Why, you ask? Because you can have nested hashes. Snap.

Nested hashes are denoted by key groups with dots in them. Name your key groups
whatever crap you please, just don't use a dot. Dot is reserved. OBEY.
Nested hashes are just keygroups with more than one string seperated by a dot (.).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/seperated/separated/

@mrflip
Copy link

mrflip commented Feb 24, 2013

-1 This feels like creeping yaml-ization.

Regarding quotes-less strings as values: "There should only be one way to do anything" means that strings should require quotes.

Regarding quoted strings in keys: Rather than figuring out how to robustly include arbitrary characters in keys, arbitrary characters should be banned. If we say that a key is one or more letters, numbers, and underscores, and must not start with a number; and that a key group is one or more keys, joined by dots, that would also address the ambiguities you mentioned.

@cbetta
Copy link

cbetta commented Feb 24, 2013

-1, agree with @mrflip

@pygy
Copy link
Contributor

pygy commented Feb 24, 2013

I disagree with this proposition. I think that each type of thing should have at most one representation (with the possible exception of hex floats, which add functionality).

Let keys be bare words (just forbid the empty key), and mandate quotes around string values.

@tef
Copy link
Author

tef commented Feb 24, 2013

@mrflip "There should be only one way to do anything". Does this include translating hashes back into TOML?

For ex, right now: {"foo":1}, {"foo=bar":{}} can be deserializied. {"foo.1":{}} can't, {"foo=bar":1} can't.

With this change, if you can represent it as a string, you can use it as a key name, keygroup name, or string. Removing three different ways to specify a string value depending on position, and replaces them with two simpler ones.

Your counter proposal also ends up with two rules for strings, but doesn't indicate how to handle keys with spaces in (which is currently supported), or key group names with dots in (currently unsupported). It would add the same complexity to implementation, but with none of the ability to handle the deserialization cases above, except {'foo":1}.

I do agree with your pull request to make key names, key group names follow one set of rules for allowable characters, but I do not think letting keys and strings be interchangeable counts 'YAMLization'. I think i'd need to be adding special cases rather than getting rid of them.

@mrflip
Copy link

mrflip commented Feb 24, 2013

@tef My proposal hinges on banning keys with spaces, dots or anything special in them. If so, then there's only one rule for strings, because keys are not strings: they can only be \w+ (starting with non-digit), and key groups are one or more keys, connected by dots, in [...] square braces.

The argument for this hinges on the presumption this is a configuration file format: primitives must be comprehensive, but the overall data structure should be locked the hell down. My proposal is to use pretty much the same rules ruby requires with its new-school symbol hash shorthand.

@tef
Copy link
Author

tef commented Feb 24, 2013

I think we're both in agreement in terms of adding identifiers to the format, and using them for key, key group names, but I'm still a bit of a weenie, in that I think strings should be valid too.

(The older issue #27 suggests your presumption about config over interchange may be right, but I don't see much added complexity by letting string keys be strings).

I keep saying they're strings, because it's implied by the spec when equivalent JSON snippets are shown alongside. I have a suspicion that in the wild, some strings have dots in them, and they are used as hash keys.

Don't get me wrong, I'm not suggesting that this should be a subset of JSON (someone else can argue for that, and that is yamlization...).

@tnm
Copy link
Contributor

tnm commented Feb 26, 2013

I'd prefer to make the types as unambiguous and "one-way" as possible, so I can't support having both quoted and unquoted strings. I do see the point about quoted keys, but I think that would only be valuable if keys could be anything (a hash, or whatever).

@kelvinst
Copy link

-1, totally agree with @mrflip

@mojombo
Copy link
Member

mojombo commented Sep 22, 2013

Thanks for the suggestion! However, I think this adds too much ambiguity to TOML. This can be evidenced by having to say that true and false are special cased. I want to avoid special cases as much as possible in TOML. You should be able to tell that a string is a string no matter what.

@mojombo mojombo closed this Sep 22, 2013
@rossipedia
Copy link
Contributor

@mojombo 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants