-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add syntax for tuples and re-enforce homogenous arrays. #154
Conversation
Awesome. 👍 One clarification I'd like to suggest: are tuples smaller than length 2 allowed? I propose that they not to be allowed. Tuples of length 1 carry no extra information than a bare value, and tuples of length 0 are a bit weird. (A unit type.) Also, I think it might be worth adding in invalid example based on the size of tuples: [ (1, 2), (1, 2, 3) ] # NOPE Which shows that tuples have length encoded into their type. |
I'd probably not allow tuples of length 1 for sanity reasons, although I suspect implementations would resolve the tuples fine. That said, incidentally, Python (which has first-class tuples) resolves an (apparently) syntactic 1-tuple (which is in fact, not a tuple) like this — >>> (1,2).__class__
<type 'tuple'>
>>> (1).__class__
<type 'int'>
>>> ("math").__class__
<type 'str'> |
That's because that's not a tuple actually. In Python >>> (1).__class__
<type 'int'>
>>> (1,).__class__
<type 'tuple'> There are advantages to allowing single-length tuples, the same as there are for single-length arrays, mainly that in code you can treat the configuration value as a sequence, regardless of how many elements it has |
@rossipedia — Aye, that's what I was getting at regarding the sanity reasons (clarified the comment). |
I'm not sure I understand what sanity reasons you mean. I'd think that requiring the type of a configuration element to change based on how many elements it has would make code working with that configuration element more complicated, and that's a Bad Thing™. |
Hm. Yeah the more I think about it, I suppose I don't see the harm in unit tuples, although I don't really see much usefulness in the config format. We might need to clarify the actual text since we say "They are represented by a comma separated list inside of parentheses." |
Don't forget about the empty tuple |
Well, since this isn't actually an execution language, we don't need to worry about +1 |
In some languages, the type of the type of a tuple depends not only on the number of elements, but als on their types. For example, in Julia: julia> isa((1,"e",3.4), (Int64,ASCIIString,Float64))
true
julia> isa((1,"e",3.4), (Any, Any, Any))
true
julia> isa((1,"e",3.4), Tuple)
true As you can see, it offers some leeway, and it also does for arrays: julia> [(1,"e",3.4)]
1-element (Int64,ASCIIString,Float64) Array:
(1,"e",3.4)
julia> ar = Array((Any,Any,Any),0)
0-element (Any,Any,Any) Array
julia> push!(ar,(2,"ER",4.5))
1-element (Any,Any,Any) Array:
(2,"ER",4.5) How strict do you want to be regarding type homogeneity? |
@pygy Indeed. This is clarified in the commit provided by @mojombo :
See #131 for more details. Maybe the spec should be clearer, but I thought the above example was enough.
The point is to make arrays completely homogeneous with respect to the type of value it contains. The point of adding tuples is to provide a way to create arrays with non-homogeneous data. i.e., an Note that TOML has no explicit But TOML doesn't have to be strict like this for all types. Of particular note are anonymous hashes in #50. |
Indeed, I had missed that part. |
Should tuples behave like arrays regarding white space and comments? |
@pygy Absolutely. |
Yay for yet another type in the minimal language spec.
Don't add tuples, this complicates things. Instead, minimize. Proclaim that homogeneous lists is a thing which application should enforce, not TOML. Parsers can implement it as a "strict" variant, whatever. Tuples complicate things because you have to decide on all sorts of corner cases which will confuse the users:
And there's this peculiar idea to forbid tuples shorter than 2 elements. This makes auto-generating values tricky and will confuse users. There is also no need for it since parentheses aren't valid in other contexts. And people will hack around it by using lists instead. But they won't be able to since a list is homogeneous. Not before long you'll start seeing I would minimize instead. Keep it simple. And obvious. |
I agree with @ambv on this one. Config files are unlikely to be ported as-is from app to app (and thus from language to language), and type strictness will confuse the least technically inclined, and annoy others. |
Then those parsers are not compliant with the spec.
None of the things you listed are "corner" cases. 1 is invalid because the types of tuples are different. 2 is valid because the length of a list does not affect its type. I don't understand 3; tuples and arrays are two different kinds of data. The whole point is that they serve two different purposes: arrays for homogeneous data and tuples for heterogeneous data. 4 is clarified in this proposal; tuples are ordered types, which means they are typed by the order of their component types.
Tuples shorter than 2 elements don't have to be banned, but I suggested it because they are peculiar things. And there's no reason to hack around such things. Tuples of 1 element have the same utility as just the bare element from the point of view of the type.
Did you read this proposal? That is allowed in the spec right now. If this proposal is accepted, then that thing won't be allowed. The whole point of this proposal is to type arrays by the type of value they contain. Similarly for tuples. You talk about confusing the user, but such things can be avoided by a parser that gives helpful error messages. A parser that doesn't give helpful error messages will confuse the user regardless of the spec. Moreover, your comment doesn't really address the problem trying to be solved by this proposal: provide a way to write well-typed structured data and to allow static languages to easily play along. Making static languages use parsers that only support fully homogeneous arrays makes them non-compliant with the spec and strictly less useful, since tuples won't exist as a way to make heterogeneous data. This is consistent with one of the objectives of the spec:
|
TOML is a configuration file format. Your application will be using it to hold domain-specific information. What I'm saying is that array homogeneity is a domain-specific need. A parser might provide a strict option which enables that. Just as you'd validate whether TCP ports are between 1 - 65535. Such validation doesn't make your application not compliant to the TOML spec.
Yes, I get that. The confusion I referred to is user-side, precisely because you need to inform non-programmers that there's a difference between
I must have failed to read the proposal because there's nothing in it that suggests that. It would also help to explicitly name those languages. Do those languages provide a parser for JSON or XML? If a language can specify a tuple, it can also specify a non-typed array. |
I agree that's a valid question. Truthfully, I don't know whether people will be confused by such things or not. As I said, one hopes that if your application needs to support such users, then it will have appropriate error messages.
Any static and strongly typed language.
Of course they do. Look at the TOML implementation list right now. There are loads of parsers for strong and static languages. I didn't say that static languages can't handle TOML without this proposal; I said it would be easier. I will also re-emphasize that it is nice to have well-typed structured data in a configuration file. You may claim that this leads to user confusion, but it can also lead to preventing the user from typing malformed data. (This can be provided by the application, as you say, but I think it is a worthy enough goal for it to be included in the spec itself.) |
I agree with ambv here. Adding tuples is a sign that the original decision of enforcing homogenous arrays was wrong. I agree that this should be left as an application check after the toml file has been parsed. If your static language has difficulty parsing a mixed type array then it will be equally difficult to parse a tuple. This is a non-argument. Reduce the complexity in this minimal data interchange format by losing the tuple idea and allowing heterogenous arrays. |
Not at all. Static languages can represent tuples as an appropriate type (like, say, any particular construction of a product type), which is typically distinct from an array. It's not a non-argument, because an assumption of homogeneity or heterogeneity buys you stuff in a static language. It allows the programmer to choose an appropriate data type to represent the TOML data. If TOML only provides heterogeneous arrays, then you never get that homogeneity assumption which restricts your choices in a static language. The idea here is to push those assumptions into the spec. The result does not benefit dynamic languages, but it does not harm them either. (e.g., A dynamic language with heterogeneous arrays could represent arrays and tuples in TOML in precisely the same way.) The result does benefit static languages. It can also benefit the user by catching malformed data. And as ambv pointed out, it can also detract from the user by having both the |
And my argument is, the decision to represent something as an array should be made by the application. Parse it as your most forgiving type, and allow it to be cast out as the more restrictive type on processing. Your language's toml parser can have convenient methods to make this simple for the app dev. The application says: I want this section of data to be a homogenous array of values, toml-parser. Make it so. The parser slurps up the data permissively and then the non-parsing side of the tom-parser (the rendere... i'm lacking terminology here) re-casts the data as the type requested. My description here is vague and hand-wavy because I haven't fully considered the call interface. However, I still believe it is better to reduce the file-format complexity and make your tools smarter. For languages that natively support mixed type arrays, they have less smarts they need to build into their toml-parsers. I guess you could still implement the type-checking API into all toml-parsers, if you wanted to. |
I know how to do what you ask. In fact, I've already done it for the current spec. It works great. (It's a decent demonstration of Go's reflection facilities IMO.) But this isn't a win-win situation. Allowing the user to enforce homogeneous arrays means they lose out on the ability for mixed data in a well-typed manner. It's all still doable, but like I've said, inconvenient and possibly less safe depending on the language used.
I like simplicity. I've been an advocate for it on this issue tracker. But I also like safety and convenience.
The type checking does add a bit more complexity to the parser. But not too much IMO. It took me a couple extra hours to add it in on a separate branch. (But I had these grand plans from the beginning.) |
You misunderstood one piece of my thinking - I wasn't suggesting that homogeneity be an all or nothing affair. I was suggesting that, per key, the app dev could get the toml-parser to validate that a collection was indeed homogenous and return it in the most efficient data structure (for the language/situation) accordingly. hand waving ahead: config = toml.parse(file) or however it is you might achieve that in the real world. |
I didn't misunderstand. That's precisely how my parser operates. :-)
This is exactly the kind of thing I'd classify as inconvenient. Instead of type safety being baked into a parser (one-time effort), the type safety has to be redone in every client use of it. We are having the classic argument of where safety should live. I'm advocating for pushing some of it into the spec. It also makes working in a static language more convenient. |
But the problem with pushing it into the spec is that it complicates it and therefore the config files written in it. Those files can be touched by non-coders. My thinking is, leave those files as simple and intuitive as possible. Give your toml-parser an API that makes coercion and validation as simple as possible. Your coder knows about these things and is the right person for owning this responsibility. |
We are in agreement that adding another syntactic category will add complexity. But it must be evaluated as a trade off. The pros are more safety for all parsers, more safety against users typing malformed data and more convenience in static languages. The cons are more complexity in the spec/parser and user confusion. |
I really don't see the convenience argument. Surely it's okay for the app dev to explicitly enquire/demand of the toml-parser that a set of keys be homogenous. That homogeneity is an aspect of his specific application. Indeed, it is an aspect that may change over time. Version 2.0 might see the need to make some of those collections heterogenous. I look at this as a pyramid. Parser writers are generally more fastidious than app devs and they in turn more so than users. Put the responsibility of making the parser flexible, smart and a pleasure to use on the parser writers. Put the responsibility of ensuring type correctness and data validation on the app dev. Let the user blunder along with good error messages to guide them to safety and correctness. |
Imagine that TOML had only three data types: hashes, arrays and strings. Do you see how it is convenient to add integers, floats, bools and datetimes? In the same sense, but not the same magnitude, having real arrays and tuples is more convenient. It pushes type information and safety into the spec, and therefore doesn't require the client to have to verify the types themselves.
But you're missing trade-offs. Taken to the extreme, your only primitive data type in TOML would be a string. TOML includes some types because it moves safety and convenience into the parser. |
I remain unconvinced by this argument, and you're straw-manning by suggesting those extremes. I was not suggesting that we remove types from the toml spec. I was suggesting that we don't add the tuple type. All up, I favour a single heterogenous array type. I think we've articulated our opinions and perspectives well enough here for now. Let's see how it turns out. |
No. I'm not misrepresenting your argument. I'm merely trying to show that the decision is about a trade-off of safety, convenience and complexity, and not one of some pyramid of responsibility. Because invariably, the spec takes responsibility for some types. (Implying that app writers don't have full responsibility of types.) |
Edit: This was written offline (I'm on the road), and sent before I could check if it was still relevant,... and I don't have the time to read the whole thread right now. Sorry for the noise if it has been covered meanwhile.
On the other hand, enforcing type homogeneity in a dynamic language means
The real solution to this kind of problem is a schema validator. Type Rather than type strictness, I'd require compliant parsers to support some -- Pierre-Yves |
@pygy - The type checker is more work, but as I mentioned, I don't think it's much more work. It took me an hour or two to add myself (but I had grand plans from the beginning).
I think schema validation is a great idea. But as I've mentioned, sometimes safety is worth pushing into the spec. Also, enforcing homogeneity in arrays without tuples is much less expressive (which is why tuples go hand-in-hand with homogeneous arrays). |
With a schema validator, how about we lose arrays altogether and just have tuples? |
@dahu, and call them arrays? Voila! ;-) |
You got the sarcasm there then. ;-) |
Personally I fall on the side of strongly typed languages. I feel the comparison between tuples and arrays to be laughable. Here are two things to keep in mind. @dahu is right, there is definitely some enforcement that is the applications responsibility. At the same time, these kinds of explicit structures prevent accidental input errors, and I can back this up with a real world example. StarSector uses json to define it's ship files. Here's an example:
Before the tools were developed to place these bounds with a GUI, someone was damn fool enough to do this:
Which of course lead to the worlds first trans-dimentional clam! For the record, this error went undetected for 8 months. Two insertion errors blew things up in an undetectable way. So why didn't the author of the ship format enforce something like this?:
because he wrote starsector in java of course! (the tragedy keeps on coming right?) His parser simply didn't provide him an easy way to translate those arrays into simple points. So he did what was simplest for him. The proposal has it's advantages, most notably in how the markup would look:
And how simply it would parse into native structures or pojo's. |
Interesting example, @Ghoughpteighbteau :-) However, nothing here convinces me that we can't just keep a single simple syntax in the toml files, and provide the app dev with tools necessary to get where she wants to be. So, consider the equivalent data structure: bounds = [[-33.0,15.0] With an appropriate schema validator that ensured it was an array of 2-value arrays of floats. Or however you want to describe that. Imaginably, the schema might even have specs in it to control the casting... No, I don't like that, on reflection. That would not be as language neutral. Issues of casting should reside in the toml-parser for each language. Dynamic languages may not need any casting at all whereas the strongly typed languages might need a bit of a helping hand. So, to my original thinking: Let's assume Java (but that is not a strength, so I will hand-wave). The toml-parser there would slurp up (internally) the toml file into an array-of-arrays (or whatever best suits Java as the internal representation - perhaps that's a tuple type?) while checking the validating schema for correctness. Then the app-dev says, give me the bounds as an array of Points (or whatever actual type they should be). I believe it's the job of the toml-parser and the app-dev to cast the parsed form into the required form. At the file-parsing face, the toml-parser for all implementations would look fairly similar. It's only at the app-dev facing side of the toml-parser that extra casting code would be required for strongly typed languages. |
The utility of schema validators is not in question. Everyone knows that a schema can be used to artificially restrict values based on a number of criteria. Integer/float/datetime ranges, enumerations, array lengths, sum types, etc. The list goes on. Saying that "it can be pushed into a schema validator" isn't relevant here. What's relevant is the balance we all want to strike between complexity, safety and convenience in the specification.
There's more to the story, since you support a variety of types included in the TOML specification, which means that the client of a TOML parser is not completely responsible for casting types. The parser takes some responsibility from the spec. At this point, the typing implications of this proposal have been made clear. I believe that further discussion should be based on the trade offs that I and others have described. The ones that I can currently think of are:
Some of these points have already been discussed, and I'm sure there are others I've missed. But I believe an evaluation of these trade offs is the appropriate way to decide whether this proposal should be accepted or not. Therefore, the discussion should be focused on those points. Things like "it can be in the schema validator" are responses to any proposal that adds safety or convenience via types into the specification, including already existing types. |
I just realized that parser have to handle mixed content in hashes, so having mixed type arrays isn't much more works. The point of schema validators is that they allow to enforce type strictness, if that's your thing, with much more control than what the TOML spec provides or could provide if this proposal was accepted (because you have domain specific knowledge), and a whole bunch of other things, like value constrains. |
They are indeed cumbersome in a static language, but it's worth it. Homogeneous maps put severe restrictions on the ability to express data concisely. |
while the toml parsers are likely to give explicit errors accompanied by line numbers
This is a bit off topic, but I think there is an underlying worry that toml will not be able to express some concepts if array's are not heterogeneous. I understand that worry, but I think all use-cases for that are covered here: #153 |
I guess I will add my comment here. Add to table, int, float, date, one more type... tuple. You have table which is a k/v map. Add a list. (1, "foo", 1979-05-27T07:32:00Z) Sometimes things make sense in tuples. Disparate types are good for expressing many concepts. Boon will have the 6th TOML implementation for Java. And I agree YAML has jumped the shark. Off topic: So in short... I agree you need Tuple. I really like that arrays are homogeneous. I also really like that my browser has spell correct because apparently I do not know how to spell homogeneous. http://rick-hightower.blogspot.com/2014/04/toml-what-if-plist-json-and-windows-ini.html |
Closing in favor of Inline Tables. Check them out on #235. |
Hurray for tuples! This PR also brings back homogenous arrays, which I would prefer, and the addition of tuples solves any decent use cases for mixed types.