-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: TOML Schema #792
Comments
Looks good so far, although I feel that the stated namespace restriction shuts down a promising opportunity before it can be explored. Although I'm inclined to avoid so-called "microformats" as being overly broad in scope, I'll set that objection aside for the time being. It may be possible to treat a subtable like it's its own TOML hash table, assess a separately specified schema against it, and integrate that assessment into the assessment of the original parent document. This could be done easily within a TOML file simply by giving a table [subtable.toml-schema]
version = 2
location = "<url>" What would your objections be to allowing such a sub-schema application? |
The Certainly not a standalone schema validator, because no configuration would be constructed when it runs. But is |
@eksortso my thinking is that the TOML parser should notice the existence of the schema reference, and then validate the document against the schema, and for any missing key, it will check for a default in the schema and grab the value from there, and construct the resulting TOML object with that value. |
@eksortso regarding namespaces, what I found to be challenging is the recursiveness of the schema. I'd be happy to support it if someone can bring a solution to the problem. |
@brunoborges Well, in a sense, there's already limited support for multiple schemas: Any part of a TOML configuration will have, at most, one schema to rule over it. The schema over that part would be defined by the The only complication in a plan like this is how we could assign a schema to every table in a table sequence. An array has no way to assign a table to a key with no table of its own. Perhaps the first table in the sequence can have a subtable called |
This is what I found to be difficult. The moment extra tables must be added later on in the document to support more metadata, the TOML document starts to lose its appeal. One idea that did come to mind was a namespace prefix, just like XML/XSD does. [toml-schema]
version=1
location="url..."
[toml-schema.cust]
version=1
location="url for customer namespace schema"
[title] # this is top-level schema
name="Customers Orders Configuration"
[customers] # part of top-level schema
[cust:customers.orderSettings] # this one is a customer element
maxitems=3
region="North America"
shipping="UPS"
[customers.orderSettings.header] # this is still part of the customer namespace as it is a child of a table linked to the 'cust' namespace.
comment="some comment" But then, how to reference the customer schema from the top-level, general schema? |
Is this not just a slightly-more-fleshed-out duplicate of #629, #76, or #116? On #629 specifically @pradyunsg makes the point that
I realize this proposal is 'better' than #629 in that the schema is itself a separate TOML document, and I appreciate the amount of thought that has gone into it, but TOML is supposed to be simple and human-oriented; I don't buy any claims that a schema would be solving a real problem. If the TOML is so complex that it requires the parser to perform context-aware validation against a nontrivial schema, maybe TOML was the wrong tool for the job to begin with. Also consider that by adding a 'url' you're implying the TOML parser needs to either have network awareness built-in (at the very least the ability to do a basic HTTP fetch), or require applications to implement that themselves via callbacks. Neither are great options, and are likely to be impossible in many contexts. One of TOML's selling points is it's minimalism, both in syntax, and thus the subsequent implementation. The requirements for implementing URL fetching will be a complexity/bloat bridge-too-far for many implementations. |
@marzer Here are a few points:
So in short: the proposal is to find common ground, without adding complexity to the TOML specification itself, but to ensure the specification recognizes the existence of the TOML Schema, and allows for a standard way for defining a pointer to a schema file. That's all. |
@marzer I agree with your essential point:
But at this point, toml-schema is a separate project, and the impression I get from everyone so far is that it'll always be separate from core TOML, even if it's heavily adopted. It imposes nothing on the core standard, and the schemas themselves are fully compliant TOML documents. I will disagree with you, vehemently, on the matter of complexity. Configurations always start small. But if a configuration is intended to scale up, there may come a time when a little help to keep things in line would be appreciated, especially when that help is a pure add-on with no additional load borne by the standard. |
Just adding a little perspective here: The syntax isn't the issue: The syntax for JSON or YAML or INI files aren't particularly complex. Heck, the syntax for XML isn't all that complex in most cases. The issue is knowing what keys are available and the expected/valid values for each key. Take, for example, Windows Terminal's Without a schema, remembering the names and values for each of the settings is a PITA and having to constantly refer to the docs is not productive. WITH a schema, editors like VSCode make writing settings a breeze: |
While I think that a TOML schema mechanism is a good idea, I agree with others here that it must be optional: TOML parsers may consider schemas, but they are not required to do so. A logical and in my viewpoint very important conclusion from this is that the absence or presence of a schema must not change the data structure resulting from parsing a valid (and schema-valid) document. Therefore, a ''default'' key as described above cannot be part of the TOML schema spec, since otherwise a schema-aware parser would parse documents into different data structures (with defaults added) than a schema-ignorant parser. Let's not go down that road, since it would fragment the TOML community. |
I don't think anyone is mandating that every TOML doc must have a schema. But we are advocating that TOML should offer/support schemas when presented. |
@ChristianSi one more time for the sake of the debate: XML and XSD are two separate specifications. One (XML) recognizes the existence of the other (XSD), but (XML) does not require it (XSD). Not all XML documents must have a schema. Therefore, the proposal is to discuss, along the key TOML contributors and the TOML community in general, whether there is room for a TOML Schema specification, how it should work best, and how TOML specification should recognize its existence in a way that is standardized (e.g. @eksortso right now, the grammar I drafted does not suggest a fully compliant TOML document, but similar. If you look closely to the ABNF, it suggests a few keywords for built-in types, that are not quoted as strings. Example: [document.property]
type = array
arraytype = string What do you think? |
Well as long as it remains fully optional, such that a parser can completely ignore a schema URI and still remain compliant, I guess I have no complaint. To that end, I second @ChristianSi 's point:
|
@brunoborges Making TOML schemas themselves fully compliant TOML document sounds like a very good idea. "Eat your own dogfood" and don't proliferate file formats and parser requirements needlessly. Just adding a few quotes here and there seems like a worthwhile price. |
I have two questions about the proposed syntax:
Note that neither questions need answering at all if the schema is not a part of the TOML document, and instead uses magic comments or similar. Something like: ##! toml-schema = { version = 1, location="url..." } Which also has the upside of appearing visually distinct from regular TOML, though adds complexity to the language since that requires changes to the ABNF. |
@brunoborges I apologize, because I've been basing my assessment on the project README only, and that's not in sync with the project's ABNF. The README does imply that the schema must be a separate document, because the only thing that the TOML document needs to have is a Now regarding those non-TOML-compliant value keywords. As long as they remain in TOSD docs, then schemas could have special unquoted value keywords in the TOSD format. That could still have a knock-on effect:
I'd love to add enumerated values and option types to TOML, but I wouldn't do anything to encourage that, at least not just yet. |
@marzer Your comment suggestion reminds me of #522, which was specifically about TOML version pragmas. Could we use a similar pattern for referring to TOML schemas? Something like the following appearing at the top of the document? # TOML Schema: v2 https://config.example.com/schema.tosd @brunoborges Is the toml-schema.location = "https://config.example.com/schema_v2.tosd" |
@eksortso the idea of having the version, is to double check the intent. If So, while it is not necessary, it would add some protection. |
Yeah, I am not a fan of the unquoted enumerated values either. I just really thought they'd make things easier for extensions/plugins and therefore developer experience in general, but I think you are right to say that it is not impossible to add the quotes. That said, I'll document that a TOML Schema must be a TOML compliant document. This does raise the question: Is there still a need for a TOML Schema ABNF grammar? I tend to believe that yes it is still needed, to ensure of the structure. @eksortso Any thoughts? |
@marzer here are my thoughts on your two questions:
It is unfortunate that the TOML specification does not set a meta-table format for information regarding the document type (e.g. the version). HTML for example has a standard way to do so:
If TOML specification allowed for such standardized construct, then the schema reference could be part of it, along with the TOML specification version that could inform parsers of other metadata. But, assuming that such construct will never be part of the specification, then my thinking is that we have a few options to consider: Reserve the table
|
I suppose I should clarify what I took "not schema-aware" to mean here: an old parser that knows nothing about this new feature. If it knows about schemas but chooses to ignore them, then it is schema-aware but also non-enforcing. Moot point, though; I agree with your points above that having it pragma-style in comments is the likely the right direction. |
@ChristianSi I think you are making a really good point here regarding Ideally, a TOML file should output the same data regardless of what parser was used, as long as the parser is compliant with the version of the TOML specification. And if a schema-aware parser generates a data object that is different because it followed the schema and grabbed a few default values, then ultimately the file is different. In essence what you are saying is that TOML Schema must not influence/modify the data of a TOML file. A TOML Schema can only dictate the data structure and data types; never data input. I'm down with that. |
@brunoborges Well, I'm a big fan of dogfooding, so my advice would be to write the schema standard as TOML using itself to check it. This will hold a lot more weight once TOML v1.0.0 is finally released. I'm not saying this just to be flippant; after all, ABNF was defined using itself for its first specification. That said, if you want to keep the ABNF around, would it be possible to use the case-sensitive string syntax introduced in |
Hi all, I incorporated some of the feedback here, and for now, also decided to not focus on the ABNF grammar, and instead on a set of rules. I believe ABNF may be useful later to generate a parser that validates the overall structure of the TOML Schema document. It is also starting to seem possible to draft a recursive TOML Schema file to validate the TOML Schema itself. I'd appreciate those interested in this proposal if you could review the new README documentation. Thank you |
@brunoborges I've left some feedback/nit-picks on your Discussions page: brunoborges/toml-schema#4 (Is that where you want that sort of thing? Or here?) |
Would be awesome to get this accross. Was just looking for it. |
I am inclined to believe that it would be much wiser to just use JSON Schema to validate the JSON object resulted from loading a TOML document. Instead of reinventing the wheel, try to reuse it. There is a huge number of validators and they could easily be retrofitted to also work with I am aware of at least one vscode extension that already implementing support for using json-schemas to validate TOML files. See https://marketplace.visualstudio.com/items?itemName=tamasfe.even-better-toml#completion-and-validation-with-json-schema New schemas can be added to https://www.schemastore.org/json/ database, which can be used to automatically pick by editors without having to implement manual associations. Sadly, the extension above does not seem to implement support for schemastore yet but I see no reason why they would not want that. Schemas can be either included in the database or just linked to their location. |
See the comment by @bitcrazed .
|
TOML handles some types (date, time, datetime) and values (+- inf, nan) that JSON (and therefore JSONschema) doesn't, so JSONschema would need to be extended a bit. |
I've used the Now I am looking to suggest TOML as a configuration format for a project I am currently working on, I have been quite disappointed to find there is no established schema, especially given the long history of it having been proposed. There seem to have been several excellent suggestions, most notably the one proposed here by @brunoborges. But the most recent comments about JSON Schema are disheartening when they clearly are missing the point. In general, seeing things like this are a red flag to me that the core format doesn't have enough support and requires external intervention to provide a proper feature set. Yes, JSON Schema can work, but
At this point, I have a few non-ideal choices:
It would be great to see this conversation make some progress. |
That being said, is there anything the existing proposals are missing, besides being accepted by the maintainers of the core product? I'd even be willing to help kickstart some tools around What would it take to get some traction here? |
Nothing really; I think everyone already agrees that TOML itself doesn't really need any additional features for toml-schema to work, and that it should always live as a separate specification. The only point is that there's no clear way to point at a TOML schema from the TOML document itself. The toml-schema document has a special IMHO the comment-based syntax suggested earlier makes more sense:
Although personally I'd probably opt for a simpler space-separated format so you don't need to parse TOML inside a comment:
This way you can add easily TOML schema to $any document without requiring any support from the application. The main missing part is just that no one has written any tooling for this; there is no So the questions here are basically:
My personal answer for 1 would be "yes, when there's at least one tool that implements it", and for 2 it would be "no, the toml-schema specification can use a comment-based pragma for this". |
I found taplo a few days ago, it uses comment based schema references, but json schema to provide validation/autocompletion for toml files. While json-schema clearly is an imperfect solution to specifying a toml document, I think it might make sense to take existing tooling into account. |
Let's say we have a simple TOML document:
With a simple schema:
The difficulty here is that the schema will be parsed to this:
And this data structure is a pain to work with, because all values are tables, but some tables represent a schema definition, and some represent further nested keys. But more importantly, what about:
And "type" is already defined so we can't add a key for this:
I guess we could make tables implicit, that "solves" it:
But it seems confusing and error-prone. And it creates a new problem as we end up with a data structure like:
So we loop over elements, and the only way to see that "author.type" refers to a type is by checking the value of that being a dict with a It's all super non-obvious and error-prone to write tooling for this. This is why my earlier comment said: you need to write implementations, because that's when these kind of problems surface. So instead of using subtables, maybe just always use string keys:
This always parses to a flat data structure;
And is just much easier to work with. On the other hand, also a bit of a pain to write, and I can see people forgetting those quotes. Although people aren't going to be writing TOML schemas that often, so maybe that's okay? Either way, I think the current proposal is not going to work out well. I'll continue playing around to see what works, but definitely more work is needed here. |
Hopefully some year this comes to fruition. This is my +1 because of an early comment that it's useful for IDE support. I think it needs to accompany the TOML spec, and I"m not sure that a file should declare it... in the same way that a json file doesn't have a declaration for its json-schema. Instead relying on a well known name. The reason it needs to accompany the offical spec though is that I suspect parsers need to know that they need to implement it. Given a schema document, along with a config file, the parser should be able to provide a list of errors. |
I wrote a lot of code for this last year, and have a somewhat working (but unfinished and rather ugly) implementation that does validation and some other stuff. I haven't worked on it in a long time though. My goal was to validate at least most of Cargo.toml and pyproject.toml. These are probably the most widespread TOML files, so that seems like a good place to start. To do this and actually make it useful I found you need to re-implement significant parts of JSON schema. While the syntax is perhaps a bit nicer, I don't think it's really worth the effort: you can just use JSON schema for TOML – this is what Taplo does for example, and it seems to work well enough. Perhaps there's a few things that can be improved in JSON schema to better support TOML, or how to use JSON-schema with TOML can be documented better, but this seems like the most useful path forward. My implementation is bad and unfinished enough that I'd rather not put it on GitHub. I actually don't quite remember what bits are and aren't done and what does and doesn't fully work. If someone really wants to work on it I can send it to them, I guess. The proposed spec here is nowhere nearly sufficient, nor is #116. Initially I thought "well, this is easy – we'll just do JSON schema like but without all that complexity", and slowly discovered a lot of that complexity is needed. It's required complexity. This was not obvious from the outset, and I strongly recommend anyone working on this starting on the implementation rather than specification. This is why my code is so ugly: I had to switch gears several times during development. I think we should just close this issue. Anyone wanting to work on it can do so – nothing in the TOML specification is preventing that. This is primarily matter of tooling, not a matter of specification. Later, when someone has some working tooling, we can perhaps consider adding a new separate specification for it. |
I started a discussion in #1038 as even though I think JSON Schema is the right tool to use to validate the structure and contents of TOML files, I also believe the core TOML project still has a role to play in describing how to do that validation well. |
Hi all,
I've been designing, along with @aalmiray, a grammar for a TOML Schema document. The proposal can be found in this repository: toml-schema.
The main difference between a TOML document and a TOML Schema document is the existence of key-value pairs with built-in values (keywords). This is one of the points I'd like to get feedback from the TOML community.
It is not the goal of this proposal to support namespaces. A TOML document cannot embed multiple schemas under nested namespaces. This would make this really, really hard to implement and support, with little to no benefit.
Feel free to comment here and/or create issues on the toml-schema project.
Thanks,
bb.
The text was updated successfully, but these errors were encountered: