-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial ProducersSection.md #65
Conversation
ProducersSection.md
Outdated
|
||
Custom section `name` field: `producers` | ||
|
||
The producers section may appear only once, and only after the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I was thinking recently about this is that from a producer's perspective it'd be nice to relax this to saying that it can appear multiple time. Projects like rustc don't actually (or at least eventually won't) have any WebAssembly encoding/decoding functionality. The Rust compiler, for example, would exclusively rely on LLVM/LLD to produce the wasm file.
To that end it'd be easiest for Rust to simply seek to the end of the file and append a few bytes, possibly adding a duplicate producers
section. The binary format is albeit quite easy to parse, but producers appending values would have to find an existing section, if any, augment the list with another entry (or make a new list if one wasn't present), and then re-encode the section back out.
I think from a consumer perspective it might not be too hard to concatenate as well? In that sense I'm not sure that there's too big of a downside of allowing multiple sections to exist other than "it feels less clean"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see how it would be easier, but I worry that this will create complexity for consumers (intermediate and otherwise) everywhere throughout the toolchain. It seems like the extra work of decoding and injecting a tool into the producers section could be handled by a single trivial command-line tool that you'd use like wasm-add-producer-tool key_name value_name
. Alternatively, tools like lld
could be liberal in what they accept so that wasm object files could have multiple producers sections but the output had a single merged producers section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking though that some of the complexity which multiple instances of this custom section might add is sort of already there? For example, as specified, consumers would have to handle a section with multiple processed-by
fields as it's not necessarily guaranteed that they're all concatenated in one field with commas?
If that's the case, then it seems that processing independent entries already implies some degree of merging logic and now it'd just span sections instead of being within one section, which in theory wouldn't be adding all that much more complexity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's not so much the complexity of a correct implementation as the likelihood that everyone independently does the correct thing.
For example, you make a good point w.r.t the field names; it'd be good I think to stipulate that they are unique like JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my mistake, I assumed that duplication of fields was intentional! Without that allowed it's definitely a different kind of decision to allow multiple.
I still personally though feel that this would ideally be relaxed for producers as there's likely far more producers than consumers. I don't really feel too strongly either way though, I'm happy to implement whatever in rustc and wasm bindgen!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems odd to me that we have a nice structure with a set of fields (just like elsewhere in the binary format) but then one of those fields is just a comma-separated text string. And of course that's the field that the most tools will have to modify. I would have expected to have multiple processed-by
fields (and why not multiple langauge
fields too?). For that matter, how do we decide which tool is the "sdk"? Is it Emscripten, or the Unity SDK that embeds Emscripten?
In terms of @alexcrichton's concerns about section-munging ease, having field duplication is sort of the worst of both worlds because consumers have to deal with multiples, and intermediate tools have to decode and re-encode the section. But I'd think that any tool that otherwise modifies the binary at all would easily have the primitives for that, and tools that don't modify the binary will probably be using primitives like WABT (we should definitely add a tool like wasm-add-producer-tool
or objcopy
or whatever to WABT).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with derek regarding the comma separated thing. Would make more sense to have each item be its own string, preceded by a count. Alternatively require duplication.
WRT to ease of concatenation I'm a little sad that we loose the ability to do this in a generic way a la SHF_STRINGS but lld already does a whole lot of custom combination logic already so I guess that ship has sailed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points!
Regarding the unnecessary use of commas in the string to provide structure that could instead be in the containing binary format. For that matter, the magic parens could also be removed and thus there would be no text analysis: just strings without any constraints. I'll update to do that.
Regarding limitation to single language/SDK: yeah, thinking about it again, you could have multiple of each. Will update.
ProducersSection.md
Outdated
# Known tools | ||
|
||
The following lists contain all the valid tool names for the fields listed below. | ||
**If your tool is not on this list and you'd like it to be, please submit a PR.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I understand why we want a strict validations for source lang/tool, I really doubt that any tool will be able to keep up-to-date with the amount of combinations that we can expect in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should probably nuance this a bit in the doc, but my thinking is that having a tool name outside the known list wouldn't be a validation error, just a thing that evergreen consumers like browsers could warn about (to provide gentle pressure) but not reject.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL at new wording regarding how tool names are checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just worried about the maintainability of such a list, even if it's just for information. I don't see why an unrecognized pipeline would be a warning, or at least displayed to the end user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I guess those are two separable issues: attempting to maintain a list and having consumers issue warnings. In both cases, I may well be overly idealistic in thinking they could work (would not be the first time...), but since it's quite easy to just relax these requirements later, it seems worth it to at least shoot for the ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that unknown tools shouldn't ever be a real problem; with a harmless diagnostic being the extent of it, if even that. Personally, I'd want to at least try this in FF for a while to help bootstrap the process, but this should be an easy thing to rip out if it's a pain and I can dial down the text in the proposal here. Another incentive is if browsers or npm analyses publish their telemetry for known tools on the list; then if you get on the list, you get free telemetry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing I was thinking: if we want a known tools list, we may want to store it in a separate flat file, one item per line. That way it's easy for build scripts to grab it and use it in whatever way they want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I was wondering about that myself too. What do you think the ideal trivial text format is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking something like:
tool1
tool2
tool3
...
You could use JSON or something more complex, but I don't think we need anything fancier than that (except maybe comments?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split by line works best (we don't prevent line breaks in the names?). JSON could be useful to store additional and JSON5 allows comments, but It sounds overkill to me.
Just an idea: the consumers will likely understand wast, we could store it in the data. I don't think that's a idea.
ProducersSection.md
Outdated
|
||
## Tool name string | ||
|
||
A tool name is a sequence of code points containing anything *other* than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overlaps a bit with the definition of a name
, could we just specify the unwanted codes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, a spec name
can contain any code point, including the 3 we want to reject here. I was considering just doing [0-9a-zA-Z] but I worried this might be too latin-character-set-centric.
ProducersSection.md
Outdated
Example version strings: | ||
* (1) | ||
* (1.2) | ||
* (0.12.1.30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that agreeing on a version format will be difficult but what about using the semver format? Tooling could also take advantage of semver ranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, interesting idea. What about just dropping all constraints (other than rejecting parens/commas)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was that a consumer could check the version against a range to use certain features of that producer. However, this implies writing a specific condition so parsing the version should be fine too.
I was worried of not being able to use the same version parsing for every producer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the value of having a unified version parsing in package.json and others, but I wonder if there's much of a need for that by the time you have a fully compiled wasm that is a mix of many different tools. It seems like one would only be honed in on particular tools you know about and then know the version scheme.
|
||
# Text format | ||
|
||
TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any advantage of using a syntax for that? I think that when custom sections are available in wast it will be easy to declare the producer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well even when custom sections are in wast, you'd still have to write out the encoded binary, which seems unpleasant to read or write. For example, if you look at a wasm module in the browser debugger, it'd be nice if you simply saw the toolchain.
ProducersSection.md
Outdated
# Known tools | ||
|
||
The following lists contain all the valid tool names for the fields listed below. | ||
**If your tool is not on this list and you'd like it to be, please submit a PR.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good to keep a list of known tools, but I agree that we should treat the presence of unknown tools as a normal and expected thing rather than an exceptional case. Not everyone is going to bother to add their tool to the registry, registries in browsers or other tools may get out of date, and none of that even covers the (likely-common?) case of "I have my own special fork of LLVM; do I pretend I'm just regular LLVM or another tool?"
Also I'm not sure having browsers emit diagnostics is broadly helpful because (I'm guessing?) the vast majority of people who see it will be web developers just using some toolchain or framework, and it won't be actionable for them.
Having said that, I'm open to other ideas to incentivize tool developers to put their tools in the registry? Maybe just the prospect of worldwide fame and notoriety is enough?
ProducersSection.md
Outdated
|
||
* `wat` | ||
* `C` | ||
* `C++` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No Rust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are missing many more langaues and tools here but we can do follow up PR to keep this one focused on the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was intentionally leaving out well-known producers so they can choose how to spell/capitalize/categorize their tools.
ProducersSection.md
Outdated
|
||
## Individual Tools | ||
|
||
* `wabt` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WABT and LLVM are acronyms; do we want those uppercase or lowercase? (My vote is uppercase)
More generally we probably shouldn't be prescriptive about how people spell their tool names, but I guess we get to decide for our own tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always use wabt personally, since it's really more of a backronym than an acronym. Then again, I didn't come up with the name! :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to do what you both decide. I see 👍 for wabt
staying lowercase; so I'll just update LLVM for now.
ProducersSection.md
Outdated
| Field | Type | Description | | ||
| ----------- | ---- | ----------- | | ||
| field_name | [name](https://webassembly.github.io/spec/core/binary/values.html#names) | name of this field, chosen from one of the set of valid field names below | | ||
| field_value | [name](https://webassembly.github.io/spec/core/binary/values.html#names) | a string which match the specified pattern according to the table below | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"match" -> "matches" (or "must match")
ProducersSection.md
Outdated
|
||
Custom section `name` field: `producers` | ||
|
||
The producers section may appear only once, and only after the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems odd to me that we have a nice structure with a set of fields (just like elsewhere in the binary format) but then one of those fields is just a comma-separated text string. And of course that's the field that the most tools will have to modify. I would have expected to have multiple processed-by
fields (and why not multiple langauge
fields too?). For that matter, how do we decide which tool is the "sdk"? Is it Emscripten, or the Unity SDK that embeds Emscripten?
In terms of @alexcrichton's concerns about section-munging ease, having field duplication is sort of the worst of both worlds because consumers have to deal with multiples, and intermediate tools have to decode and re-encode the section. But I'd think that any tool that otherwise modifies the binary at all would easily have the primitives for that, and tools that don't modify the binary will probably be using primitives like WABT (we should definitely add a tool like wasm-add-producer-tool
or objcopy
or whatever to WABT).
Based on the above recommendation, the strings are just plain uninterpreted wasm name strings and the list-of-pairs structure is in the binary format which simplifies everything; not sure why I didn't start with that... anyhow PTAL, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % nits
Thanks for the review @sbc100! Anyone else want to comment before merging? |
Thanks all. As I mentioned before; as soon as anyone starts implementing a producer of this, well, producers section, I'm happy to validate the output by implementing the validation logic in FF to test it against, so give me a ping. |
Recently proposed in WebAssembly/tool-conventions#65 each wasm file will now have an optional `producers` section listing the tooling that went into producing it. Let's add `wasm-bindgen` in when it processes a wasm file!
Recently proposed in WebAssembly/tool-conventions#65 each wasm file will now have an optional `producers` section listing the tooling that went into producing it. Let's add `wasm-bindgen` in when it processes a wasm file!
Recently proposed in WebAssembly/tool-conventions#65 each wasm file will now have an optional `producers` section listing the tooling that went into producing it. Let's add `wasm-bindgen` in when it processes a wasm file!
This commit implements WebAssembly/tool-conventions#65 for wasm files produced by the Rust compiler. This adds a bit of metadata to wasm modules to indicate that the file's language includes Rust and the file's "processed-by" tools includes rustc. The thinking with this section is to eventually have telemetry in browsers tracking all this.
Recently proposed in WebAssembly/tool-conventions#65 each wasm file will now have an optional `producers` section listing the tooling that went into producing it. Let's add `wasm-bindgen` in when it processes a wasm file!
Recently proposed in WebAssembly/tool-conventions#65 each wasm file will now have an optional `producers` section listing the tooling that went into producing it. Let's add `wasm-bindgen` in when it processes a wasm file!
… r=estebank Encode a custom "producers" section in wasm files This commit implements WebAssembly/tool-conventions#65 for wasm files produced by the Rust compiler. This adds a bit of metadata to wasm modules to indicate that the file's language includes Rust and the file's "processed-by" tools includes rustc. The thinking with this section is to eventually have telemetry in browsers tracking all this.
@lukewagner I've implemented this for the Rust compiler and the This wasm file should have the I probably have a bug in at least one location, though! Just following up on your offer to implement a consumer in Firefox :) |
As discussed in #63. Feedback and suggestions welcome.
In the "Known tools" lists, I intentionally included the bare-minimum which @dschuff and team can review. After the initial PR merges, I'd expect Rust, Blazor, etc to file PRs, adding the tool names that make sense to them.
I think we should also introduce some custom .wat syntax for this custom section, but I'd like to see some agreement on the fundamental fields, as proposed in the binary format, before adding that, so I left this as TODO in this PR.
If anyone wants to implement a producer of this section, I'd be more than happy to validate the output in SpiderMonkey by implementing a consumer; just ping me.