Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial ProducersSection.md #65

Merged
merged 17 commits into from
Nov 16, 2018
Merged

Add initial ProducersSection.md #65

merged 17 commits into from
Nov 16, 2018

Conversation

lukewagner
Copy link
Member

@lukewagner lukewagner commented Oct 10, 2018

As discussed in #63. Feedback and suggestions welcome.

In the "Known tools" lists, I intentionally included the bare-minimum which @dschuff and team can review. After the initial PR merges, I'd expect Rust, Blazor, etc to file PRs, adding the tool names that make sense to them.

I think we should also introduce some custom .wat syntax for this custom section, but I'd like to see some agreement on the fundamental fields, as proposed in the binary format, before adding that, so I left this as TODO in this PR.

If anyone wants to implement a producer of this section, I'd be more than happy to validate the output in SpiderMonkey by implementing a consumer; just ping me.


Custom section `name` field: `producers`

The producers section may appear only once, and only after the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I was thinking recently about this is that from a producer's perspective it'd be nice to relax this to saying that it can appear multiple time. Projects like rustc don't actually (or at least eventually won't) have any WebAssembly encoding/decoding functionality. The Rust compiler, for example, would exclusively rely on LLVM/LLD to produce the wasm file.

To that end it'd be easiest for Rust to simply seek to the end of the file and append a few bytes, possibly adding a duplicate producers section. The binary format is albeit quite easy to parse, but producers appending values would have to find an existing section, if any, augment the list with another entry (or make a new list if one wasn't present), and then re-encode the section back out.

I think from a consumer perspective it might not be too hard to concatenate as well? In that sense I'm not sure that there's too big of a downside of allowing multiple sections to exist other than "it feels less clean"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how it would be easier, but I worry that this will create complexity for consumers (intermediate and otherwise) everywhere throughout the toolchain. It seems like the extra work of decoding and injecting a tool into the producers section could be handled by a single trivial command-line tool that you'd use like wasm-add-producer-tool key_name value_name. Alternatively, tools like lld could be liberal in what they accept so that wasm object files could have multiple producers sections but the output had a single merged producers section.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking though that some of the complexity which multiple instances of this custom section might add is sort of already there? For example, as specified, consumers would have to handle a section with multiple processed-by fields as it's not necessarily guaranteed that they're all concatenated in one field with commas?

If that's the case, then it seems that processing independent entries already implies some degree of merging logic and now it'd just span sections instead of being within one section, which in theory wouldn't be adding all that much more complexity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's not so much the complexity of a correct implementation as the likelihood that everyone independently does the correct thing.

For example, you make a good point w.r.t the field names; it'd be good I think to stipulate that they are unique like JSON.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my mistake, I assumed that duplication of fields was intentional! Without that allowed it's definitely a different kind of decision to allow multiple.

I still personally though feel that this would ideally be relaxed for producers as there's likely far more producers than consumers. I don't really feel too strongly either way though, I'm happy to implement whatever in rustc and wasm bindgen!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd to me that we have a nice structure with a set of fields (just like elsewhere in the binary format) but then one of those fields is just a comma-separated text string. And of course that's the field that the most tools will have to modify. I would have expected to have multiple processed-by fields (and why not multiple langauge fields too?). For that matter, how do we decide which tool is the "sdk"? Is it Emscripten, or the Unity SDK that embeds Emscripten?

In terms of @alexcrichton's concerns about section-munging ease, having field duplication is sort of the worst of both worlds because consumers have to deal with multiples, and intermediate tools have to decode and re-encode the section. But I'd think that any tool that otherwise modifies the binary at all would easily have the primitives for that, and tools that don't modify the binary will probably be using primitives like WABT (we should definitely add a tool like wasm-add-producer-tool or objcopy or whatever to WABT).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with derek regarding the comma separated thing. Would make more sense to have each item be its own string, preceded by a count. Alternatively require duplication.

WRT to ease of concatenation I'm a little sad that we loose the ability to do this in a generic way a la SHF_STRINGS but lld already does a whole lot of custom combination logic already so I guess that ship has sailed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points!

Regarding the unnecessary use of commas in the string to provide structure that could instead be in the containing binary format. For that matter, the magic parens could also be removed and thus there would be no text analysis: just strings without any constraints. I'll update to do that.

Regarding limitation to single language/SDK: yeah, thinking about it again, you could have multiple of each. Will update.

ProducersSection.md Outdated Show resolved Hide resolved
# Known tools

The following lists contain all the valid tool names for the fields listed below.
**If your tool is not on this list and you'd like it to be, please submit a PR.**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand why we want a strict validations for source lang/tool, I really doubt that any tool will be able to keep up-to-date with the amount of combinations that we can expect in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably nuance this a bit in the doc, but my thinking is that having a tool name outside the known list wouldn't be a validation error, just a thing that evergreen consumers like browsers could warn about (to provide gentle pressure) but not reject.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL at new wording regarding how tool names are checked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just worried about the maintainability of such a list, even if it's just for information. I don't see why an unrecognized pipeline would be a warning, or at least displayed to the end user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I guess those are two separable issues: attempting to maintain a list and having consumers issue warnings. In both cases, I may well be overly idealistic in thinking they could work (would not be the first time...), but since it's quite easy to just relax these requirements later, it seems worth it to at least shoot for the ideal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that unknown tools shouldn't ever be a real problem; with a harmless diagnostic being the extent of it, if even that. Personally, I'd want to at least try this in FF for a while to help bootstrap the process, but this should be an easy thing to rip out if it's a pain and I can dial down the text in the proposal here. Another incentive is if browsers or npm analyses publish their telemetry for known tools on the list; then if you get on the list, you get free telemetry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing I was thinking: if we want a known tools list, we may want to store it in a separate flat file, one item per line. That way it's easy for build scripts to grab it and use it in whatever way they want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I was wondering about that myself too. What do you think the ideal trivial text format is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking something like:

tool1
tool2
tool3
...

You could use JSON or something more complex, but I don't think we need anything fancier than that (except maybe comments?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split by line works best (we don't prevent line breaks in the names?). JSON could be useful to store additional and JSON5 allows comments, but It sounds overkill to me.

Just an idea: the consumers will likely understand wast, we could store it in the data. I don't think that's a idea.


## Tool name string

A tool name is a sequence of code points containing anything *other* than
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overlaps a bit with the definition of a name, could we just specify the unwanted codes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, a spec name can contain any code point, including the 3 we want to reject here. I was considering just doing [0-9a-zA-Z] but I worried this might be too latin-character-set-centric.

ProducersSection.md Outdated Show resolved Hide resolved
Example version strings:
* (1)
* (1.2)
* (0.12.1.30)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that agreeing on a version format will be difficult but what about using the semver format? Tooling could also take advantage of semver ranges.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, interesting idea. What about just dropping all constraints (other than rejecting parens/commas)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that a consumer could check the version against a range to use certain features of that producer. However, this implies writing a specific condition so parsing the version should be fine too.

I was worried of not being able to use the same version parsing for every producer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the value of having a unified version parsing in package.json and others, but I wonder if there's much of a need for that by the time you have a fully compiled wasm that is a mix of many different tools. It seems like one would only be honed in on particular tools you know about and then know the version scheme.


# Text format

TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any advantage of using a syntax for that? I think that when custom sections are available in wast it will be easy to declare the producer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well even when custom sections are in wast, you'd still have to write out the encoded binary, which seems unpleasant to read or write. For example, if you look at a wasm module in the browser debugger, it'd be nice if you simply saw the toolchain.

# Known tools

The following lists contain all the valid tool names for the fields listed below.
**If your tool is not on this list and you'd like it to be, please submit a PR.**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to keep a list of known tools, but I agree that we should treat the presence of unknown tools as a normal and expected thing rather than an exceptional case. Not everyone is going to bother to add their tool to the registry, registries in browsers or other tools may get out of date, and none of that even covers the (likely-common?) case of "I have my own special fork of LLVM; do I pretend I'm just regular LLVM or another tool?"
Also I'm not sure having browsers emit diagnostics is broadly helpful because (I'm guessing?) the vast majority of people who see it will be web developers just using some toolchain or framework, and it won't be actionable for them.

Having said that, I'm open to other ideas to incentivize tool developers to put their tools in the registry? Maybe just the prospect of worldwide fame and notoriety is enough?


* `wat`
* `C`
* `C++`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Rust?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are missing many more langaues and tools here but we can do follow up PR to keep this one focused on the RFC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was intentionally leaving out well-known producers so they can choose how to spell/capitalize/categorize their tools.


## Individual Tools

* `wabt`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WABT and LLVM are acronyms; do we want those uppercase or lowercase? (My vote is uppercase)

More generally we probably shouldn't be prescriptive about how people spell their tool names, but I guess we get to decide for our own tools.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always use wabt personally, since it's really more of a backronym than an acronym. Then again, I didn't come up with the name! :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to do what you both decide. I see 👍 for wabt staying lowercase; so I'll just update LLVM for now.

| Field | Type | Description |
| ----------- | ---- | ----------- |
| field_name | [name](https://webassembly.github.io/spec/core/binary/values.html#names) | name of this field, chosen from one of the set of valid field names below |
| field_value | [name](https://webassembly.github.io/spec/core/binary/values.html#names) | a string which match the specified pattern according to the table below |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"match" -> "matches" (or "must match")


Custom section `name` field: `producers`

The producers section may appear only once, and only after the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd to me that we have a nice structure with a set of fields (just like elsewhere in the binary format) but then one of those fields is just a comma-separated text string. And of course that's the field that the most tools will have to modify. I would have expected to have multiple processed-by fields (and why not multiple langauge fields too?). For that matter, how do we decide which tool is the "sdk"? Is it Emscripten, or the Unity SDK that embeds Emscripten?

In terms of @alexcrichton's concerns about section-munging ease, having field duplication is sort of the worst of both worlds because consumers have to deal with multiples, and intermediate tools have to decode and re-encode the section. But I'd think that any tool that otherwise modifies the binary at all would easily have the primitives for that, and tools that don't modify the binary will probably be using primitives like WABT (we should definitely add a tool like wasm-add-producer-tool or objcopy or whatever to WABT).

@lukewagner
Copy link
Member Author

lukewagner commented Oct 15, 2018

Based on the above recommendation, the strings are just plain uninterpreted wasm name strings and the list-of-pairs structure is in the binary format which simplifies everything; not sure why I didn't start with that... anyhow PTAL, thanks.

Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % nits

ProducersSection.md Outdated Show resolved Hide resolved
ProducersSection.md Outdated Show resolved Hide resolved
ProducersSection.md Outdated Show resolved Hide resolved
ProducersSection.md Show resolved Hide resolved
ProducersSection.md Outdated Show resolved Hide resolved
@lukewagner
Copy link
Member Author

lukewagner commented Nov 16, 2018

Thanks for the review @sbc100! Anyone else want to comment before merging?

@lukewagner
Copy link
Member Author

Thanks all. As I mentioned before; as soon as anyone starts implementing a producer of this, well, producers section, I'm happy to validate the output by implementing the validation logic in FF to test it against, so give me a ping.

@lukewagner lukewagner merged commit c0a7179 into master Nov 16, 2018
@lukewagner lukewagner deleted the add-producers-section branch November 16, 2018 21:07
alexcrichton added a commit to alexcrichton/wasm-bindgen that referenced this pull request Nov 19, 2018
Recently proposed in WebAssembly/tool-conventions#65 each wasm file will
now have an optional `producers` section listing the tooling that went
into producing it. Let's add `wasm-bindgen` in when it processes a wasm
file!
alexcrichton added a commit to alexcrichton/wasm-bindgen that referenced this pull request Nov 19, 2018
Recently proposed in WebAssembly/tool-conventions#65 each wasm file will
now have an optional `producers` section listing the tooling that went
into producing it. Let's add `wasm-bindgen` in when it processes a wasm
file!
alexcrichton added a commit to alexcrichton/wasm-bindgen that referenced this pull request Nov 19, 2018
Recently proposed in WebAssembly/tool-conventions#65 each wasm file will
now have an optional `producers` section listing the tooling that went
into producing it. Let's add `wasm-bindgen` in when it processes a wasm
file!
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Nov 19, 2018
This commit implements WebAssembly/tool-conventions#65 for wasm files
produced by the Rust compiler. This adds a bit of metadata to wasm
modules to indicate that the file's language includes Rust and the
file's "processed-by" tools includes rustc.

The thinking with this section is to eventually have telemetry in
browsers tracking all this.
alexcrichton added a commit to alexcrichton/wasm-bindgen that referenced this pull request Nov 19, 2018
Recently proposed in WebAssembly/tool-conventions#65 each wasm file will
now have an optional `producers` section listing the tooling that went
into producing it. Let's add `wasm-bindgen` in when it processes a wasm
file!
alexcrichton added a commit to alexcrichton/wasm-bindgen that referenced this pull request Nov 19, 2018
Recently proposed in WebAssembly/tool-conventions#65 each wasm file will
now have an optional `producers` section listing the tooling that went
into producing it. Let's add `wasm-bindgen` in when it processes a wasm
file!
pietroalbini added a commit to pietroalbini/rust that referenced this pull request Nov 25, 2018
… r=estebank

Encode a custom "producers" section in wasm files

This commit implements WebAssembly/tool-conventions#65 for wasm files
produced by the Rust compiler. This adds a bit of metadata to wasm
modules to indicate that the file's language includes Rust and the
file's "processed-by" tools includes rustc.

The thinking with this section is to eventually have telemetry in
browsers tracking all this.
@alexcrichton
Copy link
Collaborator

@lukewagner I've implemented this for the Rust compiler and the wasm-bindgen tool in rustwasm/wasm-bindgen#1041 and rust-lang/rust#56075, and all of the examples on https://rustwasm.github.io/wasm-bindgen/ should now be recompiled with at least the wasm-bindgen changes.

This wasm file should have the wasm-bindgen section, and this wasm file should have both rustc and wasm-bindgen sections.

I probably have a bug in at least one location, though! Just following up on your offer to implement a consumer in Firefox :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants