Add initial ProducersSection.md #65

lukewagner · 2018-10-10T23:25:04Z

As discussed in #63. Feedback and suggestions welcome.

In the "Known tools" lists, I intentionally included the bare-minimum which @dschuff and team can review. After the initial PR merges, I'd expect Rust, Blazor, etc to file PRs, adding the tool names that make sense to them.

I think we should also introduce some custom .wat syntax for this custom section, but I'd like to see some agreement on the fundamental fields, as proposed in the binary format, before adding that, so I left this as TODO in this PR.

If anyone wants to implement a producer of this section, I'd be more than happy to validate the output in SpiderMonkey by implementing a consumer; just ping me.

alexcrichton · 2018-10-11T00:22:03Z

ProducersSection.md

+
+Custom section `name` field: `producers`
+
+The producers section may appear only once, and only after the


One thing I was thinking recently about this is that from a producer's perspective it'd be nice to relax this to saying that it can appear multiple time. Projects like rustc don't actually (or at least eventually won't) have any WebAssembly encoding/decoding functionality. The Rust compiler, for example, would exclusively rely on LLVM/LLD to produce the wasm file.

To that end it'd be easiest for Rust to simply seek to the end of the file and append a few bytes, possibly adding a duplicate producers section. The binary format is albeit quite easy to parse, but producers appending values would have to find an existing section, if any, augment the list with another entry (or make a new list if one wasn't present), and then re-encode the section back out.

I think from a consumer perspective it might not be too hard to concatenate as well? In that sense I'm not sure that there's too big of a downside of allowing multiple sections to exist other than "it feels less clean"

I can see how it would be easier, but I worry that this will create complexity for consumers (intermediate and otherwise) everywhere throughout the toolchain. It seems like the extra work of decoding and injecting a tool into the producers section could be handled by a single trivial command-line tool that you'd use like wasm-add-producer-tool key_name value_name. Alternatively, tools like lld could be liberal in what they accept so that wasm object files could have multiple producers sections but the output had a single merged producers section.

I was thinking though that some of the complexity which multiple instances of this custom section might add is sort of already there? For example, as specified, consumers would have to handle a section with multiple processed-by fields as it's not necessarily guaranteed that they're all concatenated in one field with commas?

If that's the case, then it seems that processing independent entries already implies some degree of merging logic and now it'd just span sections instead of being within one section, which in theory wouldn't be adding all that much more complexity?

Well, it's not so much the complexity of a correct implementation as the likelihood that everyone independently does the correct thing.

For example, you make a good point w.r.t the field names; it'd be good I think to stipulate that they are unique like JSON.

Oh my mistake, I assumed that duplication of fields was intentional! Without that allowed it's definitely a different kind of decision to allow multiple.

I still personally though feel that this would ideally be relaxed for producers as there's likely far more producers than consumers. I don't really feel too strongly either way though, I'm happy to implement whatever in rustc and wasm bindgen!

It seems odd to me that we have a nice structure with a set of fields (just like elsewhere in the binary format) but then one of those fields is just a comma-separated text string. And of course that's the field that the most tools will have to modify. I would have expected to have multiple processed-by fields (and why not multiple langauge fields too?). For that matter, how do we decide which tool is the "sdk"? Is it Emscripten, or the Unity SDK that embeds Emscripten?

In terms of @alexcrichton's concerns about section-munging ease, having field duplication is sort of the worst of both worlds because consumers have to deal with multiples, and intermediate tools have to decode and re-encode the section. But I'd think that any tool that otherwise modifies the binary at all would easily have the primitives for that, and tools that don't modify the binary will probably be using primitives like WABT (we should definitely add a tool like wasm-add-producer-tool or objcopy or whatever to WABT).

I tend to agree with derek regarding the comma separated thing. Would make more sense to have each item be its own string, preceded by a count. Alternatively require duplication.

WRT to ease of concatenation I'm a little sad that we loose the ability to do this in a generic way a la SHF_STRINGS but lld already does a whole lot of custom combination logic already so I guess that ship has sailed.

Good points!

Regarding the unnecessary use of commas in the string to provide structure that could instead be in the containing binary format. For that matter, the magic parens could also be removed and thus there would be no text analysis: just strings without any constraints. I'll update to do that.

Regarding limitation to single language/SDK: yeah, thinking about it again, you could have multiple of each. Will update.

ProducersSection.md

xtuc · 2018-10-11T05:45:12Z

ProducersSection.md

+# Known tools
+
+The following lists contain all the valid tool names for the fields listed below.
+**If your tool is not on this list and you'd like it to be, please submit a PR.**


While I understand why we want a strict validations for source lang/tool, I really doubt that any tool will be able to keep up-to-date with the amount of combinations that we can expect in the future.

I should probably nuance this a bit in the doc, but my thinking is that having a tool name outside the known list wouldn't be a validation error, just a thing that evergreen consumers like browsers could warn about (to provide gentle pressure) but not reject.

PTAL at new wording regarding how tool names are checked.

I'm just worried about the maintainability of such a list, even if it's just for information. I don't see why an unrecognized pipeline would be a warning, or at least displayed to the end user.

Well I guess those are two separable issues: attempting to maintain a list and having consumers issue warnings. In both cases, I may well be overly idealistic in thinking they could work (would not be the first time...), but since it's quite easy to just relax these requirements later, it seems worth it to at least shoot for the ideal.

Agreed that unknown tools shouldn't ever be a real problem; with a harmless diagnostic being the extent of it, if even that. Personally, I'd want to at least try this in FF for a while to help bootstrap the process, but this should be an easy thing to rip out if it's a pain and I can dial down the text in the proposal here. Another incentive is if browsers or npm analyses publish their telemetry for known tools on the list; then if you get on the list, you get free telemetry.

Another thing I was thinking: if we want a known tools list, we may want to store it in a separate flat file, one item per line. That way it's easy for build scripts to grab it and use it in whatever way they want.

That's a good idea. I was wondering about that myself too. What do you think the ideal trivial text format is?

I was thinking something like:

tool1 tool2 tool3 ...

You could use JSON or something more complex, but I don't think we need anything fancier than that (except maybe comments?)

Split by line works best (we don't prevent line breaks in the names?). JSON could be useful to store additional and JSON5 allows comments, but It sounds overkill to me.

Just an idea: the consumers will likely understand wast, we could store it in the data. I don't think that's a idea.

xtuc · 2018-10-11T05:48:06Z

ProducersSection.md

+
+## Tool name string
+
+A tool name is a sequence of code points containing anything *other* than


This overlaps a bit with the definition of a name, could we just specify the unwanted codes?

IIUC, a spec name can contain any code point, including the 3 we want to reject here. I was considering just doing [0-9a-zA-Z] but I worried this might be too latin-character-set-centric.

ProducersSection.md

xtuc · 2018-10-11T05:53:46Z

ProducersSection.md

+Example version strings:
+* (1)
+* (1.2)
+* (0.12.1.30)


I know that agreeing on a version format will be difficult but what about using the semver format? Tooling could also take advantage of semver ranges.

Ah, interesting idea. What about just dropping all constraints (other than rejecting parens/commas)?

My point was that a consumer could check the version against a range to use certain features of that producer. However, this implies writing a specific condition so parsing the version should be fine too.

I was worried of not being able to use the same version parsing for every producer.

I can see the value of having a unified version parsing in package.json and others, but I wonder if there's much of a need for that by the time you have a fully compiled wasm that is a mix of many different tools. It seems like one would only be honed in on particular tools you know about and then know the version scheme.

xtuc · 2018-10-11T05:56:08Z

ProducersSection.md

+
+# Text format
+
+TODO


Is there any advantage of using a syntax for that? I think that when custom sections are available in wast it will be easy to declare the producer.

Well even when custom sections are in wast, you'd still have to write out the encoded binary, which seems unpleasant to read or write. For example, if you look at a wasm module in the browser debugger, it'd be nice if you simply saw the toolchain.

dschuff · 2018-10-12T17:16:53Z

ProducersSection.md

+# Known tools
+
+The following lists contain all the valid tool names for the fields listed below.
+**If your tool is not on this list and you'd like it to be, please submit a PR.**


I think it's good to keep a list of known tools, but I agree that we should treat the presence of unknown tools as a normal and expected thing rather than an exceptional case. Not everyone is going to bother to add their tool to the registry, registries in browsers or other tools may get out of date, and none of that even covers the (likely-common?) case of "I have my own special fork of LLVM; do I pretend I'm just regular LLVM or another tool?"
Also I'm not sure having browsers emit diagnostics is broadly helpful because (I'm guessing?) the vast majority of people who see it will be web developers just using some toolchain or framework, and it won't be actionable for them.

Having said that, I'm open to other ideas to incentivize tool developers to put their tools in the registry? Maybe just the prospect of worldwide fame and notoriety is enough?

dschuff · 2018-10-12T17:19:17Z

ProducersSection.md

+
+* `wat`
+* `C`
+* `C++`


We are missing many more langaues and tools here but we can do follow up PR to keep this one focused on the RFC.

Yeah, I was intentionally leaving out well-known producers so they can choose how to spell/capitalize/categorize their tools.

dschuff · 2018-10-12T17:21:53Z

ProducersSection.md

+
+## Individual Tools
+
+* `wabt`


WABT and LLVM are acronyms; do we want those uppercase or lowercase? (My vote is uppercase)

More generally we probably shouldn't be prescriptive about how people spell their tool names, but I guess we get to decide for our own tools.

I always use wabt personally, since it's really more of a backronym than an acronym. Then again, I didn't come up with the name! :-)

I'm happy to do what you both decide. I see 👍 for wabt staying lowercase; so I'll just update LLVM for now.

dschuff · 2018-10-12T17:29:44Z

ProducersSection.md

+| Field       | Type | Description |
+| ----------- | ---- | ----------- |
+| field_name  | [name](https://webassembly.github.io/spec/core/binary/values.html#names) | name of this field, chosen from one of the set of valid field names below |
+| field_value | [name](https://webassembly.github.io/spec/core/binary/values.html#names) | a string which match the specified pattern according to the table below |


"match" -> "matches" (or "must match")

dschuff · 2018-10-12T17:40:40Z

ProducersSection.md

+
+Custom section `name` field: `producers`
+
+The producers section may appear only once, and only after the


It seems odd to me that we have a nice structure with a set of fields (just like elsewhere in the binary format) but then one of those fields is just a comma-separated text string. And of course that's the field that the most tools will have to modify. I would have expected to have multiple processed-by fields (and why not multiple langauge fields too?). For that matter, how do we decide which tool is the "sdk"? Is it Emscripten, or the Unity SDK that embeds Emscripten?

In terms of @alexcrichton's concerns about section-munging ease, having field duplication is sort of the worst of both worlds because consumers have to deal with multiples, and intermediate tools have to decode and re-encode the section. But I'd think that any tool that otherwise modifies the binary at all would easily have the primitives for that, and tools that don't modify the binary will probably be using primitives like WABT (we should definitely add a tool like wasm-add-producer-tool or objcopy or whatever to WABT).

lukewagner · 2018-10-15T15:39:40Z

Based on the above recommendation, the strings are just plain uninterpreted wasm name strings and the list-of-pairs structure is in the binary format which simplifies everything; not sure why I didn't start with that... anyhow PTAL, thanks.

sbc100

lgtm % nits

ProducersSection.md

lukewagner · 2018-11-16T00:10:23Z

Thanks for the review @sbc100! Anyone else want to comment before merging?

lukewagner · 2018-11-16T21:07:33Z

Thanks all. As I mentioned before; as soon as anyone starts implementing a producer of this, well, producers section, I'm happy to validate the output by implementing the validation logic in FF to test it against, so give me a ping.

Recently proposed in WebAssembly/tool-conventions#65 each wasm file will now have an optional `producers` section listing the tooling that went into producing it. Let's add `wasm-bindgen` in when it processes a wasm file!

This commit implements WebAssembly/tool-conventions#65 for wasm files produced by the Rust compiler. This adds a bit of metadata to wasm modules to indicate that the file's language includes Rust and the file's "processed-by" tools includes rustc. The thinking with this section is to eventually have telemetry in browsers tracking all this.

Recently proposed in WebAssembly/tool-conventions#65 each wasm file will now have an optional `producers` section listing the tooling that went into producing it. Let's add `wasm-bindgen` in when it processes a wasm file!

… r=estebank Encode a custom "producers" section in wasm files This commit implements WebAssembly/tool-conventions#65 for wasm files produced by the Rust compiler. This adds a bit of metadata to wasm modules to indicate that the file's language includes Rust and the file's "processed-by" tools includes rustc. The thinking with this section is to eventually have telemetry in browsers tracking all this.

alexcrichton · 2018-11-28T00:12:23Z

@lukewagner I've implemented this for the Rust compiler and the wasm-bindgen tool in rustwasm/wasm-bindgen#1041 and rust-lang/rust#56075, and all of the examples on https://rustwasm.github.io/wasm-bindgen/ should now be recompiled with at least the wasm-bindgen changes.

This wasm file should have the wasm-bindgen section, and this wasm file should have both rustc and wasm-bindgen sections.

I probably have a bug in at least one location, though! Just following up on your offer to implement a consumer in Firefox :)

Luke Wagner added 3 commits October 10, 2018 18:04

Initial stab

ef4c409

Update intro

468bb51

Other tweaks

fc72743

lukewagner assigned dschuff Oct 10, 2018

alexcrichton reviewed Oct 11, 2018

View reviewed changes

ProducersSection.md Outdated Show resolved Hide resolved

xtuc reviewed Oct 11, 2018

View reviewed changes

Luke Wagner added 5 commits October 11, 2018 11:32

Generalize the version string

2b2c7aa

Fix typo

d467a57

Clarify wording of how tool names are checked against the known list

8158424

One more case

c49c7c9

Require field names to be unique

9c81528

dschuff reviewed Oct 12, 2018

View reviewed changes

Luke Wagner added 6 commits October 15, 2018 10:01

Uppercase LLVM

2427f04

Soften paragraph on unknown tools

a59c9b6

Replace string patterns with binary format structures

5fe8409

Be a little less link-happy

d34f3a7

Tweaks

e1d1eb8

Last tweak

5d6ea70

Note that this section should not be used for optimization hints

3a5eb25

sbc100 approved these changes Nov 15, 2018

View reviewed changes

ProducersSection.md Outdated Show resolved Hide resolved

ProducersSection.md Outdated Show resolved Hide resolved

ProducersSection.md Outdated Show resolved Hide resolved

ProducersSection.md Show resolved Hide resolved

ProducersSection.md Outdated Show resolved Hide resolved

Luke Wagner added 2 commits November 15, 2018 18:07

Address Sam's comments

e927bde

Move text format to the end

22bec59

dschuff approved these changes Nov 16, 2018

View reviewed changes

xtuc approved these changes Nov 16, 2018

View reviewed changes

lukewagner merged commit c0a7179 into master Nov 16, 2018

lukewagner deleted the add-producers-section branch November 16, 2018 21:07

alexcrichton mentioned this pull request Nov 19, 2018

Add wasm-bindgen to the producers section rustwasm/wasm-bindgen#1041

Merged

alexcrichton mentioned this pull request Nov 19, 2018

Encode a custom "producers" section in wasm files rust-lang/rust#56075

Merged

xtuc mentioned this pull request Nov 28, 2018

[wasm] emit producer section webpack/webpack#8429

Open


		Custom section `name` field: `producers`

		The producers section may appear only once, and only after the


		## Tool name string

		A tool name is a sequence of code points containing anything other than

Add initial ProducersSection.md #65

Add initial ProducersSection.md #65

Conversation

lukewagner commented Oct 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukewagner commented Oct 15, 2018 • edited Loading

sbc100 left a comment

Choose a reason for hiding this comment

lukewagner commented Nov 16, 2018 • edited Loading

lukewagner commented Nov 16, 2018

alexcrichton commented Nov 28, 2018

lukewagner commented Oct 10, 2018 •

edited

Loading

lukewagner commented Oct 15, 2018 •

edited

Loading

lukewagner commented Nov 16, 2018 •

edited

Loading