Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update decoding of element/data segments in spec interpreter #1440

Closed

Conversation

alexcrichton
Copy link
Contributor

This commit updates the spec interpreter after the merge of the bulk
memory proposal to align with the textual specification for the encoding
of data and element segments. In original MVP wasm a data/element
segmented started with a leb128 for the memory/table index, but this
leb128 was repurposed as a flags byte in the bulk-memory/reference-types
proposals since it was always zero in practice (and never an over-long
zero such as "\80\00"). The spec interpreter, however, hasn't been
updated and was still reading a u32 for the flags byte, so this commit
updates it to instead read a single byte.

The tests have been updated accordinatly. Tests for overlong or invalid
index encodings were updated to use an encoding that explicitly
specifies the index (the prefix "\02" byte for both data and element
segments). New tests were added to ensure that an overlong encoding of
0, which was previously valid, is no longer valid.

This commit updates the spec interpreter after the merge of the bulk
memory proposal to align with the textual specification for the encoding
of data and element segments. In original MVP wasm a data/element
segmented started with a leb128 for the memory/table index, but this
leb128 was repurposed as a flags byte in the bulk-memory/reference-types
proposals since it was always zero in practice (and never an over-long
zero such as `"\80\00"`). The spec interpreter, however, hasn't been
updated and was still reading a u32 for the flags byte, so this commit
updates it to instead read a single byte.

The tests have been updated accordinatly. Tests for overlong or invalid
index encodings were updated to use an encoding that explicitly
specifies the index (the prefix `"\02"` byte for both data and element
segments). New tests were added to ensure that an overlong encoding of
0, which was previously valid, is no longer valid.
@rossberg
Copy link
Member

rossberg commented Apr 5, 2022

I assume this is to address #1439? PR looks good, but perhaps let's wait for that discussion to conclude.

@alexcrichton
Copy link
Contributor Author

While #1439 is somewhat related this is intended to fix the interpreter to match the current text spec. The text spec does not mention a leb for the leading byte discriminant for element/data segments but the spec interpreter uses a leb. This means that parsers which strictly implement the textual spec cannot pass the current spec tests and this PR is trying to fix that.

I recall that the decision at the time when this came up was to intentionally break any modules using overlong encodings for index 0. I found WebAssembly/bulk-memory-operations#153 as some discussion of this but I don't recall where other discussion happened. I thought, though, that #1439 was "intended breakage" and action wasn't necessary.

@rossberg
Copy link
Member

rossberg commented Apr 5, 2022

Well, the question raised on #1439 is whether this is a bug in the spec. But your recollection makes sense to me. Perhaps point this out on #1439, so it can be resolved?

@rossberg
Copy link
Member

rossberg commented May 6, 2022

With #1461 landed, this PR should no longer be needed. Closing.

@rossberg rossberg closed this May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants