Update decoding of element/data segments in spec interpreter #1440

alexcrichton · 2022-04-01T15:49:04Z

This commit updates the spec interpreter after the merge of the bulk
memory proposal to align with the textual specification for the encoding
of data and element segments. In original MVP wasm a data/element
segmented started with a leb128 for the memory/table index, but this
leb128 was repurposed as a flags byte in the bulk-memory/reference-types
proposals since it was always zero in practice (and never an over-long
zero such as "\80\00"). The spec interpreter, however, hasn't been
updated and was still reading a u32 for the flags byte, so this commit
updates it to instead read a single byte.

The tests have been updated accordinatly. Tests for overlong or invalid
index encodings were updated to use an encoding that explicitly
specifies the index (the prefix "\02" byte for both data and element
segments). New tests were added to ensure that an overlong encoding of
0, which was previously valid, is no longer valid.

This commit updates the spec interpreter after the merge of the bulk memory proposal to align with the textual specification for the encoding of data and element segments. In original MVP wasm a data/element segmented started with a leb128 for the memory/table index, but this leb128 was repurposed as a flags byte in the bulk-memory/reference-types proposals since it was always zero in practice (and never an over-long zero such as `"\80\00"`). The spec interpreter, however, hasn't been updated and was still reading a u32 for the flags byte, so this commit updates it to instead read a single byte. The tests have been updated accordinatly. Tests for overlong or invalid index encodings were updated to use an encoding that explicitly specifies the index (the prefix `"\02"` byte for both data and element segments). New tests were added to ensure that an overlong encoding of 0, which was previously valid, is no longer valid.

rossberg · 2022-04-05T14:37:36Z

I assume this is to address #1439? PR looks good, but perhaps let's wait for that discussion to conclude.

alexcrichton · 2022-04-05T14:47:43Z

While #1439 is somewhat related this is intended to fix the interpreter to match the current text spec. The text spec does not mention a leb for the leading byte discriminant for element/data segments but the spec interpreter uses a leb. This means that parsers which strictly implement the textual spec cannot pass the current spec tests and this PR is trying to fix that.

I recall that the decision at the time when this came up was to intentionally break any modules using overlong encodings for index 0. I found WebAssembly/bulk-memory-operations#153 as some discussion of this but I don't recall where other discussion happened. I thought, though, that #1439 was "intended breakage" and action wasn't necessary.

rossberg · 2022-04-05T14:57:58Z

Well, the question raised on #1439 is whether this is a bug in the spec. But your recollection makes sense to me. Perhaps point this out on #1439, so it can be resolved?

rossberg · 2022-05-06T11:32:35Z

With #1461 landed, this PR should no longer be needed. Closing.

rossberg closed this May 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update decoding of element/data segments in spec interpreter #1440

Update decoding of element/data segments in spec interpreter #1440

alexcrichton commented Apr 1, 2022

rossberg commented Apr 5, 2022

alexcrichton commented Apr 5, 2022

rossberg commented Apr 5, 2022

rossberg commented May 6, 2022

Update decoding of element/data segments in spec interpreter #1440

Update decoding of element/data segments in spec interpreter #1440

Conversation

alexcrichton commented Apr 1, 2022

rossberg commented Apr 5, 2022

alexcrichton commented Apr 5, 2022

rossberg commented Apr 5, 2022

rossberg commented May 6, 2022