Add data segments to binary format #301

titzer · 2015-08-17T16:58:24Z

Add a description of data segments, which are a way that the binary module can load initialized data into memory, similar to a .data section.

lukewagner · 2015-08-17T17:58:35Z

lgtm. It'd be nice to link to and from Modules.md#initial-state-of-linear-memory.

jfbastien · 2015-08-17T20:14:04Z

IIUC this means no addition to AST semantics, since the toolchain provides the address (or addresses) that the data segment is loaded at. Code then loads directly from that address, without any indirection / relocation / symbol. Correct?

lukewagner · 2015-08-17T20:26:34Z

In the MVP, that makes sense. With dynamic linking, though, I think we'll need to have global variables that are immutable, load-time initialized pointers into the heap where data sections are loaded (I explained this more in #154). This could be achieved by specifying that, in a shared module (in the sense of -shared), data sections don't get to name an absolute address but, rather, each segment declares a new global variable that is initialized with the address of that section. With a patching implementation, this should be equivalent in performance to non-shared global data; without patching, it'd be equivalent to -fPIC.

titzer · 2015-08-17T20:56:42Z

Yes, these data segments would basically a way to initialize an area of
memory before the program starts.

We could explore program control of loading of data segments as a further
step; e.g. after the program has been linked, it then issues "load data
segment" commands to blast bytes into memory at particular addresses. Those
"load data segment" commands could also be otherwise useful, IMO.

On Mon, Aug 17, 2015 at 10:14 PM, JF Bastien [email protected]
wrote:

IIUC this means no addition to AST semantics, since the toolchain provides
the address (or addresses) that the data segment is loaded at. Code then
loads directly from that address, without any indirection / relocation /
symbol. Correct?

—
Reply to this email directly or view it on GitHub
#301 (comment).

lukewagner · 2015-08-17T21:23:18Z

@titzer If we have the ability to efficiently copy into linear memory from outside memory and from files (map_file), what is the remaining use case for a dynamic "load data segment"? The main difference I see is that a dynamic "load data segment" would allow you to bundle some binary data in your .wasm file, but:

bundling/packaging to minimize the number of resources fetches is a general Web problem that is being attacked in a number of ways, so it seems like we might be attacking at the wrong level here
a naive wasm engine will keep the wasm binary in memory (negating any benefits from just loading the data eagerly); it'll take non-trivial work to keep this dynamically-loadable data segment out of memory and so it seems better to leverage existing support for this (File API).

jfbastien · 2015-08-17T21:25:31Z

@lukewagner are you proposing that main modules be able to have a data section, but not dynamically loaded modules? I'm not sure I'm clear.

I agree that this interacts tightly with dynamic linking, and it would be good to have a nicely unified solution.

lukewagner · 2015-08-17T21:45:01Z

@jfbastien Nope: both would be able to have data sections: the difference is that main modules would be able to absolutely position their data sections in linear memory while dynamically-loaded modules would need to rely on const-global-ptrs that were declared by the data section.

jfbastien · 2015-08-17T21:57:55Z

In that case it kind of seems like doing addressof on a global symbol is easier and more consistent, regardless of whether the symbol comes from the main module or a dso.

lukewagner · 2015-08-17T22:23:08Z

We could force main modules to do the same thing as shared modules, but that would effectively be strictly taking away useful functionality from main modules:

being able to place data sections anywhere in the address space
having the address be an a priori constant that can be transitively folded.

It does make sense that, for symmetry, we could allow main modules to use symbolic globals to refer to data sections, but until we have dynamic linking, that will be a superfluous feature.

jfbastien · 2015-08-17T23:18:47Z

You may be right. I would however like us to try to avoid designing two features when we know up front we could design one that'll serve both purposes. Could we let dynamically-linked modules decide where their data section is loaded? That would address your first point.

Constant folding: relocations and/or patching could take care of this?

lukewagner · 2015-08-17T23:50:55Z

I think there's just fundamental asymmetry between main and shared modules. A main module knows it has the whole [0, memory_size) range to itself and can put anything anywhere infallibly (an invariant that could be leveraged for interesting optimizations). For a shared module, since wasm semantics don't say what memory in [0, memory_size) is already in use, I've been assuming that we'd want to specify an allocate_global_data_section(length) callback that is specified to be called by the engine when loading a shared module and allows the app to decide exactly how it wants to lay out global data. FWIW, the same issue comes up with aliased thread-local state (which needs to go in the heap... but where?) and could have the same callback solution. (There's a lot of symmetry between thread-local state and dynamically linked global state.)

For constant folding: I'm thinking compound expression trees that include global addresses at the leaves that could otherwise be folded at compile-to-wasm time.

titzer · 2015-08-18T11:08:14Z

The main use case I see is that a module has a complete and efficient
specification of its initial state of memory. Maybe that only makes sense
for "main" modules, but it nevertheless has the nice property that such a
module has no dependencies on an outside linking process.

On Mon, Aug 17, 2015 at 11:23 PM, Luke Wagner [email protected]
wrote:

@titzer https://github.com/titzer If we have the ability to efficiently
copy into linear memory from outside memory and from files (map_file
https://github.com/WebAssembly/design/blob/master/FutureFeatures.md#finer-grained-control-over-memory),
what is the remaining use case for a dynamic "load data segment"? The main
difference I see is that a dynamic "load data segment" would allow you to
bundle some binary data in your .wasm file, but:

bundling/packaging to minimize the number of resources fetches is a
general Web problem that is being attacked in a number of ways, so it seems
like we might be attacking at the wrong level here

a naive wasm engine will keep the wasm binary in memory (negating
any benefits from just loading the data eagerly); it'll take non-trivial
work to keep this dynamically-loadable data segment out of memory and so it
seems better to leverage existing support for this (File API).

—
Reply to this email directly or view it on GitHub
#301 (comment).

lukewagner · 2015-08-18T15:32:50Z

@titzer Totally agreed on that use case; maybe I misunderstood what you were asking. To be clear, I think dynamically-linked modules should have their own data segments that are copied into memory when the module is dynamically linked (see discussion with @jfbastien above). I thought you were asking for some sort of API to load data segments at arbitrary times (not just dynamic link time).

jfbastien · 2015-08-18T16:00:46Z

@titzer could you clarify what you mean by "outside linking"? Main and dynamic modules are inherently relying on a loader.

A few thoughts:

Say a user wants some basic ASLR for their in-app data, and only have a single module (no dso). How would they achieve this? IIUC the current proposal is that they'd manually copy the automatically loaded data, and then use it as regular heap memory?

How does user code implement the basic allocator for heap space? The allocator has to figure out where data starts / ends, and stay clear of that? That means that the generic allocator we auto-link into user code has to know this. This is resolvable, but I want to make sure we design this knowingly.

titzer · 2015-08-18T16:15:21Z

On Tue, Aug 18, 2015 at 5:33 PM, Luke Wagner [email protected]
wrote:

@titzer https://github.com/titzer Totally agreed on that use case;
maybe I misunderstood what you were asking. To be clear, I think
dynamically-linked modules should have their own data segments that are
copied into memory when the module is dynamically linked (see discussion
with @jfbastien https://github.com/jfbastien above). I thought you were
asking for some sort of API to load data segments at arbitrary times (not
just dynamic link time).

I agree that dynamic linking will need to deal with initialized data in
some way. I don't want to propose a general API for loading data segments
in this PR, just initialized data segments for the initial contents of
memory.

—
Reply to this email directly or view it on GitHub
#301 (comment).

titzer · 2015-08-18T16:17:49Z

On Tue, Aug 18, 2015 at 6:01 PM, JF Bastien [email protected]
wrote:

@titzer https://github.com/titzer could you clarify what you mean by
"outside linking"? Main and dynamic modules are inherently relying on a
loader.

A few thoughts:

Say a user wants some basic ASLR for their in-app data, and only have a
single module (no dso). How would they achieve this? IIUC the current
proposal is that they'd manually copy the automatically loaded data, and
then use it as regular heap memory?

I'm not sure how the wasm engine could help for ASLR, since pointers are
just offsets into the linear memory, so I'd hazard a guess that yes, they
should manually copy the automatically loaded data segments.

How does user code implement the basic allocator for heap space? The
allocator has to figure out where data starts / ends, and stay clear of
that? That means that the generic allocator we auto-link into user code has
to know this. This is resolvable, but I want to make sure we design this
knowingly.

I'm assuming that until we solve dynamic linking, the allocator, if any,
would be compiled into the single (main) module and would inherently know
where the initialized data segments lie.

—
Reply to this email directly or view it on GitHub
#301 (comment).

jfbastien · 2015-08-18T16:24:09Z

I think we're getting into more design than what a PR should contain. Would
you mind committing this, and moving the discussion to an issue instead?

On Tue, Aug 18, 2015 at 9:17 AM titzer [email protected] wrote:

On Tue, Aug 18, 2015 at 6:01 PM, JF Bastien [email protected]

Say a user wants some basic ASLR for their in-app data, and only have a
single module (no dso). How would they achieve this? IIUC the current
proposal is that they'd manually copy the automatically loaded data, and
then use it as regular heap memory?

I'm not sure how the wasm engine could help for ASLR, since pointers are
just offsets into the linear memory, so I'd hazard a guess that yes, they
should manually copy the automatically loaded data segments.

I wasn't suggesting the wasm engine help for ASLR as much as avoid getting
in the way. I think this is all stuff we should offer on the toolchain
side, but it would be nice if the basic mechanism wasm exposes were
designed with the current state of the art in mind.

How does user code implement the basic allocator for heap space? The

allocator has to figure out where data starts / ends, and stay clear of
that? That means that the generic allocator we auto-link into user code
has
to know this. This is resolvable, but I want to make sure we design this
knowingly.

I'm assuming that until we solve dynamic linking, the allocator, if any,
would be compiled into the single (main) module and would inherently know
where the initialized data segments lie.

That's indeed what my question leads to. So how does it figure this out? :-)

titzer · 2015-08-18T16:34:05Z

Merging based on above LGTM from @lukewagner

Add data segments to binary format

Add data segments to binary format

fbacd0a

Add a description of data segments, which are a way that the binary module can load initialized data into memory, similar to a .data section.

titzer added 2 commits August 18, 2015 18:31

Update BinaryEncoding.md

e9bee50

Update Modules.md

5d3dbeb

titzer pushed a commit that referenced this pull request Aug 18, 2015

Merge pull request #301 from WebAssembly/add_data_se

80378e1

Add data segments to binary format

titzer merged commit 80378e1 into master Aug 18, 2015

jfbastien deleted the add_data_se branch August 18, 2015 16:35

titzer mentioned this pull request Aug 18, 2015

How do data segments interact with dynamic linking? #302

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add data segments to binary format #301

Add data segments to binary format #301

titzer commented Aug 17, 2015

lukewagner commented Aug 17, 2015

jfbastien commented Aug 17, 2015

lukewagner commented Aug 17, 2015

titzer commented Aug 17, 2015

lukewagner commented Aug 17, 2015

jfbastien commented Aug 17, 2015

lukewagner commented Aug 17, 2015

jfbastien commented Aug 17, 2015

lukewagner commented Aug 17, 2015

jfbastien commented Aug 17, 2015

lukewagner commented Aug 17, 2015

titzer commented Aug 18, 2015

lukewagner commented Aug 18, 2015

jfbastien commented Aug 18, 2015

titzer commented Aug 18, 2015

titzer commented Aug 18, 2015

jfbastien commented Aug 18, 2015

titzer commented Aug 18, 2015

Add data segments to binary format #301

Add data segments to binary format #301

Conversation

titzer commented Aug 17, 2015

lukewagner commented Aug 17, 2015

jfbastien commented Aug 17, 2015

lukewagner commented Aug 17, 2015

titzer commented Aug 17, 2015

lukewagner commented Aug 17, 2015

jfbastien commented Aug 17, 2015

lukewagner commented Aug 17, 2015

jfbastien commented Aug 17, 2015

lukewagner commented Aug 17, 2015

jfbastien commented Aug 17, 2015

lukewagner commented Aug 17, 2015

titzer commented Aug 18, 2015

lukewagner commented Aug 18, 2015

jfbastien commented Aug 18, 2015

titzer commented Aug 18, 2015

titzer commented Aug 18, 2015

jfbastien commented Aug 18, 2015

titzer commented Aug 18, 2015