-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WasmFS] Fix FS.readFile + WASM_BIGINT #17316
[WasmFS] Fix FS.readFile + WASM_BIGINT #17316
Conversation
The wrong type was used for `makeGetValue` causing the length to be returned incorrectly when linking with `-sWASM_BIGINT`.
src/library_wasmfs.js
Outdated
// The integer length is returned in the first 8 bytes of the buffer. | ||
var length = {{{ makeGetValue('buf', '0', 'i64') }}}; | ||
// The integer length resides in the first 8 bytes of the buffer. | ||
var length = {{{ makeGetValue('buf', '0', 'i32') }}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, how about changing
emscripten/system/lib/wasmfs/js_api.cpp
Line 37 in ae675c6
*(uint32_t*)result = size; |
to write an i64? I think that might be better. The size can be more than 32 bits, and we are all set up for that, just the wrong type is written in that C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, WasmFS currently uses 32-bit sizes in a lot of places where it should be using 64-bit sizes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit 567cfd3 ensures that it writes an i64 in C++, and that it is read out as an i64 in JS. I've updated the PR description accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
lgtm % last comment
`off_t` should already be a 64-bit type, even on wasm32, so we can safely use the normal types and APIs to get large file support.
…orator" This reverts commit f19dcad.
src/library_wasmfs.js
Outdated
var length = {{{ makeGetValue('buf', '0', 'i64') }}}; | ||
|
||
// Default return type is binary. | ||
// The buffer contents exist 8 bytes after the returned pointer. | ||
#if WASM_BIGINT | ||
// The length as BigInt must be converted to a Number. | ||
var ret = new Uint8Array(HEAPU8.subarray(buf + 8, buf + 8 + Number(length))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was considering using $bigintToI53Checked
here instead of Number(length)
.
emscripten/src/library_int53.js
Lines 127 to 136 in 5003f94
$MAX_INT53: '{{{ Math.pow(2, 53) }}}', | |
$MIN_INT53: '-{{{ Math.pow(2, 53) }}}', | |
// Counvert a bigint value (usually coming from Wasm->JS call) into an int53 | |
// JS Number. This is used when we have an incoming i64 that we know is a | |
// pointer or size_t and is expected to be withing the int53 range. | |
// Returns NaN if the incoming bigint is outside the range. | |
$bigintToI53Checked__deps: ['$MAX_INT53', '$MIN_INT53'], | |
$bigintToI53Checked: function(num) { | |
return (num < MIN_INT53 || num > MAX_INT53) ? NaN : Number(num); | |
}, |
That is, something similar to
{{{ receiveI64ParamAsI53('length', -cDefine('EOVERFLOW')) }}}
.Lines 1118 to 1122 in 5003f94
function receiveI64ParamAsI53(name, onError) { | |
if (WASM_BIGINT) { | |
// Just convert the bigint into a double. | |
return `${name} = bigintToI53Checked(${name}); if (isNaN(${name})) return ${onError};`; | |
} |
But I think this is redundant, since that is only a issue when using >= ~8GB buffers (_malloc(Number.MAX_SAFE_INTEGER + 1)
). I'm not aware of any VM that currently supports such large buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, that size buffer should not be a problem.
But actually I think that using readI53FromU64
may be the best thing here. We do actually want a JS number here for the reason you said, and that seems like the simplest way to get it. Then we wouldn't need an ifdef here: readI53FromU64
returns a Number either way.
@sbc100 what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should modify makeGetValue
above such that we can pass i53
instead of i64
? This would differ from i64
in that the return value would always be a Number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbc100 Nice idea! sgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, there is a difference between the signed and unsigned case though, which is why we have both readI53FromU64
and readI53FromI64
, so just i53
is not enough information I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already support i64 and u64 in makeGetValue so we can support i53 and u53 too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I forgot that, i53/u53 might make sense then.
One issue is that functions need to include the helper function as a dep (and to the developer it's less visible without explicitly calling the dep). But maybe the auto-dep stuff you recently added could make that work automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have i53ConversionDeps
for this purpose. Can makeGetValue not just call bigintToI53Checked
... I don't see any singed/unsigned version of bigintToI53Checked
..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit 8efae3e incorporates this (but this will probably conflict with PR #17459). According to POSIX, off_t
is a signed integer type, so readI53FromU64
is not appropriate here.
Note that this will not catch overflows, see e.g. these TODO items:
emscripten/src/library_int53.js
Line 84 in a41b1a3
// TODO: Add $readI53FromI64Signaling() variant. |
emscripten/src/library_int53.js
Line 91 in a41b1a3
// TODO: Add $readI53FromU64Signaling() variant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
@sbc100 did you want to take another look?
@@ -22,6 +22,8 @@ __wasi_fd_t wasmfs_create_file(char* pathname, mode_t mode, backend_t backend); | |||
// The buffer will also contain the file length. | |||
// Caller must free the returned pointer. | |||
void* _wasmfs_read_file(char* path) { | |||
static_assert(sizeof(off_t) == 8, "File offset type must be 64-bit"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this assert here.. I'm pretty sure if this ever changed all kind of stuff would break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly the assert is a little redundant, but it's static so no overhead, and also it is helpful for the reader, I think, as it indicates this is a contract with the JS side that must not change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@@ -408,6 +408,9 @@ function makeGetValue(ptr, pos, type, noNeedFirst, unsigned, ignore, align) { | |||
} | |||
|
|||
const offset = calcFastOffset(ptr, pos, noNeedFirst); | |||
if (type === 'i53' || type === 'u53') { | |||
return 'readI53From' + (unsigned ? 'U' : 'I') + '64(' + offset + ')'; | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already have another use of this in #17459 !
Let's maybe consider the assert in a followup, as there is a user that has hit the bug, so let me land this fix now. |
_wasmfs_read_file
wrote a 32-bit unsigned integer, whilemakeGetValue
in JS read this as a 64-bit integer.This caused the length to be returned incorrectly when linking
with
-sWASM_BIGINT
. Fix this by ensuring that_wasmfs_read_file
will write a 64-bit integer.
Resolves: #17444.