-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add implementation of MultipartFormData
streaming encoder
#200
base: main
Are you sure you want to change the base?
Conversation
591475c
to
1c57fa0
Compare
const response = await fetch("https://http-me.glitch.me/meow?header=cat:é");
strictEqual(response.headers.get('cat'), "é"); |
1c57fa0
to
655b57d
Compare
@andreiltd, is this ready to review, or are you still expecting to make changes to it? |
Yes, this is ready to review :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either I'm missing something in looking at the diff, or there is some code missing—see the inline comment on form-data-encoder.h
.
One thing that I can't fully evaluate, because it'll presumably be part of the implementation of encode_stream
: IIUC, field names, will always be first have their newlines normalized to CRLF
, and then have those escaped. It'd be good to fold those into one operation, if possible.
Otherwise, I left a few comments and suggestions. I'm a bit concerned about allocation and general failure handling, so it'd be good to go over those aspects in some detail.
static size_t query_length(JSContext *cx, HandleObject self); | ||
static JSObject *encode_stream(JSContext *cx, HandleObject self); | ||
static JSObject *create(JSContext *cx, HandleObject form_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of these seems to be missing? (And in the case of query_length
I also can't find any uses.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... that's weird:
create
function is here: https://github.com/bytecodealliance/StarlingMonkey/pull/200/files#diff-1794d7449a6c94d4fb4e898ac1d44ebd87c84c0b52e2ddc2c9e1c522cfaae801R446encode_stream
function is here: https://github.com/bytecodealliance/StarlingMonkey/pull/200/files#diff-1794d7449a6c94d4fb4e898ac1d44ebd87c84c0b52e2ddc2c9e1c522cfaae801R434
I totally forgot about query_length, I'm sorry about that! The reason for this is that the fetch spec doesn't say what to do with Content-Length
in case of FormData
.
Set length to unclear, see html/6424 for improving this.
See here: https://fetch.spec.whatwg.org/#bodyinit-unions. It technically should be possible to calculate an encoded size beforehand without doing an actual encoding, but it's unclear to me if this is needed or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query_length
function is now implemented. Without setting the Content-Length
header uploading FormData
tests were failing.
bool MultipartFormDataImpl::handle_entry_header(JSContext *cx, StreamContext &stream) { | ||
auto entry = stream.entries->begin()[chunk_idx_]; | ||
auto header = fmt::memory_buffer(); | ||
auto name = escape_newlines(entry.name).value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs a null check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is overloaded but the version that doesn't take JSContetext
is infallible. I can change this to separate function like escape_name
and escape_name_val
if we prefer explicit return value. Anyway, I've added the assertion for now.
// Hex encode bytes to string | ||
auto bytes = std::move(res.unwrap()); | ||
auto bytes_str = std::string_view((char *)(bytes.ptr.get()), bytes.size()); | ||
auto base64_str = base64::forgivingBase64Encode(bytes_str, base64::base64EncodeTable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need a null check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns just a std::string
.
auto leftover = datasz - written; | ||
if (leftover > 0) { | ||
MOZ_ASSERT(remainder_.empty()); | ||
remainder_.assign(first + written, last); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an infallible operation? It seems like it should need to allocate memory, so I'm not sure I understand how that works. And ideally, if it does allocate, we should make that fallible—or at least assert that it didn't fail and abort execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it technically can throw bad_alloc
because it is internally calling std::string::reserve
if needed - I guess this will abort excution.
https://cplusplus.com/reference/string/string/assign/
A bad_alloc exception is thrown if the function needs to allocate storage and fails.
Hi @tschneidereit, as always, thank you so much for a through review!
So the spec says this:
It's really hard to follow the spec closely because the implementation is state-machine-based, and a lot of actions that the spec calls for correspond to different states in the implementation and cannot be done together. For example, processing both names and values: names are part of the entry-header state, whereas values are handled in the entry-body state. |
I've adopted a bunch of Unfortunately, I wasn't able to run those in
const formDataText = await (await fetch(
`/fetch/api/resources/echo-content.py`,
{
method: "POST",
body: formData,
},
)).text(); This returns an empty string. |
That's a great idea!
Can you say more about how you're trying to run the test? I can't see where in the PR the Taking a step back though, to fully adapt a test from WPT requires at least these steps:
|
Oh, I was just trying to run The failures that I described were from running
That's a good point. Would |
It would be, yes. Though if you're only doing this to work around weird issues with running WPT tests, then I would overall prefer figuring out what causes them. Can you give me steps to reproduce for the issue you were seeing with this test? |
I added the test like this: diff --git a/tests/wpt-harness/tests.json b/tests/wpt-harness/tests.json
index 349baa1..fb757b9 100644
--- a/tests/wpt-harness/tests.json
+++ b/tests/wpt-harness/tests.json
@@ -175,6 +175,7 @@
"FileAPI/blob/Blob-stream.any.js",
"FileAPI/blob/Blob-text.any.js",
"FileAPI/file/File-constructor.any.js",
+ "FileAPI/file/send-file-formdata.any.js",
"hr-time/basic.any.js",
"hr-time/idlharness.any.js",
"hr-time/monotonic-clock.any.js", and then running just wpt-test FileAPI/file/send-file-formdata.any.js gives me this error:
|
Just looking at how we import meta scripts here it looks like the meta scripts are evaluated separately from the test, shouldn't the meta and the test be concatenated instead? |
I don't entirely remember why I landed on this exact way of loading META scripts. I do remember that it was non-trivial to get things working, and included this weird thing where we replace |
Yeah, it would be really nice to have those tests working, it's worth dozens of wpt passes. Let me know if you plan to investigate otherwise I will allocate some time to it for next week. 🙂 |
Merging #209 Should fix failing wpt tests in this PR. |
I did a quick comparison of encoded
FormData
between SM and Chrome using following JS code:Simple FormData
The results are:
I also did some manual fuzzing to test the streaming logic by varying the buffer sizes the encoder writes into. Specifically, I tested these buffer sizes in bytes: 1, 2, 4, 8, 32, 1024, and 8192 by changing the size in the implementation. Though, having some unit test framework would be nice.
Relevant RFC: https://www.rfc-editor.org/rfc/rfc2046#section-5.1.1