Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 checksum middleware #102

Merged
merged 10 commits into from
Feb 15, 2018
Merged

S3 checksum middleware #102

merged 10 commits into from
Feb 15, 2018

Conversation

jeskew
Copy link
Contributor

@jeskew jeskew commented Jan 19, 2018

This PR also includes the work described in https://sim.amazon.com/issues/d2307adc-302f-4461-ad6a-81841442c388, as it would have been additional work to support ReadableStream|Blob right before ripping ReadableStream support out.

!ArrayBuffer.isView(body) &&
!isArrayBuffer(body)
) {
body = await streamCollector(body);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually want to collect an entire stream to calculate a hash though, right? If it helps, the glacier middleware pr I submitted has packages for calculating hashes on streams (throws an error if a non-file stream is encountered in node). I can change it to allow any Hash to be provided, instead of relying on sha256.

It doesn't cover WHATWG ReadableStreams because those aren't valid on requests yet, but we would just want to convert those to a Blob for now anyway, then use the BlobCollector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at the Glacier PR, but I don't see how we'd be able to entirely consume a stream (to calculate the hash) and then also send it as a request body without putting the bytes somewhere. Throwing when someone uses anything other than an undecorated file stream seems very limiting. (I thought you had wanted to put special code for file handling in a higher-level glacier abstraction, which would let the base API handle common use cases like zlib and crypto streams.)

What if we added a "CollectedStream" type parameter everywhere we currently have a Stream type parameter? You're absolutely right that we should be using Blobs as stream sinks in browsers, and maybe we should be using temp files for the same purpose in node. Another option would be to use separate implementations of this middleware for Node & the browser so that special cases like Blobs and file streams can be handled efficiently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will make sense to either make this middleware environment specific, or let its MiddlewareCustomizationDefinition define an environment-specific configuration to pass in something that can hash a stream/file.

You're right that we'll need temp files in node if we want to perform this hashing. I realize now that the streamCollector hopefully shouldn't need to be used unless the user turns off unsigned payloads. I originally assumed that wouldn't be the case, since in V2, we calculate the MD5 still even with unsigned payloads (though maybe that is no longer a requirement).

What would the CollectedStream type be? Uint8Array in node, Blob in browsers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at your Glacier middleware PR, I think you were right that we need an environment-specific stream hasher instead of just reading the whole thing into memory. I would like to see streams transformed into a CollectedStream or RewindableStream before being fed into the hasher, but that can be a future improvement.

"importHelpers": true,
"noEmitHelpers": true,
"lib": [
"dom",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "dom" needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, think I left that in after putting some debugging console.log statements in a few spots. Will remove.

Copy link
Contributor

@chrisradek chrisradek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than just checksum middleware, looks pretty cool!

PutBucketReplication: [
md5Checksum,
],
PutObject: [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really hope this isn't the case anymore, but I remember with V2, when we used unsignedPayload with sigv4, we had to calculate the md5. Might be worth checking if this is still a requirement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that there is no technical requirement that we do so (sending PutObject operations with no Content-MD5 header and an X-Amz-Content-Sha256 header of UNSIGNED-PAYLOAD works just fine).

!ArrayBuffer.isView(body) &&
!isArrayBuffer(body)
) {
body = await streamCollector(body);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will make sense to either make this middleware environment specific, or let its MiddlewareCustomizationDefinition define an environment-specific configuration to pass in something that can hash a stream/file.

You're right that we'll need temp files in node if we want to perform this hashing. I realize now that the streamCollector hopefully shouldn't need to be used unless the user turns off unsigned payloads. I originally assumed that wouldn't be the case, since in V2, we calculate the MD5 still even with unsigned payloads (though maybe that is no longer a requirement).

What would the CollectedStream type be? Uint8Array in node, Blob in browsers?

@jeskew jeskew changed the title Feature/s3 checksum middleware [WIP] S3 checksum middleware Feb 9, 2018
@jeskew jeskew changed the title [WIP] S3 checksum middleware S3 checksum middleware Feb 10, 2018
@@ -72,21 +69,11 @@ export class FetchHttpHandler implements HttpHandler<ReadableStream, BrowserHttp
transformedHeaders[pair[0]] = pair[1];
}

const httpResponse: HttpResponse<ReadableStream> = {
return response.blob().then<HttpResponse<Blob>>(body => ({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a fair change for now since we're in dev preview and it isn't easy to convert a stream to a blob (and blobs are more useful than streams right now)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someday, we can have browser streams; hopefully before GA!

@jeskew jeskew merged commit 3c642b4 into aws:master Feb 15, 2018
@jeskew jeskew deleted the feature/s3-checksum-middleware branch February 15, 2018 02:21
trivikr referenced this pull request in trivikr/aws-sdk-js-v3 Dec 10, 2018
* Add body-checksum-applying middleware

* Add a universal JS MD5 package

* Add S3 checksum middleware bindings

* Add SigV4 unsigned payload middleware

* Adapt body checksum and unsigned payload middleware to reflect recent changes to mw stack

* Add Glacier .gitignores

* Incorporate changes made for Glacier middleware addition

* Change browser stream type from ReadableStream to Blob

* Regenerate browser SDK packages

* Fix failing tests
@lock
Copy link

lock bot commented Sep 26, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

@lock lock bot locked as resolved and limited conversation to collaborators Sep 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants