-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 checksum middleware #102
Conversation
!ArrayBuffer.isView(body) && | ||
!isArrayBuffer(body) | ||
) { | ||
body = await streamCollector(body); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't actually want to collect an entire stream to calculate a hash though, right? If it helps, the glacier middleware pr I submitted has packages for calculating hashes on streams (throws an error if a non-file stream is encountered in node). I can change it to allow any Hash to be provided, instead of relying on sha256.
It doesn't cover WHATWG ReadableStreams because those aren't valid on requests yet, but we would just want to convert those to a Blob for now anyway, then use the BlobCollector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look at the Glacier PR, but I don't see how we'd be able to entirely consume a stream (to calculate the hash) and then also send it as a request body without putting the bytes somewhere. Throwing when someone uses anything other than an undecorated file stream seems very limiting. (I thought you had wanted to put special code for file handling in a higher-level glacier abstraction, which would let the base API handle common use cases like zlib and crypto streams.)
What if we added a "CollectedStream" type parameter everywhere we currently have a Stream
type parameter? You're absolutely right that we should be using Blobs as stream sinks in browsers, and maybe we should be using temp files for the same purpose in node. Another option would be to use separate implementations of this middleware for Node & the browser so that special cases like Blobs and file streams can be handled efficiently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will make sense to either make this middleware environment specific, or let its MiddlewareCustomizationDefinition define an environment-specific configuration to pass in something that can hash a stream/file.
You're right that we'll need temp files in node if we want to perform this hashing. I realize now that the streamCollector hopefully shouldn't need to be used unless the user turns off unsigned payloads. I originally assumed that wouldn't be the case, since in V2, we calculate the MD5 still even with unsigned payloads (though maybe that is no longer a requirement).
What would the CollectedStream type be? Uint8Array in node, Blob in browsers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking at your Glacier middleware PR, I think you were right that we need an environment-specific stream hasher instead of just reading the whole thing into memory. I would like to see streams transformed into a CollectedStream
or RewindableStream
before being fed into the hasher, but that can be a future improvement.
"importHelpers": true, | ||
"noEmitHelpers": true, | ||
"lib": [ | ||
"dom", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "dom" needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, think I left that in after putting some debugging console.log
statements in a few spots. Will remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More than just checksum middleware, looks pretty cool!
PutBucketReplication: [ | ||
md5Checksum, | ||
], | ||
PutObject: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really hope this isn't the case anymore, but I remember with V2, when we used unsignedPayload with sigv4, we had to calculate the md5. Might be worth checking if this is still a requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified that there is no technical requirement that we do so (sending PutObject operations with no Content-MD5
header and an X-Amz-Content-Sha256
header of UNSIGNED-PAYLOAD
works just fine).
!ArrayBuffer.isView(body) && | ||
!isArrayBuffer(body) | ||
) { | ||
body = await streamCollector(body); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will make sense to either make this middleware environment specific, or let its MiddlewareCustomizationDefinition define an environment-specific configuration to pass in something that can hash a stream/file.
You're right that we'll need temp files in node if we want to perform this hashing. I realize now that the streamCollector hopefully shouldn't need to be used unless the user turns off unsigned payloads. I originally assumed that wouldn't be the case, since in V2, we calculate the MD5 still even with unsigned payloads (though maybe that is no longer a requirement).
What would the CollectedStream type be? Uint8Array in node, Blob in browsers?
… changes to mw stack
@@ -72,21 +69,11 @@ export class FetchHttpHandler implements HttpHandler<ReadableStream, BrowserHttp | |||
transformedHeaders[pair[0]] = pair[1]; | |||
} | |||
|
|||
const httpResponse: HttpResponse<ReadableStream> = { | |||
return response.blob().then<HttpResponse<Blob>>(body => ({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a fair change for now since we're in dev preview and it isn't easy to convert a stream to a blob (and blobs are more useful than streams right now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someday, we can have browser streams; hopefully before GA!
* Add body-checksum-applying middleware * Add a universal JS MD5 package * Add S3 checksum middleware bindings * Add SigV4 unsigned payload middleware * Adapt body checksum and unsigned payload middleware to reflect recent changes to mw stack * Add Glacier .gitignores * Incorporate changes made for Glacier middleware addition * Change browser stream type from ReadableStream to Blob * Regenerate browser SDK packages * Fix failing tests
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread. |
This PR also includes the work described in https://sim.amazon.com/issues/d2307adc-302f-4461-ad6a-81841442c388, as it would have been additional work to support
ReadableStream|Blob
right before rippingReadableStream
support out.