Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PUT uploads to object storage client #25

Merged
merged 1 commit into from
Jan 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 12 additions & 8 deletions docs/cloud-object-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Dotenv files are commonly kept in [cloud object storage](https://en.wikipedia.or

Creating a signature is a [four-step process](https://docs.aws.amazon.com/general/latest/gr/sigv4_signing.html):

1. _[Create a canonical request](https://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html)_. "Canonical" just means that the string has a standard set of fields. These fields provide request metadata like the HTTP method and headers.
1. _[Create a canonical request](https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html)_. "Canonical" just means that the string has a standard set of fields. These fields provide request metadata like the HTTP method and headers.
2. _[Create a string to sign](https://docs.aws.amazon.com/general/latest/gr/sigv4-create-string-to-sign.html)_. In this step, a SHA256 hash of the canonical request is calculated, and combined with some additional authentication information to produce a new string called the "string to sign." The Python standard library package [`hashlib`](https://docs.python.org/3/library/hashlib.html) makes this straightforward.
3. _[Calculate a signature](https://docs.aws.amazon.com/general/latest/gr/sigv4-calculate-signature.html)_. To set up this step, a signing key is derived with successive rounds of HMAC hashing. The [concept behind HMAC](https://www.okta.com/identity-101/hmac/) ("Keyed-Hashing for Message Authentication" or "Hash-based Message Authentication Codes") is to generate hashes with mostly non-secret information, along with a small amount of secret information that both the sender and recipient have agreed upon ahead of time. The secret information here is the secret access key. The signature is then calculated with another round of HMAC, using the signing key and the string to sign. The Python standard library package [`hmac`](https://docs.python.org/3/library/hmac.html) does most of the hard work here.
4. _[Add the signature to the HTTP request](https://docs.aws.amazon.com/general/latest/gr/sigv4-add-signature-to-request.html)_. The hex digest of the signature is included with the request.
Expand All @@ -39,15 +39,23 @@ Dotenv files are commonly kept in [cloud object storage](https://en.wikipedia.or

#### Download

Downloads with `GET` can be authenticated by including AWS Signature Version 4 information either with [request headers](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-auth-using-authorization-header.html) or [query parameters](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html). fastenv uses query parameters to generate [presigned URLs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html). The advantage to presigned URLs with query parameters is that URLs can be used on their own.

The download method generates a presigned URL, uses it to download file contents, and either saves the contents to a file or returns the contents as a string.

Downloads with `GET` can be authenticated by including AWS Signature Version 4 information either with [request headers](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-auth-using-authorization-header.html) or [query parameters](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html). fastenv uses query parameters to generate [presigned URLs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html). The advantage to presigned URLs with query parameters is that URLs can be used on their own.

A related operation is [`head_object`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.head_object), which can be used to check if an object exists. The request is the same as a `GET`, except the [`HEAD` HTTP request method](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/HEAD) is used. fastenv does not provide an implementation of `head_object` at this time, but it could be considered in the future.

#### Upload

[Uploads with `POST`](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-UsingHTTPPOST.html) work differently than downloads with `GET`. A typical back-end engineer might ask, "Can't I just `POST` binary data to an API endpoint with a bearer token or something?" To which AWS might respond, "No, not really. Here's how you do it instead: pretend like you're submitting a web form." "What?"
The upload method uploads source contents to an object storage bucket, selecting the appropriate upload strategy based on the cloud platform being used. Uploads can be done with either `POST` or `PUT`.

[Uploads with `PUT` can use presigned URLs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html). Unlike downloads with `GET`, presigned `PUT` URL query parameters do not necessarily contain all the required information. Additional information may need to be supplied in request headers. In addition to supplying header keys and values with HTTP requests, header keys should be signed into the URL in the `X-Amz-SignedHeaders` query string parameter. These request headers can specify:

- [Object encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/serv-side-encryption.html). Encryption information can be specified with headers including `X-Amz-Server-Side-Encryption`. Note that, although similar headers like `X-Amz-Algorithm` are included as query string parameters in presigned URLs, `X-Amz-Server-Side-Encryption` is not. If `X-Amz-Server-Side-Encryption` is included in query string parameters, it may be silently ignored by the object storage platform. [AWS S3 now automatically encrypts all objects](https://docs.aws.amazon.com/AmazonS3/latest/userguide/default-encryption-faq.html) and [Cloudflare R2 does also](https://docs.aws.amazon.com/AmazonS3/latest/userguide/default-encryption-faq.html), but [Backblaze B2 will only automatically encrypt objects if the bucket has default encryption enabled](https://www.backblaze.com/docs/cloud-storage-server-side-encryption).
- [Object metadata](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html). Headers like `Content-Disposition`, `Content-Length`, and `Content-Type` can be supplied in request headers.
- [Object integrity checks](https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html). The `Content-MD5` header, defined by [RFC 1864](https://www.rfc-editor.org/rfc/rfc1864), can supply a base64-encoded MD5 checksum. After the upload is completed, the object storage platform server will calculate a checksum for the object in the same manner. If the client and server checksums are the same, this means that all expected information was successfully sent to the server. If the checksums are different, this may mean that object information was lost in transit, and an error will be reported. Note that, although Backblaze B2 accepts and processes the `Content-MD5` header, it will report a SHA1 checksum to align with [uploads to the B2-native API](https://www.backblaze.com/docs/en/cloud-storage-file-information).

[Uploads with `POST`](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-UsingHTTPPOST.html) work differently than `GET` or `PUT` operations. A typical back-end engineer might ask, "Can't I just `POST` binary data to an API endpoint with a bearer token or something?" To which AWS might respond, "No, not really. Here's how you do it instead: pretend like you're submitting a web form." "What?"

Anyway, here's how it works:

Expand All @@ -56,12 +64,8 @@ Dotenv files are commonly kept in [cloud object storage](https://en.wikipedia.or
3. _Calculate a signature_. This step is basically the same as for query string auth. A signing key is derived with HMAC, and then used with the string to sign for another round of HMAC to calculate the signature.
4. _Add the signature to the HTTP request_. For `POST` uploads, the signature is provided with other required information as form data, rather than as URL query parameters. An advantage of this approach is that it can also be used for browser-based uploads, because the form data can be used to populate the fields of an HTML web form. There is some overlap between items in the `POST` policy and fields in the form data, but they are not exactly the same.

The S3 API does also support [uploads with HTTP `PUT` requests](https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html). fastenv does not use `PUT` requests at this time, but they could be considered in the future.

Backblaze uploads with `POST` are different, though there are [good reasons](https://www.backblaze.com/blog/design-thinking-b2-apis-the-hidden-costs-of-s3-compatibility/) for that (helps keep costs low). fastenv includes an implementation of the Backblaze B2 `POST` upload process.

The upload method uploads source contents to an object storage bucket, selecting the appropriate upload strategy based on the cloud platform being used.

#### List

fastenv does not currently have methods for listing bucket contents.
Expand Down
89 changes: 71 additions & 18 deletions fastenv/cloud/object_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,9 +189,10 @@ def generate_presigned_url(
bucket_path: os.PathLike[str] | str,
*,
expires: int = 3600,
headers: httpx.Headers | dict[str, str] | None = None,
service: str = "s3",
) -> httpx.URL:
"""Generate a presigned URL for downloads from S3-compatible object storage.
"""Generate a presigned URL for S3-compatible object storage.

Requests to S3-compatible object storage can be authenticated either with
request headers or query parameters. Presigned URLs use query parameters.
Expand All @@ -207,6 +208,11 @@ def generate_presigned_url(
`expires`: seconds until the URL expires. The default and maximum
expiration times are the same as the AWS CLI and Boto3.

`headers`: HTTP request headers (not including the default HTTP `host` header)
that will be included with the request. These headers may include additional
`x-amz-*` headers, such as `X-Amz-Server-Side-Encryption`, or other headers
such as `Content-Type` known to be accepted by the API operation.

`service`: cloud service for which to generate the presigned URL.

https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/presign.html
Expand All @@ -220,7 +226,7 @@ def generate_presigned_url(
raise ValueError("Expiration time must be between one second and one week.")
key = key if (key := str(bucket_path)).startswith("/") else f"/{key}"
params = self._set_presigned_url_query_params(
method, key, expires=expires, service=service
method, key, expires=expires, headers=headers, service=service
)
return httpx.URL(
scheme="https", host=self._config.bucket_host, path=key, params=params
Expand All @@ -232,6 +238,7 @@ def _set_presigned_url_query_params(
key: str,
*,
expires: int,
headers: httpx.Headers | dict[str, str] | None = None,
service: str = "s3",
payload_hash: str = "UNSIGNED-PAYLOAD",
) -> httpx.QueryParams:
Expand Down Expand Up @@ -271,13 +278,20 @@ def _set_presigned_url_query_params(
if self._config.session_token:
params["X-Amz-Security-Token"] = self._config.session_token
params["X-Amz-SignedHeaders"] = "host"
headers = {"host": self._config.bucket_host}
default_headers = {"host": self._config.bucket_host}
if headers:
signed_headers = httpx.Headers({**default_headers, **headers})
else:
signed_headers = httpx.Headers(default_headers)
params["X-Amz-SignedHeaders"] = (
";".join(keys) if len(keys := sorted(signed_headers)) > 1 else "host"
)
# 1. create canonical request
canonical_request = self._create_canonical_request(
method=method,
key=key,
params=params,
headers=headers,
headers=signed_headers,
payload_hash=payload_hash,
)
# 2. create string to sign
Expand All @@ -297,20 +311,28 @@ def _set_presigned_url_query_params(
def _create_canonical_request(
method: Literal["DELETE", "GET", "HEAD", "POST", "PUT"],
key: str,
params: dict[str, str],
headers: dict[str, str],
params: httpx.QueryParams | dict[str, str],
headers: httpx.Headers | dict[str, str],
payload_hash: str,
) -> str:
"""Create a canonical request for AWS Signature Version 4.

https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html
https://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html
https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html

There should be two line breaks after the `canonical_headers`.

`signed_headers` must be alphabetically-sorted, semicolon-separated, and
lowercased. Note that the `sorted` built-in function ("builtin") sorts strings
case-sensitively by default. To sort case-insensitively, strings should be
lowercased before the function call (done automatically by `httpx.Headers`) or
lowercased during the function call (`sorted(key=str.lower)`).
https://docs.python.org/3/howto/sorting.html
"""
canonical_uri = urllib.parse.quote(key if key.startswith("/") else f"/{key}")
canonical_query_params = httpx.QueryParams(params)
canonical_query_string = str(canonical_query_params)
headers = httpx.Headers(headers)
header_keys = sorted(headers)
canonical_headers = "".join(f"{key}:{headers[key]}\n" for key in header_keys)
signed_headers = ";".join(header_keys)
Expand Down Expand Up @@ -392,7 +414,9 @@ async def upload(
source: os.PathLike[str] | str | bytes = ".env",
*,
content_type: str = "text/plain",
method: Literal["POST", "PUT"] = "PUT",
server_side_encryption: Literal["AES256", None] = None,
specify_content_disposition: bool = True,
) -> httpx.Response | None:
"""Upload a file to cloud object storage.

Expand All @@ -407,16 +431,43 @@ async def upload(
See Backblaze for a list of supported content types.
https://www.backblaze.com/b2/docs/content-types.html

`server_side_encryption`: optional encryption algorithm to specify,
which the object storage platform will use to encrypt the file for storage.
`method`: HTTP method to use for upload. S3-compatible object storage accepts
uploads with HTTP PUT via the PutObject API and presigned URLs, or POST
with authentication information in form fields.

`server_side_encryption`: optional encryption algorithm to specify for
the object storage platform to use to encrypt the file for storage.
This method supports AES256 encryption with managed keys,
referred to as "SSE-B2" on Backblaze or "SSE-S3" on AWS S3.
https://www.backblaze.com/b2/docs/server_side_encryption.html
https://www.backblaze.com/docs/cloud-storage-server-side-encryption
https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingServerSideEncryption.html

`specify_content_disposition`: the HTTP header `Content-Disposition` indicates
whether the content is expected to be displayed inline (in the browser) or
downloaded to a file (referred to as an "attachment"). Dotenv files are
typically downloaded instead of being displayed in the browser, so by default,
fastenv will add `Content-Disposition: attachment; filename="{filename}"`.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition
"""
try:
content, message = await self._encode_source(source)
if self._config.bucket_host.endswith(".backblazeb2.com"):
content_length = len(content)
if method == "PUT":
content_md5 = base64.b64encode(hashlib.md5(content).digest())
headers = httpx.Headers({b"Content-MD5": content_md5})
headers["Content-Length"] = str(content_length)
headers["Content-Type"] = content_type
if specify_content_disposition:
filename = str(bucket_path).split(sep="/")[-1]
content_disposition = f'attachment; filename="{filename}"'
headers["Content-Disposition"] = content_disposition
if server_side_encryption:
headers["X-Amz-Server-Side-Encryption"] = server_side_encryption
url = self.generate_presigned_url(
method, bucket_path, expires=30, headers=headers
)
response = await self._client.put(url, content=content, headers=headers)
elif self._config.bucket_host.endswith(".backblazeb2.com"):
response = await self.upload_to_backblaze_b2(
bucket_path,
content,
Expand All @@ -426,7 +477,7 @@ async def upload(
else:
url, data = self.generate_presigned_post(
bucket_path,
content_length=len(content),
content_length=content_length,
content_type=content_type,
expires=30,
server_side_encryption=server_side_encryption,
Expand Down Expand Up @@ -479,11 +530,11 @@ def generate_presigned_post(
See Backblaze for a list of supported content types.
https://www.backblaze.com/b2/docs/content-types.html

`server_side_encryption`: optional encryption algorithm to specify,
which the object storage platform will use to encrypt the file for storage.
`server_side_encryption`: optional encryption algorithm to specify for
the object storage platform to use to encrypt the file for storage.
This method supports AES256 encryption with managed keys,
referred to as "SSE-B2" on Backblaze or "SSE-S3" on AWS S3.
https://www.backblaze.com/b2/docs/server_side_encryption.html
https://www.backblaze.com/docs/cloud-storage-server-side-encryption
https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingServerSideEncryption.html

`specify_content_disposition`: the HTTP header `Content-Disposition` indicates
Expand Down Expand Up @@ -776,7 +827,7 @@ async def get_backblaze_b2_upload_url(
"""Get an upload URL from Backblaze B2, using the authorization token
and URL obtained from a call to `b2_authorize_account`.

https://www.backblaze.com/b2/docs/uploading.html
https://www.backblaze.com/apidocs/b2-upload-file
https://www.backblaze.com/b2/docs/b2_get_upload_url.html
"""
authorization_response_json = authorization_response.json()
Expand All @@ -801,8 +852,10 @@ async def upload_to_backblaze_b2(
"""Upload a file to Backblaze B2 object storage, using the authorization token
and URL obtained from a call to `b2_get_upload_url`.

https://www.backblaze.com/b2/docs/uploading.html
https://www.backblaze.com/b2/docs/b2_upload_file.html
Backblaze B2 does not currently support single-part uploads with POST
to their S3 API. The B2 native API must be used.

https://www.backblaze.com/apidocs/b2-upload-file
"""
authorization_response = await self.authorize_backblaze_b2_account()
upload_url_response = await self.get_backblaze_b2_upload_url(
Expand Down
Loading
Loading