Skip to content

Commit

Permalink
Add PUT uploads to object storage client (#25)
Browse files Browse the repository at this point in the history
Dotenv files are commonly kept in cloud object storage. fastenv provides
an object storage client for downloading and uploading dotenv files.

S3-compatible object storage allows uploads with either `POST` or `PUT`.
This commit will implement uploads with `PUT`.

The new `method` argument to `fastenv.ObjectStorageClient.upload()` will
accept either `POST` or `PUT`. `POST` was previously the default.
#8

`PUT` will now be the default. `PUT` uploads are more widely-supported
and standardized. Backblaze B2 does not currently support single-part
uploads with `POST` to their S3 API (the B2 native API must be used
instead), and Cloudflare R2 does not support uploads with `POST` at all.
https://www.backblaze.com/apidocs/b2-upload-file
https://developers.cloudflare.com/r2/api/s3/presigned-urls/#supported-http-methods

Files will be opened in binary mode and attached with the `content`
argument (`httpx_client.put(content=content)`) as suggested in the
HTTPX docs (https://www.python-httpx.org/compatibility/).

Unlike downloads with `GET`, presigned `PUT` URL query parameters do not
necessarily contain all the required information. Additional information
may need to be supplied in request headers. In addition to supplying
header keys and values with HTTP requests, header keys will be signed
into the URL in the `X-Amz-SignedHeaders` query string parameter
(alphabetically-sorted, semicolon-separated, lowercased).
https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html

These request headers can specify:

- Object encryption. Encryption information can be specified with
  headers including `X-Amz-Server-Side-Encryption`. Note that, although
  similar headers like `X-Amz-Algorithm` are included as query string
  parameters in presigned URLs, `X-Amz-Server-Side-Encryption` is not.
  If `X-Amz-Server-Side-Encryption` is included in query string
  parameters, it may be silently ignored by the object storage platform.
  AWS S3 and Cloudflare R2 now automatically encrypt all objects, but
  Backblaze B2 will only automatically encrypt objects if the bucket
  has default encryption enabled.
  https://docs.aws.amazon.com/AmazonS3/latest/userguide/default-encryption-faq.html
  https://docs.aws.amazon.com/AmazonS3/latest/userguide/serv-side-encryption.html
  https://www.backblaze.com/docs/cloud-storage-server-side-encryption
  https://docs.aws.amazon.com/AmazonS3/latest/userguide/default-encryption-faq.html
- Object integrity checks. The `Content-MD5` header defined by RFC 1864
  can supply a base64-encoded MD5 checksum. After upload, the object
  storage platform server will calculate a checksum for the object in
  the same manner. If the client and server checksums are the same, all
  expected information was successfully sent to the server. If the
  checksums are different, this may mean that object information was
  lost in transit, and an error will be reported. Note that, although
  Backblaze B2 accepts and processes the `Content-MD5` header, it will
  report a SHA1 checksum to align with uploads to the B2-native API.
  https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html
  https://www.backblaze.com/docs/en/cloud-storage-file-information
- Object metadata. Headers like `Content-Disposition`, `Content-Length`,
  and `Content-Type` can be supplied in request headers.
  https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html
  • Loading branch information
br3ndonland authored Jan 28, 2024
1 parent 8492656 commit 6968353
Show file tree
Hide file tree
Showing 3 changed files with 145 additions and 30 deletions.
20 changes: 12 additions & 8 deletions docs/cloud-object-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Dotenv files are commonly kept in [cloud object storage](https://en.wikipedia.or

Creating a signature is a [four-step process](https://docs.aws.amazon.com/general/latest/gr/sigv4_signing.html):

1. _[Create a canonical request](https://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html)_. "Canonical" just means that the string has a standard set of fields. These fields provide request metadata like the HTTP method and headers.
1. _[Create a canonical request](https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html)_. "Canonical" just means that the string has a standard set of fields. These fields provide request metadata like the HTTP method and headers.
2. _[Create a string to sign](https://docs.aws.amazon.com/general/latest/gr/sigv4-create-string-to-sign.html)_. In this step, a SHA256 hash of the canonical request is calculated, and combined with some additional authentication information to produce a new string called the "string to sign." The Python standard library package [`hashlib`](https://docs.python.org/3/library/hashlib.html) makes this straightforward.
3. _[Calculate a signature](https://docs.aws.amazon.com/general/latest/gr/sigv4-calculate-signature.html)_. To set up this step, a signing key is derived with successive rounds of HMAC hashing. The [concept behind HMAC](https://www.okta.com/identity-101/hmac/) ("Keyed-Hashing for Message Authentication" or "Hash-based Message Authentication Codes") is to generate hashes with mostly non-secret information, along with a small amount of secret information that both the sender and recipient have agreed upon ahead of time. The secret information here is the secret access key. The signature is then calculated with another round of HMAC, using the signing key and the string to sign. The Python standard library package [`hmac`](https://docs.python.org/3/library/hmac.html) does most of the hard work here.
4. _[Add the signature to the HTTP request](https://docs.aws.amazon.com/general/latest/gr/sigv4-add-signature-to-request.html)_. The hex digest of the signature is included with the request.
Expand All @@ -39,15 +39,23 @@ Dotenv files are commonly kept in [cloud object storage](https://en.wikipedia.or

#### Download

Downloads with `GET` can be authenticated by including AWS Signature Version 4 information either with [request headers](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-auth-using-authorization-header.html) or [query parameters](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html). fastenv uses query parameters to generate [presigned URLs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html). The advantage to presigned URLs with query parameters is that URLs can be used on their own.

The download method generates a presigned URL, uses it to download file contents, and either saves the contents to a file or returns the contents as a string.

Downloads with `GET` can be authenticated by including AWS Signature Version 4 information either with [request headers](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-auth-using-authorization-header.html) or [query parameters](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html). fastenv uses query parameters to generate [presigned URLs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html). The advantage to presigned URLs with query parameters is that URLs can be used on their own.

A related operation is [`head_object`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.head_object), which can be used to check if an object exists. The request is the same as a `GET`, except the [`HEAD` HTTP request method](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/HEAD) is used. fastenv does not provide an implementation of `head_object` at this time, but it could be considered in the future.

#### Upload

[Uploads with `POST`](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-UsingHTTPPOST.html) work differently than downloads with `GET`. A typical back-end engineer might ask, "Can't I just `POST` binary data to an API endpoint with a bearer token or something?" To which AWS might respond, "No, not really. Here's how you do it instead: pretend like you're submitting a web form." "What?"
The upload method uploads source contents to an object storage bucket, selecting the appropriate upload strategy based on the cloud platform being used. Uploads can be done with either `POST` or `PUT`.

[Uploads with `PUT` can use presigned URLs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html). Unlike downloads with `GET`, presigned `PUT` URL query parameters do not necessarily contain all the required information. Additional information may need to be supplied in request headers. In addition to supplying header keys and values with HTTP requests, header keys should be signed into the URL in the `X-Amz-SignedHeaders` query string parameter. These request headers can specify:

- [Object encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/serv-side-encryption.html). Encryption information can be specified with headers including `X-Amz-Server-Side-Encryption`. Note that, although similar headers like `X-Amz-Algorithm` are included as query string parameters in presigned URLs, `X-Amz-Server-Side-Encryption` is not. If `X-Amz-Server-Side-Encryption` is included in query string parameters, it may be silently ignored by the object storage platform. [AWS S3 now automatically encrypts all objects](https://docs.aws.amazon.com/AmazonS3/latest/userguide/default-encryption-faq.html) and [Cloudflare R2 does also](https://docs.aws.amazon.com/AmazonS3/latest/userguide/default-encryption-faq.html), but [Backblaze B2 will only automatically encrypt objects if the bucket has default encryption enabled](https://www.backblaze.com/docs/cloud-storage-server-side-encryption).
- [Object metadata](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html). Headers like `Content-Disposition`, `Content-Length`, and `Content-Type` can be supplied in request headers.
- [Object integrity checks](https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html). The `Content-MD5` header, defined by [RFC 1864](https://www.rfc-editor.org/rfc/rfc1864), can supply a base64-encoded MD5 checksum. After the upload is completed, the object storage platform server will calculate a checksum for the object in the same manner. If the client and server checksums are the same, this means that all expected information was successfully sent to the server. If the checksums are different, this may mean that object information was lost in transit, and an error will be reported. Note that, although Backblaze B2 accepts and processes the `Content-MD5` header, it will report a SHA1 checksum to align with [uploads to the B2-native API](https://www.backblaze.com/docs/en/cloud-storage-file-information).

[Uploads with `POST`](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-UsingHTTPPOST.html) work differently than `GET` or `PUT` operations. A typical back-end engineer might ask, "Can't I just `POST` binary data to an API endpoint with a bearer token or something?" To which AWS might respond, "No, not really. Here's how you do it instead: pretend like you're submitting a web form." "What?"

Anyway, here's how it works:

Expand All @@ -56,12 +64,8 @@ Dotenv files are commonly kept in [cloud object storage](https://en.wikipedia.or
3. _Calculate a signature_. This step is basically the same as for query string auth. A signing key is derived with HMAC, and then used with the string to sign for another round of HMAC to calculate the signature.
4. _Add the signature to the HTTP request_. For `POST` uploads, the signature is provided with other required information as form data, rather than as URL query parameters. An advantage of this approach is that it can also be used for browser-based uploads, because the form data can be used to populate the fields of an HTML web form. There is some overlap between items in the `POST` policy and fields in the form data, but they are not exactly the same.

The S3 API does also support [uploads with HTTP `PUT` requests](https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html). fastenv does not use `PUT` requests at this time, but they could be considered in the future.

Backblaze uploads with `POST` are different, though there are [good reasons](https://www.backblaze.com/blog/design-thinking-b2-apis-the-hidden-costs-of-s3-compatibility/) for that (helps keep costs low). fastenv includes an implementation of the Backblaze B2 `POST` upload process.

The upload method uploads source contents to an object storage bucket, selecting the appropriate upload strategy based on the cloud platform being used.

#### List

fastenv does not currently have methods for listing bucket contents.
Expand Down
89 changes: 71 additions & 18 deletions fastenv/cloud/object_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,9 +189,10 @@ def generate_presigned_url(
bucket_path: os.PathLike[str] | str,
*,
expires: int = 3600,
headers: httpx.Headers | dict[str, str] | None = None,
service: str = "s3",
) -> httpx.URL:
"""Generate a presigned URL for downloads from S3-compatible object storage.
"""Generate a presigned URL for S3-compatible object storage.
Requests to S3-compatible object storage can be authenticated either with
request headers or query parameters. Presigned URLs use query parameters.
Expand All @@ -207,6 +208,11 @@ def generate_presigned_url(
`expires`: seconds until the URL expires. The default and maximum
expiration times are the same as the AWS CLI and Boto3.
`headers`: HTTP request headers (not including the default HTTP `host` header)
that will be included with the request. These headers may include additional
`x-amz-*` headers, such as `X-Amz-Server-Side-Encryption`, or other headers
such as `Content-Type` known to be accepted by the API operation.
`service`: cloud service for which to generate the presigned URL.
https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/presign.html
Expand All @@ -220,7 +226,7 @@ def generate_presigned_url(
raise ValueError("Expiration time must be between one second and one week.")
key = key if (key := str(bucket_path)).startswith("/") else f"/{key}"
params = self._set_presigned_url_query_params(
method, key, expires=expires, service=service
method, key, expires=expires, headers=headers, service=service
)
return httpx.URL(
scheme="https", host=self._config.bucket_host, path=key, params=params
Expand All @@ -232,6 +238,7 @@ def _set_presigned_url_query_params(
key: str,
*,
expires: int,
headers: httpx.Headers | dict[str, str] | None = None,
service: str = "s3",
payload_hash: str = "UNSIGNED-PAYLOAD",
) -> httpx.QueryParams:
Expand Down Expand Up @@ -271,13 +278,20 @@ def _set_presigned_url_query_params(
if self._config.session_token:
params["X-Amz-Security-Token"] = self._config.session_token
params["X-Amz-SignedHeaders"] = "host"
headers = {"host": self._config.bucket_host}
default_headers = {"host": self._config.bucket_host}
if headers:
signed_headers = httpx.Headers({**default_headers, **headers})
else:
signed_headers = httpx.Headers(default_headers)
params["X-Amz-SignedHeaders"] = (
";".join(keys) if len(keys := sorted(signed_headers)) > 1 else "host"
)
# 1. create canonical request
canonical_request = self._create_canonical_request(
method=method,
key=key,
params=params,
headers=headers,
headers=signed_headers,
payload_hash=payload_hash,
)
# 2. create string to sign
Expand All @@ -297,20 +311,28 @@ def _set_presigned_url_query_params(
def _create_canonical_request(
method: Literal["DELETE", "GET", "HEAD", "POST", "PUT"],
key: str,
params: dict[str, str],
headers: dict[str, str],
params: httpx.QueryParams | dict[str, str],
headers: httpx.Headers | dict[str, str],
payload_hash: str,
) -> str:
"""Create a canonical request for AWS Signature Version 4.
https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html
https://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html
https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html
There should be two line breaks after the `canonical_headers`.
`signed_headers` must be alphabetically-sorted, semicolon-separated, and
lowercased. Note that the `sorted` built-in function ("builtin") sorts strings
case-sensitively by default. To sort case-insensitively, strings should be
lowercased before the function call (done automatically by `httpx.Headers`) or
lowercased during the function call (`sorted(key=str.lower)`).
https://docs.python.org/3/howto/sorting.html
"""
canonical_uri = urllib.parse.quote(key if key.startswith("/") else f"/{key}")
canonical_query_params = httpx.QueryParams(params)
canonical_query_string = str(canonical_query_params)
headers = httpx.Headers(headers)
header_keys = sorted(headers)
canonical_headers = "".join(f"{key}:{headers[key]}\n" for key in header_keys)
signed_headers = ";".join(header_keys)
Expand Down Expand Up @@ -392,7 +414,9 @@ async def upload(
source: os.PathLike[str] | str | bytes = ".env",
*,
content_type: str = "text/plain",
method: Literal["POST", "PUT"] = "PUT",
server_side_encryption: Literal["AES256", None] = None,
specify_content_disposition: bool = True,
) -> httpx.Response | None:
"""Upload a file to cloud object storage.
Expand All @@ -407,16 +431,43 @@ async def upload(
See Backblaze for a list of supported content types.
https://www.backblaze.com/b2/docs/content-types.html
`server_side_encryption`: optional encryption algorithm to specify,
which the object storage platform will use to encrypt the file for storage.
`method`: HTTP method to use for upload. S3-compatible object storage accepts
uploads with HTTP PUT via the PutObject API and presigned URLs, or POST
with authentication information in form fields.
`server_side_encryption`: optional encryption algorithm to specify for
the object storage platform to use to encrypt the file for storage.
This method supports AES256 encryption with managed keys,
referred to as "SSE-B2" on Backblaze or "SSE-S3" on AWS S3.
https://www.backblaze.com/b2/docs/server_side_encryption.html
https://www.backblaze.com/docs/cloud-storage-server-side-encryption
https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingServerSideEncryption.html
`specify_content_disposition`: the HTTP header `Content-Disposition` indicates
whether the content is expected to be displayed inline (in the browser) or
downloaded to a file (referred to as an "attachment"). Dotenv files are
typically downloaded instead of being displayed in the browser, so by default,
fastenv will add `Content-Disposition: attachment; filename="{filename}"`.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition
"""
try:
content, message = await self._encode_source(source)
if self._config.bucket_host.endswith(".backblazeb2.com"):
content_length = len(content)
if method == "PUT":
content_md5 = base64.b64encode(hashlib.md5(content).digest())
headers = httpx.Headers({b"Content-MD5": content_md5})
headers["Content-Length"] = str(content_length)
headers["Content-Type"] = content_type
if specify_content_disposition:
filename = str(bucket_path).split(sep="/")[-1]
content_disposition = f'attachment; filename="{filename}"'
headers["Content-Disposition"] = content_disposition
if server_side_encryption:
headers["X-Amz-Server-Side-Encryption"] = server_side_encryption
url = self.generate_presigned_url(
method, bucket_path, expires=30, headers=headers
)
response = await self._client.put(url, content=content, headers=headers)
elif self._config.bucket_host.endswith(".backblazeb2.com"):
response = await self.upload_to_backblaze_b2(
bucket_path,
content,
Expand All @@ -426,7 +477,7 @@ async def upload(
else:
url, data = self.generate_presigned_post(
bucket_path,
content_length=len(content),
content_length=content_length,
content_type=content_type,
expires=30,
server_side_encryption=server_side_encryption,
Expand Down Expand Up @@ -479,11 +530,11 @@ def generate_presigned_post(
See Backblaze for a list of supported content types.
https://www.backblaze.com/b2/docs/content-types.html
`server_side_encryption`: optional encryption algorithm to specify,
which the object storage platform will use to encrypt the file for storage.
`server_side_encryption`: optional encryption algorithm to specify for
the object storage platform to use to encrypt the file for storage.
This method supports AES256 encryption with managed keys,
referred to as "SSE-B2" on Backblaze or "SSE-S3" on AWS S3.
https://www.backblaze.com/b2/docs/server_side_encryption.html
https://www.backblaze.com/docs/cloud-storage-server-side-encryption
https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingServerSideEncryption.html
`specify_content_disposition`: the HTTP header `Content-Disposition` indicates
Expand Down Expand Up @@ -776,7 +827,7 @@ async def get_backblaze_b2_upload_url(
"""Get an upload URL from Backblaze B2, using the authorization token
and URL obtained from a call to `b2_authorize_account`.
https://www.backblaze.com/b2/docs/uploading.html
https://www.backblaze.com/apidocs/b2-upload-file
https://www.backblaze.com/b2/docs/b2_get_upload_url.html
"""
authorization_response_json = authorization_response.json()
Expand All @@ -801,8 +852,10 @@ async def upload_to_backblaze_b2(
"""Upload a file to Backblaze B2 object storage, using the authorization token
and URL obtained from a call to `b2_get_upload_url`.
https://www.backblaze.com/b2/docs/uploading.html
https://www.backblaze.com/b2/docs/b2_upload_file.html
Backblaze B2 does not currently support single-part uploads with POST
to their S3 API. The B2 native API must be used.
https://www.backblaze.com/apidocs/b2-upload-file
"""
authorization_response = await self.authorize_backblaze_b2_account()
upload_url_response = await self.get_backblaze_b2_upload_url(
Expand Down
Loading

0 comments on commit 6968353

Please sign in to comment.