Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace rusoto with custom implementation for AWS (#2176) #2352

Merged
merged 10 commits into from
Aug 15, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Aug 7, 2022

Which issue does this PR close?

Closes #2350
Part of #2176

Rationale for this change

See #2176

What changes are included in this PR?

Replaces rusoto with a custom implementation based on reqwest

Are there any user-facing changes?

Not that I'm aware of, but there is a possibility

@github-actions github-actions bot added the object-store Object Store Interface label Aug 7, 2022
&[DATE_HEADER, HASH_HEADER, TOKEN_HEADER, AUTH_HEADER];

impl<'a> RequestSigner<'a> {
fn sign(&self, request: &mut Request) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything about this process is ludicrous... TLS exists... Why oh why would you do this 😭

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big cooperation firewall/VPN stuff with TLS stripping 🙈

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous comment about this being ludicrous still stands 😆

object_store/src/aws/credential.rs Outdated Show resolved Hide resolved
object_store/src/aws/credential.rs Outdated Show resolved Hide resolved
///
/// Note: this does not use AsyncTrait as the lifetimes are difficult to manage
pub(crate) trait CloudMultiPartUploadImpl {
#[async_trait]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the opportunity to clean up this interface a bit, this isn't public and so this isn't a breaking change. I may split this out into another PR

@tustvold
Copy link
Contributor Author

tustvold commented Aug 7, 2022

Looks like I found a bug in the azure implementation 😅

Edit: fix in #2354

@tustvold tustvold marked this pull request as ready for review August 8, 2022 14:20
}

/// <https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-technical-overview.html>
async fn web_identity(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good way to integration test this, but I have tested it with some real EKS tokens and it seems to work

Copy link
Contributor

@crepererum crepererum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished the first round. Good job! I think this will definitely pay off.

use futures::Stream;
use std::future::Future;

/// Performs a paginated operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring says close to nothing 😁

Maybe add a few words on how a stream is constructed out of a state and an async function and what this provides over unfold and poll_fn (which also exists for streams). Esp. that the output is paginated as well, it is NOT flattened.

Furthermore, I would suggest renaming list to list_fn or something, because initially I thought list is A list -- not a function to list -- that somehow gets unrolled here.

Comment on lines +103 to +106
Self::Generic {
store: "S3",
source: Box::new(err),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be at least a NotFound error (and the generic test suite should probably assert that fetching an unknown object actually results in an "not found" error, not some generic one).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NotFound variant is produced by the client

use bytes::{Buf, Bytes};
use chrono::{DateTime, Utc};
use percent_encoding::{utf8_percent_encode, AsciiSet, PercentEncode, NON_ALPHANUMERIC};
use reqwest::{Client, Method, Response, StatusCode};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use reqwest::{Client, Method, Response, StatusCode};
use reqwest::{Client as ReqwestClient, Method, Response, StatusCode};

#[derive(Debug)]
pub(crate) struct S3Client {
config: S3Config,
client: Client,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
client: Client,
reqwest_client: ReqwestClient,

I was wondering what the client in a client in a client was.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

}

impl CredentialExt for RequestBuilder {
/// Sign a request <https://docs.aws.amazon.com/general/latest/gr/sigv4_signing.html>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc should go the the trait, not the impl (or both if the the impl is special).

fn sign(self, credential: &AwsCredential, region: &str, service: &str) -> Self;
}

impl CredentialExt for RequestBuilder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always a bit torn when it comes to extension traits for foreign types. Sure the make the code more elegant (I can clearly see that that in client.rs, which now feels like rather normal reqwest client stuff with very few parameters) but looking at code that uses is often makes me wonder "what's that new/fancy method on this type I've worked with before?!". So you can leave it as it is, just don't overdo it 😉

Also: they are technically prone to break under semantic versioning, because upstream may be free to add new methods (not a breaking change) which then collide with this method name. Since your "use side" doesn't use a fully qualified name, the code will stop compiling due to ambiguity.

One thing to avoid both (my confusion and the potential breakage) would be to choose a more specific method name like aws_sign or s3_sign.

/// Interface for [Amazon S3](https://aws.amazon.com/s3/).
#[derive(Debug)]
pub struct AmazonS3 {
client: Arc<S3Client>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason that the S3Client is not just inlined into the store object? Feels a bit like a layer that isn't really necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It helps with encapsulation, especially as many methods on ObjectStore map to some combination of actual upstream requests - e.g. the various types of Put request. Also made it easier to split into its own module

@@ -0,0 +1,526 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module makes me a bit unhappy. It's such a PITA that AWS requires all that stuff. However I see you clearly did your homework and it's documented and the tests work, so I just assume it does the right thing.

If you want me to double-check the code with the official docs, I can do that but I think it's not really required.


/// Canonicalizes headers into the AWS Canonical Form.
///
/// <http://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// <http://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html>
/// <https://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html>

.github/workflows/object_store.yml Show resolved Hide resolved
@tustvold
Copy link
Contributor Author

tustvold commented Aug 9, 2022

https://github.com/influxdata/influxdb_iox/pull/5342 has been deployed and appears to be working correctly, testing both the Instance Metadata credentials and the WebIdentity

@alamb
Copy link
Contributor

alamb commented Aug 9, 2022

I would like to request holding off merging this PR until we have done the official 0.4 object store release from this repo, tracked by #2338 #2261

My rationale is that there are enough moving parts and we are (effectively) still testing some of this code in production so I don't want to accidentally introduce a regression that would require a new release quickly after the previous one

@alamb
Copy link
Contributor

alamb commented Aug 11, 2022

Sounds like the IOx 'testing in production' has gone very well ❤️

So to move this PR forward:

  1. It seems like like there are a few more comments from @crepererum to address.
  2. The 0.4.0 release is going along nicely,
  3. I think we are on track to merge this over the weekend or early next week

@alamb alamb changed the title Replace rusoto with custom implementation (#2176) Replace rusoto with custom implementation for AWS (#2176) Aug 11, 2022
@tustvold
Copy link
Contributor Author

I intend to get this in this afternoon unless anybody would like longer to review

@alamb
Copy link
Contributor

alamb commented Aug 15, 2022

I don't have any objections, though I haven't reviewed this PR in detail either. I will confirm we deployed these changes in our IOx environment and they seem to be working great

cc @roeap

1 similar comment
@alamb
Copy link
Contributor

alamb commented Aug 15, 2022

I don't have any objections, though I haven't reviewed this PR in detail either. I will confirm we deployed these changes in our IOx environment and they seem to be working great

cc @roeap

@tustvold tustvold merged commit 3f0e12d into apache:master Aug 15, 2022
@ursabot
Copy link

ursabot commented Aug 15, 2022

Benchmark runs are scheduled for baseline = 47a2c21 and contender = 3f0e12d. 3f0e12d is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@alamb alamb mentioned this pull request Aug 31, 2022
7 tasks
@tustvold tustvold added the api-change Changes to the arrow API label Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AmazonS3 Support IMDSv2
4 participants