Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add token provider authorization to azure store #2374

Merged
merged 9 commits into from
Aug 13, 2022

Conversation

roeap
Copy link
Contributor

@roeap roeap commented Aug 8, 2022

Which issue does this PR close?

Closes #2373

cc @tustvold @alamb

Rationale for this change

Allow a wider range of use by offering more ob the azure native auth methods. While this change might eventually becoem irrelevant, once we move away from SDKs, it is quite helpful in the meantime.

What changes are included in this PR?

Add an option to provide a TokenProvider credential via the MicrosoftAzureBuilder.
Add an options to provide client credentials via the MicrosoftAzureBuilder.

Are there any user-facing changes?

users now have more authorization options.

@github-actions github-actions bot added the object-store Object Store Interface label Aug 8, 2022
@roeap roeap changed the title feat: add token provider authorizatiojn to azure store feat: add token provider authorization to azure store Aug 8, 2022
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting token auth makes sense to me, even in a world where we move away from the SDK. I do wonder if we can hide the azure_core type from the interface though?

object_store/src/azure.rs Outdated Show resolved Hide resolved
object_store/src/azure.rs Outdated Show resolved Hide resolved
/// Set a TokenCredential to be used for authentication (required - one of token_credential ot access key)
pub fn with_token_credential(
mut self,
token_credential: Arc<dyn TokenCredential>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some way we could avoid leaking the azure_core type into the public interface, this will not only make it easier to use, but also survive a move away from the SDK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the S§ implementation, but I seem to remember, that we do have some sense of a token credential as well. The problem I found with that, is that "get_token" accepts a client, while by now the client is somewhat hidden in the azure client. Also for gcs that was not a trait. Maybe thats the way to go?

An alternative might be to expose all (some) options required to initialize the token credentials. I guess we would always need that information regardless of the underlying implementation ..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar enough with the various authentication options available in Azure-land to know the implications of asking this, but if we can expose the information for the various options that would be ideal imo...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few, but I can make that work ... also just realized I did this against the upcoming version, so I have to do some method renaming anyhow...

Co-authored-by: Raphael Taylor-Davies <[email protected]>
@roeap roeap marked this pull request as draft August 8, 2022 20:03
@roeap
Copy link
Contributor Author

roeap commented Aug 8, 2022

@tustvold - had to convert this back to draft, since the currently released version of the crates do not support identity based credentials just yet ... Maybe I'll wait until this gets released, and do the updates then, or just get it over with and migrate the implementation to a custom one in this crate ...

Which is also why there was not yet released APIs in use here :D

azure_core = { version = "0.2", optional = true, default-features = false, features = ["enable_reqwest_rustls"] }
azure_storage = { version = "0.2", optional = true, default-features = false, features = ["account"] }
azure_storage_blobs = { version = "0.2", optional = true, default-features = false, features = ["enable_reqwest_rustls"] }
azure_core = { version = "0.4", optional = true, default-features = false, features = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I'm not sure what is formatting these, but I think it makes the file harder to read...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make my prettier less eager :)

@roeap roeap marked this pull request as ready for review August 10, 2022 16:56
@roeap
Copy link
Contributor Author

roeap commented Aug 10, 2022

The latest version of the Azure crates finally got released. Since there were many api changes between the last attempt (#2193) I did not update that PR but included it all here. The relevant update - authorizing via identity - is also part of the latest azure release.

Working on this made me realize once more that we really want to use our own, more focussed implementation of API interactions. I will take a look at that asap. in the meantime i was hoping to get the token auth in the next release, so while a bit finicky in some places, using the SDKs was more realistic to get doe in time (hopefully :)).

@roeap roeap requested a review from tustvold August 10, 2022 17:05
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thank you 🎉

I'm going to hold off on merging this whilst the ongoing object_store 0.4 release process trundles along, just in case there are any bumps along the way (it is the first release following the Apache process). I'll then get this in

Working on this made me realize once more that we really want to use our own, more focussed implementation of API interactions. I will take a look at that asap.

Very exciting, btw if you haven't seen #2352 that moves away from rusuto for S3, so the Azure SDKs are the only ones that still remain.

object_store/src/azure.rs Show resolved Hide resolved
AzureErrorKind::HttpResponse {
status: StatusCode::NotFound,
..
} => crate::Error::NotFound {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

object_store/src/azure.rs Outdated Show resolved Hide resolved
object_store/src/azure.rs Outdated Show resolved Hide resolved
@tustvold
Copy link
Contributor

There do appear to be failing lints and tests btw

Comment on lines +302 to +305
let first = stream.next().await.transpose()?.unwrap_or_default();
Ok(GetResult::Stream(Box::pin(
futures::stream::once(async { Ok(first) }).chain(stream),
)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow this felt a bit strange to collect the first element just to add it to a stream right after, but I saw no other obvious way to have the method raise an error. Before this we are never actually touching any results...

@roeap
Copy link
Contributor Author

roeap commented Aug 10, 2022

There do appear to be failing lints and tests btw

All tests and lints should be passing now.

Very exciting, btw if you haven't seen #2352 that moves away from rusuto for S3, so the Azure SDKs are the only ones that still remain.

Very exciting indeed - one more reason to get this done and have everything nice and clean :)

@tustvold tustvold merged commit 12a9d84 into apache:master Aug 13, 2022
@ursabot
Copy link

ursabot commented Aug 13, 2022

Benchmark runs are scheduled for baseline = f0d7d0b and contender = 12a9d84. 12a9d84 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@roeap roeap deleted the azure-auth branch August 13, 2022 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

object_store: Add TokenProvider authorization to azure
3 participants