-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Improve Azure e2e/integration testing #497
Comments
Thanks @thovoll , I agree with both points. The replay based mocking method looks quite interesting 👍 |
I had a deeper look into the The price we have to pay however is that we need to refactor the whole object store implementation :D, but I think it's worth the effort. Since the question of authorization has been raised (#527, #499), that could happen as well in the migration. Trying to figure out authorization in the new datalake client I got a bit confused however. To me it seems - and I might be totally off - that since the storage core crate is not yet migrated, there are actually two authorization mechanisms happening at once. Whatever the underlying storage account client uses, as well as the authorization policy which right now only accepts a bearer token. MY personal feeling is the best way froward is to try and advance the migration in the azure crates to a point where storage core and datalake are using the pipeline architecture. As for auto-refreshing credentials, right now it seems that that API does not allow us to customize the pipeline policy - I guess that would have to change? For the python clients I know that there is auto-refreshing build into the SDKs, maybe such a policy is something foe the azure SDKs as well? Otherwise I believe adopting the current mechanics for auto-refresh (mutex etc.) to a custom policy is quite straight forward, and will even be much simpler, since we no longer have to replace the entire client, and just have to refresh the actual token within the policy. Also looking at the latest code specifically in the cosmos crate I believe the lifetime of a return stream is no longer tied to the client, which (hopefully) makes for a nicer implementation of the list_* calls. @thovoll, did you look into some of this already? I'd be happy to pick up some work, or if desired try and contribute some of these changes in the azure repo. |
Thanks @roeap, you are correct on ALL counts and I agree with the direction. Would be great to get your help on this. I'm restarting work soon and will finish #499 as discussed in #489, taking the shortest path to getting writes to work - I want to start testing the basic write path. Once that is in place, let's proceed as you outlined above. |
On the matter of two authorization methods being used at the moment, I think the next step should be Azure/azure-sdk-for-rust#490. After that, the other parts of storage and data_lake can evolve independently. The next step for data_lake would probably be Azure/azure-sdk-for-rust#426. And finally Azure/azure-sdk-for-rust#543. At that point we are in a pretty good place from a delta-rs perspective. |
@thovoll, Happy to support where I can! Very much looking forward to having this integration :). For some personal experiments I needed a Kusto client and recently refactored this to be compatible with the pipelines architecture. As part of this there is an auth policy based on token credentials and using essentially the same auto.refresh as in this crate. Could this already be a starting point? I think the link above could at least address Azure/azure-sdk-for-rust#426, the auto refresh works as well, but requires tokio as a dependency right now. From what I understand the azure crates are so far meant to be independent of a specific runtime. In any case, mode elaborate features leveraging refresh tokens etc. are not part of this in any case. As for Azure/azure-sdk-for-rust#490, I could give this a try. Assumption being that migrating to the pipeline policy already gets us most of the way. Details are probably better discussed over in the Azure repo? |
Thanks @roeap, yes let's talk details over in the Azure repo :) |
Description
Currently, there is an ignored test for Azure Storage that relies on @rtyler's Azure Resources.
I propose the following incremental improvements:
A bit more info on 2. The way it works is that integration tests can be run in record mode, where the tests call real Azure resources and capture the API responses (which are then committed to the repo). In CI, the integration tests are run in replay mode where no real Azure resources are involved and the HTTP calls get intercepted and the stored API responses are returned. More info is here.
This approach is different from AWS S3 where I understand the local simulator is workable. For Azure, this isn't currently feasible since Azurite doesn't yet support ADLS Gen2. As an alternative to the mock approach described above, we could wait until Azurite does support ADLS Gen2.
A bit more on 1. If we need static files to be present in the storage account, we can commit them to the repo along with a script that uploads those files as part of local dev setup. We'd also have to make sure all the endpoint and auth info is read from environment variables.
Related Issue(s)
#486 (comment)
The text was updated successfully, but these errors were encountered: