Introduce StorageConnector for Azure #14660

LakshSingla · 2023-07-25T16:53:53Z

Description

This PR adds the storage connector to interact with Azure's blob storage using the current Azure API used in Druid. This will allow Durable storage and MSQ's interactive APIs to work with Azure

This also refactors the currently available S3 connector so that the chunking downloads that is currently done by the S3 connector can be extended to other connectors. (note: This refactoring is ported from the PR #14611 since that is currently parked for work).

Testing plan

Adding unit tests to the Azure connector
Functionally testing that the Azure connector works as expected.
Sanity testing that the S3 connector works as expected since it has been refactored
Performance comparison between the Azure connector (new feature) and the S3 connector (current benchmark)

Release note

Azure connector has been introduced and MSQ's fault tolerance and durable storage can now be used with Microsoft Azure's blob storage. Also the results of newly introduced queries from deep storage can now store and fetch the results from the Azure's blob storage.

Key changed/added classes in this PR

This PR has:

…zure lib

extensions-core/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureStorage.java

cryptoe · 2023-07-28T06:05:28Z

processing/src/main/java/org/apache/druid/storage/remote/ChunkingStorageConnector.java

+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicLong;
+
+public abstract class ChunkingStorageConnector<T> implements StorageConnector


Can you please java doc this since this is the crux of this PR .

...re-extensions/src/test/java/org/apache/druid/storage/azure/output/AzureOutputConfigTest.java

...ure-extensions/src/test/java/org/apache/druid/storage/azure/output/AzureOutputSerdeTest.java

adarshsanjeev · 2023-08-02T04:36:33Z

...essing/src/main/java/org/apache/druid/storage/remote/ChunkingStorageConnectorParameters.java

+    public ChunkingStorageConnectorParameters<T> build()
+    {
+      Preconditions.checkArgument(start >= 0, "'start' not provided or an incorrect value [%s] passed", start);
+      Preconditions.checkArgument(end >= 0, "'end' not provided or an incorrect value [%s] passed", end);


Would end < start return a good error message?

Updated a check with this as well in the PR!

adarshsanjeev · 2023-08-02T04:51:15Z

processing/src/main/java/org/apache/druid/storage/remote/ChunkingStorageConnector.java

+{
+  private static final long DOWNLOAD_MAX_CHUNK_SIZE_BYTES = 100_000_000;
+
+  public ChunkingStorageConnector()


Does this need to be public?

Reverted the change so that the individual connectors can control the chunk sizes. Used primarily for testing for now, though this can be extended to the real implementations as well.

adarshsanjeev

Looks good to me overall

adarshsanjeev · 2023-08-07T02:47:22Z

processing/src/main/java/org/apache/druid/storage/remote/ChunkingStorageConnector.java

+                      params.getMaxRetry()
+                  ),
+                  outFile,
+                  new byte[8 * 1024],


I know this code was only moved, but could you add a comment on why these numbers are chosen?

cryptoe

Changes LGTM. The user facing docs are remaining.

.../azure-extensions/src/main/java/org/apache/druid/storage/azure/output/AzureOutputConfig.java

LakshSingla · 2023-08-09T12:24:42Z

Thanks, @adarshsanjeev @cryptoe for the reviews and @dhananjay1308 for testing the changes out on a cluster.
Testing for Azure has been ongoing for a day. Queries for durable storage on Azure are taking comparable times to durable storage on S3, and there don't seem to be any performance concerns for the new storage connector. Going ahead with the merge.

LakshSingla added 14 commits July 5, 2023 10:41

initial commit

26f5bfb

classes n code

2b9787a

Merge branch 'master' into gcs-storage-connector

4631946

add stubs for other classes

d41e967

goc changes stash

71c50fa

version before batch delete

d07b740

storage connector final

7c482c5

Merge branch 'master' into gcs-storage-connector

39eff75

cleanup S3 storage connector

5730c35

change byte format

b119af8

add azure files

8448be4

revert gcs changes

7953f73

remove files

f657137

remove RetryableAzureOutputStream since that is already done in the a…

cffbb7e

…zure lib

github-advanced-security bot found potential problems Jul 26, 2023

View reviewed changes

extensions-core/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureStorage.java Fixed Show resolved Hide resolved

docs

efb3b85

cryptoe reviewed Jul 28, 2023

View reviewed changes

add tests, comments, validation

2782636

github-advanced-security bot found potential problems Aug 1, 2023

View reviewed changes

...re-extensions/src/test/java/org/apache/druid/storage/azure/output/AzureOutputConfigTest.java Fixed Show fixed Hide fixed

...ure-extensions/src/test/java/org/apache/druid/storage/azure/output/AzureOutputSerdeTest.java Fixed Show fixed Hide fixed

adarshsanjeev reviewed Aug 2, 2023

View reviewed changes

LakshSingla added 5 commits August 2, 2023 11:40

add coverage

86ad32a

more test coverage, review

d2b68f3

tests fix

ef18323

fix import

f87266d

add more tests

9cc6bf3

cryptoe added the Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 label Aug 4, 2023

LakshSingla added 3 commits August 4, 2023 16:11

fixup tests

7e138ef

more tests

eafe836

more coverage

47aeae6

adarshsanjeev approved these changes Aug 7, 2023

View reviewed changes

cryptoe approved these changes Aug 7, 2023

View reviewed changes

LakshSingla added 3 commits August 8, 2023 10:43

refactor, add comments

08cecb0

Merge branch 'master' into azure-storage-connector

56062f4

docs

b458bd6

github-actions bot added the Area - Documentation label Aug 8, 2023

LakshSingla added 3 commits August 8, 2023 12:13

spellcheck

850cb94

create dir before checking for permissions

8639351

check fix

575e2ec

github-advanced-security bot found potential problems Aug 8, 2023

View reviewed changes

.../azure-extensions/src/main/java/org/apache/druid/storage/azure/output/AzureOutputConfig.java Fixed Show fixed Hide fixed

build fix

e1da797

LakshSingla added the Release Notes label Aug 8, 2023

LakshSingla merged commit 8f102f9 into apache:master Aug 9, 2023

LakshSingla deleted the azure-storage-connector branch August 9, 2023 12:25

LakshSingla added this to the 28.0 milestone Oct 12, 2023

LakshSingla mentioned this pull request Nov 4, 2023

[DRAFT] 28.0.0 release notes #15326

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce StorageConnector for Azure #14660

Introduce StorageConnector for Azure #14660

LakshSingla commented Jul 25, 2023 •

edited

Loading

cryptoe Jul 28, 2023

adarshsanjeev Aug 2, 2023

LakshSingla Aug 9, 2023

adarshsanjeev Aug 2, 2023

LakshSingla Aug 7, 2023

adarshsanjeev left a comment

adarshsanjeev Aug 7, 2023

LakshSingla Aug 7, 2023

cryptoe left a comment

LakshSingla commented Aug 9, 2023

Introduce StorageConnector for Azure #14660

Introduce StorageConnector for Azure #14660

Conversation

LakshSingla commented Jul 25, 2023 • edited Loading

Description

Testing plan

Release note

Key changed/added classes in this PR

cryptoe Jul 28, 2023

Choose a reason for hiding this comment

adarshsanjeev Aug 2, 2023

Choose a reason for hiding this comment

LakshSingla Aug 9, 2023

Choose a reason for hiding this comment

adarshsanjeev Aug 2, 2023

Choose a reason for hiding this comment

LakshSingla Aug 7, 2023

Choose a reason for hiding this comment

adarshsanjeev left a comment

Choose a reason for hiding this comment

adarshsanjeev Aug 7, 2023

Choose a reason for hiding this comment

LakshSingla Aug 7, 2023

Choose a reason for hiding this comment

cryptoe left a comment

Choose a reason for hiding this comment

LakshSingla commented Aug 9, 2023

LakshSingla commented Jul 25, 2023 •

edited

Loading