Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline S3 Repository- and Client-Settings #37393

Merged

Conversation

original-brownbear
Copy link
Member

  • Make clusterstate settings override static settings
  • Make repository settings override client settings
  • Cache clients according to settings not name to prevent conflicts between multiple repositories using the same named client with different custom settings causing conflicts
    • Introduce custom implementations for the AWS credentials here to be able to use them as part of a hash key

* Make clusterstate settings override static settings
* Make repository settings override client settings
* Cache clients according to settings
   * Introduce custom implementations for the AWS credentials here to be able to use them as part of a hash key
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@original-brownbear
Copy link
Member Author

Jenkins run Gradle build tests 1

ByteSizeValue bufferSize, String cannedACL, String storageClass) {
private final Tuple<RepositoryMetaData, Settings> settingsKey;

S3BlobStore(S3Service service, String bucket, boolean serverSideEncryption,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor could be simplified now, making all the setting extraction from the metadata happen here. I just left that out of this step because it requires some noisy changes in the tests.

@ywelsch ywelsch self-requested a review January 14, 2019 11:39
@@ -51,15 +52,18 @@

private final StorageClass storageClass;

S3BlobStore(S3Service service, String clientName, String bucket, boolean serverSideEncryption,
ByteSizeValue bufferSize, String cannedACL, String storageClass) {
private final Tuple<RepositoryMetaData, Settings> settingsKey;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is perhaps a bit too coarse-grained as settings key? Should it just be repository metadata that's relevant for establishing connections?

Also, why are the settings (i.e. 2nd part of tuple) part of this key? These are static / fixed at node startup, so there's no need to cache them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ywelsch

Also, why are the settings (i.e. 2nd part of tuple) part of this key? These are static / fixed at node startup, so there's no need to cache them?

I think I was a little paranoid here with reloads from the secret store but on second thought, that's wrong it's all handled by refreshAndClearCache just fine :). I'll simplify the key.

this is perhaps a bit too coarse-grained as settings key? Should it just be repository metadata that's relevant for establishing connections?

Maybe in theory, but does it matter in practice? Creating a new repo client isn't absurdly expensive and updating the repo settings is a rare event? I agree we could create another key here but it seems like it's just a bunch of extra code complexity and then if we ever add a new relevant (for establishing the connection) setting we have to maintain this as well => maybe the rare case of needlessly creating a new client isn't a problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in theory, but does it matter in practice?

I was wondering if with this approach we never share clients between repo definitions (as these repos will probably be different in some way). Not sure how many repo definitions we typically have in clusters though and whether it matters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait :) I can't take credit for it now, but I think I actually handled this just fine already :D
The actual client instances are cached by their S3ClientSettings which only contain stuff needed for instantiating the clients.
So org.elasticsearch.repositories.s3.S3Service#clientsCache should deduplicate clients just fine. All we get are some redundant settings instances in org.elasticsearch.repositories.s3.S3Service#derivedClientSettings where two different repo-metadata keys could point to equal settings instances.

@original-brownbear
Copy link
Member Author

@ywelsch ditched the redundant use of the static settings from the cache key and client settings loading in 8f2a85b

@original-brownbear
Copy link
Member Author

Jenkins run Gradle build tests 1

@ywelsch ywelsch self-requested a review January 21, 2019 12:11
@ywelsch
Copy link
Contributor

ywelsch commented Jan 21, 2019

Can you investigate how to test this PR?

@original-brownbear
Copy link
Member Author

original-brownbear commented Jan 21, 2019

@ywelsch yes on it today tomorrow :)

@original-brownbear
Copy link
Member Author

@ywelsch alright, added an IT that shows that we're not overriding the named clients anymore when using settings in the repository definition (I realize that it's less than elegant to test it the way I did not by breaking the repo, but our current IT infrastructure makes it extremely hard to pass in a correct endpoint here because we compute the replacements in the .yml files long before Minio starts up).
Also, added a UT that tests the basic workflow of overriding the settings, let me know if this is what you had in mind :)

@original-brownbear
Copy link
Member Author

Added some short docs here 46acd77

@original-brownbear
Copy link
Member Author

I just learnt that we could write really nice tests here if the ones I added are too hack using the test fixtures plugin. Looking into #37783 now.

clientName,
metadata.name());
}

if (S3ClientSettings.checkDeprecatedCredentials(metadata.settings())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also add deprecation warnings for all other client settings in REPO settings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK, if I think about it it seems a little redundant:

  • We brought this back to life now, so it'll probably mostly be used by people getting the idea from the docs that say it's deprecated
  • As far as I understand this may not stay deprecated?

... but that said, I can add some if you want me to, I don't really have an opinion here :)

@original-brownbear original-brownbear merged commit 57823c4 into elastic:master Jan 30, 2019
@original-brownbear original-brownbear deleted the repo-setting-fixes branch January 30, 2019 05:22
@original-brownbear
Copy link
Member Author

@ywelsch thanks for the review! Merged this now so I can get going on the back port, but let me know if I should add a PR for the deprecation logging :)

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jan 30, 2019
* Make repository settings override static settings
* Cache clients according to settings
   * Introduce custom implementations for the AWS credentials here to be able to use them as part of a hash key
original-brownbear added a commit that referenced this pull request Jan 30, 2019
* Make repository settings override static settings
* Cache clients according to settings
   * Introduce custom implementations for the AWS credentials here to be able to use them as part of a hash key
@original-brownbear
Copy link
Member Author

Backported to 6.x in #38010

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jan 30, 2019
* master: (29 commits)
  Fix limit on retaining sequence number (elastic#37992)
  Docs test fix, wait for shards active.
  Revert "Revert "Documented default values for index follow request parameters. (elastic#37917)""
  Revert "Documented default values for index follow request parameters. (elastic#37917)"
  Ensure date parsing BWC compatibility (elastic#37929)
  SQL: Skip the nested and object field types in case of an ODBC request (elastic#37948)
  Use mappings to format doc-value fields by default. (elastic#30831)
  Give precedence to index creation when mixing typed templates with typeless index creation and vice-versa. (elastic#37871)
  Add classifier to tar.gz in docker compose (elastic#38011)
  Documented default values for index follow request parameters. (elastic#37917)
  Fix fetch source option in expand search phase (elastic#37908)
  Restore a noop _all metadata field for 6x indices (elastic#37808)
  Added ccr to xpack usage infrastructure (elastic#37256)
  Fix exit code for Security CLI tools  (elastic#37956)
  Streamline S3 Repository- and Client-Settings (elastic#37393)
  Add version 6.6.1 (elastic#37975)
  Ensure task metadata not null in follow test (elastic#37993)
  Docs fix - missing callout
  Types removal - deprecate include_type_name with index templates (elastic#37484)
  Handle completion suggestion without contexts
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v6.7.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants