-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
more detail on storage class #31420
more detail on storage class #31420
Conversation
Pinging @elastic/es-distributed |
I only wonder if it's also good to provide a recommendation to set all files in a repository to one storage class, via the storage class setting initially and with AWS API and storage class setting if changing storage class at a later point. Rather than mix/match storage classes between different snapshots in the same repository as there's not much value in this mixed approach for perceived lifecycling patterns (keep all monthlies as standard class, subsequent inter-month snaps as reduced redundancy/IA,etc) due to the incremental nature of snapshot files relying on other files in the repository without much external visibility. So provide a general best practice tip on utilising the storage class setting as setting all files to the same class in a repo. |
@tlrx any opinions here? |
This changes looks good. Maybe we could add a similar precision fo canned_acl? I agree on the fact that mixing storage classes within the same repository do not add a lot of value. I wish this setting was not exposed at all and that we recommend to use S3 Lifecycle Policies for setting the storage class. Maybe we can write a word about this, and emphasises that we recommand to use the same storage class for all files in the same repo? |
I added a bit about recommended usage of storage_class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an enormous fan of this change (in isolation) for a handful of reasons:
-
we do not describe how we write objects to the repository, so after adding this I think users will still struggle to understand the implications of the various options for
storage_class
. I'm not sure if we want to describe the repository layout in enough detail to address this. -
I do not think we should explicitly recommend setting all objects to the same storage class. It makes the most sense to me, but the choice of storage class is irrelevant to Elasticsearch, and I think users should feel free to set it to whatever they want according to their usage.
-
we should really call them "objects" rather than "files" in the context of S3.
Incidentally I just realised that lifecycle policies totally make sense on time-based indices, in contradiction to what I said in Slack. |
So these are all good points. |
I don't think that we should describe how we write objects to the repository and I don't think it would help anyway. But I think it is valuable for users to understand the implication of changing the storage class on an existing repository, ie that the storage class is applied to new snapshotted files only. The explanation of incremental snapshots looks sufficient to me.
I agree that the choice of storage class is out of scope and I don't think this change recommend to use a specific storage class? I still think that we must indicate a) what is the default storage class, b) changing the setting on an existing repository only change the storage class for newly created objects resulting in a mixed usage of storage classes in the repo and c) users may prefer use Lifecycle Policies to manage storage classes. |
That's right, but it does recommend using the same storage class for all objects in the repo, and updating the storage class of existing objects via the API when changing the repo's storage class setting. I don't think we should do that.
+1 |
Oh I see. Yes I agree to on that point. |
Made some changes to the wording based on recent comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested some minor tweaks. Also could you wrap the text so the lines are all about the same length?
docs/plugins/repository-s3.asciidoc
Outdated
@@ -190,8 +190,12 @@ The following settings are supported: | |||
|
|||
`storage_class`:: | |||
|
|||
Sets the S3 storage class type for the backup files. Values may be | |||
Sets the S3 storage class for snapshot repository objects. Values may be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer "objects stored in the snapshot repository" instead of "snapshot repository objects"?
docs/plugins/repository-s3.asciidoc
Outdated
`standard`, `reduced_redundancy`, `standard_ia`. Defaults to `standard`. | ||
All new objects created will use the set storage class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this sentence can be removed - the subsequent sentence says this and more.
docs/plugins/repository-s3.asciidoc
Outdated
`standard`, `reduced_redundancy`, `standard_ia`. Defaults to `standard`. | ||
All new objects created will use the set storage class. | ||
Changing this setting on an existing repository only affects the storage class for newly created objects, | ||
resulting in a mixed usage of storage classes. Alternatively, S3 Lifecycle Policies can be used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Alternatively" suggests you can't do both. "Additionally" would be better.
docs/plugins/repository-s3.asciidoc
Outdated
All new objects created will use the set storage class. | ||
Changing this setting on an existing repository only affects the storage class for newly created objects, | ||
resulting in a mixed usage of storage classes. Alternatively, S3 Lifecycle Policies can be used | ||
to manage storage class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be better to say "the storage class of existing objects" instead of just "storage class" here.
Further updates made based on suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @geekpete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This branched off |
@elasticmachine retest this please |
It's unlikely that retesting this will lead to a green build. You need to merge the latest
|
It seems that this statement is a debug leftover since it currently adds an error message `{"jobs":[]}` after each successful REST test.
When dealing with a null group, the associated metric aggs need to return null as well Fix #34896
* Removed methods are just unused (the exceptions being isGeoPoint() and is isFloatingPoint() but those could more efficiently be replaced by enum comparisons to simplify the code) * Remove exceptions aren't thrown
Clarifies the width rules for snippets.
we are restructuring the docs, this migrates ILM docs outside of the x-pack doc structure.
`AbstractComponent` is trouble because its name implies that *everything* should extend from it. It *is* useful, but maybe too broadly useful. The things it offers access too, the `Settings` instance for the entire server and a logger are nice to have around, but not really needed *everywhere*. The `Settings` instance especially adds a fair bit of ceremony to testing without any value. This removes the `nodeName` method from `AbstractComponent` so it is more clear where we actually need the node name.
Drops the `deprecationLogger` from `AbstractComponent`, moving it to places where we need it. This saves us from building a bunch of `DeprecationLogger`s that we don't need. Relates to #34488
Sometimes users are confused about whether they can use the Convert Processor for changing an existing fields type to other types even if the existing one is already ingested. This confusion is from the first line of description. Changing this and also adding a some detail to the code snippet.
* Moved `AcknowledgedResponse` to core package * Made AcknowledgedResponse not abstract and provided a default parser, so that in cases when the field name is not overwritten then there is no need for a subclass. Relates to #33824
* Consolidate the name of the qualified build version * Field name in response should not be redundant
Today we only apply `ingore_throttled` to expansions from wildcards, date math expressions and aliases. Yet, this is tricky since we might have resolved certain expressions in pre-filter steps like security. It's more consistent to apply this logic to all expressions including concrete indices. Relates to #34354
* ML: Adding missing data check class * reverting bad change * Adding bucket + missing data object for returns * reverting unnecessary change * adding license header * Make client calls synchronous, akin to DatafeedJob * Fixing line length * Renaming things, addressing PR comments
Adds basic rolling upgrade tests to check that lifecycles are still recognizable between rolling cluster upgrades and managed indices stay managed. This is a placeholder for discussing types of checks so they are ready once we backported This is a re-boot of the previous PR against index-lifecycle that needed to be reverted due to CI bwc issues. #32828
An auto follow pattern: * cannot start with `_` * cannot contain a `,` * can be encoded in UTF-8 * the length of UTF-8 encoded bytes is no longer than 255 bytes
removes fake allocation id after recovery is done Relates to #33432
This change adds a `frozen` engine that allows lazily open a directory reader on a read-only shard. The engine wraps general purpose searchers in a LazyDirectoryReader that also allows to release and reset the underlying index readers after any and before secondary search phases. Relates to #34352
…query (#35160) Today when a percolator query contains a date range then the query analyzer extracts that range, so that at search time the `percolate` query can exclude percolator queries efficiently that are never going to match. The problem is that if 'now' is used it is evaluated at index time. So the idea is to rewrite date ranges with 'now' to a match all query, so that the query analyzer can't extract it and the `percolate` query is then able to evaluate 'now' at query time.
A CCR test failure shows that the approach in #34474 is flawed. Restoring the LocalCheckpointTracker from an index commit can cause both FollowingEngine and InternalEngine to incorrectly ignore some deletes. Here is a small scenario illustrating the problem: 1. Delete doc with seq=1 => engine will add a delete tombstone to Lucene 2. Flush a commit consisting of only the delete tombstone 3. Index doc with seq=0 => engine will add that doc to Lucene but soft-deleted 4. Restart an engine with the commit (step 2); the engine will fill its LocalCheckpointTracker with the delete tombstone in the commit 5. Replay the local translog in reverse order: index#0 then delete#1 6. When process index#0, an engine will add it into Lucene as a live doc and advance the local checkpoint to 1 (seq#1 was restored from the commit - step 4). 7. When process delete#1, an engine will skip it because seq_no=1 is less than or equal to the local checkpoint. We should have zero document after recovering from translog, but here we have one. Since all operations after the local checkpoint of the safe commit are retained, we should find them if the look-up considers also soft-deleted documents. This PR fills the disparity between the version map and the local checkpoint tracker by taking soft-deleted documents into account while resolving strategy for engine operations. Relates #34474 Relates #33656
Since it's still possible to shrink an index when replicas are unassigned, we should not check that all copies are available when performing the shrink, since we set the allocation requirement for a single node. Resolves #35321
#35271) Ensure that Watcher is correctly started and stopped between tests for SmokeTestWatcherWithSecurityIT, SmokeTestWatcherWithSecurityClientYamlTestSuiteIT, SmokeTestWatcherTestSuiteIT, WatcherRestIT, XDocsClientYamlTestSuiteIT, and XPackRestIT The change here is to throw an `AssertionError` instead of `break;` to allow the `assertBusy()` to continue to busy wait until the desired state is reached. closes #33291, closes #29877, closes #34462, closes #30705, closes #34448
The lookup vars under params (namely _fields and _source) were inadvertently removed when scoring scripts were converted to using script contexts. This commit adds them back, along with deprecation warnings for those that should not be used.
Additional detail on how storage class setting affects snapshot files.
Additional detail on how storage class setting affects snapshot files.
At this point I'd almost just do a new PR against recent master and just close this one and reference it from the new one, unless there's some force rebase/fix I can do to get this cleaned up... |
+1 to that, if you have the last version of the PR still around. |
A clean PR from original reference PR with discussion here: #31420
Closed in favour of #35400 |
Additional detail on how storage class setting affects snapshot files.