Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more detail on storage class #31420

Closed
wants to merge 2,901 commits into from
Closed

more detail on storage class #31420

wants to merge 2,901 commits into from

Conversation

geekpete
Copy link
Member

Additional detail on how storage class setting affects snapshot files.

@colings86 colings86 added >docs General docs changes :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Jun 19, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@geekpete
Copy link
Member Author

geekpete commented Jun 19, 2018

I only wonder if it's also good to provide a recommendation to set all files in a repository to one storage class, via the storage class setting initially and with AWS API and storage class setting if changing storage class at a later point. Rather than mix/match storage classes between different snapshots in the same repository as there's not much value in this mixed approach for perceived lifecycling patterns (keep all monthlies as standard class, subsequent inter-month snaps as reduced redundancy/IA,etc) due to the incremental nature of snapshot files relying on other files in the repository without much external visibility. So provide a general best practice tip on utilising the storage class setting as setting all files to the same class in a repo.

@bleskes
Copy link
Contributor

bleskes commented Jun 20, 2018

@tlrx any opinions here?

@tlrx
Copy link
Member

tlrx commented Jun 20, 2018

This changes looks good. Maybe we could add a similar precision fo canned_acl?

I agree on the fact that mixing storage classes within the same repository do not add a lot of value. I wish this setting was not exposed at all and that we recommend to use S3 Lifecycle Policies for setting the storage class. Maybe we can write a word about this, and emphasises that we recommand to use the same storage class for all files in the same repo?

@geekpete
Copy link
Member Author

I added a bit about recommended usage of storage_class.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an enormous fan of this change (in isolation) for a handful of reasons:

  • we do not describe how we write objects to the repository, so after adding this I think users will still struggle to understand the implications of the various options for storage_class. I'm not sure if we want to describe the repository layout in enough detail to address this.

  • I do not think we should explicitly recommend setting all objects to the same storage class. It makes the most sense to me, but the choice of storage class is irrelevant to Elasticsearch, and I think users should feel free to set it to whatever they want according to their usage.

  • we should really call them "objects" rather than "files" in the context of S3.

@DaveCTurner
Copy link
Contributor

Incidentally I just realised that lifecycle policies totally make sense on time-based indices, in contradiction to what I said in Slack.

@geekpete
Copy link
Member Author

So these are all good points.
Do we need to just specify that this setting only affects newly created objects and leave it at that?

@tlrx
Copy link
Member

tlrx commented Jun 26, 2018

we do not describe how we write objects to the repository, so after adding this I think users will still struggle to understand the implications of the various options for storage_class. I'm not sure if we want to describe the repository layout in enough detail to address this.

I don't think that we should describe how we write objects to the repository and I don't think it would help anyway. But I think it is valuable for users to understand the implication of changing the storage class on an existing repository, ie that the storage class is applied to new snapshotted files only. The explanation of incremental snapshots looks sufficient to me.

I do not think we should explicitly recommend setting all objects to the same storage class. It makes the most sense to me, but the choice of storage class is irrelevant to Elasticsearch, and I think users should feel free to set it to whatever they want according to their usage.

I agree that the choice of storage class is out of scope and I don't think this change recommend to use a specific storage class? I still think that we must indicate a) what is the default storage class, b) changing the setting on an existing repository only change the storage class for newly created objects resulting in a mixed usage of storage classes in the repo and c) users may prefer use Lifecycle Policies to manage storage classes.

@DaveCTurner
Copy link
Contributor

I don't think this change recommend to use a specific storage class?

That's right, but it does recommend using the same storage class for all objects in the repo, and updating the storage class of existing objects via the API when changing the repo's storage class setting. I don't think we should do that.

I still think that we must indicate a) what is the default storage class, b) changing the setting on an existing repository only change the storage class for newly created objects resulting in a mixed usage of storage classes in the repo and c) users may prefer use Lifecycle Policies to manage storage classes.

+1

@tlrx
Copy link
Member

tlrx commented Jun 26, 2018

That's right, but it does recommend using the same storage class for all objects in the repo, and updating the storage class of existing objects via the API when changing the repo's storage class setting. I don't think we should do that.

Oh I see. Yes I agree to on that point.

@geekpete
Copy link
Member Author

Made some changes to the wording based on recent comments.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested some minor tweaks. Also could you wrap the text so the lines are all about the same length?

@@ -190,8 +190,12 @@ The following settings are supported:

`storage_class`::

Sets the S3 storage class type for the backup files. Values may be
Sets the S3 storage class for snapshot repository objects. Values may be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer "objects stored in the snapshot repository" instead of "snapshot repository objects"?

`standard`, `reduced_redundancy`, `standard_ia`. Defaults to `standard`.
All new objects created will use the set storage class.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sentence can be removed - the subsequent sentence says this and more.

`standard`, `reduced_redundancy`, `standard_ia`. Defaults to `standard`.
All new objects created will use the set storage class.
Changing this setting on an existing repository only affects the storage class for newly created objects,
resulting in a mixed usage of storage classes. Alternatively, S3 Lifecycle Policies can be used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Alternatively" suggests you can't do both. "Additionally" would be better.

All new objects created will use the set storage class.
Changing this setting on an existing repository only affects the storage class for newly created objects,
resulting in a mixed usage of storage classes. Alternatively, S3 Lifecycle Policies can be used
to manage storage class.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be better to say "the storage class of existing objects" instead of just "storage class" here.

@geekpete
Copy link
Member Author

Further updates made based on suggestions.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @geekpete

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@DaveCTurner
Copy link
Contributor

This branched off master some time ago so I wouldn't be surprised if this build fails. If it does, merging in the latest master should fix it.

@geekpete
Copy link
Member Author

geekpete commented Aug 1, 2018

@elasticmachine retest this please

@DaveCTurner
Copy link
Contributor

It's unlikely that retesting this will lead to a green build. You need to merge the latest master since we've released versions since this branch was cut, so the list of released versions is out of date, causing this:

03:17:18 Execution failed for task ':distribution:bwc:next-bugfix-snapshot:buildBwcVersion'.
03:17:18 > Building bwc version didn't generate expected files [/var/lib/jenkins/workspace/elastic+elasticsearch+pull-request/distribution/bwc/next-bugfix-snapshot/build/bwc/checkout-6.3/distribution/archives/oss-zip/build/distributions/elasticsearch-oss-6.3.1-SNAPSHOT.zip, /var/lib/jenkins/workspace/elastic+elasticsearch+pull-request/distribution/bwc/next-bugfix-snapshot/build/bwc/checkout-6.3/distribution/archives/zip/build/distributions/elasticsearch-6.3.1-SNAPSHOT.zip, /var/lib/jenkins/workspace/elastic+elasticsearch+pull-request/distribution/bwc/next-bugfix-snapshot/build/bwc/checkout-6.3/distribution/packages/oss-deb/build/distributions/elasticsearch-oss-6.3.1-SNAPSHOT.deb, /var/lib/jenkins/workspace/elastic+elasticsearch+pull-request/distribution/bwc/next-bugfix-snapshot/build/bwc/checkout-6.3/distribution/packages/deb/build/distributions/elasticsearch-6.3.1-SNAPSHOT.deb, /var/lib/jenkins/workspace/elastic+elasticsearch+pull-request/distribution/bwc/next-bugfix-snapshot/build/bwc/checkout-6.3/distribution/packages/oss-rpm/build/distributions/elasticsearch-oss-6.3.1-SNAPSHOT.rpm, /var/lib/jenkins/workspace/elastic+elasticsearch+pull-request/distribution/bwc/next-bugfix-snapshot/build/bwc/checkout-6.3/distribution/packages/rpm/build/distributions/elasticsearch-6.3.1-SNAPSHOT.rpm]

imotov and others added 9 commits October 26, 2018 14:23
It seems that this statement is a debug leftover since it currently
adds an error message `{"jobs":[]}` after each successful REST test.
When dealing with a null group, the associated metric aggs need to
return null as well

Fix #34896
* Removed methods are just unused (the exceptions being isGeoPoint() and is
isFloatingPoint() but those could more efficiently be replaced by enum comparisons to simplify the code)
* Remove exceptions aren't thrown
Clarifies the width rules for snippets.
we are restructuring the docs, this migrates ILM docs outside of the x-pack doc structure.
`AbstractComponent` is trouble because its name implies that
*everything* should extend from it. It *is* useful, but maybe too
broadly useful. The things it offers access too, the `Settings` instance
for the entire server and a logger are nice to have around, but not
really needed *everywhere*. The `Settings` instance especially adds a
fair bit of ceremony to testing without any value.

This removes the `nodeName` method from `AbstractComponent` so it is
more clear where we actually need the node name.
Drops the `deprecationLogger` from `AbstractComponent`, moving it to
places where we need it. This saves us from building a bunch of
`DeprecationLogger`s that we don't need.

Relates to #34488
insukcho and others added 22 commits November 7, 2018 17:01
Sometimes users are confused about whether they can use the Convert Processor
for changing an existing fields type to other types even if the existing one is already
ingested. This confusion is from the first line of description. Changing this and also
adding a some detail to the code snippet.
* Moved `AcknowledgedResponse` to core package
* Made AcknowledgedResponse not abstract and provided a default parser,
so that in cases when the field name is not overwritten then there
is no need for a subclass.

Relates to #33824
* Consolidate the name of the qualified build version

* Field name in response should not be redundant
Today we only apply `ingore_throttled` to expansions from wildcards,
date math expressions and aliases. Yet, this is tricky since we might
have resolved certain expressions in pre-filter steps like security.
It's more consistent to apply this logic to all expressions including
concrete indices.

Relates to #34354
* ML: Adding missing data check class

* reverting bad change

* Adding bucket + missing data object for returns

* reverting unnecessary change

* adding license header

* Make client calls synchronous, akin to DatafeedJob

* Fixing line length

* Renaming things, addressing PR comments
Adds basic rolling upgrade tests to check that lifecycles are still recognizable between rolling cluster upgrades and managed indices stay managed.

This is a placeholder for discussing types of checks so they are ready once we backported

This is a re-boot of the previous PR against index-lifecycle that needed to be
reverted due to CI bwc issues. #32828
An auto follow pattern:
* cannot start with `_`
* cannot contain a `,`
* can be encoded in UTF-8
* the length of UTF-8 encoded bytes is no longer than 255 bytes
removes fake allocation id after recovery is done

Relates to #33432
This change adds a `frozen` engine that allows lazily open a directory reader
on a read-only shard. The engine wraps general purpose searchers in a LazyDirectoryReader
that also allows to release and reset the underlying index readers after any and before
secondary search phases.

Relates to #34352
…query (#35160)

Today when a percolator query contains a date range then the query
analyzer extracts that range, so that at search time the `percolate` query
can exclude percolator queries efficiently that are never going to match.

The problem is that if 'now' is used it is evaluated at index time.
So the idea is to rewrite date ranges with 'now' to a match all query, 
so that the query analyzer can't extract it and the `percolate` query 
is  then able to evaluate 'now' at query time.
* [ILM] Check shard and relocation status in AllocationRoutedStep

This is a follow-up from #35161 where we now check for started and relocating
state in `AllocationRoutedStep`.

Resolves #35258
A CCR test failure shows that the approach in #34474 is flawed.
Restoring the LocalCheckpointTracker from an index commit can cause both
FollowingEngine and InternalEngine to incorrectly ignore some deletes.

Here is a small scenario illustrating the problem:

1. Delete doc with seq=1 => engine will add a delete tombstone to Lucene

2. Flush a commit consisting of only the delete tombstone

3. Index doc with seq=0  => engine will add that doc to Lucene but soft-deleted

4. Restart an engine with the commit (step 2); the engine will fill its
LocalCheckpointTracker with the delete tombstone in the commit

5. Replay the local translog in reverse order: index#0 then delete#1

6. When process index#0, an engine will add it into Lucene as a live doc
and advance the local checkpoint to 1 (seq#1 was restored from the
commit - step 4).

7. When process delete#1, an engine will skip it because seq_no=1 is
less than or equal to the local checkpoint.

We should have zero document after recovering from translog, but here we
have one.

Since all operations after the local checkpoint of the safe commit are
retained, we should find them if the look-up considers also soft-deleted
documents. This PR fills the disparity between the version map and the
local checkpoint tracker by taking soft-deleted documents into account
while resolving strategy for engine operations.

Relates #34474
Relates #33656
Since it's still possible to shrink an index when replicas are unassigned, we
should not check that all copies are available when performing the shrink, since
we set the allocation requirement for a single node.

Resolves #35321
#35271)

Ensure that Watcher is correctly started and stopped between tests for
SmokeTestWatcherWithSecurityIT,
SmokeTestWatcherWithSecurityClientYamlTestSuiteIT,
SmokeTestWatcherTestSuiteIT, WatcherRestIT,
XDocsClientYamlTestSuiteIT, and XPackRestIT

The change here is to throw an `AssertionError` instead of `break;` to
allow the `assertBusy()` to continue to busy wait until the desired
state is reached.

closes #33291, closes #29877, closes #34462, closes #30705, closes #34448
The lookup vars under params (namely _fields and _source) were
inadvertently removed when scoring scripts were converted to using
script contexts. This commit adds them back, along with deprecation
warnings for those that should not be used.
Additional detail on how storage class setting affects snapshot files.
Additional detail on how storage class setting affects snapshot files.
@cbuescher
Copy link
Member

@geekpete @dct something went wrong with the master merge it seems. Is there any way to recover and maybe merge this?

@geekpete
Copy link
Member Author

geekpete commented Nov 8, 2018

At this point I'd almost just do a new PR against recent master and just close this one and reference it from the new one, unless there's some force rebase/fix I can do to get this cleaned up...

@cbuescher
Copy link
Member

At this point I'd almost just do a new PR against recent master

+1 to that, if you have the last version of the PR still around.

geekpete added a commit that referenced this pull request Nov 9, 2018
A clean PR from original reference PR with discussion here: #31420
@geekpete
Copy link
Member Author

geekpete commented Nov 9, 2018

Closed in favour of #35400

@geekpete geekpete closed this Nov 9, 2018
@geekpete geekpete deleted the geekpete-patch-5 branch November 30, 2018 05:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >docs General docs changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.