Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add minimal docs around upgrading clusters with ccr enabled #38037

Merged
merged 7 commits into from
Mar 13, 2019

Conversation

martijnvg
Copy link
Member

No description provided.

@martijnvg martijnvg added >docs General docs changes v7.0.0 :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features v6.7.0 labels Jan 30, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Feb 3, 2019
This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
the the follower cluster is started and tests execute against this two cluster.

Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)

This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.

This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.

Relates to elastic#37231 and elastic#38037
@jasontedor jasontedor added v8.0.0 and removed v7.0.0 labels Feb 6, 2019
martijnvg added a commit that referenced this pull request Feb 12, 2019
This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
then the follower cluster is started and tests execute against this two cluster.

Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)

This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.

This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.

Relates to #37231 and #38037
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Feb 13, 2019
This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
then the follower cluster is started and tests execute against this two cluster.

Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)

This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.

This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.

Relates to elastic#37231 and elastic#38037
martijnvg added a commit that referenced this pull request Feb 14, 2019
* Add rolling upgrade multi cluster test module (#38277)

This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
then the follower cluster is started and tests execute against this two cluster.

Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)

This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.

This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.

Relates to #37231 and #38037

* Filter out upgraded version index settings when starting index following (#38838)

The `index.version.upgraded` and `index.version.upgraded_string` are likely
to be different between leader and follower index. In the event that
a follower index gets restored on a upgraded node while the leader index
is still on non-upgraded nodes.

Closes #38835
martijnvg added a commit that referenced this pull request Feb 14, 2019
* Add rolling upgrade multi cluster test module (#38277)

This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
then the follower cluster is started and tests execute against this two cluster.

Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)

This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.

This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.

Relates to #37231 and #38037

* Filter out upgraded version index settings when starting index following (#38838)

The `index.version.upgraded` and `index.version.upgraded_string` are likely
to be different between leader and follower index. In the event that
a follower index gets restored on a upgraded node while the leader index
is still on non-upgraded nodes.

Closes #38835
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Feb 14, 2019
* Add rolling upgrade multi cluster test module (elastic#38277)

This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
then the follower cluster is started and tests execute against this two cluster.

Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)

This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.

This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.

Relates to elastic#37231 and elastic#38037

* Filter out upgraded version index settings when starting index following (elastic#38838)

The `index.version.upgraded` and `index.version.upgraded_string` are likely
to be different between leader and follower index. In the event that
a follower index gets restored on a upgraded node while the leader index
is still on non-upgraded nodes.

Closes elastic#38835
martijnvg added a commit that referenced this pull request Feb 14, 2019
* Add rolling upgrade multi cluster test module (#38277)

This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
then the follower cluster is started and tests execute against this two cluster.

Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)

This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.

This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.

Relates to #37231 and #38037

* Filter out upgraded version index settings when starting index following (#38838)

The `index.version.upgraded` and `index.version.upgraded_string` are likely
to be different between leader and follower index. In the event that
a follower index gets restored on a upgraded node while the leader index
is still on non-upgraded nodes.

Closes #38835
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Feb 15, 2019
Follow index in follow cluster that follows an index in the leader cluster and another
follow index in the leader index that follows that index in the follow cluster.

During the upgrade index following is paused and after the upgrade
index following is resumed and then verified index following works as expected.

Relates to elastic#38037
martijnvg added a commit that referenced this pull request Feb 18, 2019
Follow index in follow cluster that follows an index in the leader cluster and another
follow index in the leader index that follows that index in the follow cluster.

During the upgrade index following is paused and after the upgrade
index following is resumed and then verified index following works as expected.

Relates to #38037
martijnvg added a commit that referenced this pull request Feb 18, 2019
Follow index in follow cluster that follows an index in the leader cluster and another
follow index in the leader index that follows that index in the follow cluster.

During the upgrade index following is paused and after the upgrade
index following is resumed and then verified index following works as expected.

Relates to #38037
martijnvg added a commit that referenced this pull request Feb 18, 2019
Follow index in follow cluster that follows an index in the leader cluster and another
follow index in the leader index that follows that index in the follow cluster.

During the upgrade index following is paused and after the upgrade
index following is resumed and then verified index following works as expected.

Relates to #38037
martijnvg added a commit that referenced this pull request Feb 18, 2019
Follow index in follow cluster that follows an index in the leader cluster and another
follow index in the leader index that follows that index in the follow cluster.

During the upgrade index following is paused and after the upgrade
index following is resumed and then verified index following works as expected.

Relates to #38037
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up, and sorry for the delay in reviewing @martijnvg. I left some comments.

finally the first cluster.

[float]
==== Bidirectional index following
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Bidirectional/Bi-directional

[float]
==== Bidirectional index following

In this kind of setup clusters index following happens in multiple clusters and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a bi-directional setup between two clusters, each cluster contains both leader and follower indices.

In this kind of setup clusters index following happens in multiple clusters and
so each cluster contains both leader and follower indices.

When upgrading clusters in this setup, index following needs to be paused prior
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/index following/all index following

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps link to the pause API?

so each cluster contains both leader and follower indices.

When upgrading clusters in this setup, index following needs to be paused prior
to upgrading all the clusters. After all clusters have been upgraded then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/all the/both

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/After all/After both


When upgrading clusters in this setup, index following needs to be paused prior
to upgrading all the clusters. After all clusters have been upgraded then
index following can be resumed. Pausing index following is required, otherwise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps link to the resume API?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the last sentence can be dropped.

Otherwise index following may fail during a rolling upgrade, because of the following reasons:

* If a new index setting or mapping type is replicated from an upgraded cluster
to a not upgraded cluster then the upgraded cluster will reject that and will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/not upgraded/non-upgraded

s/the upgraded cluster/the non-upgraded cluster

to a not upgraded cluster then the upgraded cluster will reject that and will
fail index following.
* Lucene is not forwards compatible and when index following is falling back to
file based recovery then a node in a not upgraded cluster will reject index files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/not upgraded/non-upgraded

file based recovery then a node in a not upgraded cluster will reject index files
from a newer Lucene version compared to what it is using.

Rolling upgrading clusters with CCR is different in case of unidirectional index following and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/unidirectional/uni-directional

from a newer Lucene version compared to what it is using.

Rolling upgrading clusters with CCR is different in case of unidirectional index following and
bidirectional index following.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/bidirectional/bi-directional

bidirectional index following.

[float]
==== Unidirectional index following
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Unidirectional/Uni-directional

@martijnvg martijnvg requested a review from jasontedor March 8, 2019 08:41
@martijnvg
Copy link
Member Author

Thanks for reviewing @jasontedor. I've done the smaller changes in the first commit and the moving the upgrade docs into a separate page in the second commit.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one more comment.

For example if there is a first cluster that contains all leader indices,
a second cluster that follows indices in the first cluster and a third
cluster that follows indices in the second clusters. In this case the
third cluster should be upgraded first, then the second cluster and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should avoid referring to the clusters as "first", "second", and "third" only to turn around and say "third cluster [...] first", that's confusing. How about A, B, and C?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed: a91bbda

@martijnvg martijnvg requested a review from jasontedor March 11, 2019 07:40
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two more comments, sorry I should have noticed this on the first round.

[[ccr-upgrading]]
== Upgrading clusters

In case of upgrading clusters, clusters that are actively using CCR, require a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have noticed this before. Let's use {ccr}. Let's reword this first sentence to: Clusters that are actively using {ccr} require a careful approach to upgrades.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed: e7c6f01

file based recovery then a node in a non-upgraded cluster will reject index
files from a newer Lucene version compared to what it is using.

Rolling upgrading clusters with CCR is different in case of uni-directional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, {ccr}.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed: e7c6f01

@martijnvg martijnvg requested a review from jasontedor March 12, 2019 09:01
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left a minor nit, no need for another round.

using the {ref}/ccr-post-pause-follow.html[pause follower API] prior to
upgrading both clusters. After both clusters have been upgraded then index
following can be resumed using the
{ref}/ccr-post-resume-follow.html[resume follower API]].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you end with a newline here?

@jasontedor
Copy link
Member

Thanks @martijnvg!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >docs General docs changes v6.7.0 v7.0.0-rc1 v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants