Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rolling upgrade multi cluster test module #38277

Merged

Conversation

martijnvg
Copy link
Member

This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
the the follower cluster is started and tests execute against this two cluster.

Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)

This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.

This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.

Relates to #37231 and #38037

This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
the the follower cluster is started and tests execute against this two cluster.

Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)

This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.

This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.

Relates to elastic#37231 and elastic#38037
@martijnvg martijnvg added >test Issues or PRs that are addressing/adding tests v7.0.0 :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features v6.7.0 labels Feb 3, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@alpar-t alpar-t added the :Delivery/Build Build or test infrastructure label Feb 4, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@martijnvg
Copy link
Member Author

This PR is the first step of properly testing ccr during a rolling upgrade. Currently there is only a test that verifies that unidirectional index following works whiling doing a rolling upgrade. As a follow up auto follow patterns should be tested and also bidirection index following should be tested while doing a rolling upgrade (after we decided how we think bi-directional index following should be working during a rolling upgrade).

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these tests @martijnvg. I've left some smaller comments, looking very good already.

x-pack/qa/rolling-upgrade-multi-cluster/build.gradle Outdated Show resolved Hide resolved
x-pack/qa/rolling-upgrade-multi-cluster/build.gradle Outdated Show resolved Hide resolved
// At this point all nodes in both clusters have been updated and
// the leader cluster can now will leader_index4 in the follower cluster:
followIndex(leaderClient(), "follower", "leader_index4", "follower_index4");
assertBusy(() -> verifyTotalHitCount("follower_index4", 64, leaderClient()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps increase timeout on the assertBusy (same for the other ones in this class).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think the timeouts need to be increased, because CI workers may be slow?
Locally I have not seen this fail, because there wasn't enough time to replicate the documents from leader to follower index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was worrying about slower CI workers. The assertBusy also waits on timed events e.g. on the internal auto-refresh on the follower index. Perhaps we can do an explicit refresh to speed things up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 yes, that makes sense.

ResponseException e = expectThrows(ResponseException.class,
() -> followIndex(leaderClient(), "follower", "leader_index4", "follower_index4"));
assertThat(e.getMessage(), containsString("the snapshot was created with Elasticsearch version ["));
assertThat(e.getMessage(), containsString("] which is higher than the version of this node ["));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should improve the exception message here /cc @tbrooks8

@ywelsch ywelsch added the v6.7.0 label Feb 11, 2019
@martijnvg
Copy link
Member Author

@ywelsch I've updated the PR.

@ywelsch ywelsch self-requested a review February 11, 2019 18:14
@martijnvg martijnvg merged commit f6e8654 into elastic:master Feb 12, 2019
@martijnvg
Copy link
Member Author

I will let this bake on master for a while before backporting.

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Feb 13, 2019
This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
then the follower cluster is started and tests execute against this two cluster.

Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)

This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.

This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.

Relates to elastic#37231 and elastic#38037
martijnvg added a commit that referenced this pull request Feb 14, 2019
* Add rolling upgrade multi cluster test module (#38277)

This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
then the follower cluster is started and tests execute against this two cluster.

Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)

This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.

This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.

Relates to #37231 and #38037

* Filter out upgraded version index settings when starting index following (#38838)

The `index.version.upgraded` and `index.version.upgraded_string` are likely
to be different between leader and follower index. In the event that
a follower index gets restored on a upgraded node while the leader index
is still on non-upgraded nodes.

Closes #38835
martijnvg added a commit that referenced this pull request Feb 14, 2019
* Add rolling upgrade multi cluster test module (#38277)

This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
then the follower cluster is started and tests execute against this two cluster.

Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)

This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.

This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.

Relates to #37231 and #38037

* Filter out upgraded version index settings when starting index following (#38838)

The `index.version.upgraded` and `index.version.upgraded_string` are likely
to be different between leader and follower index. In the event that
a follower index gets restored on a upgraded node while the leader index
is still on non-upgraded nodes.

Closes #38835
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Feb 14, 2019
* Add rolling upgrade multi cluster test module (elastic#38277)

This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
then the follower cluster is started and tests execute against this two cluster.

Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)

This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.

This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.

Relates to elastic#37231 and elastic#38037

* Filter out upgraded version index settings when starting index following (elastic#38838)

The `index.version.upgraded` and `index.version.upgraded_string` are likely
to be different between leader and follower index. In the event that
a follower index gets restored on a upgraded node while the leader index
is still on non-upgraded nodes.

Closes elastic#38835
martijnvg added a commit that referenced this pull request Feb 14, 2019
* Add rolling upgrade multi cluster test module (#38277)

This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
then the follower cluster is started and tests execute against this two cluster.

Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)

This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.

This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.

Relates to #37231 and #38037

* Filter out upgraded version index settings when starting index following (#38838)

The `index.version.upgraded` and `index.version.upgraded_string` are likely
to be different between leader and follower index. In the event that
a follower index gets restored on a upgraded node while the leader index
is still on non-upgraded nodes.

Closes #38835
@martijnvg
Copy link
Member Author

This has been backported to 7.x, 7.0 and 6.7 banches.

@mark-vieira mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Build Build or test infrastructure :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features Team:Delivery Meta label for Delivery team >test Issues or PRs that are addressing/adding tests v6.7.0 v7.0.0-rc1 v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants