docs: Rook Ceph Upgrade #1165

surajssd · 2020-11-05T11:38:45Z

Add a guide to explain how to upgrade rook-ceph component.

Fixes: #906

docs/how-to-guides/upgrade-rook-ceph.md

surajssd · 2020-11-05T14:28:12Z

Verify with the upstream docs to make sure that we are in sync.

invidian

Some nits, otherwise looks nice

docs/how-to-guides/upgrade-rook-ceph.md

invidian

Looking good, just some nits.

BTW, that's a lot of things to monitor during the upgrade, I wonder if we could simplify that somehow...

docs/how-to-guides/upgrade-rook-ceph.md

invidian

Looking good. BTW, comment about the CRDs is not addressed still.

docs/how-to-guides/upgrade-rook-ceph.md

invidian

Looking good.

docs/how-to-guides/upgrade-rook-ceph.md

invidian · 2020-11-20T13:09:42Z

docs/how-to-guides/upgrade-rook-ceph.md

+
+Ensure that the `AUTOSCALE` column outputs `on` and not `warn`. If the output of the `AUTOSCALE`
+column says `warn`, then run the command below, to make sure that pool autoscaling is enabled. It is
+required to ensure that the placement groups scale up as the data in the cluster increases.


Suggested change

required to ensure that the placement groups scale up as the data in the cluster increases.

required to ensure that the placement groups scale up as the data in the cluster increases during the upgrade when individual OSDs gets updated.

This should always be on regardless of upgrade or anything else.

This part still feels somehow incomplete to me. Maybe we can add that "it is recommended that this option is always on, especially during the upgrade when..." ?

Could you quote what should I change it to exactly? I am finding it hard to rephrase.

docs/how-to-guides/upgrade-rook-ceph.md

invidian

Just one last nit, but I think we could merge regardless. Nice work @surajssd

iaguis

Some comments. One general concern is that we keep telling the user to monitor updates but we don't say what to do if something goes wrong. I don't know if there's much we can say about that but it feels weird.

docs/how-to-guides/upgrade-rook-ceph.md

iaguis · 2020-11-24T16:34:25Z

docs/how-to-guides/upgrade-rook-ceph.md

+> **IMPORTANT**: Don't proceed further if the output is anything other than `HEALTH_OK`.
+
+During the ongoing upgrade and after completion, make sure that the output stays in `HEALTH_OK`
+state. If the cluster is more than 60% full, then the output can sometimes turn into `HEALTH_WARN`.


So what happens if during the upgrade it shows HEALTH_WARN is it ok if it doesn't get full or does everything explode after that? What advice can we give to the user?

HEALTH_WARN is ok to be in. TBH the user does not have control over the upgrade process and if the upgrade fails for whatever reason we can have look into it case by case basis. This upgrade doc should reflect such cases if there are any coming from upstream.

Then I guess we shouldn't say "make sure that the output stays in HEALTH_OK" because it's not actionable for the user. I'll make a suggestion.

iaguis · 2020-11-24T16:35:31Z

docs/how-to-guides/upgrade-rook-ceph.md

+Keep an eye on the `STATUS` field of the following output, in another terminal window, from the
+`rook` namespace. Make sure that the pods are restarted in record time and don't go into


I find this phrasing weird.

Suggested change

Keep an eye on the `STATUS` field of the following output, in another terminal window, from the

`rook` namespace. Make sure that the pods are restarted in record time and don't go into

Open another terminal window and keep an eye on the `STATUS` field of the following output. Make sure that the pods are restarted in record time and don't go into

Also, what does "pods are restated in record time" mean?

The time after which the controller times out (this can be for various reasons) / starts showing error / restarts pods for some failure condition, etc.

docs/how-to-guides/upgrade-rook-ceph.md

iaguis · 2020-11-24T16:45:11Z

docs/how-to-guides/upgrade-rook-ceph.md

+With everything monitored, you can start the update process now by executing the following commands:
+
+```bash
+kubectl apply -f https://raw.githubusercontent.com/kinvolk/lokomotive/v0.5.0/assets/charts/components/rook/templates/resources.yaml


Is the idea to keep this tag updated as we release?

We have to update the doc on each upgrade of Rook Ceph. Things may change between rook-ceph releases.

surajssd · 2020-11-25T11:08:43Z

One general concern is that we keep telling the user to monitor updates but we don't say what to do if something goes wrong. I don't know if there's much we can say about that but it feels weird.

I think all this visibility helps user to connect dots in an automated upgrade process, if something were to go wrong. A user can simply run lokoctl component apply rook rook-ceph and might wonder what is going on under or why cluster won't store data all of sudden.

Expecting failures will be hard, but if there are known issues we should always document it in this document. But we provide the user doing upgrade enough visibility and tools to dig deeper if something were to go wrong.

invidian

LGTM, but we should perhaps wait for @iaguis approval.

iaguis

Some small changes.

docs/how-to-guides/upgrade-rook-ceph.md

iaguis · 2020-12-01T16:13:55Z

docs/how-to-guides/upgrade-rook-ceph.md

+> **IMPORTANT**: Don't proceed further if the output is anything other than `HEALTH_OK`.
+
+During the ongoing upgrade and after completion, make sure that the output stays in `HEALTH_OK`
+state. If the cluster is more than 60% full, then the output can sometimes turn into `HEALTH_WARN`.


Then I guess we shouldn't say "make sure that the output stays in HEALTH_OK" because it's not actionable for the user. I'll make a suggestion.

docs/how-to-guides/upgrade-rook-ceph.md

Add a guide to explain how to upgrade rook-ceph component. Signed-off-by: Suraj Deshmukh <[email protected]>

iaguis

lgtm

surajssd force-pushed the surajssd/upgrade-rook-ceph-doc branch 2 times, most recently from e95a674 to 35b697b Compare November 5, 2020 14:26

surajssd commented Nov 5, 2020

View reviewed changes

docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved

invidian suggested changes Nov 5, 2020

View reviewed changes

surajssd force-pushed the surajssd/upgrade-rook-ceph-doc branch 4 times, most recently from 1947d62 to e3f16e2 Compare November 6, 2020 09:21

surajssd marked this pull request as ready for review November 6, 2020 09:21

surajssd requested a review from invidian November 6, 2020 11:42

invidian suggested changes Nov 9, 2020

View reviewed changes

surajssd force-pushed the surajssd/upgrade-rook-ceph-doc branch from e3f16e2 to 323c693 Compare November 10, 2020 11:27

surajssd requested a review from ipochi November 11, 2020 10:29

surajssd force-pushed the surajssd/upgrade-rook-ceph-doc branch from 323c693 to c1eda33 Compare November 11, 2020 13:43

surajssd requested a review from invidian November 11, 2020 13:43

invidian suggested changes Nov 11, 2020

View reviewed changes

vbatts added the kind/documentation Issues about documentation label Nov 13, 2020

invidian added the priority/P3 Low priority label Nov 17, 2020

surajssd force-pushed the surajssd/upgrade-rook-ceph-doc branch 2 times, most recently from e347f7b to 98e3b8c Compare November 19, 2020 11:25

invidian suggested changes Nov 20, 2020

View reviewed changes

surajssd requested a review from invidian November 24, 2020 07:06

surajssd force-pushed the surajssd/upgrade-rook-ceph-doc branch from 98e3b8c to 7b78d2a Compare November 24, 2020 07:06

invidian previously approved these changes Nov 24, 2020

View reviewed changes

iaguis suggested changes Nov 24, 2020

View reviewed changes

surajssd dismissed invidian’s stale review via 52267ee November 25, 2020 11:15

surajssd force-pushed the surajssd/upgrade-rook-ceph-doc branch from 7b78d2a to 52267ee Compare November 25, 2020 11:15

surajssd requested a review from iaguis November 25, 2020 11:15

surajssd requested a review from invidian November 25, 2020 11:15

invidian previously approved these changes Nov 30, 2020

View reviewed changes

iaguis suggested changes Dec 1, 2020

View reviewed changes

docs: Rook Ceph Upgrade

2ba085e

Add a guide to explain how to upgrade rook-ceph component. Signed-off-by: Suraj Deshmukh <[email protected]>

surajssd dismissed invidian’s stale review via 2ba085e December 2, 2020 08:36

surajssd force-pushed the surajssd/upgrade-rook-ceph-doc branch from 52267ee to 2ba085e Compare December 2, 2020 08:36

surajssd requested a review from iaguis December 2, 2020 08:36

iaguis approved these changes Dec 2, 2020

View reviewed changes

surajssd merged commit ec7fc25 into master Dec 2, 2020

surajssd deleted the surajssd/upgrade-rook-ceph-doc branch December 2, 2020 13:48

invidian mentioned this pull request Dec 7, 2020

Release v0.6.0 #1064

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Rook Ceph Upgrade #1165

docs: Rook Ceph Upgrade #1165

surajssd commented Nov 5, 2020 •

edited

Loading

surajssd commented Nov 5, 2020

invidian left a comment

invidian left a comment

invidian left a comment

invidian left a comment

invidian Nov 20, 2020

surajssd Nov 24, 2020

invidian Nov 24, 2020

surajssd Nov 25, 2020

invidian left a comment

iaguis left a comment

iaguis Nov 24, 2020

surajssd Nov 25, 2020

iaguis Dec 1, 2020

iaguis Nov 24, 2020

iaguis Nov 24, 2020

surajssd Nov 25, 2020

iaguis Nov 24, 2020

surajssd Nov 25, 2020

surajssd commented Nov 25, 2020

invidian left a comment

iaguis left a comment

iaguis Dec 1, 2020

iaguis left a comment

	required to ensure that the placement groups scale up as the data in the cluster increases.
	required to ensure that the placement groups scale up as the data in the cluster increases during the upgrade when individual OSDs gets updated.

		Keep an eye on the `STATUS` field of the following output, in another terminal window, from the
		`rook` namespace. Make sure that the pods are restarted in record time and don't go into

	Keep an eye on the `STATUS` field of the following output, in another terminal window, from the
	`rook` namespace. Make sure that the pods are restarted in record time and don't go into
	Open another terminal window and keep an eye on the `STATUS` field of the following output. Make sure that the pods are restarted in record time and don't go into

docs: Rook Ceph Upgrade #1165

docs: Rook Ceph Upgrade #1165

Conversation

surajssd commented Nov 5, 2020 • edited Loading

surajssd commented Nov 5, 2020

invidian left a comment

Choose a reason for hiding this comment

invidian left a comment

Choose a reason for hiding this comment

invidian left a comment

Choose a reason for hiding this comment

invidian left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

invidian left a comment

Choose a reason for hiding this comment

iaguis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

surajssd commented Nov 25, 2020

invidian left a comment

Choose a reason for hiding this comment

iaguis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iaguis left a comment

Choose a reason for hiding this comment

surajssd commented Nov 5, 2020 •

edited

Loading