Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

docs: Rook Ceph Upgrade #1165

Merged
merged 1 commit into from
Dec 2, 2020
Merged

docs: Rook Ceph Upgrade #1165

merged 1 commit into from
Dec 2, 2020

Conversation

surajssd
Copy link
Member

@surajssd surajssd commented Nov 5, 2020

Add a guide to explain how to upgrade rook-ceph component.

Fixes: #906

@surajssd surajssd force-pushed the surajssd/upgrade-rook-ceph-doc branch 2 times, most recently from e95a674 to 35b697b Compare November 5, 2020 14:26
@surajssd
Copy link
Member Author

surajssd commented Nov 5, 2020

  • Verify with the upstream docs to make sure that we are in sync.

Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, otherwise looks nice

docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
@surajssd surajssd force-pushed the surajssd/upgrade-rook-ceph-doc branch 4 times, most recently from 1947d62 to e3f16e2 Compare November 6, 2020 09:21
@surajssd surajssd marked this pull request as ready for review November 6, 2020 09:21
@surajssd surajssd requested a review from invidian November 6, 2020 11:42
Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just some nits.

BTW, that's a lot of things to monitor during the upgrade, I wonder if we could simplify that somehow...

docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
@surajssd surajssd force-pushed the surajssd/upgrade-rook-ceph-doc branch from e3f16e2 to 323c693 Compare November 10, 2020 11:27
@surajssd surajssd requested a review from ipochi November 11, 2020 10:29
@surajssd surajssd force-pushed the surajssd/upgrade-rook-ceph-doc branch from 323c693 to c1eda33 Compare November 11, 2020 13:43
@surajssd surajssd requested a review from invidian November 11, 2020 13:43
Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. BTW, comment about the CRDs is not addressed still.

docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Show resolved Hide resolved
@vbatts vbatts added the kind/documentation Issues about documentation label Nov 13, 2020
@invidian invidian added the priority/P3 Low priority label Nov 17, 2020
@surajssd surajssd force-pushed the surajssd/upgrade-rook-ceph-doc branch 2 times, most recently from e347f7b to 98e3b8c Compare November 19, 2020 11:25
Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Show resolved Hide resolved

Ensure that the `AUTOSCALE` column outputs `on` and not `warn`. If the output of the `AUTOSCALE`
column says `warn`, then run the command below, to make sure that pool autoscaling is enabled. It is
required to ensure that the placement groups scale up as the data in the cluster increases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
required to ensure that the placement groups scale up as the data in the cluster increases.
required to ensure that the placement groups scale up as the data in the cluster increases during the upgrade when individual OSDs gets updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should always be on regardless of upgrade or anything else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part still feels somehow incomplete to me. Maybe we can add that "it is recommended that this option is always on, especially during the upgrade when..." ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you quote what should I change it to exactly? I am finding it hard to rephrase.

docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
@surajssd surajssd requested a review from invidian November 24, 2020 07:06
@surajssd surajssd force-pushed the surajssd/upgrade-rook-ceph-doc branch from 98e3b8c to 7b78d2a Compare November 24, 2020 07:06
invidian
invidian previously approved these changes Nov 24, 2020
Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one last nit, but I think we could merge regardless. Nice work @surajssd

Copy link
Contributor

@iaguis iaguis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. One general concern is that we keep telling the user to monitor updates but we don't say what to do if something goes wrong. I don't know if there's much we can say about that but it feels weird.

docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
> **IMPORTANT**: Don't proceed further if the output is anything other than `HEALTH_OK`.

During the ongoing upgrade and after completion, make sure that the output stays in `HEALTH_OK`
state. If the cluster is more than 60% full, then the output can sometimes turn into `HEALTH_WARN`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what happens if during the upgrade it shows HEALTH_WARN is it ok if it doesn't get full or does everything explode after that? What advice can we give to the user?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HEALTH_WARN is ok to be in. TBH the user does not have control over the upgrade process and if the upgrade fails for whatever reason we can have look into it case by case basis. This upgrade doc should reflect such cases if there are any coming from upstream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I guess we shouldn't say "make sure that the output stays in HEALTH_OK" because it's not actionable for the user. I'll make a suggestion.

Comment on lines 67 to 68
Keep an eye on the `STATUS` field of the following output, in another terminal window, from the
`rook` namespace. Make sure that the pods are restarted in record time and don't go into
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this phrasing weird.

Suggested change
Keep an eye on the `STATUS` field of the following output, in another terminal window, from the
`rook` namespace. Make sure that the pods are restarted in record time and don't go into
Open another terminal window and keep an eye on the `STATUS` field of the following output. Make sure that the pods are restarted in record time and don't go into

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what does "pods are restated in record time" mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time after which the controller times out (this can be for various reasons) / starts showing error / restarts pods for some failure condition, etc.

docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
With everything monitored, you can start the update process now by executing the following commands:

```bash
kubectl apply -f https://raw.githubusercontent.com/kinvolk/lokomotive/v0.5.0/assets/charts/components/rook/templates/resources.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea to keep this tag updated as we release?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to update the doc on each upgrade of Rook Ceph. Things may change between rook-ceph releases.

@surajssd
Copy link
Member Author

One general concern is that we keep telling the user to monitor updates but we don't say what to do if something goes wrong. I don't know if there's much we can say about that but it feels weird.

I think all this visibility helps user to connect dots in an automated upgrade process, if something were to go wrong. A user can simply run lokoctl component apply rook rook-ceph and might wonder what is going on under or why cluster won't store data all of sudden.

Expecting failures will be hard, but if there are known issues we should always document it in this document. But we provide the user doing upgrade enough visibility and tools to dig deeper if something were to go wrong.

@surajssd surajssd requested a review from invidian November 25, 2020 11:15
invidian
invidian previously approved these changes Nov 30, 2020
Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but we should perhaps wait for @iaguis approval.

Copy link
Contributor

@iaguis iaguis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small changes.

docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
> **IMPORTANT**: Don't proceed further if the output is anything other than `HEALTH_OK`.

During the ongoing upgrade and after completion, make sure that the output stays in `HEALTH_OK`
state. If the cluster is more than 60% full, then the output can sometimes turn into `HEALTH_WARN`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I guess we shouldn't say "make sure that the output stays in HEALTH_OK" because it's not actionable for the user. I'll make a suggestion.

docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
docs/how-to-guides/upgrade-rook-ceph.md Outdated Show resolved Hide resolved
Add a guide to explain how to upgrade rook-ceph component.

Signed-off-by: Suraj Deshmukh <[email protected]>
Copy link
Contributor

@iaguis iaguis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@surajssd surajssd merged commit ec7fc25 into master Dec 2, 2020
@surajssd surajssd deleted the surajssd/upgrade-rook-ceph-doc branch December 2, 2020 13:48
@invidian invidian mentioned this pull request Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/documentation Issues about documentation priority/P3 Low priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create an upgrade doc for Rook Ceph on Lokomotive
4 participants