-
Notifications
You must be signed in to change notification settings - Fork 47
Conversation
e95a674
to
35b697b
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits, otherwise looks nice
1947d62
to
e3f16e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just some nits.
BTW, that's a lot of things to monitor during the upgrade, I wonder if we could simplify that somehow...
e3f16e2
to
323c693
Compare
323c693
to
c1eda33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. BTW, comment about the CRDs is not addressed still.
e347f7b
to
98e3b8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good.
|
||
Ensure that the `AUTOSCALE` column outputs `on` and not `warn`. If the output of the `AUTOSCALE` | ||
column says `warn`, then run the command below, to make sure that pool autoscaling is enabled. It is | ||
required to ensure that the placement groups scale up as the data in the cluster increases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
required to ensure that the placement groups scale up as the data in the cluster increases. | |
required to ensure that the placement groups scale up as the data in the cluster increases during the upgrade when individual OSDs gets updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should always be on
regardless of upgrade or anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part still feels somehow incomplete to me. Maybe we can add that "it is recommended that this option is always on, especially during the upgrade when..." ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you quote what should I change it to exactly? I am finding it hard to rephrase.
98e3b8c
to
7b78d2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one last nit, but I think we could merge regardless. Nice work @surajssd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments. One general concern is that we keep telling the user to monitor updates but we don't say what to do if something goes wrong. I don't know if there's much we can say about that but it feels weird.
> **IMPORTANT**: Don't proceed further if the output is anything other than `HEALTH_OK`. | ||
|
||
During the ongoing upgrade and after completion, make sure that the output stays in `HEALTH_OK` | ||
state. If the cluster is more than 60% full, then the output can sometimes turn into `HEALTH_WARN`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what happens if during the upgrade it shows HEALTH_WARN
is it ok if it doesn't get full or does everything explode after that? What advice can we give to the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HEALTH_WARN is ok to be in. TBH the user does not have control over the upgrade process and if the upgrade fails for whatever reason we can have look into it case by case basis. This upgrade doc should reflect such cases if there are any coming from upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I guess we shouldn't say "make sure that the output stays in HEALTH_OK
" because it's not actionable for the user. I'll make a suggestion.
Keep an eye on the `STATUS` field of the following output, in another terminal window, from the | ||
`rook` namespace. Make sure that the pods are restarted in record time and don't go into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this phrasing weird.
Keep an eye on the `STATUS` field of the following output, in another terminal window, from the | |
`rook` namespace. Make sure that the pods are restarted in record time and don't go into | |
Open another terminal window and keep an eye on the `STATUS` field of the following output. Make sure that the pods are restarted in record time and don't go into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what does "pods are restated in record time" mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The time after which the controller times out (this can be for various reasons) / starts showing error / restarts pods for some failure condition, etc.
With everything monitored, you can start the update process now by executing the following commands: | ||
|
||
```bash | ||
kubectl apply -f https://raw.githubusercontent.com/kinvolk/lokomotive/v0.5.0/assets/charts/components/rook/templates/resources.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea to keep this tag updated as we release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to update the doc on each upgrade of Rook Ceph. Things may change between rook-ceph releases.
I think all this visibility helps user to connect dots in an automated upgrade process, if something were to go wrong. A user can simply run Expecting failures will be hard, but if there are known issues we should always document it in this document. But we provide the user doing upgrade enough visibility and tools to dig deeper if something were to go wrong. |
7b78d2a
to
52267ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but we should perhaps wait for @iaguis approval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small changes.
> **IMPORTANT**: Don't proceed further if the output is anything other than `HEALTH_OK`. | ||
|
||
During the ongoing upgrade and after completion, make sure that the output stays in `HEALTH_OK` | ||
state. If the cluster is more than 60% full, then the output can sometimes turn into `HEALTH_WARN`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I guess we shouldn't say "make sure that the output stays in HEALTH_OK
" because it's not actionable for the user. I'll make a suggestion.
Add a guide to explain how to upgrade rook-ceph component. Signed-off-by: Suraj Deshmukh <[email protected]>
52267ee
to
2ba085e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Add a guide to explain how to upgrade rook-ceph component.
Fixes: #906