Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve "rollout progress" dashboard #5113

Merged

Conversation

charleskorn
Copy link
Contributor

@charleskorn charleskorn commented May 31, 2023

What this PR does

This PR makes a number of small improvements to the "rollout progress" dashboard:

  • it fixes the issue where the "unhealthy pods" panel could just show a number, rather than the name of the workload and the number of unhealthy pods if only one workload had unhealthy pods
  • it adjusts the layout of the two left-most panels so that the "rollout progress" panel doesn't require scrolling
  • it adjusts the order of the columns in the "pods count per version" panel so that the workload name is shown first
  • it fixes the issue where some workloads aren't shown on the dashboard (now all Deployments and StatefulSets in the selected namespace are shown)
  • it modifies the "rollout progress" panel to show the number of updated and ready pods for each workload (previously it just showed the number of updated pods)

This is what the dashboard looks like after all of these changes for a test cell at Grafana Labs:

Screenshot 2023-05-31 at 4 59 31 pm

Which issue(s) this PR fixes or relates to

(none)

Checklist

  • [n/a] Tests updated
  • [n/a] Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@charleskorn charleskorn marked this pull request as ready for review May 31, 2023 07:36
@charleskorn charleskorn requested a review from a team as a code owner May 31, 2023 07:36
@krajorama krajorama self-requested a review May 31, 2023 08:21
@krajorama
Copy link
Contributor

The code LGTM, although I'm no expert on dashboard json.
On the other hand the double bars in the rollout progress and seeing all deployment/statefulsets is more subjective question. I'd do a poll to be honest. Personally the double bars are too busy, hard on the eyes. And if someone has loki and tempo running in same namespace the whole thing could get very busy.

@charleskorn
Copy link
Contributor Author

On the other hand the double bars in the rollout progress and seeing all deployment/statefulsets is more subjective question. I'd do a poll to be honest. Personally the double bars are too busy, hard on the eyes.

I'm very open to feedback / alternative suggestions here.

I chose blue and green for the bars to avoid colours that suggest something is wrong (eg. red, orange or yellow), but I agree it can be difficult to read. I could try some different colours perhaps? Or split updated and healthy to separate panels, replacing the "unhealthy pods" panel?

And if someone has loki and tempo running in same namespace the whole thing could get very busy.

Is this something we recommend or support?

@krajorama
Copy link
Contributor

On the other hand the double bars in the rollout progress and seeing all deployment/statefulsets is more subjective question. I'd do a poll to be honest. Personally the double bars are too busy, hard on the eyes.

I'm very open to feedback / alternative suggestions here.

I chose blue and green for the bars to avoid colours that suggest something is wrong (eg. red, orange or yellow), but I agree it can be difficult to read. I could try some different colours perhaps? Or split updated and healthy to separate panels, replacing the "unhealthy pods" panel?

I kind of like the unhealthy pods panel, shows me exactly what its name suggests. Personally I don't need the extra bar for unhealthy anywhere. But again, that's just me. Maybe we should just make the change in our environment and get people's feedback.

And if someone has loki and tempo running in same namespace the whole thing could get very busy.

Is this something we recommend or support?

That's a good question, I don't think we document this anywhere.
By the way , in "Pods count per version" we only show rows where the version has 'r.*', is that correct? Who is this dashboard for in that case? People should be using stable versions, those are not versioned that way.

@charleskorn
Copy link
Contributor Author

I kind of like the unhealthy pods panel, shows me exactly what its name suggests. Personally I don't need the extra bar for unhealthy anywhere. But again, that's just me.

For me, showing the percentage of unhealthy pods puts the number of unhealthy pods in context - for example, seeing that there's five unhealthy queriers doesn't tell me if that's worthy of concern or not, but seeing if 10% or 80% of all queriers are unhealthy tells me if I need to worry about it.

By the way , in "Pods count per version" we only show rows where the version has 'r.*', is that correct? Who is this dashboard for in that case? People should be using stable versions, those are not versioned that way.

Are you referring to this override for columns that start with r? If so, that just seems to be a formatting override, the query still selects and shows all versions.

@flxbk
Copy link
Contributor

flxbk commented Jun 5, 2023

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

@charleskorn charleskorn force-pushed the charleskorn/rollout-progress-dashboard-improvements branch from 142337d to 450f7c0 Compare June 6, 2023 01:28
@krajorama
Copy link
Contributor

I kind of like the unhealthy pods panel, shows me exactly what its name suggests. Personally I don't need the extra bar for unhealthy anywhere. But again, that's just me.

For me, showing the percentage of unhealthy pods puts the number of unhealthy pods in context - for example, seeing that there's five unhealthy queriers doesn't tell me if that's worthy of concern or not, but seeing if 10% or 80% of all queriers are unhealthy tells me if I need to worry about it.

I see. I'm not sure how easy it is to see with the bars combined. Possibly you could mention the percentage of failed pods in the unhealthy PODs part. So if 0% of pods are unhealthy then the component is not shown and if 20% failed, you'd list it as:
| component | 4 | 20% failed |
Anyway. As discussed IRL we could just try it out internally and get feedback. Please update with main, approving.

By the way , in "Pods count per version" we only show rows where the version has 'r.*', is that correct? Who is this dashboard for in that case? People should be using stable versions, those are not versioned that way.

Are you referring to this override for columns that start with r? If so, that just seems to be a formatting override, the query still selects and shows all versions.

Ah, misunderstood that override.

Copy link
Contributor

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's try it

@charleskorn charleskorn force-pushed the charleskorn/rollout-progress-dashboard-improvements branch from 450f7c0 to 30186d9 Compare June 7, 2023 06:21
@charleskorn charleskorn enabled auto-merge (squash) June 7, 2023 06:21
@charleskorn charleskorn merged commit 021c531 into main Jun 7, 2023
@charleskorn charleskorn deleted the charleskorn/rollout-progress-dashboard-improvements branch June 7, 2023 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants