Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

monitoring: add networking dashboard for Zoekt pods #29481

Merged
merged 6 commits into from
Jan 7, 2022
Merged

monitoring: add networking dashboard for Zoekt pods #29481

merged 6 commits into from
Jan 7, 2022

Conversation

ggilmore
Copy link
Contributor

@ggilmore ggilmore commented Jan 6, 2022

This PR adds a new dashboard for Zoekt that displays various networking metrics (transmission/receive rate, errors, dropped packets).

Screen Shot 2022-01-06 at 1 29 44 PM

It also includes two other minor changes:

  • better documentation for how to use the monitoring stack while using a dataset from another instance (like k8s.sgdev.org)
  • tracked down and fixed a bug in our monitoring generator that caused our template variables (that we use for per-instance metrics) to be much more permissive than intended.

Notes:

  • This section is hidden by default.
  • Networking metrics are only available at the pod level, not the container level (all containers in a pod share networking resources [ip, ports, etc.] - see the original k8s networking design docs for more on this in the "container to container" and "pod to pod" sections. For us, this means that the networking metrics for zoekt-indexserver and zoekt-webserver are aggregated together. Given this, how do y'all feel about copying this same set of dashboards on zoekt-webserver's Grafana page)?
  • These dashboards only work on Kubernetes instances for now. The labels they rely on are only provided in kubernetes deployments, there is a different set of labels in docker-compose deployments. I think that's possible to get it working on both environments by looking into label rewriting, but I figured this wasn't a blocker for now. (gitserver's metrics have the same limitation).

@cla-bot cla-bot bot added the cla-signed label Jan 6, 2022
@ggilmore ggilmore changed the title monitoring: add hidden networking dashboard for Zoekt pods monitoring: add networking dashboard for Zoekt pods Jan 6, 2022
@ggilmore ggilmore marked this pull request as ready for review January 6, 2022 21:53
@ggilmore ggilmore requested a review from a team January 6, 2022 21:53
@sourcegraph-bot
Copy link
Contributor

sourcegraph-bot commented Jan 6, 2022

Notifying subscribers in CODENOTIFY files for diff d0a7cba...ec0cf3e.

Notify File(s)
@bobheadxi monitoring/definitions/shared/container.go
monitoring/definitions/shared/provisioning.go
monitoring/definitions/shared/shared.go
monitoring/definitions/zoekt_index_server.go
monitoring/monitoring/dashboards.go
monitoring/monitoring/monitoring.go
@slimsag monitoring/definitions/shared/container.go
monitoring/definitions/shared/provisioning.go
monitoring/definitions/shared/shared.go
monitoring/definitions/zoekt_index_server.go
monitoring/monitoring/dashboards.go
monitoring/monitoring/monitoring.go
@sourcegraph/delivery doc/admin/observability/dashboards.md
@sourcegraph/distribution monitoring/definitions/shared/container.go
monitoring/definitions/shared/provisioning.go
monitoring/definitions/shared/shared.go
monitoring/definitions/zoekt_index_server.go
monitoring/monitoring/dashboards.go
monitoring/monitoring/monitoring.go

doc/dev/how-to/monitoring_local_dev.md Outdated Show resolved Hide resolved
doc/dev/how-to/monitoring_local_dev.md Show resolved Hide resolved
monitoring/definitions/zoekt_index_server.go Outdated Show resolved Hide resolved
monitoring/monitoring/monitoring.go Show resolved Hide resolved
@tsenart
Copy link
Contributor

tsenart commented Jan 7, 2022

This PR adds a new dashboard for Zoekt that displays various networking metrics (transmission/receive rate, errors, dropped packets).

Looks great!

Given this, how do y'all feel about copying this same set of dashboards on zoekt-webserver's Grafana page)?

TBH, I think we should merge zoekt-indexserver and zoekt-webserver dashboards into one indexed-search dashboard.

These dashboards only work on Kubernetes instances for now. The labels they rely on are only provided in kubernetes deployments, there is a different set of labels in docker-compose deployments. I think that's possible to get it working on both environments by looking into label rewriting, but I figured this wasn't a blocker for now. (gitserver's metrics have the same limitation).

Yeah, it would be great to have this already supported. Do we have examples to follow? What's the level of effort for that?

@ggilmore
Copy link
Contributor Author

ggilmore commented Jan 7, 2022

@tsenart

Yeah, it would be great to have this already supported. Do we have examples to follow? What's the level of effort for that?

I'm not super familiar with label rewriting in Prometheus (although I know we do it for other things, so I can use that as prior art). Roughly the steps involved would be:

  • kubernetes: publishing a new label that has the "pod name" value (the existing name label has the container name, so we can't reuse that) for all cadvisor metrics (although it probably makes sense to do it for sourcegraph specific metrics as well so that we're consistent)
  • docker-compose: publish a new label that has the service name (e.g. indexed-search-0, etc.) - this name of this label has to be the same as the same as the one that we create in kubernetes so that we can use the same grafana query on both deployment types

I'd estimate that this could take a day or two (reading up on label rewriting, trial and error with getting a good workflow to be able to see changes on both deployment types quickly, etc.)

@ggilmore ggilmore merged commit 074bb28 into main Jan 7, 2022
@ggilmore ggilmore deleted the zoekt-io branch January 7, 2022 16:53
bobheadxi added a commit that referenced this pull request Feb 10, 2022
The generated Grafana 'all' value introduced in #29481 does not work very well for all cases, especially for OptionsQuery where we can get a huge result set.

This PR creates an explicit option, WildcardAllValue, along with docs on when and why to use it as well as why we don't use a wildcard by default.
davejrt pushed a commit that referenced this pull request Feb 15, 2022
The generated Grafana 'all' value introduced in #29481 does not work very well for all cases, especially for OptionsQuery where we can get a huge result set.

This PR creates an explicit option, WildcardAllValue, along with docs on when and why to use it as well as why we don't use a wildcard by default.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants