-
Notifications
You must be signed in to change notification settings - Fork 167
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Clarifications in monitoring chapter (#1000)
* Clarifications in monitoring chapter * Update src/main/pages/che-7/administration-guide/assembly_monitoring-che.adoc Co-Authored-By: Florent BENOIT <[email protected]> * handle feedback from @skabashnyuk - add visibility to enabling and exposing che metrics * rework Viewing Che metrics on Grafana dashboards Co-authored-by: Florent BENOIT <[email protected]>
- Loading branch information
Showing
6 changed files
with
208 additions
and
216 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
10 changes: 10 additions & 0 deletions
10
...in/pages/che-7/administration-guide/proc_enabling-and-exposing-che-metrics.adoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
[id="enabling-and-exposing-{prod-id-short}-metrics_{context}"] | ||
= Enabling and exposing {prod-short} metrics | ||
|
||
This section describes how to enable and expose {prod-short} metrics. | ||
|
||
.Procedure | ||
|
||
. Set the `CHE_METRICS_ENABLED=true` environment variable | ||
|
||
. Expose the `8087` port as a service on the che-master host |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
169 changes: 14 additions & 155 deletions
169
.../che-7/administration-guide/proc_viewing-che-metrics-on-grafana-dashboards.adoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,174 +1,33 @@ | ||
[id="viewing-{prod-id-short}-metrics-on-grafana-dashboards_{context}"] | ||
= Viewing {prod-short} metrics on Grafana dashboards | ||
|
||
Grafana is used for informative representation of Prometheus metrics. Providing visibility for OpenShift, Grafana’s deployment configuration and ConfigMaps are located in the `che-monitoring.yaml` configuration file. | ||
This section describes how to view {prod-short} metrics on Grafana dashboards. | ||
|
||
.Prerequisites | ||
|
||
== Configuring and deploying Grafana | ||
* Prometheus is collecting metrics on the {prod-short} cluster. See xref:collecting-{prod-id-short}-metrics-with-prometheus_{context}[]. | ||
|
||
Grafana is run on port `3000` with a corresponding *service* and *route*. | ||
* Grafana 6.0 or above is running on port `3000` with a corresponding *service* and *route*. See link:https://grafana.com/docs/installation/[Installing Grafana]. | ||
|
||
Three ConfigMaps are used to configure Grafana: | ||
|
||
.Procedure | ||
|
||
. Deploy {prod-short}-specific dashboards on Grafana using the `che-monitoring.yaml` configuration file. | ||
+ | ||
Three ConfigMaps are used to configure Grafana: | ||
+ | ||
* `grafana-datasources` -- configuration for Grafana datasource, a Prometheus endpoint | ||
* `grafana-dashboards` -- configuration of Grafana dashboards and panels | ||
* `grafana-dashboard-provider` -- configuration of the Grafana dashboard provider API object, which tells Grafana where to look in the file system for pre-provisioned dashboards | ||
|
||
|
||
== Grafana dashboards overview | ||
|
||
{prod-short} provides several types of dashboards. | ||
|
||
|
||
=== {prod-short} server dashboard | ||
|
||
Use case: {prod-short} server-specific metrics related to {prod-short} components, such as workspaces or users. | ||
|
||
.The *General* panel | ||
image::monitoring/monitoring-che-che-server-dashboard-general-panel.png[] | ||
|
||
The *General* panel contains basic information, such as the total number of users and workspaces in the {prod-short} database. | ||
|
||
.The *Workspaces* panel | ||
image::monitoring/monitoring-che-che-server-dashboard-workspace-panel.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-workspace-panel.png"] | ||
|
||
* *Workspace start rate* -- the ratio between successful and failed started workspaces | ||
* *Workspace stop rate* -- the ratio between successful and failed stopped workspaces | ||
* *Workspace Failures* -- the number of workspace failures shown on the graph | ||
* *Starting Workspaces* -- the gauge that shows the number of currently starting workspaces | ||
* *Average Workspace Start Time* -- 1-hour average of workspace starts or fails | ||
* *Average Workspace Stop Time* -- 1-hour average of workspace stops | ||
* *Running Workspaces* -- the gauge that shows the number of currently running workspaces | ||
* *Stopping Workspaces* -- the gauge that shows the number of currently stopping workspaces | ||
* *Workspaces started under 60 seconds* -- the percentage of workspaces started under 60 seconds | ||
* *Number of Workspaces* -- the number of workspaces created over time | ||
|
||
.The *Users* panel | ||
image::monitoring/monitoring-che-che-server-dashboard-users-panel.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-users-panel.png"] | ||
|
||
* *Number of Users* -- the number of users known to {prod-short} over time | ||
|
||
|
||
.The *Tomcat* panel | ||
image::monitoring/monitoring-che-che-server-dashboard-tomcat-panel.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-tomcat-panel.png"] | ||
|
||
* *Max number of active sessions* -- the max number of active sessions that have been active at the same time | ||
* *Number of current active sessions* -- the number of currently active sessions | ||
* *Total sessions* -- the total number of sessions | ||
* *Expired sessions* -- the number of sessions that have expired | ||
* *Rejected sessions* -- the number of sessions that were not created because the maximum number of active sessions was reached | ||
* *Longest time of an expired session* -- the longest time (in seconds) that an expired session had been alive | ||
|
||
.The *Request* panel | ||
image::monitoring/monitoring-che-che-server-dashboard-requests-panel.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-requests-panel.png"] | ||
|
||
The *Requests* panel displays HTTP requests in a graph that shows the average number of requests per minute. | ||
|
||
.The *Executors* panel, part 1 | ||
image::monitoring/monitoring-che-che-server-dashboard-executors-panel-1.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-executors-panel-1.png"] | ||
|
||
* *Threads running* - the number of threads that are not terminated aka alive. May include threads that are in a waiting or blocked state. | ||
* *Threads terminated* - the number of threads that was finished its execution. | ||
* *Threads created* - number of threads created by thread factory for given executor service. | ||
* *Created thread/minute* - Speed of thread creating for the given executor service. | ||
|
||
.The *Executors* panel, part 2 | ||
image::monitoring/monitoring-che-che-server-dashboard-executors-panel-2.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-executors-panel-2.png"] | ||
|
||
* *Executor threads active* - number of threads that actively execute tasks. | ||
* *Executor pool size* - number of threads that actively execute tasks. | ||
* *Queued task* - the approximate number of tasks that are queued for execution | ||
* *Queued occupancy* - the percent of the queue used by the tasks that is waining for execution. | ||
|
||
.The *Executors* panel, part 3 | ||
image::monitoring/monitoring-che-che-server-dashboard-executors-panel-3.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-executors-panel-3.png"] | ||
|
||
* *Rejected task* - the number of tasks that were rejected from execution. | ||
* *Rejected task/minute* - the speed of task rejections | ||
* *Completed tasks* - the number of completed tasks | ||
* *Completed tasks/minute* - the speed of task execution | ||
|
||
.The *Executors* panel, part 4 | ||
image::monitoring/monitoring-che-che-server-dashboard-executors-panel-4.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-executors-panel-4.png"] | ||
|
||
* *Task execution seconds max* - 5min moving maximum of task execution | ||
* *Tasks execution seconds avg* - 1h moving average of task execution | ||
* *Executor idle seconds max* - 5min moving maximum of executor idle state. | ||
* *Executor idle seconds avg* - 1h moving average of executor idle state. | ||
|
||
.The *Traces* panel, part 1 | ||
image::monitoring/monitoring-che-che-server-dashboard-trace-panel-1.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-trace-panel-1.png"] | ||
|
||
* *Workspace start Max* - maximum workspace start time | ||
* *Workspace start Avg* - 1h moving average of the workspace start time components | ||
* *Workspace stop Max* - maximum of workspace stop time | ||
* *Workspace stop Avg* - 1h moving average of the workspace stop time components | ||
|
||
.The *Traces* panel, part 2 | ||
image::monitoring/monitoring-che-che-server-dashboard-trace-panel-2.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-trace-panel-2.png"] | ||
|
||
* *OpenShiftInternalRuntime#start Max* - maximum time of OpenShiftInternalRuntime#start operation | ||
* *OpenShiftInternalRuntime#start Avg* - 1h moving average time of OpenShiftInternalRuntime#start operation | ||
* *Plugin Brokering Execution Max* - maximum time of PluginBrokerManager#getTooling operation | ||
* *Plugin Brokering Execution Avg* - 1h moving average of PluginBrokerManager#getTooling operation | ||
|
||
.The *Traces* panel, part 3 | ||
image::monitoring/monitoring-che-che-server-dashboard-trace-panel-3.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-trace-panel-3.png"] | ||
|
||
* *OpenShiftEnvironmentProvisioner#provision Max* - maximum time of OpenShiftEnvironmentProvisioner#provision operation | ||
* *OpenShiftEnvironmentProvisioner#provision Avg* -1h moving average of OpenShiftEnvironmentProvisioner#provision operation | ||
* *Plugin Brokering Execution Max* - maximum time of PluginBrokerManager#getTooling components execution time | ||
* *Plugin Brokering Execution Avg* - 1h moving average of time of PluginBrokerManager#getTooling components execution time | ||
|
||
.The *Traces* panel, part 4 | ||
image::monitoring/monitoring-che-che-server-dashboard-trace-panel-4.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-trace-panel-4.png"] | ||
|
||
* *WaitMachinesStart Max* - maximim time of WaitMachinesStart operations | ||
* *WaitMachinesStart Avg* - 1h moving average time of WaitMachinesStart operations | ||
* *OpenShiftInternalRuntime#startMachines Max* - maximim time of OpenShiftInternalRuntime#startMachines operations | ||
* *OpenShiftInternalRuntime#startMachines Avg* - 1h moving average of the time of OpenShiftInternalRuntime#startMachines operations | ||
|
||
.The *Workspace detailed* panel | ||
image::monitoring/monitoring-che-che-server-dashboard-workspace-detailed-panel.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-workspace-detailed-panel.png"] | ||
|
||
The *Workspace Detailed* panel contains heat maps, which illustrate the average time of workspace starts or fails. The row shows some period of time. | ||
|
||
|
||
=== {prod-short} server JVM dashboard | ||
|
||
Use case: JVM metrics of the {prod-short} server, such as JVM memory or classloading. | ||
|
||
.{prod-short} server JVM dashboard | ||
image::monitoring/monitoring-che-che-server-jvm-dashboard.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard.png"] | ||
|
||
.Quick Facts | ||
image::monitoring/monitoring-che-che-server-jvm-dashboard-quick-facts.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-quick-facts.png"] | ||
|
||
.JVM Memory | ||
image::monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory.png"] | ||
|
||
.JVM Misc | ||
image::monitoring/monitoring-che-che-server-jvm-dashboard-jvm-misc.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-jvm-misc.png"] | ||
|
||
.JVM Memory Pools (heap) | ||
image::monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory-pools-heap.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory-pools-heap.png"] | ||
|
||
.JVM Memory Pools (Non-Heap) | ||
image::monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory-pools-non-heap.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory-pools-non-heap.png"] | ||
|
||
.Garbage Collection | ||
image::monitoring/monitoring-che-che-server-jvm-dashboard-garbage-collection.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-garbage-collection.png"] | ||
|
||
.Classloading | ||
image::monitoring/monitoring-che-che-server-jvm-dashboard-classloading.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-classloading.png"] | ||
.Verification steps | ||
|
||
.Buffer Pools | ||
image::monitoring/monitoring-che-che-server-jvm-dashboard-buffer-pools.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-buffer-pools.png"] | ||
* Use the Grafana console to view {prod-short} metrics. | ||
|
||
.Additional resources | ||
|
||
// [discrete] | ||
// == Additional resources | ||
// | ||
// * A bulleted list of links to other material closely related to the contents of the procedure module. | ||
// * For more details on writing procedure modules, see the link:https://github.com/redhat-documentation/modular-docs#modular-documentation-reference-guide[Modular Documentation Reference Guide]. | ||
// * Use a consistent system for file names, IDs, and titles. For tips, see _Anchor Names and File Names_ in link:https://github.com/redhat-documentation/modular-docs#modular-documentation-reference-guide[Modular Documentation Reference Guide]. | ||
* link:https://grafana.com/docs/installation/[Installing Grafana]. |
Oops, something went wrong.