From 8adeb4bd10fcce1a34092f92d5889ea93cdf8ac8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fabrice=20Flore-Th=C3=A9bault?= Date: Fri, 20 Dec 2019 16:10:05 +0100 Subject: [PATCH 1/4] Clarifications in monitoring chapter --- .../assembly_monitoring-che.adoc | 40 ++---------------- ...ollecting-che-metrics-with-prometheus.adoc | 42 +++++++++++-------- ...ing-che-metrics-on-grafana-dashboards.adoc | 9 ++++ 3 files changed, 38 insertions(+), 53 deletions(-) diff --git a/src/main/pages/che-7/administration-guide/assembly_monitoring-che.adoc b/src/main/pages/che-7/administration-guide/assembly_monitoring-che.adoc index 1ddc625751..2eebb7c9ec 100644 --- a/src/main/pages/che-7/administration-guide/assembly_monitoring-che.adoc +++ b/src/main/pages/che-7/administration-guide/assembly_monitoring-che.adoc @@ -15,41 +15,9 @@ summary: :context: monitoring-che -{prod-short} can expose certain data as metrics, that can be processed by Prometheus -ifeval::["{project-context}" == "che"] -and Grafana stack -endif::[] -. - -Prometheus is a monitoring system, that maintains the collection of metrics - time series key-value data which can represent consumption of resources like CPU and memory, amount of processed HTTP queries and their execution time, and {prod-short} specific resources, such as number of users and workspaces, the start and shutdown of workspaces, information about JsonRPC stack. - -Prometheus is powered with a special query language, that allows manipulating the collected data, and perform various binary, vector and aggregation operations with it, to help create a more refined view on data. - -ifeval::["{project-context}" == "che"] -Grafana offers a front-end "facade" with tools to create a various visual representation in the form of dashboards with various panels and graph types. -endif::[] - -Note that this monitoring stack is not an official production-ready solution, but rather has an introduction purpose. - -.The structure of {prod-short} monitoring stack -image::monitoring/monitoring-che-stack-structure.png[link="{imagesdir}/monitoring/monitoring-che-stack-structure.png"] +This chapter describes how to use additional tools to process data exposed as metrics by {prod-short}. -[id="enabling-{prod-id-short}-metrics-collections"] -== Enabling {prod-short} metrics collections - -[id='prerequisites-{context}',discrete] -.Prerequisites - -* Installed Prometheus 2.9.1 or above. See more link:https://prometheus.io/docs/introduction/first_steps/[https://prometheus.io/docs/introduction/first_steps/]. -* Installed Grafana 6.0 or above. See more at link:https://grafana.com/docs/installation/[https://grafana.com/docs/installation/] - -.Procedure - -. Set the `CHE_METRICS_ENABLED=true` environment variable -. Expose the `8087` port as a service on the che-master host -. Configure Prometheus to scrape metrics from the `8087` port -. Configure a Prometheus data source on Grafana -. Deploy {prod-short}-specific dashboards on Grafana +The monitoring stack is not an official production-ready solution, but rather has an introduction purpose. include::proc_collecting-che-metrics-with-prometheus.adoc[leveloffset=+1] @@ -59,8 +27,8 @@ include::proc_viewing-che-metrics-on-grafana-dashboards.adoc[leveloffset=+1] include::proc_developing-grafana-dashboards.adoc[leveloffset=+1] -include::proc_extending-che-monitoring-metrics.adoc[leveloffset=+1] - endif::[] +include::proc_extending-che-monitoring-metrics.adoc[leveloffset=+1] + :context: {parent-context-of-monitoring-che} diff --git a/src/main/pages/che-7/administration-guide/proc_collecting-che-metrics-with-prometheus.adoc b/src/main/pages/che-7/administration-guide/proc_collecting-che-metrics-with-prometheus.adoc index 7a34d19480..b64b1486f8 100644 --- a/src/main/pages/che-7/administration-guide/proc_collecting-che-metrics-with-prometheus.adoc +++ b/src/main/pages/che-7/administration-guide/proc_collecting-che-metrics-with-prometheus.adoc @@ -1,27 +1,19 @@ [id="collecting-{prod-id-short}-metrics-with-prometheus_{context}"] = Collecting {prod-short} metrics with Prometheus -Prometheus is a monitoring system that collects metrics in real time and stores them in a time series database. +This section describes how to use the Prometheus monitoring system to collect, store and query metrics about {prod-short}. -Prometheus comes with a console accessible at the `9090` port of the application pod. By default, a template provides an existing *service* and a *route* to access it. It can be used to query and view metrics. +.Procedure -ifeval::["{project-context}" == "che"] -image::monitoring/monitoring-che-prometheus-console.png[link="{imagesdir}/monitoring/monitoring-che-prometheus-console.png"] -endif::[] +. Install Prometheus 2.9.1 or above. See more link:https://prometheus.io/docs/introduction/first_steps/[https://prometheus.io/docs/introduction/first_steps/]. -== Prometheus terminology +. Set the `CHE_METRICS_ENABLED=true` environment variable -Prometheus offers: +. Expose the `8087` port as a service on the che-master host -counter:: the simplest numerical type of metric whose value can be only increased. A typical example is counting the amount of HTTP requests that go through the system. - -gauge:: numerical value that can be increased or decreased. Best suited for representing values of objects. - -histogram:: a more complex metric that is suited for performing observations. Metrics are collected and grouped in configurable buckets, which allwos to present the results, for instance, in a form of a heatmap. - -== Configuring Prometheus - -.Prometheus configuration +. Configure Prometheus to scrape metrics from the `8087` port ++ +.Prometheus configuration example [source,yaml,subs="+attributes"] ---- - apiVersion: v1 @@ -33,10 +25,26 @@ histogram:: a more complex metric that is suited for performing observations. Me scrape_configs: - job_name: 'che' static_configs: - - targets: ['{prod-host}:8087'] + - targets: ['{prod-host}:8087'] <3> kind: ConfigMap metadata: name: prometheus-config ---- ++ <1> rate, at which a target is scraped <2> rate, at which recording and alerting rules are re-checked (not used in our system at the moment) +<3> scrape metrics from the `8087` port + +.Verification steps + +. Navigate to the Prometheus console accessible at the `9090` port of the application pod: `++http://++{prod-host}:9090/`. The default template provides an existing *service* and a *route* to access it. + +. Use the Prometheus console to query and view metrics. + +.Additional resources + +* link:https://prometheus.io/docs/prometheus/latest/configuration/configuration/[Configuring Prometheus]. + +* link:https://prometheus.io/docs/prometheus/latest/querying/basics/[Querying Prometheus]. + +* link:https://prometheus.io/docs/concepts/metric_types/[Prometheus metric types]. diff --git a/src/main/pages/che-7/administration-guide/proc_viewing-che-metrics-on-grafana-dashboards.adoc b/src/main/pages/che-7/administration-guide/proc_viewing-che-metrics-on-grafana-dashboards.adoc index 391e753cc0..e7bd0f3139 100644 --- a/src/main/pages/che-7/administration-guide/proc_viewing-che-metrics-on-grafana-dashboards.adoc +++ b/src/main/pages/che-7/administration-guide/proc_viewing-che-metrics-on-grafana-dashboards.adoc @@ -1,8 +1,17 @@ [id="viewing-{prod-id-short}-metrics-on-grafana-dashboards_{context}"] = Viewing {prod-short} metrics on Grafana dashboards +Grafana offers a front-end "facade" with tools to create a various visual representation in the form of dashboards with various panels and graph types. + Grafana is used for informative representation of Prometheus metrics. Providing visibility for OpenShift, Grafana’s deployment configuration and ConfigMaps are located in the `che-monitoring.yaml` configuration file. +.Procedure + +. Install Grafana 6.0 or above. See more at link:https://grafana.com/docs/installation/[https://grafana.com/docs/installation/] + +. Configure a Prometheus data source on Grafana + +. Deploy {prod-short}-specific dashboards on Grafana == Configuring and deploying Grafana From 740b76991cfe9dd7d537a520ef036a25febb97ce Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fabrice=20Flore-Th=C3=A9bault?= Date: Mon, 23 Dec 2019 09:26:56 +0100 Subject: [PATCH 2/4] Update src/main/pages/che-7/administration-guide/assembly_monitoring-che.adoc Co-Authored-By: Florent BENOIT --- .../che-7/administration-guide/assembly_monitoring-che.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/main/pages/che-7/administration-guide/assembly_monitoring-che.adoc b/src/main/pages/che-7/administration-guide/assembly_monitoring-che.adoc index 2eebb7c9ec..2291fc498f 100644 --- a/src/main/pages/che-7/administration-guide/assembly_monitoring-che.adoc +++ b/src/main/pages/che-7/administration-guide/assembly_monitoring-che.adoc @@ -15,7 +15,7 @@ summary: :context: monitoring-che -This chapter describes how to use additional tools to process data exposed as metrics by {prod-short}. +This chapter describes how to use additional tools to process data exposed as metrics by {prod-short}. The monitoring stack is not an official production-ready solution, but rather has an introduction purpose. From 800495730622abcce05b170fc20bd2af589cc0f2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fabrice=20Flore-Th=C3=A9bault?= Date: Mon, 23 Dec 2019 11:12:51 +0100 Subject: [PATCH 3/4] handle feedback from @skabashnyuk - add visibility to enabling and exposing che metrics --- .../assembly_monitoring-che.adoc | 4 ++-- .../proc_collecting-che-metrics-with-prometheus.adoc | 12 +++++++----- .../proc_enabling-and-exposing-che-metrics.adoc | 10 ++++++++++ 3 files changed, 19 insertions(+), 7 deletions(-) create mode 100644 src/main/pages/che-7/administration-guide/proc_enabling-and-exposing-che-metrics.adoc diff --git a/src/main/pages/che-7/administration-guide/assembly_monitoring-che.adoc b/src/main/pages/che-7/administration-guide/assembly_monitoring-che.adoc index 2291fc498f..9e5cda8189 100644 --- a/src/main/pages/che-7/administration-guide/assembly_monitoring-che.adoc +++ b/src/main/pages/che-7/administration-guide/assembly_monitoring-che.adoc @@ -15,9 +15,9 @@ summary: :context: monitoring-che -This chapter describes how to use additional tools to process data exposed as metrics by {prod-short}. +This chapter describes how to configure {prod-short} to expose metrics and how to build an example monitoring stack with external tools to process data exposed as metrics by {prod-short}. -The monitoring stack is not an official production-ready solution, but rather has an introduction purpose. +include::proc_enabling-and-exposing-che-metrics.adoc[leveloffset=+1] include::proc_collecting-che-metrics-with-prometheus.adoc[leveloffset=+1] diff --git a/src/main/pages/che-7/administration-guide/proc_collecting-che-metrics-with-prometheus.adoc b/src/main/pages/che-7/administration-guide/proc_collecting-che-metrics-with-prometheus.adoc index b64b1486f8..eea9efa2b3 100644 --- a/src/main/pages/che-7/administration-guide/proc_collecting-che-metrics-with-prometheus.adoc +++ b/src/main/pages/che-7/administration-guide/proc_collecting-che-metrics-with-prometheus.adoc @@ -3,15 +3,15 @@ This section describes how to use the Prometheus monitoring system to collect, store and query metrics about {prod-short}. -.Procedure +.Prerequisites -. Install Prometheus 2.9.1 or above. See more link:https://prometheus.io/docs/introduction/first_steps/[https://prometheus.io/docs/introduction/first_steps/]. +* {prod-short} is exposing metrics on port `8087`. See xref:enabling-and-exposing-{prod-id-short}-metrics_{context}[]. -. Set the `CHE_METRICS_ENABLED=true` environment variable +* Prometheus 2.9.1 or above is running. Prometheus console is accessible at the 9090 port of the application pod: http://che-host:9090/. See more link:https://prometheus.io/docs/introduction/first_steps/[First steps with Prometheus]. -. Expose the `8087` port as a service on the che-master host +.Procedure -. Configure Prometheus to scrape metrics from the `8087` port +* Configure Prometheus to scrape metrics from the `8087` port + .Prometheus configuration example [source,yaml,subs="+attributes"] @@ -43,6 +43,8 @@ This section describes how to use the Prometheus monitoring system to collect, s .Additional resources +* link:https://prometheus.io/docs/introduction/first_steps/[First steps with Prometheus]. + * link:https://prometheus.io/docs/prometheus/latest/configuration/configuration/[Configuring Prometheus]. * link:https://prometheus.io/docs/prometheus/latest/querying/basics/[Querying Prometheus]. diff --git a/src/main/pages/che-7/administration-guide/proc_enabling-and-exposing-che-metrics.adoc b/src/main/pages/che-7/administration-guide/proc_enabling-and-exposing-che-metrics.adoc new file mode 100644 index 0000000000..024f86624e --- /dev/null +++ b/src/main/pages/che-7/administration-guide/proc_enabling-and-exposing-che-metrics.adoc @@ -0,0 +1,10 @@ +[id="enabling-and-exposing-{prod-id-short}-metrics_{context}"] += Enabling and exposing {prod-short} metrics + +This section describes how to enable and expose {prod-short} metrics. + +.Procedure + +. Set the `CHE_METRICS_ENABLED=true` environment variable + +. Expose the `8087` port as a service on the che-master host From dbb9f816f88c3377824ae677f1d360e200e28c11 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fabrice=20Flore-Th=C3=A9bault?= Date: Mon, 23 Dec 2019 12:40:41 +0100 Subject: [PATCH 4/4] rework Viewing Che metrics on Grafana dashboards --- .../assembly_monitoring-che.adoc | 2 + ...ollecting-che-metrics-with-prometheus.adoc | 6 +- ...proc_extending-che-monitoring-metrics.adoc | 16 +- ...ing-che-metrics-on-grafana-dashboards.adoc | 174 ++---------------- .../ref_grafana-dashboards-for-che.adoc | 145 +++++++++++++++ 5 files changed, 169 insertions(+), 174 deletions(-) create mode 100644 src/main/pages/che-7/administration-guide/ref_grafana-dashboards-for-che.adoc diff --git a/src/main/pages/che-7/administration-guide/assembly_monitoring-che.adoc b/src/main/pages/che-7/administration-guide/assembly_monitoring-che.adoc index 9e5cda8189..10ee3318c3 100644 --- a/src/main/pages/che-7/administration-guide/assembly_monitoring-che.adoc +++ b/src/main/pages/che-7/administration-guide/assembly_monitoring-che.adoc @@ -25,6 +25,8 @@ ifeval::["{project-context}" == "che"] include::proc_viewing-che-metrics-on-grafana-dashboards.adoc[leveloffset=+1] +include::ref_grafana-dashboards-for-che.adoc[leveloffset=+1] + include::proc_developing-grafana-dashboards.adoc[leveloffset=+1] endif::[] diff --git a/src/main/pages/che-7/administration-guide/proc_collecting-che-metrics-with-prometheus.adoc b/src/main/pages/che-7/administration-guide/proc_collecting-che-metrics-with-prometheus.adoc index eea9efa2b3..7574214b01 100644 --- a/src/main/pages/che-7/administration-guide/proc_collecting-che-metrics-with-prometheus.adoc +++ b/src/main/pages/che-7/administration-guide/proc_collecting-che-metrics-with-prometheus.adoc @@ -7,7 +7,7 @@ This section describes how to use the Prometheus monitoring system to collect, s * {prod-short} is exposing metrics on port `8087`. See xref:enabling-and-exposing-{prod-id-short}-metrics_{context}[]. -* Prometheus 2.9.1 or above is running. Prometheus console is accessible at the 9090 port of the application pod: http://che-host:9090/. See more link:https://prometheus.io/docs/introduction/first_steps/[First steps with Prometheus]. +* Prometheus 2.9.1 or above is running. Prometheus console is running on port `9090` with a corresponding *service* and *route*. See link:https://prometheus.io/docs/introduction/first_steps/[First steps with Prometheus]. .Procedure @@ -37,9 +37,7 @@ This section describes how to use the Prometheus monitoring system to collect, s .Verification steps -. Navigate to the Prometheus console accessible at the `9090` port of the application pod: `++http://++{prod-host}:9090/`. The default template provides an existing *service* and a *route* to access it. - -. Use the Prometheus console to query and view metrics. +* Use the Prometheus console to query and view metrics. .Additional resources diff --git a/src/main/pages/che-7/administration-guide/proc_extending-che-monitoring-metrics.adoc b/src/main/pages/che-7/administration-guide/proc_extending-che-monitoring-metrics.adoc index e04e348825..de16300346 100644 --- a/src/main/pages/che-7/administration-guide/proc_extending-che-monitoring-metrics.adoc +++ b/src/main/pages/che-7/administration-guide/proc_extending-che-monitoring-metrics.adoc @@ -1,6 +1,8 @@ [id="extending-{prod-id-short}-monitoring-metrics_{context}"] = Extending {prod-short} monitoring metrics +This section describes how to create a metric or a group of metrics to extend the monitoring metrics that {prod-short} is exposing. + There are two major modules for metrics: * `che-core-metrics-core` -- contains core metrics module @@ -9,10 +11,10 @@ There are two major modules for metrics: .Procedure -To create a metric or a group of metrics, you need a class that extends the `MeterBinder` class. This allows to register the created metric in the overriden `bindTo(MeterRegistry registry)` method. - +* Create a class that extends the `MeterBinder` class. This allows to register the created metric in the overriden `bindTo(MeterRegistry registry)` method. ++ The following is an example of a metric that has a function that supplies the value for it: - ++ .Example metric [source,java] ---- @@ -40,13 +42,11 @@ public class UserMeterBinder implements MeterBinder { } } ---- - ++ Alternatively, the metric can be stored with a reference and updated manually in some other place in the code. .Additional resources -For more information about the types of metrics and naming conventions, visit Prometheus documentation: - -* link:https://prometheus.io/docs/practices/naming/[Naming practices] -* link:https://prometheus.io/docs/concepts/metric_types/[Metric types] +* link:https://prometheus.io/docs/practices/naming/[Metric and label naming for Prometheus] +* link:https://prometheus.io/docs/concepts/metric_types/[Metric types for Prometheus] diff --git a/src/main/pages/che-7/administration-guide/proc_viewing-che-metrics-on-grafana-dashboards.adoc b/src/main/pages/che-7/administration-guide/proc_viewing-che-metrics-on-grafana-dashboards.adoc index e7bd0f3139..1d697d5905 100644 --- a/src/main/pages/che-7/administration-guide/proc_viewing-che-metrics-on-grafana-dashboards.adoc +++ b/src/main/pages/che-7/administration-guide/proc_viewing-che-metrics-on-grafana-dashboards.adoc @@ -1,183 +1,33 @@ [id="viewing-{prod-id-short}-metrics-on-grafana-dashboards_{context}"] = Viewing {prod-short} metrics on Grafana dashboards -Grafana offers a front-end "facade" with tools to create a various visual representation in the form of dashboards with various panels and graph types. +This section describes how to view {prod-short} metrics on Grafana dashboards. -Grafana is used for informative representation of Prometheus metrics. Providing visibility for OpenShift, Grafana’s deployment configuration and ConfigMaps are located in the `che-monitoring.yaml` configuration file. +.Prerequisites -.Procedure - -. Install Grafana 6.0 or above. See more at link:https://grafana.com/docs/installation/[https://grafana.com/docs/installation/] - -. Configure a Prometheus data source on Grafana +* Prometheus is collecting metrics on the {prod-short} cluster. See xref:collecting-{prod-id-short}-metrics-with-prometheus_{context}[]. -. Deploy {prod-short}-specific dashboards on Grafana +* Grafana 6.0 or above is running on port `3000` with a corresponding *service* and *route*. See link:https://grafana.com/docs/installation/[Installing Grafana]. -== Configuring and deploying Grafana -Grafana is run on port `3000` with a corresponding *service* and *route*. +.Procedure +. Deploy {prod-short}-specific dashboards on Grafana using the `che-monitoring.yaml` configuration file. ++ Three ConfigMaps are used to configure Grafana: - ++ * `grafana-datasources` -- configuration for Grafana datasource, a Prometheus endpoint * `grafana-dashboards` -- configuration of Grafana dashboards and panels * `grafana-dashboard-provider` -- configuration of the Grafana dashboard provider API object, which tells Grafana where to look in the file system for pre-provisioned dashboards -== Grafana dashboards overview - -{prod-short} provides several types of dashboards. - - -=== {prod-short} server dashboard - -Use case: {prod-short} server-specific metrics related to {prod-short} components, such as workspaces or users. - -.The *General* panel -image::monitoring/monitoring-che-che-server-dashboard-general-panel.png[] - -The *General* panel contains basic information, such as the total number of users and workspaces in the {prod-short} database. - -.The *Workspaces* panel -image::monitoring/monitoring-che-che-server-dashboard-workspace-panel.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-workspace-panel.png"] - -* *Workspace start rate* -- the ratio between successful and failed started workspaces -* *Workspace stop rate* -- the ratio between successful and failed stopped workspaces -* *Workspace Failures* -- the number of workspace failures shown on the graph -* *Starting Workspaces* -- the gauge that shows the number of currently starting workspaces -* *Average Workspace Start Time* -- 1-hour average of workspace starts or fails -* *Average Workspace Stop Time* -- 1-hour average of workspace stops -* *Running Workspaces* -- the gauge that shows the number of currently running workspaces -* *Stopping Workspaces* -- the gauge that shows the number of currently stopping workspaces -* *Workspaces started under 60 seconds* -- the percentage of workspaces started under 60 seconds -* *Number of Workspaces* -- the number of workspaces created over time - -.The *Users* panel -image::monitoring/monitoring-che-che-server-dashboard-users-panel.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-users-panel.png"] - -* *Number of Users* -- the number of users known to {prod-short} over time - - -.The *Tomcat* panel -image::monitoring/monitoring-che-che-server-dashboard-tomcat-panel.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-tomcat-panel.png"] - -* *Max number of active sessions* -- the max number of active sessions that have been active at the same time -* *Number of current active sessions* -- the number of currently active sessions -* *Total sessions* -- the total number of sessions -* *Expired sessions* -- the number of sessions that have expired -* *Rejected sessions* -- the number of sessions that were not created because the maximum number of active sessions was reached -* *Longest time of an expired session* -- the longest time (in seconds) that an expired session had been alive - -.The *Request* panel -image::monitoring/monitoring-che-che-server-dashboard-requests-panel.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-requests-panel.png"] - -The *Requests* panel displays HTTP requests in a graph that shows the average number of requests per minute. - -.The *Executors* panel, part 1 -image::monitoring/monitoring-che-che-server-dashboard-executors-panel-1.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-executors-panel-1.png"] - -* *Threads running* - the number of threads that are not terminated aka alive. May include threads that are in a waiting or blocked state. -* *Threads terminated* - the number of threads that was finished its execution. -* *Threads created* - number of threads created by thread factory for given executor service. -* *Created thread/minute* - Speed of thread creating for the given executor service. - -.The *Executors* panel, part 2 -image::monitoring/monitoring-che-che-server-dashboard-executors-panel-2.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-executors-panel-2.png"] - -* *Executor threads active* - number of threads that actively execute tasks. -* *Executor pool size* - number of threads that actively execute tasks. -* *Queued task* - the approximate number of tasks that are queued for execution -* *Queued occupancy* - the percent of the queue used by the tasks that is waining for execution. - -.The *Executors* panel, part 3 -image::monitoring/monitoring-che-che-server-dashboard-executors-panel-3.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-executors-panel-3.png"] - -* *Rejected task* - the number of tasks that were rejected from execution. -* *Rejected task/minute* - the speed of task rejections -* *Completed tasks* - the number of completed tasks -* *Completed tasks/minute* - the speed of task execution - -.The *Executors* panel, part 4 -image::monitoring/monitoring-che-che-server-dashboard-executors-panel-4.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-executors-panel-4.png"] - -* *Task execution seconds max* - 5min moving maximum of task execution -* *Tasks execution seconds avg* - 1h moving average of task execution -* *Executor idle seconds max* - 5min moving maximum of executor idle state. -* *Executor idle seconds avg* - 1h moving average of executor idle state. - -.The *Traces* panel, part 1 -image::monitoring/monitoring-che-che-server-dashboard-trace-panel-1.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-trace-panel-1.png"] - -* *Workspace start Max* - maximum workspace start time -* *Workspace start Avg* - 1h moving average of the workspace start time components -* *Workspace stop Max* - maximum of workspace stop time -* *Workspace stop Avg* - 1h moving average of the workspace stop time components - -.The *Traces* panel, part 2 -image::monitoring/monitoring-che-che-server-dashboard-trace-panel-2.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-trace-panel-2.png"] - -* *OpenShiftInternalRuntime#start Max* - maximum time of OpenShiftInternalRuntime#start operation -* *OpenShiftInternalRuntime#start Avg* - 1h moving average time of OpenShiftInternalRuntime#start operation -* *Plugin Brokering Execution Max* - maximum time of PluginBrokerManager#getTooling operation -* *Plugin Brokering Execution Avg* - 1h moving average of PluginBrokerManager#getTooling operation - -.The *Traces* panel, part 3 -image::monitoring/monitoring-che-che-server-dashboard-trace-panel-3.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-trace-panel-3.png"] - -* *OpenShiftEnvironmentProvisioner#provision Max* - maximum time of OpenShiftEnvironmentProvisioner#provision operation -* *OpenShiftEnvironmentProvisioner#provision Avg* -1h moving average of OpenShiftEnvironmentProvisioner#provision operation -* *Plugin Brokering Execution Max* - maximum time of PluginBrokerManager#getTooling components execution time -* *Plugin Brokering Execution Avg* - 1h moving average of time of PluginBrokerManager#getTooling components execution time - -.The *Traces* panel, part 4 -image::monitoring/monitoring-che-che-server-dashboard-trace-panel-4.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-trace-panel-4.png"] - -* *WaitMachinesStart Max* - maximim time of WaitMachinesStart operations -* *WaitMachinesStart Avg* - 1h moving average time of WaitMachinesStart operations -* *OpenShiftInternalRuntime#startMachines Max* - maximim time of OpenShiftInternalRuntime#startMachines operations -* *OpenShiftInternalRuntime#startMachines Avg* - 1h moving average of the time of OpenShiftInternalRuntime#startMachines operations - -.The *Workspace detailed* panel -image::monitoring/monitoring-che-che-server-dashboard-workspace-detailed-panel.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-workspace-detailed-panel.png"] - -The *Workspace Detailed* panel contains heat maps, which illustrate the average time of workspace starts or fails. The row shows some period of time. - - -=== {prod-short} server JVM dashboard - -Use case: JVM metrics of the {prod-short} server, such as JVM memory or classloading. - -.{prod-short} server JVM dashboard -image::monitoring/monitoring-che-che-server-jvm-dashboard.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard.png"] - -.Quick Facts -image::monitoring/monitoring-che-che-server-jvm-dashboard-quick-facts.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-quick-facts.png"] - -.JVM Memory -image::monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory.png"] - -.JVM Misc -image::monitoring/monitoring-che-che-server-jvm-dashboard-jvm-misc.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-jvm-misc.png"] - -.JVM Memory Pools (heap) -image::monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory-pools-heap.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory-pools-heap.png"] -.JVM Memory Pools (Non-Heap) -image::monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory-pools-non-heap.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory-pools-non-heap.png"] -.Garbage Collection -image::monitoring/monitoring-che-che-server-jvm-dashboard-garbage-collection.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-garbage-collection.png"] -.Classloading -image::monitoring/monitoring-che-che-server-jvm-dashboard-classloading.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-classloading.png"] +.Verification steps -.Buffer Pools -image::monitoring/monitoring-che-che-server-jvm-dashboard-buffer-pools.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-buffer-pools.png"] +* Use the Grafana console to view {prod-short} metrics. +.Additional resources -// [discrete] -// == Additional resources -// -// * A bulleted list of links to other material closely related to the contents of the procedure module. -// * For more details on writing procedure modules, see the link:https://github.com/redhat-documentation/modular-docs#modular-documentation-reference-guide[Modular Documentation Reference Guide]. -// * Use a consistent system for file names, IDs, and titles. For tips, see _Anchor Names and File Names_ in link:https://github.com/redhat-documentation/modular-docs#modular-documentation-reference-guide[Modular Documentation Reference Guide]. +* link:https://grafana.com/docs/installation/[Installing Grafana]. \ No newline at end of file diff --git a/src/main/pages/che-7/administration-guide/ref_grafana-dashboards-for-che.adoc b/src/main/pages/che-7/administration-guide/ref_grafana-dashboards-for-che.adoc new file mode 100644 index 0000000000..e7efb46ae7 --- /dev/null +++ b/src/main/pages/che-7/administration-guide/ref_grafana-dashboards-for-che.adoc @@ -0,0 +1,145 @@ +[id="grafana-dashboards-for-{prod-id-short}_{context}"] += Grafana dashboards for {prod-short} + +This section describes the Grafana dashboards that are displaying metrics collected from {prod-short}. + +.The *General* panel +image::monitoring/monitoring-che-che-server-dashboard-general-panel.png[] + +The *General* panel contains basic information, such as the total number of users and workspaces in the {prod-short} database. + +.The *Workspaces* panel +image::monitoring/monitoring-che-che-server-dashboard-workspace-panel.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-workspace-panel.png"] + +* *Workspace start rate* -- the ratio between successful and failed started workspaces +* *Workspace stop rate* -- the ratio between successful and failed stopped workspaces +* *Workspace Failures* -- the number of workspace failures shown on the graph +* *Starting Workspaces* -- the gauge that shows the number of currently starting workspaces +* *Average Workspace Start Time* -- 1-hour average of workspace starts or fails +* *Average Workspace Stop Time* -- 1-hour average of workspace stops +* *Running Workspaces* -- the gauge that shows the number of currently running workspaces +* *Stopping Workspaces* -- the gauge that shows the number of currently stopping workspaces +* *Workspaces started under 60 seconds* -- the percentage of workspaces started under 60 seconds +* *Number of Workspaces* -- the number of workspaces created over time + +.The *Users* panel +image::monitoring/monitoring-che-che-server-dashboard-users-panel.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-users-panel.png"] + +* *Number of Users* -- the number of users known to {prod-short} over time + + +.The *Tomcat* panel +image::monitoring/monitoring-che-che-server-dashboard-tomcat-panel.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-tomcat-panel.png"] + +* *Max number of active sessions* -- the max number of active sessions that have been active at the same time +* *Number of current active sessions* -- the number of currently active sessions +* *Total sessions* -- the total number of sessions +* *Expired sessions* -- the number of sessions that have expired +* *Rejected sessions* -- the number of sessions that were not created because the maximum number of active sessions was reached +* *Longest time of an expired session* -- the longest time (in seconds) that an expired session had been alive + +.The *Request* panel +image::monitoring/monitoring-che-che-server-dashboard-requests-panel.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-requests-panel.png"] + +The *Requests* panel displays HTTP requests in a graph that shows the average number of requests per minute. + +.The *Executors* panel, part 1 +image::monitoring/monitoring-che-che-server-dashboard-executors-panel-1.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-executors-panel-1.png"] + +* *Threads running* - the number of threads that are not terminated aka alive. May include threads that are in a waiting or blocked state. +* *Threads terminated* - the number of threads that was finished its execution. +* *Threads created* - number of threads created by thread factory for given executor service. +* *Created thread/minute* - Speed of thread creating for the given executor service. + +.The *Executors* panel, part 2 +image::monitoring/monitoring-che-che-server-dashboard-executors-panel-2.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-executors-panel-2.png"] + +* *Executor threads active* - number of threads that actively execute tasks. +* *Executor pool size* - number of threads that actively execute tasks. +* *Queued task* - the approximate number of tasks that are queued for execution +* *Queued occupancy* - the percent of the queue used by the tasks that is waining for execution. + +.The *Executors* panel, part 3 +image::monitoring/monitoring-che-che-server-dashboard-executors-panel-3.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-executors-panel-3.png"] + +* *Rejected task* - the number of tasks that were rejected from execution. +* *Rejected task/minute* - the speed of task rejections +* *Completed tasks* - the number of completed tasks +* *Completed tasks/minute* - the speed of task execution + +.The *Executors* panel, part 4 +image::monitoring/monitoring-che-che-server-dashboard-executors-panel-4.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-executors-panel-4.png"] + +* *Task execution seconds max* - 5min moving maximum of task execution +* *Tasks execution seconds avg* - 1h moving average of task execution +* *Executor idle seconds max* - 5min moving maximum of executor idle state. +* *Executor idle seconds avg* - 1h moving average of executor idle state. + +.The *Traces* panel, part 1 +image::monitoring/monitoring-che-che-server-dashboard-trace-panel-1.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-trace-panel-1.png"] + +* *Workspace start Max* - maximum workspace start time +* *Workspace start Avg* - 1h moving average of the workspace start time components +* *Workspace stop Max* - maximum of workspace stop time +* *Workspace stop Avg* - 1h moving average of the workspace stop time components + +.The *Traces* panel, part 2 +image::monitoring/monitoring-che-che-server-dashboard-trace-panel-2.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-trace-panel-2.png"] + +* *OpenShiftInternalRuntime#start Max* - maximum time of OpenShiftInternalRuntime#start operation +* *OpenShiftInternalRuntime#start Avg* - 1h moving average time of OpenShiftInternalRuntime#start operation +* *Plugin Brokering Execution Max* - maximum time of PluginBrokerManager#getTooling operation +* *Plugin Brokering Execution Avg* - 1h moving average of PluginBrokerManager#getTooling operation + +.The *Traces* panel, part 3 +image::monitoring/monitoring-che-che-server-dashboard-trace-panel-3.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-trace-panel-3.png"] + +* *OpenShiftEnvironmentProvisioner#provision Max* - maximum time of OpenShiftEnvironmentProvisioner#provision operation +* *OpenShiftEnvironmentProvisioner#provision Avg* -1h moving average of OpenShiftEnvironmentProvisioner#provision operation +* *Plugin Brokering Execution Max* - maximum time of PluginBrokerManager#getTooling components execution time +* *Plugin Brokering Execution Avg* - 1h moving average of time of PluginBrokerManager#getTooling components execution time + +.The *Traces* panel, part 4 +image::monitoring/monitoring-che-che-server-dashboard-trace-panel-4.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-trace-panel-4.png"] + +* *WaitMachinesStart Max* - maximim time of WaitMachinesStart operations +* *WaitMachinesStart Avg* - 1h moving average time of WaitMachinesStart operations +* *OpenShiftInternalRuntime#startMachines Max* - maximim time of OpenShiftInternalRuntime#startMachines operations +* *OpenShiftInternalRuntime#startMachines Avg* - 1h moving average of the time of OpenShiftInternalRuntime#startMachines operations + +.The *Workspace detailed* panel +image::monitoring/monitoring-che-che-server-dashboard-workspace-detailed-panel.png[link="{imagesdir}/monitoring/monitoring-che-che-server-dashboard-workspace-detailed-panel.png"] + +The *Workspace Detailed* panel contains heat maps, which illustrate the average time of workspace starts or fails. The row shows some period of time. + + +=== {prod-short} server JVM dashboard + +Use case: JVM metrics of the {prod-short} server, such as JVM memory or classloading. + +.{prod-short} server JVM dashboard +image::monitoring/monitoring-che-che-server-jvm-dashboard.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard.png"] + +.Quick Facts +image::monitoring/monitoring-che-che-server-jvm-dashboard-quick-facts.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-quick-facts.png"] + +.JVM Memory +image::monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory.png"] + +.JVM Misc +image::monitoring/monitoring-che-che-server-jvm-dashboard-jvm-misc.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-jvm-misc.png"] + +.JVM Memory Pools (heap) +image::monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory-pools-heap.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory-pools-heap.png"] + +.JVM Memory Pools (Non-Heap) +image::monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory-pools-non-heap.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-jvm-memory-pools-non-heap.png"] + +.Garbage Collection +image::monitoring/monitoring-che-che-server-jvm-dashboard-garbage-collection.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-garbage-collection.png"] + +.Classloading +image::monitoring/monitoring-che-che-server-jvm-dashboard-classloading.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-classloading.png"] + +.Buffer Pools +image::monitoring/monitoring-che-che-server-jvm-dashboard-buffer-pools.png[link="{imagesdir}/monitoring/monitoring-che-che-server-jvm-dashboard-buffer-pools.png"]