Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/istio metrics #4253

Merged
merged 29 commits into from
Nov 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
8db832b
Sample metrics for proxy sidecars
gsantoro Sep 15, 2022
c2ac3cd
expose metrics datastream, minor changes to docs
gsantoro Sep 15, 2022
16169bf
working version of proxy metrics
gsantoro Sep 21, 2022
f8b1771
filters + period
gsantoro Sep 21, 2022
91085a8
renamed metrics for proxy
gsantoro Sep 21, 2022
3986911
Metrics for Istiod
gsantoro Sep 21, 2022
1c236fa
new entry in changelog
gsantoro Sep 21, 2022
2a6097a
extra processors to clean events, sample events and missing docs
gsantoro Sep 23, 2022
61f1e34
minor changes to sample events and pipelines, dynamic fields for labe…
gsantoro Sep 23, 2022
9d1bbb2
fix issue with mappings
gsantoro Sep 23, 2022
cd34265
renaming metrics mappings
gsantoro Sep 23, 2022
b9daa35
Update packages/istio/data_stream/proxy_metrics/manifest.yml
gsantoro Sep 23, 2022
e377a92
only scrape prometheus metrics if scrape annotation is true
gsantoro Sep 26, 2022
c1ed41d
trying to setup a system test for istio
gsantoro Oct 12, 2022
45a9664
fixed system tests
gsantoro Oct 13, 2022
e899bb3
docs for testing
gsantoro Oct 14, 2022
6411126
new dashboards
gsantoro Nov 2, 2022
64a2987
new istio data views + renamed dashboards and visualizations
gsantoro Nov 2, 2022
a2dc801
remove custom labels from gauges
gsantoro Nov 3, 2022
8cb83b9
updated dashboards to make tests pass
gsantoro Nov 3, 2022
d7770ee
added info on markdown component
gsantoro Nov 3, 2022
c604e81
removed istio data views
gsantoro Nov 17, 2022
5e79ed2
remove suffix from visualisations
gsantoro Nov 18, 2022
8d20cc0
tagcloud -> piecharts
gsantoro Nov 18, 2022
3184b75
screenshots
gsantoro Nov 18, 2022
12adee5
changed the order of the screenshots and fixed their size
gsantoro Nov 18, 2022
7f8b11d
changed release to ga
gsantoro Nov 18, 2022
16a052c
align migration versions
gsantoro Nov 18, 2022
e0cc33e
migrated visualisation version to 8.4.0 to make it compatible with th…
gsantoro Nov 18, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 59 additions & 1 deletion packages/istio/_dev/build/docs/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Istio Integration

This integration ingest access logs created by the [Istio](https://istio.io/) service mesh.
This integration ingest access logs and metrics created by the [Istio](https://istio.io/) service mesh.

## Compatibility
gsantoro marked this conversation as resolved.
Show resolved Hide resolved

Expand All @@ -15,3 +15,61 @@ The `access_logs` data stream collects Istio access logs.
{{event "access_logs"}}

{{fields "access_logs"}}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can u add a section on how to test it locally? Steps how to install istio, config until you see metrics in ELK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think all the instructions on how to test this locally should end up in the official docs. It doesn't seem to be standard practise in other packages. In order to test it "locally" you need to setup all sort of VMs, K8s cluster and elastic stack with custom properties. I don't think this kind of information should end up in this readme. For the istio specific steps to set this up, I am using the getting started at https://istio.io/latest/docs/setup/getting-started/ which is already documented in the issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tried to change this a little: have a look here https://github.com/elastic/integrations/tree/main/packages/nginx_ingress_controller/_dev/build#how-to-setup-and-test-ingress-controller-locally

In general of course not instructions to install all components, you will consider that user has already elk and k8s etc. But reference to starting guide with some additional hints on what to install would be great I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I can add something similar but fortunately there is not much custom configs or manual steps that I needed for either logs or metrics

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are other packages that have additionla readme files in testing that describe in detail how it should be tested. So users using the package will not see it but any package dev has easy access to it.

Copy link
Contributor Author

@gsantoro gsantoro Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin reading your comment I would suggest that a more appropriate place for testing docs is packages/istio/data_stream/istiod_metrics/_dev/test/system or anywhere else hidden in the test folders. If I understand correctly adding this test docs here (at packages/istio/_dev/build/docs/README.md will modify the packages/istio/docs/README.md which is used to expose the package docs to the package users).

I'm not a fan of mixing testing docs with official user facing docs where we usually document only the fields exposed by the integration.


## Metrics

### Istiod Metrics

The `istiod_metrics` data stream collects Istiod metrics.

{{event "istiod_metrics"}}

{{fields "istiod_metrics"}}

### Proxy Metrics

The `proxy_metrics` data stream collects Istio proxy metrics.

{{event "proxy_metrics"}}

{{fields "proxy_metrics"}}


## How to setup and test Istio locally

1. Setup a Kubernetes cluster. Since the Istio sample app requires lots of RAM (> 10GB) it's preferable to use a managed Kubernetes cluster (any cloud provider will do).
2. Setup a EK cluster on Elastic Cloud. For the same reason that Istio sample app requires a lot of RAM, it's unfeasible to run the Elastic cluster on your laptop via elastic-package. As an alternative ECK might be used as well.
3. Start elastic agents on Kubernetes cluster. The easiest way to achieve this is by using Fleet Server. You can find instructions [here](https://www.elastic.co/guide/en/fleet/master/running-on-kubernetes-managed-by-fleet.html)
4. Download Istio cli following the [instructions](https://istio.io/latest/docs/setup/getting-started/#download).
5. Install Istio via [instructions](https://istio.io/latest/docs/setup/getting-started/#install). The namespace `default` is used with this basic installation. This is the same namespace where we are going to run the Istio sample app.
6. Deploy the sample application via [instructions](https://istio.io/latest/docs/setup/getting-started/#bookinfo)
7. Open the application to external traffic and determine the ingress IP and ports. This step is slightly different depending where Kubernetes is running. More info at [here](https://istio.io/latest/docs/setup/getting-started/#ip) and [here](https://istio.io/latest/docs/setup/getting-started/#determining-the-ingress-ip-and-ports). The following commands should be enough to get this working.

```bash
kubectl apply -f samples/bookinfo/networking/bookinfo-gateway.yaml
istioctl analyze

# since we are using a cloud environment with an external load balancer
export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
export SECURE_INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="https")].port}')
export GATEWAY_URL=$INGRESS_HOST:$INGRESS_PORT
```

From the same terminal run the following command to open a browser to that link. This should verify that the sample application is reachable.

```bash
open "http://$GATEWAY_URL/productpage"
```

8. Generate some traffic to the sample application


```bash
for i in $(seq 1 100); do curl -s -o /dev/null "http://$GATEWAY_URL/productpage"; done
```

9. (Optional) You can visualize the graph of microservices in the sample app via [instructions](https://istio.io/latest/docs/setup/getting-started/#dashboard).
9. Add the Istio integration from the registry.
10. View logs and/or metrics from the Istio integration using the Discovery tab and selecting the right Data view
21 changes: 21 additions & 0 deletions packages/istio/_dev/deploy/docker/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
version: '2.3'
services:
istio_is_ready:
image: tianon/true
depends_on:
istio:
condition: service_healthy
istio:
image: nginx:alpine
ports:
- 8080
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
- ./istiod.txt:/www/data/metrics/istiod.txt
- ./proxy.txt:/www/data/metrics/proxy.txt
healthcheck:
interval: 1s
retries: 120
timeout: 120s
test: |-
curl -f -s http://localhost:8080/metrics/ -o /dev/null
525 changes: 525 additions & 0 deletions packages/istio/_dev/deploy/docker/istiod.txt

Large diffs are not rendered by default.

17 changes: 17 additions & 0 deletions packages/istio/_dev/deploy/docker/nginx.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
worker_processes 1;

events { worker_connections 1024; }

http {
sendfile on;

server {
listen 8080;

root /www/data;

location /metrics {
autoindex on;
}
}
}
961 changes: 961 additions & 0 deletions packages/istio/_dev/deploy/docker/proxy.txt

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions packages/istio/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "0.2.0"
changes:
- description: Metrics for Istiod and Proxy sidecar container
type: enhancement # can be one of: enhancement, bugfix, breaking-change
link: https://github.com/elastic/integrations/pull/4253
- version: "0.1.0"
changes:
- description: Initial release
Expand Down
2 changes: 1 addition & 1 deletion packages/istio/data_stream/access_logs/manifest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ type: logs
release: experimental
streams:
- input: filestream
title: Collect Istio access logs
title: Istio access logs
description: Collect Istio access logs either in text or json format
vars:
- name: paths
Expand Down
5 changes: 5 additions & 0 deletions packages/istio/data_stream/access_logs/sample_event.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
{
"@timestamp": "2022-07-20T09:52:24.955Z",
"data_stream": {
"namespace": "default",
"type": "logs",
"dataset": "istio.access_logs"
},
"destination": {
"address": "10.68.2.10:9080",
"ip": "10.68.2.10",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
service: istio
data_stream:
vars:
period: 1s
hosts:
- "http://{{Hostname}}:8080"
metrics_path: "/metrics/istiod.txt"
condition: "true"
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
metricsets: ["collector"]
period: {{period}}
hosts:
{{#each hosts}}
- {{this}}
{{/each}}
condition: {{ condition }}
{{#if metrics_path}}
metrics_path: {{metrics_path}}
{{/if}}
metrics_filters.exclude:
{{#each metrics_filters.exclude}}
- {{this}}
{{/each}}
metrics_filters.include:
{{#each metrics_filters.include}}
- {{this}}
{{/each}}
use_types: true
rate_counters: true
gsantoro marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
description: Pipeline for renaming object
processors:
gsantoro marked this conversation as resolved.
Show resolved Hide resolved
- remove:
field:
- metricset.name
- service.address
- service.type
ignore_missing: true
- set:
field: ecs.version
value: '8.4.0'
- set:
field: event.module
value: istio
- set:
field: event.kind
value: metric
- rename:
field: prometheus.labels
target_field: istio.istiod.labels
ignore_missing: true
- set:
field: istio.istiod.labels.job
value: istio
override: true
- rename:
field: prometheus
target_field: istio.istiod.metrics
ignore_missing: true
on_failure:
- set:
field: error.message
value: '{{ _ingest.on_failure_message }}'
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
- name: data_stream.type
type: constant_keyword
description: Data stream type.
- name: data_stream.dataset
type: constant_keyword
description: Data stream dataset.
- name: data_stream.namespace
type: constant_keyword
description: Data stream namespace.
- name: '@timestamp'
type: date
description: Event timestamp.
- name: event.module
type: constant_keyword
description: Event module
- name: event.dataset
type: constant_keyword
description: Event dataset
8 changes: 8 additions & 0 deletions packages/istio/data_stream/istiod_metrics/fields/ecs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
- name: ecs.version
external: ecs
- name: error.message
external: ecs
- name: event.ingested
external: ecs
- name: event.kind
external: ecs
35 changes: 35 additions & 0 deletions packages/istio/data_stream/istiod_metrics/fields/fields.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
- name: istio.istiod
type: group
fields:
- name: labels.*
type: object
object_type: keyword
description: |
Istiod metric labels
- name: istio.istiod.metrics.*.value
type: object
object_type: double
object_type_mapping_type: "*"
description: >
Istiod gauge metric

- name: istio.istiod.metrics.*.counter
type: object
object_type: double
object_type_mapping_type: "*"
description: >
Istiod counter metric

- name: istio.istiod.metrics.*.rate
type: object
object_type: double
object_type_mapping_type: "*"
description: >
Istiod rated counter metric

- name: istio.istiod.metrics.*.histogram
type: object
object_type: histogram
object_type_mapping_type: "*"
description: >-
Istiod histogram metric
58 changes: 58 additions & 0 deletions packages/istio/data_stream/istiod_metrics/manifest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
title: "Istiod Metrics"
release: experimental
type: metrics
streams:
- input: prometheus/metrics
title: Istiod metrics
description: Collect Istiod metrics
vars:
- name: period
type: text
title: Period
multi: false
required: true
show_user: true
default: 10s
- name: hosts
type: text
title: Hosts
multi: true
required: true
show_user: true
default:
- ${kubernetes.pod.ip}:15014
- name: metrics_path
type: text
title: Metrics Path
multi: false
required: true
show_user: true
default:
- /metrics
- name: condition
title: Condition
description: Condition to filter when to apply this datastream
type: text
multi: false
required: true
show_user: true
default: ${kubernetes.labels.app} == 'istiod' and ${kubernetes.annotations.prometheus.io/scrape} == 'true'
Copy link
Member

@ChrsMark ChrsMark Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a rehearsal from https://www.elastic.co/blog/istio-monitoring-with-elastic-observability I remember that istiod could be accessed directly through its service. For example:

Metricbeat config:

- module: istio 
  metricsets: ['istiod'] 
  period: 10s 
  hosts: ['istiod.istio-system:15014']

So in that case you don't need a to automatically discover the Pods using a condition, since you can directly access the endpoint through the k8s Service:
istiod.istio-system:15014 where istiod is the name of the Service and istio-system is the name of Namespace it belongs to.

Is this still the case or there is sth that has been changed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also any update on this @gsantoro ? The Pr can be merged for now and open a small enhancement for this if you think you need time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer to merge and then open a new issue for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is istiod exposed through a Service today? We need to clarify this before proceeding.

Also how many istiod Pods are running per cluster? If it's only one, which is also exposed through a Service, then we should better use istiod.istio-system:15014 as host endpoint along with a leaderelection condition. Similarly to what we have for cluster level metrics at https://github.com/elastic/integrations/blob/main/packages/kubernetes/data_stream/apiserver/manifest.yml#L21.
If there are more than 1 istiod Pod per cluster (for high availability maybe) then we should check if those keep and expose the same metrics or each one of them keep and provide different metrics (using a sharding mechanismo for example). In that case we could use the "autodiscovery" approach otherwise if the endpoint of the control plane metrics is unique we should not use the "autodiscovery" approach. See below:

The way we have it now makes it more complicated and resource consuming since we use an "autodiscovery" approach when the endpoint is unique per cluster and static through a Service. This means that every time the istiod Pod's state is updated we will trigger an event that this Pod is updated and we will re-launch the "autodiscovery" event. This happens at https://github.com/elastic/elastic-agent/blob/main/internal/pkg/composable/providers/kubernetes/pod.go#L218-L232 and if you follow the codebase you can understand that the processing load is quite a lot.

Add to this the fact that "autodiscovery" based inputs are harder to troubleshoot them.
Consequently it'snot a suggested practice to use "autodiscovery" conditions if not really needed and I would prefer fixing it at first place here instead of having a follow up.
Let me know if that makes sense or if I miss anything here.

- name: metrics_filters.exclude
type: text
title: Metrics Filters Exclude
multi: true
required: false
show_user: true
default:
- "^up$"
- name: metrics_filters.include
type: text
title: Metrics Filters Include
multi: true
required: false
show_user: true
default:
- "galley_*"
- "pilot_*"
- "citadel_*"
- "istio_*"
55 changes: 55 additions & 0 deletions packages/istio/data_stream/istiod_metrics/sample_event.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
{
"istio": {
"istiod": {
"metrics": {
"pilot_xds_config_size_bytes": {
"histogram": {
"counts": [
0,
0,
0,
0,
0,
0,
0
],
"values": [
0.5,
5000.5,
505000,
2500000,
7000000,
25000000,
70000000
]
}
}
},
"labels": {
"instance": "10.124.0.8:15014",
"type": "type.googleapis.com/envoy.config.route.v3.RouteConfiguration",
"job": "istio"
}
}
},
"@timestamp": "2022-09-23T09:30:56.055Z",
"ecs": {
"version": "8.4.0"
},
"data_stream": {
"namespace": "default",
"type": "metrics",
"dataset": "istio.istiod_metrics"
},
"metricset": {
"period": 10000
},
"event": {
"duration": 10806443,
"agent_id_status": "verified",
"kind": "metric",
"ingested": "2022-09-23T09:30:57Z",
"module": "istio",
"dataset": "istio.istiod_metrics"
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
service: istio
data_stream:
vars:
period: 1s
hosts:
- "http://{{Hostname}}:8080"
metrics_path: "/metrics/proxy.txt"
condition: "true"
Loading