Skip to content

Commit

Permalink
Merge pull request #44 from showuon/operatorMonitor
Browse files Browse the repository at this point in the history
add prometheus integration for Flink operator
  • Loading branch information
showuon authored Jan 7, 2025
2 parents 4fc9f57 + 8ae7e02 commit da4117b
Show file tree
Hide file tree
Showing 12 changed files with 50 additions and 12 deletions.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,11 +65,14 @@ If you choose to do this make sure you update the `data-generator.yaml` file for
--set podSecurityContext=null \
--set defaultConfiguration."log4j-operator\.properties"=monitorInterval\=30 \
--set defaultConfiguration."log4j-console\.properties"=monitorInterval\=30 \
--set defaultConfiguration."flink-conf\.yaml"="kubernetes.operator.metrics.reporter.prom.factory.class\:\ org.apache.flink.metrics.prometheus.PrometheusReporterFactory
kubernetes.operator.metrics.reporter.prom.port\:\ 9249 " \
-n flink
```
Note:<br>
(1) Set `podSecurityContext` to null so that we can run in OpenShift environment<br>
(2) Set `monitorInterval` to log4j properties file so that we can dynamically change log level for operator and job/task manager.
(3) Set the metrics reporter as prometheus for [further integration](prometheus-install/README.md).
### Running an example
Expand Down
24 changes: 15 additions & 9 deletions prometheus-install/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,17 @@ After deploying Flink cluster, you can then deploy Prometheus to monitor the met

**Linux:**
```
sed -i s/OPERATOR/$(kubectl get pods -lapp.kubernetes.io/name=flink-kubernetes-operator -n flink -o=jsonpath="{range .items[*]}{.status.podIP}{','}{end}" | cut -d ',' -f1)/g prometheus-install/prometheus-config.yaml
sed -i s/JOB_MANAGER/$(kubectl get pods -lapp=recommendation-app -n flink -o=jsonpath="{range .items[*]}{.status.podIP}{','}{end}" | cut -d ',' -f1)/g prometheus-install/prometheus-config.yaml
sed -i s/TASK_MANAGER/$(kubectl get pods -lapp=recommendation-app -n flink -o=jsonpath="{range .items[*]}{.status.podIP}{','}{end}" | cut -d ',' -f2)/g prometheus-install/prometheus-config.yaml
```
**MacOS**
```
sed -i '' s/JOB_MANAGER/$(kubectl get pods -lapp.kubernetes.io/name=flink-kubernetes-operator -n flink -o=jsonpath="{range .items[*]}{.status.podIP}{','}{end}" | cut -d ',' -f1)/g prometheus-install/prometheus-config.yaml
sed -i '' s/JOB_MANAGER/$(kubectl get pods -lapp=recommendation-app -n flink -o=jsonpath="{range .items[*]}{.status.podIP}{','}{end}" | cut -d ',' -f1)/g prometheus-install/prometheus-config.yaml
sed -i '' s/TASK_MANAGER/$(kubectl get pods -lapp=recommendation-app -n flink -o=jsonpath="{range .items[*]}{.status.podIP}{','}{end}" | cut -d ',' -f2)/g prometheus-install/prometheus-config.yaml
```
Note: Here we assume there's only 1 job manager and 1 task manager. If you deployed more than that, please update the `prometheus-config.yaml` file.
Note: Here we assume there's only 1 flink kubernetes operator, 1 job manager, and 1 task manager. If you deployed more than that, please update the `prometheus-config.yaml` file.

2. Install prometheus, configuration, and service:
```
Expand All @@ -25,22 +27,26 @@ After deploying Flink cluster, you can then deploy Prometheus to monitor the met
```
kubectl port-forward svc/prometheus-service -n flink 9090
```
4. Now you can monitor the metrics in job manager or task manager via the Prometheus UI is accessible at localhost:9090.
![img.png](job_metric.png)
![img.png](task_metric.png)
4. Now you can monitor the metrics in flink kubernetes operator, job manager or task manager via the Prometheus UI is accessible at localhost:9090.
![img.png](images/operator_metric.png)
![img.png](images/job_metric.png)
![img.png](images/task_metric.png)

# Integrate Prometheus into Flink cluster deployed on OpenShift

Since Openshift already has a built-in Prometheus installed and configured, we can integrate with it by deploying a `PodMonitor` CR for the flink cluster:

1. Install the pre-configured `PodMonitor` CR:
1. Install the pre-configured `PodMonitor`, `service`, and `serviceMonitor` CRs:
```
oc apply -f prometheus-install/podmonitor_example/flink-monitor.yaml -n flink
oc apply -f prometheus-install/openshift_monitor_example -n flink
```
Note: This CR is configured to select the `FlinkDeployment` created as part of the `recommendation-app` example. Please update the `selector.matchLabels` field in `flink-monitor.yaml` if you are running a different example.
Note: These CRs are configured to select the Flink kubernetes operator, and
`FlinkDeployment` created as part of the `recommendation-app` example.
Please update the `selector.matchLabels` field in `flink-monitor.yaml` if you are running a different example.

2. It takes around 5 minutes to wait for prometheus operator to update the config for prometheus server. After that, you can query the metrics in the OpenShift UI as described [here](https://docs.openshift.com/container-platform/4.16/observability/monitoring/managing-metrics.html#querying-metrics-for-all-projects-as-an-administrator_managing-metrics).
![img.png](openshift_jobmanager.png)
![img.png](openshift_taskmanager.png)
![img.png](images/openshift_operator.png)
![img.png](images/openshift_jobmanager.png)
![img.png](images/openshift_taskmanager.png)

File renamed without changes
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added prometheus-install/images/operator_metric.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# The service is created for serviceMonitor use, to open the prometheus port for scraping
# The flink kubernetes operator cannot config custom container port like FlinkDeployment does, so this service is needed.
apiVersion: v1
kind: Service
metadata:
name: flink-operator-prometheus-service
labels:
app: flink-operator-prometheus-service
spec:
ports:
- port: 9249
targetPort: 9249
name: prom
selector:
app.kubernetes.io/name: flink-kubernetes-operator
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# Scraping for job managers/task managers
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: flink-metrics
name: flink-pod-monitor
labels:
app: flink-monitor
app: flink-pod-monitor
spec:
selector:
matchLabels:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Scraping for flink kubernetes operators
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: flink-service-monitor
spec:
endpoints:
- interval: 10s
port: prom
scheme: http
selector:
matchLabels:
app: flink-operator-prometheus-service
2 changes: 1 addition & 1 deletion prometheus-install/prometheus-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ data:
scrape_configs:
- job_name: 'flink'
static_configs:
- targets: ['JOB_MANAGER:9249', 'TASK_MANAGER:9249']
- targets: ['OPERATOR:9249', 'JOB_MANAGER:9249', 'TASK_MANAGER:9249']

0 comments on commit da4117b

Please sign in to comment.