From 98c97acac523af32a2f1545979b12f698f43882e Mon Sep 17 00:00:00 2001 From: Greg Neiheisel Date: Fri, 12 Oct 2018 01:51:13 -0400 Subject: [PATCH] [AIRFLOW-3177] Change scheduler_heartbeat from gauge to counter (#4027) This updates the scheduler_heartbeat metric from a gauge to a counter to better support the statsd_exporter for usage with Prometheus. A counter allows users to track the rate of the heartbeat, and integrates with the exporter better. A crashing or down scheduler will no longer emit the metric, but the statsd_exporter will continue to show a 1 for the metric value. This fixes that issue because a counter will continually change, and the lack of change indicates an issue with the scheduler. Add statsd change notice in UPDATING.md --- UPDATING.md | 4 ++++ airflow/jobs.py | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/UPDATING.md b/UPDATING.md index 74337f3fe88de..5e1402576b867 100644 --- a/UPDATING.md +++ b/UPDATING.md @@ -52,6 +52,10 @@ To delete a user: airflow users --delete --username jondoe ``` +### StatsD Metrics + +The `scheduler_heartbeat` metric has been changed from a gauge to a counter. Each loop of the scheduler will increment the counter by 1. This provides a higher degree of visibility and allows for better integration with Prometheus using the [StatsD Exporter](https://github.com/prometheus/statsd_exporter). Scheduler upness can be determined by graphing and alerting using a rate. If the scheduler goes down, the rate will drop to 0. + ### Custom auth backends interface change We have updated the version of flask-login we depend upon, and as a result any diff --git a/airflow/jobs.py b/airflow/jobs.py index b224f755459ce..3922939a868a7 100644 --- a/airflow/jobs.py +++ b/airflow/jobs.py @@ -1895,7 +1895,7 @@ def process_file(self, file_path, pickle_dags=False, session=None): @provide_session def heartbeat_callback(self, session=None): - Stats.gauge('scheduler_heartbeat', 1, 1) + Stats.incr('scheduler_heartbeat', 1, 1) class BackfillJob(BaseJob):