-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ingesting pg_settings for dbm
users
#14577
Conversation
The |
Codecov Report
Flags with carried forward coverage won't be shown. Click here to find out more. |
The |
The |
The |
DEFAULT_COLLECTION_INTERVAL = 30 | ||
|
||
PG_SETTINGS_QUERY = """ | ||
SELECT name, setting, category, short_desc FROM pg_settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For productionizing: how expensive is this query? Hopefully not much, but the concern I have is if the agent is restarting frequently, the 30m frequency will actually end up being more like 15s frequency between restarts. So is the 30m a performance optimization at all?
(As noted in chat, only name and setting are necessary)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this stuff is stored in the system catalogs, which requires a disk read but then it should (in theory) be served from memory?
Seems OK to run this query once every 15s. I only had the interval so long because these settings aren't changed very often
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kk let's just review and explain plan and latency before merge, but otherwise sounds good to me.
from datadog_checks.base.utils.tracking import tracked_method | ||
|
||
# default collection interval in minutes | ||
DEFAULT_COLLECTION_INTERVAL = 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is not frequently updating data, but 30 minutes is so long. Imagine looking at a 1-hour graph and having no idea where the change actually happened. 10m is the minimum I would declare for any data we send unless the query is very expensive. I think 5m would also be reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah @joelmarcotte mentioned the same thing, we can decrease it to 5m
def report_postgres_settings(self): | ||
rows = self._collect_postgres_settings() | ||
if rows: | ||
event = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highly, highly recommend including the postgres version in this payload since settings constants hardcoded on the backend/frontend will differ across versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes i meant to do that, thanks for the reminder
The |
The |
The |
dbm
users
@@ -427,6 +427,22 @@ files: | |||
value: | |||
type: number | |||
example: 3500 | |||
- name: database_metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this nested? What does "metadata" mean to a customer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was thinking we have one config section for settings & schema, but also this job will emit resources. Do you have a better idea on how these settings should look? I'm not attached to this at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly find the term “database_metadata” confusing used as an external configuration API. Am I alone? Why not just top-level
collect_settings:
enabled: true
collection_interval: 600
which follows the spec for query metrics and samples. https://xkcd.com/927/
example: true | ||
- name: settings_collection_interval | ||
description: | | ||
Set the database settings collection interval (in minutes). Each collection involves a single query to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in minutes
None of the other collection intervals are in minutes, this seems confusing to customers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I did minutes bc ... writing 300 for the default felt weird, but I can change it to seconds
from datadog_checks.base.utils.tracking import tracked_method | ||
|
||
# default collection interval in minutes | ||
DEFAULT_COLLECTION_INTERVAL = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small python nit: adding a unit suffix _ms
_seconds
_min
is appreciated if we're not standardizing on seconds (but I suggest aligning on seconds)
DEFAULT_COLLECTION_INTERVAL = 30 | ||
|
||
PG_SETTINGS_QUERY = """ | ||
SELECT name, setting, category, short_desc FROM pg_settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kk let's just review and explain plan and latency before merge, but otherwise sounds good to me.
77c2b91
to
2c7845c
Compare
170ab4d
to
1c7ee7c
Compare
@@ -427,6 +427,22 @@ files: | |||
value: | |||
type: number | |||
example: 3500 | |||
- name: database_metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly find the term “database_metadata” confusing used as an external configuration API. Am I alone? Why not just top-level
collect_settings:
enabled: true
collection_interval: 600
which follows the spec for query metrics and samples. https://xkcd.com/927/
) | ||
|
||
# by default, send resources every 5 minutes | ||
self.collection_interval = min(300, self.pg_settings_collection_interval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of min
here? Do we not trust the configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is to set how often the check will run, if the user wants to collect settings more often than the default check iteration we have for sending resources, then we should do that
- name: collect_settings | ||
description: | | ||
Enable collection of pg_settings. Requires `dbm: true`. | ||
value: | ||
type: boolean | ||
example: false | ||
- name: settings_collection_interval | ||
description: | | ||
Set the database settings collection interval (in seconds). Each collection involves a single query to | ||
`pg_settings`. | ||
value: | ||
type: number | ||
example: 600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make these hidden to avoid early configuration and customers copying configs and messing themselves up later down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make these hidden to avoid early configuration and customers copying configs and messing themselves up later down the road.
if we do this #14577 (comment), why do you assume we are going to change this? also its an alpha feature so seems fair that we could break the config at some point if we want to?
elapsed_s = time.time() - self._time_since_last_settings_query | ||
if elapsed_s >= self.pg_settings_collection_interval and self._collect_pg_settings_enabled: | ||
self._pg_settings_cached = self._collect_postgres_settings() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@@ -716,6 +718,7 @@ def check(self, _): | |||
if self._config.dbm_enabled: | |||
self.statement_metrics.run_job_loop(tags) | |||
self.statement_samples.run_job_loop(tags) | |||
self.metadata_samples.run_job_loop(tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not saying anything new, but we are adding yet another open connection for the datadog agent for this job loop. Let's make sure this is the last one and this class is used for all additional data not collected as part of the collection interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Send pg_settings to dbm backend Co-authored-by: Alex Normand <[email protected]> 16f75ab
What does this PR do?
This feature adds a new job,
database-metadata
, which requiresdbm
. The job has two main features:pg_settings
&& forwards this to the backend to be displayed in the DBM UIAn example of the settings in our UI is here:

Motivation
There are a number of relevant use cases for this, for example: