Make metricbeat sql module more versatile #22779

dagwieers · 2020-11-27T17:27:27Z

The metricbeat sql module is a lifesaver, we are very happy with this inclusion. \o/

But we think there are a few features that could make this even better.

Support a unique identifier for `_id`

We like to perform queries and updates docs in Elastic. For this we would need to provide the _id value, so it would be nice if we could configure which field is to be used as the _id value. This would allow us to have a rolling window for updating documents (e.g. every hour process the aggregated data for the previous X hours) and have it update those entries in Elastic.

I may be possible to related this to the primary key of a table. But a configurable column would be more convenient IMO

Support an identifier for the `@timestamp`

We already use a timestamp provided in the query as the timeField in the Index pattern, and this works very well already. But I think it could be useful to have the default @timestamp selected from the query as well so this works out of the box.

For metrics at the time of the query this offers no value, but for time-based aggregations in tables, having a way to influence @timestamp without the need to have ingest pipelines would be very useful.

Storing and using the timestamp of the last successful ingest

We are processing aggregated data grouped by hour, the first time we would like to get to the historical information (all information), but subsequent queries should only cover the most recent windows. In fact, if there would be a variable that would hold the timestamp of the last queried/ingested time we could use this to calculate the timeframe we need to process. In this case if we miss to run the queries in a timely fashion (agent was down) it would pick up where it left off (just like filebeat does when processing logfiles).

cc @amandahla @jsoriano

Originally posted by @dagwieers in #13257 (comment)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-11-30T06:16:28Z

Pinging @elastic/integrations-services (Team:Services)

halfa · 2020-11-30T12:28:29Z

This usage (getting data from an SQL table in the form of logs) would be better in filebeat rather than metricsbeat, as the spirit is closer to how log collection operate (have a single instance of each record) rather than metrics (collection of a sensors at a certain point in time).

jsoriano · 2020-11-30T13:16:06Z

The metricbeat sql module is a lifesaver, we are very happy with this inclusion. \o/

Thanks for your feedback, happy to see you like this module 🙂

This usage (getting data from an SQL table in the form of logs) would be better in filebeat rather than metricsbeat, as the spirit is closer to how log collection operate (have a single instance of each record) rather than metrics (collection of a sensors at a certain point in time).

Yeah, I tend to agree with this, the features requested here can be problematic in Metricbeat (specially the one about setting the @timestamp, and the one about the last successful ingest). They may be more suitable as a filebeat input, that has a registry to continue from a given timestamp. Or for this use case, maybe logstash could be used, with the JDBC input plugin.

Regarding the support for a unique identifier: something like this may be already possible by using the fingerprint processor (not tested). It can create a unique fingerprint based on a list of other fields, then you could rename it to @metadata.id, so it is used as the id of the document.
But even if possible, it can have problems. Indexes created by default with Metricbeat are not intended for this kind of documents, these indexes are rotated using ILM, and on rotation, documents with the same _id can exist on different indexes.
Also, there are some discussions about allowing indexes without _id (see elastic/elasticsearch#48699), in the future indexes/data_streams storing time series might not have _id at all.

So I don't think this should be supported in the sql module (or any other Metricbeat module), but there are workarounds that may already work using processors and custom indexes, or using Logstash.

dagwieers · 2020-11-30T15:32:46Z

@jsoriano I get your point wrt. the id not being unique beyond the index used, but that would be a concern for the specific implementation. For some use-cases there would be no roll-over, and in our case we would also push only X historical records (e.g. the past 6 hours) so that a restart of the agent would recover from this kind of downtime.

But I agree with @halfa, that filebeat would be a better place to have such an sql module.

BTW We use a different timestamp in the definition of the index pattern and that works fine as well.

botelastic · 2022-01-27T13:48:08Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

botelastic · 2023-03-31T08:55:45Z

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 27, 2020

andresrc added the Team:Services (Deprecated) Label for the former Integrations-Services team label Nov 30, 2020

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 30, 2020

andresrc added the enhancement label Dec 1, 2020

botelastic bot added the Stalled label Jan 27, 2022

jlind23 added Team:Service-Integrations Label for the Service Integrations team and removed Team:Services (Deprecated) Label for the former Integrations-Services team labels Mar 31, 2022

botelastic bot removed the Stalled label Mar 31, 2022

botelastic bot added the Stalled label Mar 31, 2023

botelastic bot closed this as completed Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make metricbeat sql module more versatile #22779

Make metricbeat sql module more versatile #22779

dagwieers commented Nov 27, 2020 •

edited

Loading

elasticmachine commented Nov 30, 2020

halfa commented Nov 30, 2020

jsoriano commented Nov 30, 2020

dagwieers commented Nov 30, 2020

botelastic bot commented Jan 27, 2022

botelastic bot commented Mar 31, 2023

Make metricbeat sql module more versatile #22779

Make metricbeat sql module more versatile #22779

Comments

dagwieers commented Nov 27, 2020 • edited Loading

Support a unique identifier for _id

Support an identifier for the @timestamp

Storing and using the timestamp of the last successful ingest

elasticmachine commented Nov 30, 2020

halfa commented Nov 30, 2020

jsoriano commented Nov 30, 2020

dagwieers commented Nov 30, 2020

botelastic bot commented Jan 27, 2022

botelastic bot commented Mar 31, 2023

dagwieers commented Nov 27, 2020 •

edited

Loading

Support a unique identifier for `_id`

Support an identifier for the `@timestamp`