Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make metricbeat sql module more versatile #22779

Closed
dagwieers opened this issue Nov 27, 2020 · 6 comments
Closed

Make metricbeat sql module more versatile #22779

dagwieers opened this issue Nov 27, 2020 · 6 comments
Labels
enhancement Stalled Team:Service-Integrations Label for the Service Integrations team

Comments

@dagwieers
Copy link

dagwieers commented Nov 27, 2020

The metricbeat sql module is a lifesaver, we are very happy with this inclusion. \o/

But we think there are a few features that could make this even better.

Support a unique identifier for _id

We like to perform queries and updates docs in Elastic. For this we would need to provide the _id value, so it would be nice if we could configure which field is to be used as the _id value. This would allow us to have a rolling window for updating documents (e.g. every hour process the aggregated data for the previous X hours) and have it update those entries in Elastic.

I may be possible to related this to the primary key of a table. But a configurable column would be more convenient IMO

Support an identifier for the @timestamp

We already use a timestamp provided in the query as the timeField in the Index pattern, and this works very well already. But I think it could be useful to have the default @timestamp selected from the query as well so this works out of the box.

For metrics at the time of the query this offers no value, but for time-based aggregations in tables, having a way to influence @timestamp without the need to have ingest pipelines would be very useful.

Storing and using the timestamp of the last successful ingest

We are processing aggregated data grouped by hour, the first time we would like to get to the historical information (all information), but subsequent queries should only cover the most recent windows. In fact, if there would be a variable that would hold the timestamp of the last queried/ingested time we could use this to calculate the timeframe we need to process. In this case if we miss to run the queries in a timely fashion (agent was down) it would pick up where it left off (just like filebeat does when processing logfiles).

cc @amandahla @jsoriano

Originally posted by @dagwieers in #13257 (comment)

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 27, 2020
@andresrc andresrc added the Team:Services (Deprecated) Label for the former Integrations-Services team label Nov 30, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 30, 2020
@halfa
Copy link

halfa commented Nov 30, 2020

This usage (getting data from an SQL table in the form of logs) would be better in filebeat rather than metricsbeat, as the spirit is closer to how log collection operate (have a single instance of each record) rather than metrics (collection of a sensors at a certain point in time).

@jsoriano
Copy link
Member

The metricbeat sql module is a lifesaver, we are very happy with this inclusion. \o/

Thanks for your feedback, happy to see you like this module 🙂

This usage (getting data from an SQL table in the form of logs) would be better in filebeat rather than metricsbeat, as the spirit is closer to how log collection operate (have a single instance of each record) rather than metrics (collection of a sensors at a certain point in time).

Yeah, I tend to agree with this, the features requested here can be problematic in Metricbeat (specially the one about setting the @timestamp, and the one about the last successful ingest). They may be more suitable as a filebeat input, that has a registry to continue from a given timestamp. Or for this use case, maybe logstash could be used, with the JDBC input plugin.

Regarding the support for a unique identifier: something like this may be already possible by using the fingerprint processor (not tested). It can create a unique fingerprint based on a list of other fields, then you could rename it to @metadata.id, so it is used as the id of the document.
But even if possible, it can have problems. Indexes created by default with Metricbeat are not intended for this kind of documents, these indexes are rotated using ILM, and on rotation, documents with the same _id can exist on different indexes.
Also, there are some discussions about allowing indexes without _id (see elastic/elasticsearch#48699), in the future indexes/data_streams storing time series might not have _id at all.

So I don't think this should be supported in the sql module (or any other Metricbeat module), but there are workarounds that may already work using processors and custom indexes, or using Logstash.

@dagwieers
Copy link
Author

@jsoriano I get your point wrt. the id not being unique beyond the index used, but that would be a concern for the specific implementation. For some use-cases there would be no roll-over, and in our case we would also push only X historical records (e.g. the past 6 hours) so that a restart of the agent would recover from this kind of downtime.

But I agree with @halfa, that filebeat would be a better place to have such an sql module.

BTW We use a different timestamp in the definition of the index pattern and that works fine as well.

@botelastic
Copy link

botelastic bot commented Jan 27, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added the Stalled label Jan 27, 2022
@jlind23 jlind23 added Team:Service-Integrations Label for the Service Integrations team and removed Team:Services (Deprecated) Label for the former Integrations-Services team labels Mar 31, 2022
@botelastic botelastic bot removed the Stalled label Mar 31, 2022
@botelastic
Copy link

botelastic bot commented Mar 31, 2023

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Mar 31, 2023
@botelastic botelastic bot closed this as completed Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Stalled Team:Service-Integrations Label for the Service Integrations team
Projects
None yet
Development

No branches or pull requests

6 participants