Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Netflow] Support TSDS #7549

Open
BenB196 opened this issue Aug 27, 2023 · 10 comments
Open

[Netflow] Support TSDS #7549

BenB196 opened this issue Aug 27, 2023 · 10 comments
Labels
Integration:netflow NetFlow Records Team:Security-Deployment and Devices Deployment and Devices Security team [elastic/sec-deployment-and-devices]

Comments

@BenB196
Copy link
Contributor

BenB196 commented Aug 27, 2023

Hi All,

I was curious if there are any plans to support enabling TSDS on the Netflow integration.

While this integration currently falls under the logs type. I think there would be significant value in allowing this integration to leverage TSDS.

Netflow contains a large number of metrics and generally at scale, will generate a significant number of timeseries events that need to be indexed and stored. I think that the Netflow integration would receive a significant value increase by leveraging TSDS in indexing speed, storage usage, and search/agg speed.

@elasticmachine
Copy link

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@narph narph added the Integration:netflow NetFlow Records label Aug 28, 2023
@andrewkroh
Copy link
Member

I was having the same idea, but for aws.vpcflow data which is very similar just with a smaller number of possible fields. I think we should do a test with using TSDS on one of these flow log data sources. I think storage size would be the biggest benefit.

One thing that could cause an issue (particularly in the aws.vpcflow case) is late arriving data (like if it is historical data read from S3). TSDS can only accept data that has a "recent" timestamp (see https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html#tsds-accepted-time-range).

@BenB196
Copy link
Contributor Author

BenB196 commented Aug 29, 2023

Interesting, I didn't really think about historical ("backfill") data here, but it does make sense as to consider.

I wonder if elastic/elasticsearch#98463 would help as well in this scenario. I don't really know the dynamics around use cases like importing VPC flow data, so not sure how "useable" this feature would be in a "backfill" scenario.

@narph narph added Team:Security-Linux Platform Linux Platform Security team [elastic/sec-linux-platform] and removed Team:Security-External Integrations labels Jan 29, 2024
@jamiehynds
Copy link

@andrewkroh andrewkroh added Team:Security-Deployment and Devices Deployment and Devices Security team [elastic/sec-deployment-and-devices] and removed Team:Security-Linux Platform Linux Platform Security team [elastic/sec-linux-platform] labels Feb 21, 2024
@elasticmachine
Copy link

Pinging @elastic/sec-deployment-and-devices (Team:Security-Deployment and Devices)

@pkoutsovasilis
Copy link
Contributor

hello 👋 full disclosure I just started reading about TSDS; so here are some quick ones to pick your brains @andrewkroh @BenB196

quoting from here

Only use a TSDS if you typically add metrics data to Elasticsearch in near real-time and @timestamp order.
A TSDS is only intended for metrics data. For other timestamped data, such as logs or traces, use a regular data stream.

So, if my interpretation of the above is correct, to take advantage of TSDS we need to define which fields are the metrics. At the moment there is no separation in the fields extracted from netflow input as which ones are eligible as metrics, e.g. I assume this one tcp_ack_total_count is a metric while this one ssl_server_name isn't?! Maybe a coarse-grain criteria can be the type of the field by having as metrics the ones that have a numeric type? (more info here)

quoting from here

In addition to a @timestamp, each document in a TSDS must contain one or more dimension fields. The matching index template for a TSDS must contain mappings for at least one keyword dimension.

Again, if my interpretation of the above is correct, we need to have at least one field as dimension; from having a look at the fields these can be exporter.address, exporter.source_id and exporter.version?! But what happens if these are missing?!

What are your thoughts on the above guys? 🙂

@BenB196
Copy link
Contributor Author

BenB196 commented May 31, 2024

I think one of the main challenges with TSDS going back to something @andrewkroh pointed out and that is the possibility of needing to backfill data, which doesn't have the greatest experience with TSDS.

Elastic is working on a new "LogsDB" index mode, elastic/elasticsearch#106462, which hopefully will provide many of the same benefits of TSDS, without the challenges of backfilling data. I don't think it will be 100% as efficient as TSDS could possibly be, but would still be a nice value add.

@pkoutsovasilis
Copy link
Contributor

I think one of the main challenges with TSDS going back to something @andrewkroh pointed out and that is the possibility of needing to backfill data, which doesn't have the greatest experience with TSDS.

Elastic is working on a new "LogsDB" index mode, elastic/elasticsearch#106462, which hopefully will provide many of the same benefits of TSDS, without the challenges of backfilling data. I don't think it will be 100% as efficient as TSDS could possibly be, but would still be a nice value add.

yes! LogsDB index mode sounds like the best of both worlds with an acceptable trade-off. This comes as Tech Preview in ES 8.14.0 but I think the target is reach GA in 8.15.0

@IanLee1521
Copy link
Contributor

Now that LogsDB has landed, is there thought to revisting that as an option for this data?

@andrewkroh
Copy link
Member

LogsDB should already be usable (as it is with most log data streams). You can test it out by adding index.mode: logsdb into the index template1. Every integration supports a @custom component template for adding in your own settings. This would be the place to add it so that upgrades to the package work seamlessly. Once you make changes to the index settings you need to do _rollover to create a new backing index that uses the new settings.

This is the default index template for the netflow data stream. It already references the logs-netflow.log@custom component template (out of the box the component template itself won't exist).

Image

Create the logs-netflow.log@custom component template and add the index.mode setting.

Image

Preview of the final index mapping that shows index.mode was integrated into the final settings.

Image

Footnotes

  1. https://www.elastic.co/guide/en/elasticsearch/reference/current/logs-data-stream.html#how-to-use-logsds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Integration:netflow NetFlow Records Team:Security-Deployment and Devices Deployment and Devices Security team [elastic/sec-deployment-and-devices]
Projects
None yet
Development

No branches or pull requests

8 participants