Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider defaulting index.codec in metricbeat ES template to best_compression? #3141

Closed
peterskim12 opened this issue Dec 7, 2016 · 6 comments
Labels
Metricbeat Metricbeat

Comments

@peterskim12
Copy link

I could be wrong on this but I'd imagine that the vast majority of queries on data delivered by Metricbeat would be aggregations rather than for specific documents where _source would be retrieved?

If this is the case, there might not be much of a downside in defaulting the index.codec in the metricbeat.template.json to best_compression. I haven't done any testing to validate the extent of storage reduction but I could run a quick test.

@peterskim12 peterskim12 added the Metricbeat Metricbeat label Dec 7, 2016
@ruflin
Copy link
Contributor

ruflin commented Dec 9, 2016

This is definitively worth to do some testing. Interesting to know is also what would be the impact on indexing speed with best_compression enabled.

@Shaoranlaos
Copy link

Can the _source field not be deactivated completly for the metricbeat?
Like it is proposed in the docu of the source field? _source docu
Or has that consequences that i can not see?
Also applies the same to the _all field?

@ruflin
Copy link
Contributor

ruflin commented Feb 27, 2017

@Shaoranlaos Definitively discussions worth to have. One problem with disabling _source is that it will not allow to reindex the data in case for example you want to change the structure of the some old fields.

A discussion that is also related to this is the number of default shards: #3431 I think it is important that we keep it configurable as not all users will want the same. Having #3654 in the future will allow to generate the correct template on demand based on the beat config file hopefully.

@Shaoranlaos
Copy link

Yes the reindexing is a use case that i overlooked.

How is it with the _all field? Are there similar points i don´t see?

@peterskim12
Copy link
Author

@Shaoranlaos You make a good point on the _all field. The _all field is going to be removed in Elasticsearch 6.0 so even without any changes to Beats, this optimization will occur by default in the 6.0 product line. This will result in an improvement in indexing time and index size.

Pre-6.0, disabling the _all field may have downstream impact -- e.g. Kibana's handling of data that doesn't have an _all field.

@tsg
Copy link
Contributor

tsg commented Sep 5, 2017

This was already done for Metricbeat, and with #5095 it's also done for Heartbeat.

@tsg tsg closed this as completed Sep 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Metricbeat Metricbeat
Projects
None yet
Development

No branches or pull requests

4 participants