Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat/Kibana/stats] exclude_usage=true by default #22654

Closed

Conversation

afharo
Copy link
Member

@afharo afharo commented Nov 18, 2020

What does this PR do?

It adds changes to the Metricbeat's kibana module: it adds a new configuration flag stats.exclude_usage (default true) that is used in the metricset stats in the method shouldCollectUsage to decide whether it should collect usage (no matter the time).

The reason for this change is to disable the usage collection via the /api/stats endpoint but to still allow the collection of the usage via explicitly setting the flag to false if necessary.

Why is it important?

We are working towards decoupling Monitoring and Telemetry in Kibana. At the moment, Monitoring holds in the .monitoring-kibana-* indices the Kibana's Usage details only to be able to report them when sending the telemetry reports. This data is already sent to the remote telemetry cluster via Kibana's local collectors, so there's no need for this duplication in the efforts. Thanks to these changes, Monitoring will only maintain in its indices the data they really need for its own features.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • Did I filled the default config in the right way?
  • Not a Go developer myself. Did I mess anything up? 😇

How to test this PR locally

Setup Metricbeat with the kibana-xpack module enabled and notice it never asks for the /api/stats?extended&legacy&exclude_usage=false request (the last query is always true by default).

Related issues

Use cases

Explained in the issue #22651

Screenshots

Logs

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 18, 2020
@afharo afharo force-pushed the metricbeat/kibana/stats/exclude_usage branch from 384ac16 to 124ecf5 Compare November 18, 2020 15:09
@afharo afharo added needs_backport PR is waiting to be backported to other branches. Feature:Stack Monitoring needs_reviewer PR needs to be assigned a reviewer review Team:Services (Deprecated) Label for the former Integrations-Services team labels Nov 18, 2020
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 18, 2020
@afharo afharo added v8.0.0 and removed needs_backport PR is waiting to be backported to other branches. needs_reviewer PR needs to be assigned a reviewer labels Nov 18, 2020
@afharo afharo force-pushed the metricbeat/kibana/stats/exclude_usage branch from 124ecf5 to e79d83a Compare November 18, 2020 15:32
@elasticmachine
Copy link
Collaborator

elasticmachine commented Nov 18, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #22654 updated]

  • Start Time: 2020-11-20T12:28:33.120+0000

  • Duration: 62 min 28 sec

Test stats 🧪

Test Results
Failed 0
Passed 2218
Skipped 505
Total 2723

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 2218
Skipped 505
Total 2723

@afharo afharo force-pushed the metricbeat/kibana/stats/exclude_usage branch from e79d83a to 0721f10 Compare November 18, 2020 15:59
@andresrc
Copy link
Contributor

cc @sayden for awareness given the ongoing changes

@afharo afharo marked this pull request as ready for review November 18, 2020 17:44
@elasticmachine
Copy link
Collaborator

Pinging @elastic/stack-monitoring (Stack monitoring)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@afharo afharo requested review from a team and chrisronline November 18, 2020 17:46
@afharo
Copy link
Member Author

afharo commented Nov 18, 2020

cc @chrisronline for awareness

@chrisronline
Copy link
Contributor

@sayden @ycombinator This change relates a larger initiative we are undertaking to decouple telemetry and monitoring. For Kibana, we will no longer collect and index usage data, as we will rely entirely in local telemetry collection within Kibana instances (which is enabled by default). See elastic/kibana#83626 for more information

@ycombinator ycombinator self-requested a review November 20, 2020 15:01
@ycombinator
Copy link
Contributor

ycombinator commented Nov 23, 2020

Hey @afharo, thanks for this PR.

I understand the reasons for this change and am generally ++ for it. However, I'd like to explore the possibility of not introducing a configuration setting to toggle this change. Instead, I'd like to explore if we could programmatically determine the right value for the exclude_usage flag based on some conditions.

With the change in this PR, the kibana/stats metricset will no longer be responsible for collecting usage stats from Kibana and indexing it into Stack Monitoring indices. And subsequently, Telemetry code in Kibana will no longer have access to these usage stats from the Stack Monitoring indices. So I imagine there is a parallel effort happening in Kibana to get the same usage stats into Telemetry some other way. Which version of Kibana is this parallel effort expected to land in? Maybe we could toggle the exclude_usage flag in the kibana/stats metricset based on this version (we do this sort of thing already in a couple of places:

// IsStatsAPIAvailable returns whether the stats API is available in the given version of Kibana
func IsStatsAPIAvailable(currentKibanaVersion *common.Version) bool {
return elastic.IsFeatureAvailable(currentKibanaVersion, StatsAPIAvailableVersion)
}
// IsSettingsAPIAvailable returns whether the settings API is available in the given version of Kibana
func IsSettingsAPIAvailable(currentKibanaVersion *common.Version) bool {
return elastic.IsFeatureAvailable(currentKibanaVersion, SettingsAPIAvailableVersion)
}
). Essentially what we'll be saying is: starting from version X, Kibana knows how to collect usage stats on its own, so no need for the kibana/stats metricset to collect it.

@afharo
Copy link
Member Author

afharo commented Nov 23, 2020

Hey @ycombinator,

Thanks for coming back to me! These are very valid points! Our intention is to ship the parallel effort in Kibana 7.11 (almost there 😅).

I'd love to use your knowledge here to deep dive on this version-dependency: The way Telemetry will use the Monitoring data depends on the Monitoring cluster's version, but not the monitored production clusters. This means that:

  • If the Monitoring cluster is 7.11 and all the production clusters are running any other version, the telemetry will be fine.
  • If the Production cluster is 7.11, but the Monitoring cluster is not at least 7.11, the reported telemetry might be incomplete.

Do you know if Metricbeat is, somehow, aware of the version of the Monitoring cluster it is sending the stats to?

FYI: Telemetry will still be fine if Monitoring data contains Kibana Usage on 7.11. So we could take this into a longer effort and wait for Metricbeat to stop reporting this data on a future version (i.e.: Make exclude_usage=false by default for now and change it in a future release).

@ycombinator
Copy link
Contributor

ycombinator commented Nov 23, 2020

Ah, thanks for clarifying the monitoring vs. monitored cluster situation here. There's no way for Metricbeat to know which monitoring cluster it's sending the data to so we can't check it's version.

Given that, I think what you have in this PR is a good approach for now - we default to the current behavior but provide a flag to turn it off in a future release. One question then becomes: do we need documentation to tell users to explicitly set the setting to false if they are running 7.11 or greater in their monitoring cluster?

Alternatively, we could just turn it off completely in this PR - i.e. no setting to toggle it - but only merge it into master, i.e. we won't backport it to 7.x. This will mean that starting 8.0, the kibana/stats metricset in Metricbeat will never collect usage and versions before that (7.x) will always collect usage (which is the current behavior).

@afharo
Copy link
Member Author

afharo commented Nov 23, 2020

Given that, I think what you have in this PR is a good approach for now - we default to the current behavior but provide a flag to turn it off in a future release.

In this PR I actually did the opposite: disabled by default, but a flag to turn it back on if needed. But I'm happy to flip the default value if you think it's safer.

One question then becomes: do we need documentation to tell users to explicitly set the setting to false if they are running 7.11 or greater in their monitoring cluster?

Given that we are doing the opposite change in this PR (for now), I think we should tell our users to explicitly set false if their monitoring cluster is 7.10 or earlier to maintain backwards compatiblity.

Alternatively, we could just turn it off completely in this PR - i.e. no setting to toggle it - but only merge it into master, i.e. we won't backport it to 7.x. This will mean that starting 8.0, the kibana/stats metricset in Metricbeat will never collect usage and versions before that (7.x) will always collect usage (which is the current behavior).

I think, ultimately, we'll complete this breaking change and remove the option flag entirely. Whether to keep collecting it until 8.0 or not, I don't have a strong opinion. I'd like to know @chrisronline's and @kobelb's opinions on that.

@ycombinator
Copy link
Contributor

ycombinator commented Nov 23, 2020

So, maybe the naming of the setting is confusing me :)

Given that we are doing the opposite change in this PR (for now), I think we should tell our users to explicitly set false if their monitoring cluster is 7.10 or earlier to maintain backwards compatiblity.

I think it's a bit odd to ask users to take some action in a minor version to maintain backwards compatibility. I think backwards compatibility should be expected without any extra actions by the user between minor version upgrades.

So I think the default value of the setting should result in the behavior we have today (which is for the kibana/stats metricset to collect usage stats). That way, users of Metricbeat 7.11 (assuming this PR makes it to that release) should not need to do anything extra right after they upgrade just to maintain backwards compatibility. Then, we also add to documentation saying: if your monitoring cluster is >= 7.11, please set X setting to Y value for so and so reason. And from a user perspective there is a compelling reason: usage stats docs can be quite large so not collecting them via Metricbeat saves memory + size on the wire.

Then in 8.0 / master we can simply remove the logic from the kibana/stats metricset to collect usage stats as well as the setting to allow users to toggle this collection. This would be a breaking change, but that's okay since it would be across a major version boundary.

@afharo
Copy link
Member Author

afharo commented Nov 23, 2020

@ycombinator sounds like a very sensible plan to me! Thank you! I'll apply those changes to this PR.

@kobelb
Copy link

kobelb commented Nov 23, 2020

It's unclear to me why changing the behavior of Metricbeat to exclude the usage would be a breaking change from the end-users perspective. The output of Kibana's stats endpoint is indexed into the .monitoring-kibana-* and end-users shouldn't be querying these documents directly. Changing this behavior should only affect the telemetry data that is sent to Elastic itself, and we're tolerant of this change in a minor version.

@ycombinator
Copy link
Contributor

ycombinator commented Nov 23, 2020

That's a good distinction and clarification, @kobelb. From the POV of a maintainer of the kibana Metricbeat module, I've been considering end-users and internal (Elastic) users both as users of this module, in that they are both affected by what data this module collects.

Asking the kibana/stats Metricbeat module to stop collecting usage telemetry is definitely not a breaking change for end-users. And if internal users are tolerant to not having usage telemetry in case an end-user runs Metricbeat 7.11+ against a monitoring cluster that's < 7.11 (as @afharo explained in #22654 (comment)), then I'm ++ removing usage telemetry collection in kibana/stats in 7.11.

In fact, I think we could simplify this PR even more then. Is there any reason we couldn't simply stop collecting usage data starting Metricbeat 7.11?

@afharo
Copy link
Member Author

afharo commented Nov 23, 2020

And if internal users are tolerant to not having usage telemetry in case an end-user runs Metricbeat 7.11+ against a monitoring cluster that's < 7.11 (as @afharo explained in #22654 (comment)), then I'm ++ removing usage telemetry collection in kibana/stats in 7.11.

AFAIK, we should be OK in that scenario. The 7.11+ (in fact, 7.2+) production Kibana instance will be reporting their locally collected telemetry anyway (that's why we are after stopping Monitoring from doing so).

In fact, I think we could simplify this PR even more then. Is there any reason we couldn't simply stop collecting usage data starting Metricbeat 7.11?

I was a bit wary of simply deleting the piece of code 😅. But if we are all happy to do so, tomorrow I'll create a new PR with the removal and we can choose which one we like best :)

Thank you both @kobelb and @ycombinator for your thoughts on this!

@afharo
Copy link
Member Author

afharo commented Nov 24, 2020

@ycombinator I've created #22732 superseding this one. It fully removes the exclude_usage=false logic

@afharo
Copy link
Member Author

afharo commented Nov 25, 2020

Closing this PR in favour of #22732

@afharo afharo closed this Nov 25, 2020
@afharo afharo deleted the metricbeat/kibana/stats/exclude_usage branch November 25, 2020 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature:Stack Monitoring Metricbeat Metricbeat review Team:Services (Deprecated) Label for the former Integrations-Services team v7.11.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Metricbeat][Kibana][stats] Stop collecting usage
7 participants