Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nats module improvements #22445

Merged
merged 20 commits into from
Nov 17, 2020
Merged

Conversation

ChrsMark
Copy link
Member

@ChrsMark ChrsMark commented Nov 5, 2020

What this PR adds

This PR adds more metrics at connections and routes metricsets. Prior to this no metrics have been collected for each of the connections and for each of the routes. With this PR we collect metrics like in/out bytes etc for each the connections and for each of the routes. In order to have routes in a NATS server a more than 1 NATS servers should be presented creating a NATS cluster. For this docker-compose file of the module adds an extra NATS instance so as populate routes and their respective metrics.

Why it is important

It related to elastic/integrations#359. Next steps would be to enhance/improve dashboards.

How to test this manually

  1. Start the nats cluster using docker-compose file of nats module
  2. Enable nats metricbeat module connections and routes metricsets and point them to the monitoring port of the nats container (exported port of 8222 internal)
  3. Make sure that connection and route details are being collected.

Signed-off-by: chrismark <[email protected]>
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 5, 2020
@ChrsMark ChrsMark added the Team:Platforms Label for the Integrations - Platforms team label Nov 5, 2020
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 5, 2020
@ChrsMark ChrsMark self-assigned this Nov 5, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Nov 5, 2020

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 2247
Skipped 508
Total 2755

@elasticmachine
Copy link
Collaborator

elasticmachine commented Nov 5, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #22445 updated, Started by user Chris Mark, Rebuilds build #21]

  • Start Time: 2020-11-17T09:07:03.019+0000

  • Duration: 82 min 41 sec

Test stats 🧪

Test Results
Failed 0
Passed 2253
Skipped 508
Total 2761

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 2253
Skipped 508
Total 2761

Signed-off-by: chrismark <[email protected]>
Signed-off-by: chrismark <[email protected]>
Signed-off-by: chrismark <[email protected]>
Signed-off-by: chrismark <[email protected]>
Signed-off-by: chrismark <[email protected]>
@ChrsMark ChrsMark marked this pull request as ready for review November 10, 2020 11:09
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

Signed-off-by: chrismark <[email protected]>
@ChrsMark ChrsMark added [zube]: In Review needs_reviewer PR needs to be assigned a reviewer review labels Nov 11, 2020
Signed-off-by: chrismark <[email protected]>
@jsoriano jsoriano self-requested a review November 11, 2020 11:00
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that a metricset that collects events per connection shouldn't be enabled by default.

With this change, we are modifying the behaviour of a default metricset from collecting a summary of metrics for all the connections, to collect one event per connection. I guess that a production nats server can have hundreds or thousands of connections at a given time, would this generate such number of events per fetch period?

I think that the summary, and the metrics per connection should be in different metricsets.

I see that we also collect the number of connections in the stats metricset (assuming that nats.stats.total_connections is the same thing).

Maybe a plan could be to:

  • Leave the summarized metrics (the total count would be the only one at least by now) in the stats metricset.
  • Deprecate the connections metricset in 7.x/7.11, and remove it in master/8.0, the only metric it collects is also collected by the stats metricset.
  • Add a new, non-default connection metricset, that collects the metrics per connection. Other reason to move from connections to connection is that singular namespaces is usually preferred in metric names (e.g: system.process..., postgresql.statement...).

For the routes we could follow a similar plan, to deprecate routes and create a new non-default route metricset, but we would first need to add the count of routes to the stats metricset if possible.

metricbeat/module/nats/_meta/run.sh Outdated Show resolved Hide resolved
metricbeat/module/nats/docker-compose.yml Outdated Show resolved Hide resolved
metricbeat/module/nats/docker-compose.yml Outdated Show resolved Hide resolved
metricbeat/module/nats/_meta/run.sh Show resolved Hide resolved
@ChrsMark
Copy link
Member Author

Added 2 new extra metricsets, disabled by default, to collect metrics per connection/route. After this we can deprecate connections and routes metricsets adding their sum under stats metricset.

@jsoriano jsoriano self-requested a review November 12, 2020 11:47
Signed-off-by: chrismark <[email protected]>
metricbeat/module/nats/_meta/config.reference.yml Outdated Show resolved Hide resolved
metricbeat/module/nats/_meta/config.yml Outdated Show resolved Hide resolved
metricbeat/module/nats/connection/data.go Outdated Show resolved Hide resolved
metricbeat/module/nats/route/data.go Outdated Show resolved Hide resolved
metricbeat/module/nats/util/util.go Show resolved Hide resolved
Signed-off-by: chrismark <[email protected]>
@ChrsMark ChrsMark requested a review from jsoriano November 12, 2020 16:39
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for addressing all the comments. I think we could keep nats.server.time as deprecated instead of removing it, but I am also ok with removing it now, I don't think this field is used much.

@@ -13,7 +13,3 @@
type: keyword
description: >
The server ID
- name: server.time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to keep this in existing modules to avoid breaking changes, though I don't think this field is going to be missed.

if err != nil {
return errors.Wrap(err, "failure applying module schema")
}
timestamp, err := util.GetNatsTimestamp(moduleFields)
moduleFields.Delete("server.time")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to keep this value to avoid a smallish breaking change.

@ChrsMark ChrsMark requested a review from a team as a code owner November 17, 2020 08:38
@ChrsMark ChrsMark merged commit 70ff0a7 into elastic:master Nov 17, 2020
ChrsMark added a commit to ChrsMark/beats that referenced this pull request Nov 17, 2020
ChrsMark added a commit that referenced this pull request Nov 20, 2020
* Nats module improvements (#22445)


(cherry picked from commit 70ff0a7)

* Update CHANGELOG.next.asciidoc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs_reviewer PR needs to be assigned a reviewer review Team:Platforms Label for the Integrations - Platforms team v7.11.0 [zube]: In Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants