-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Prometheus metrics support to Swarm Client #246
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from adding a few more JARs to the equation, I do not see anything wrong here. Could be a good addition for those who wants to do direct monitoring of the agent nodes. requesting changes, because some user documentation is needed for this new feature
1eb3414
to
e0205f8
Compare
@oleg-nenashev Thanks for the feedback! I've added some documentation in e0205f8. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@oleg-nenashev Thanks! Who is responsible for merging PRs in this repo? |
@basil is the current maintainer |
Thanks for the PR, @nre-ableton! I definitely want to see monitoring support added to Swarm. This has been requested by users since 2017 (see JENKINS-43119). I am not very familiar with this problem space, so I need to educate myself further about best practices before I commit to maintaining any specific solution. One of my open questions from a design perspective is how to support multiple reporting backends. Since Swarm is a general-purpose tool that is intended to be used in a variety of configurations, I want to make sure that it can be extended with support for multiple backends, such as Graphite, CloudWatch (requested in JENKINS-43119), Prometheus (used by your company), Datadog (used by my company), etc. To that end, I want to make sure there is a clean API/implementation separation with an API that is backend-agnostic and implementations that are backend-specific. Note that I am not necessarily asking you to implement support for a second reporting backend right now, but I am asking that the API (including any relevant command-line options) be designed with future extensibility in mind, and the best way to prove out such an API would be to implement support for a second reporting backend. I did some preliminary research on this topic and found some Java libraries like Dropwizard Metrics and Micrometer which advertise themselves as delivering such a vendor-neutral API. For example, Dropwizard can integrate with Prometheus. At face value, it seems that one of these solutions would be desirable for a general-purpose tool like Swarm that is intended to be used in a variety of configurations. But I am not sure about the details, since I have not worked with these technologies directly. Some prototyping would be helpful here. Are you familiar with any of these technologies, Nik? If so, maybe you could help me answer some of these questions and make the implementation more generic, ideally by adding support for at least a second reporting backend. If not, I will need to do some more research about this and possibly prototype some solutions using the other backends and libraries mentioned above. As this is a volunteer effort for me, it may take several weeks or months for me to get to this. |
@basil Thanks for the detailed feedback! I agree that having a more general metrics reporting framework would be beneficial for users with other metric collection platforms. I am not familiar with the libraries that you listed, but I had a quick look at their webpages and think that Micrometer looks quite promising. I'll rework this PR to use Micrometer instead. |
b2e4879
to
1ceb5b8
Compare
@basil Ok, I've reworked this PR to use Micrometer. It currently still only supports Prometheus, but of course this leaves the door open to easily support other platforms in the future should we choose. Mind having another look? Thanks! |
Hmm, it seems that CI is broken, though it was working before. I'll have a look at this. |
1ceb5b8
to
9f4ce71
Compare
Ok, CI is fixed! Mind taking another look @basil ? Thanks 👍 |
re-ping @basil, in case the notification got lost. 😄 |
Hey @nre-ableton, sorry for the delay on this. I meant to test this out but got busy with other things. It occurs to me that if there was a test in |
9f4ce71
to
b5c7496
Compare
Sounds good, I'll try to come up with some good unit tests and ping you when this PR has them. |
This commit adds preliminary support to the Swarm Client for Prometheus metrics (see https://prometheus.io), by way of Micrometer (see https://micrometer.io). Currently, only process resource usage and JVM statistics are reported, but in the future we could expand this to report stats specific to Swarm Client. One reason for adding this feature is to facilitate monitoring of Swarm Client nodes. If the Swarm Client service itself crashes, then alertmanager (see https://github.com/prometheus/alertmanager) can be used to send alerts about the service being down. At the moment, only Prometheus is configured as an endpoint, but in the future other metrics platforms could also be supported.
b5c7496
to
3131c1b
Compare
@basil Ok, I've added a unit test which attempts to fetch the metrics page from the client after starting it up. As it stands, it's more of a sanity test of the feature rather than a comprehensive test. We could add separate unit tests to try to get different metrics names for the various reporters, but I worry that this will be a bit brittle and prone to breakage since these names change periodically. Should I add any additional tests, or is this one sufficient? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks for adding the integration test. This test is sufficient. Just a few more minor comments and this should be ready to merge.
plugin/src/test/java/hudson/plugins/swarm/SwarmClientIntegrationTest.java
Outdated
Show resolved
Hide resolved
client/pom.xml
Outdated
<dependency> | ||
<groupId>io.micrometer</groupId> | ||
<artifactId>micrometer-registry-prometheus</artifactId> | ||
<version>1.5.4</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order for Dependabot to be able to automatically update this library, the version number needs to be factored out into a Maven property as in jenkinsci/email-ext-plugin#229. Please also update this to the latest version (1.6.0).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 0c282d7.
client/pom.xml
Outdated
<dependency> | ||
<groupId>io.micrometer</groupId> | ||
<artifactId>micrometer-registry-prometheus</artifactId> | ||
<version>1.5.4</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order for Dependabot to be able to automatically update this library, the version number needs to be factored out into a Maven property as in jenkinsci/email-ext-plugin#229. Please also update this to the latest version (1.6.0).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 0c282d7.
Co-authored-by: Basil Crow <[email protected]>
Co-authored-by: Basil Crow <[email protected]>
Co-authored-by: Basil Crow <[email protected]>
@basil Thanks for the suggestions; I've applied them as fixups to the branch and also made another fixup for the Micrometer version in |
@basil thanks! 👍 |
Thank you for the PR! If you can confirm that the final build is working then I'll go ahead and cut a release of this. |
@basil ok, I just did a quick test with the JAR you linked to from that Jenkins server, and the Swarm Client itself still seems to work, and the new Prometheus feature is working as well. So I think that we're good to go as far as a release is concerned. |
Released in 3.24. |
This commit adds preliminary support to the Swarm Client for
Prometheus metrics (see https://prometheus.io). Currently, only stats
about JVM usage are reported, but in the future we could expand this
to report stats specific to Swarm Client.
One reason for adding this feature is to facilitate monitoring of
Swarm Client nodes. If the Swarm Client service itself crashes, then
alertmanager (see https://github.com/prometheus/alertmanager) can be
used to send alerts about the service being down.