Skip to content

Commit

Permalink
Add Prometheus metrics support to Swarm Client
Browse files Browse the repository at this point in the history
This commit adds preliminary support to the Swarm Client for
Prometheus metrics (see https://prometheus.io). Currently, only stats
about JVM usage are reported, but in the future we could expand this
to report stats specific to Swarm Client.

One reason for adding this feature is to facilitate monitoring of
Swarm Client nodes. If the Swarm Client service itself crashes, then
alertmanager (see https://github.com/prometheus/alertmanager) can be
used to send alerts about the service being down.
  • Loading branch information
nre-ableton committed Aug 17, 2020
1 parent ff52450 commit e0205f8
Show file tree
Hide file tree
Showing 5 changed files with 67 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ settings unexpectedly.
* [Changelog](CHANGELOG.md)
* [Global Security Configuration](docs/security.md)
* [Logging](docs/logging.md)
* [Prometheus](docs/prometheus.md)
* [Proxy Configuration](docs/proxy.md)

## Available Options
Expand Down Expand Up @@ -102,6 +103,9 @@ $ java -jar swarm-client.jar -help
refuse to start if this file exists
and the previous process is still
running.
-prometheusPort N : If defined, then start an HTTP
service on this port for Prometheus
metrics. (default: -1)
-retry N : Number of retries before giving up.
Unlimited if not specified. (default:
-1)
Expand Down
15 changes: 15 additions & 0 deletions client/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -166,5 +166,20 @@
<artifactId>oshi-core</artifactId>
<version>5.2.3</version>
</dependency>
<dependency>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient</artifactId>
<version>0.9.0</version>
</dependency>
<dependency>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient_hotspot</artifactId>
<version>0.9.0</version>
</dependency>
<dependency>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient_httpserver</artifactId>
<version>0.9.0</version>
</dependency>
</dependencies>
</project>
5 changes: 5 additions & 0 deletions client/src/main/java/hudson/plugins/swarm/Options.java
Original file line number Diff line number Diff line change
Expand Up @@ -203,4 +203,9 @@ public class Options {
+ " missing.",
forbids = "-disableWorkDir")
public boolean failIfWorkDirIsMissing = false;

@Option(
name = "-prometheusPort",
usage = "If defined, then start an HTTP service on this port for Prometheus metrics.")
public int prometheusPort = -1;
}
21 changes: 21 additions & 0 deletions client/src/main/java/hudson/plugins/swarm/SwarmClient.java
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
import hudson.remoting.Launcher;
import hudson.remoting.jnlp.Main;

import io.prometheus.client.exporter.HTTPServer;
import io.prometheus.client.hotspot.DefaultExports;
import org.apache.commons.codec.digest.DigestUtils;
import org.apache.commons.lang.StringUtils;
import org.apache.http.Header;
Expand Down Expand Up @@ -75,6 +77,7 @@ public class SwarmClient {
private final Options options;
private final String hash;
private String name;
private HTTPServer prometheusServer = null;

public SwarmClient(Options options) {
this.options = options;
Expand Down Expand Up @@ -104,6 +107,10 @@ public SwarmClient(Options options) {
"Problem reading labels from file " + options.labelsFile, e);
}
}

if (options.prometheusPort > 0) {
startPrometheusService(options.prometheusPort);
}
}

public String getName() {
Expand Down Expand Up @@ -635,13 +642,27 @@ private static String hash(File remoteFsRoot) {
}

public void exitWithStatus(int status) {
if (prometheusServer != null) {
prometheusServer.stop();
}
System.exit(status);
}

public void sleepSeconds(int waitTime) throws InterruptedException {
Thread.sleep(waitTime * 1000);
}

private void startPrometheusService(int port) {
try {
logger.fine("Starting Prometheus service on port " + port);
prometheusServer = new HTTPServer(port);
logger.info("Started Prometheus service on port " + port);
DefaultExports.initialize();
} catch (IOException e) {
logger.severe("Failed to start Prometheus service: " + e.getMessage());
}
}

private static class DefaultTrustManager implements X509TrustManager {

final List<String> allowedFingerprints = new ArrayList<>();
Expand Down
22 changes: 22 additions & 0 deletions docs/prometheus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Prometheus monitoring

The Jenkins Swarm Client has support for [Prometheus](https://prometheus.io) monitoring,
which can be used to scrape data from a Prometheus server. To start a Prometheus endpoint,
simply use a non-zero value for the `-prometheusPort` option when starting the client JAR.
The service will be stopped when the Swarm Client exits.

The actual metrics can be accessed at the root-level of the service page. So for example,
if the node's IP address is `169.254.10.12`, and `9100` is passed to `-prometheusPort`,
then the metrics can be accessed at: `http://169.254.10.12:9100/`.

## Data Reported

The following metrics are reported by the client:

- Basic process info, including:
- Process uptime
- CPU time consumed
- Virtual memory consumed
- Resident memory consumed
- File descriptors consumed
- JVM metrics such as CPU and memory usage, thread states, etc.

0 comments on commit e0205f8

Please sign in to comment.