-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vault.agent.authenticated metric #26570
Add vault.agent.authenticated metric #26570
Conversation
The below can be used to demonstrate the new metric Generate TLS certificates
Generate configuration
Start vault
Extract unseal keys and root token
Unseal Vault
Enable approle auth
Create approle
Generate Vault Agent configuration
Start Vault Agent
Get Vault Agent Metrics
Start Modified Vault Agent
Get Vault Agent Metrics
|
b88be6e
to
aafebf7
Compare
changelog/26570.txt
Outdated
@@ -0,0 +1,3 @@ | |||
```release-note:feature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would probably qualify as an improvement!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
ah.logger.Info("renewed auth token") | ||
|
||
case <-credCh: | ||
ah.logger.Info("auth method found new credentials, re-authenticating") | ||
ah.logger.Info("autreh method found new credentials, -authenticating") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably revert this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Hi @markafarrell, thank you so much for your PR! Before we proceed - I'd love to understand your use case a little bit better. Currently, we can see when authentication has succeeded and the agent has a valid token in the server logs (ie. https://github.com/hashicorp/vault/blob/main/command/agentproxyshared/auth/auth.go#L480 ). Is there a reason that telemetry might better serve your needs than the server logs? |
@divyaac Having a metric makes it much easier to integrate with alerting tools like Prometheus alert manager. Then you can get an alert when that metric goes to zero so you can promptly act. Instead of having to look at logs to see that the agent is not authenticated |
Thanks for your response @markafarrell . I think adding this metric would make sense. After addressing the comments we should be able to get move this PR along! |
@@ -276,6 +288,8 @@ func (ah *AuthHandler) Run(ctx context.Context, am AuthMethod) error { | |||
if err != nil { | |||
ah.logger.Error("error creating client for wrapped call", "error", err, "backoff", backoffCfg) | |||
metrics.IncrCounter([]string{ah.metricsSignifier, "auth", "failure"}, 1) | |||
// Set unauthenticated when authentication fails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment can be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -478,6 +511,8 @@ func (ah *AuthHandler) Run(ctx context.Context, am AuthMethod) error { | |||
} | |||
|
|||
metrics.IncrCounter([]string{ah.metricsSignifier, "auth", "success"}, 1) | |||
// Set authenticated when authentication succeeds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment can be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -144,12 +144,18 @@ func (ah *AuthHandler) Run(ctx context.Context, am AuthMethod) error { | |||
backoffCfg := newAutoAuthBackoff(ah.minBackoff, ah.maxBackoff, ah.exitOnError) | |||
|
|||
ah.logger.Info("starting auth handler") | |||
|
|||
// Set unauthenticated when starting up | |||
metrics.SetGauge([]string{ah.metricsSignifier, "auth", "authenticated"}, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The comment can be deleted (and done for the rest of the new additions)
- The name of the metric can be
metrics.SetGauge([]string{ah.metricsSignifier, "authenticated"}, 0)
aka, we can remove the "auth" prefix from the string array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the changes are addressed we can move forward!
e336c63
to
54ebad9
Compare
f15336a
to
f60907f
Compare
f60907f
to
a6b2e25
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc updates lgtm
3bf2951
to
1eb2322
Compare
fix metric name
1eb2322
to
d1e0696
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Content update lgtm. Feel free to merge once ENG approves
Thanks for this! I chatted with @divyaac and she's approved it, I resolved the merge conflicts (I think they were mostly my fault!) and I'll try and get this merged if everything passes. Great work :D |
This is awesomesauce! This resolves a query I raised on discuss.hashicorp.com about how to use the pre-existing telemetry to effectively monitor a fleet of vault agents. |
More correctly: this resolves an issue that has made me reluctant to roll out a fleet of vault agents in the first place. |
This adds an additional metric to vault agent telemetry that allows you to see if vault agent is currently authenticated and has a valid token.
When the metric is set to 1 it means that the agent has successfully authenticated with the vault server and has a valid token.
When it is set to 0 it means that the agent does not have a valid token.
fixes #26569