Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV when seems vault have stale state #4921

Closed
tantra35 opened this issue Nov 26, 2018 · 5 comments
Closed

SIGSEGV when seems vault have stale state #4921

tantra35 opened this issue Nov 26, 2018 · 5 comments

Comments

@tantra35
Copy link
Contributor

tantra35 commented Nov 26, 2018

Nomad version

0.8.6

Issue

We found this in our server logs

Nov 24 19:56:06 vol-cl-control-01 nomad.sh[2388]:     2018/11/24 19:56:06.640571 [ERR] http: Request /v1/acl/token/b5ff0f93-0ae9-0f77-b706-af148b36bc3e, error: rpc error: Cannot delete nonexistent tokens: b5ff0f93-0ae9-0f77-b706-af148b36
Nov 24 19:56:08 vol-cl-control-01 nomad.sh[2388]: panic: runtime error: invalid memory address or nil pointer dereference
Nov 24 19:56:08 vol-cl-control-01 nomad.sh[2388]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0xfb1d1b]
Nov 24 19:56:08 vol-cl-control-01 nomad.sh[2388]: goroutine 45 [running]:
Nov 24 19:56:08 vol-cl-control-01 nomad.sh[2388]: github.com/hashicorp/nomad/nomad.(*vaultClient).parseSelfToken(0xc42023af70, 0x1, 0x0)
Nov 24 19:56:08 vol-cl-control-01 nomad.sh[2388]:         /opt/gopath/src/github.com/hashicorp/nomad/nomad/vault.go:603 +0x9b
Nov 24 19:56:08 vol-cl-control-01 nomad.sh[2388]: github.com/hashicorp/nomad/nomad.(*vaultClient).establishConnection(0xc42023af70)
Nov 24 19:56:08 vol-cl-control-01 nomad.sh[2388]:         /opt/gopath/src/github.com/hashicorp/nomad/nomad/vault.go:428 +0x164
Nov 24 19:56:08 vol-cl-control-01 nomad.sh[2388]: github.com/hashicorp/nomad/nomad.(*vaultClient).(github.com/hashicorp/nomad/nomad.establishConnection)-fm()
Nov 24 19:56:08 vol-cl-control-01 nomad.sh[2388]:         /opt/gopath/src/github.com/hashicorp/nomad/nomad/vault.go:247 +0x2a
Nov 24 19:56:08 vol-cl-control-01 nomad.sh[2388]: github.com/hashicorp/nomad/nomad.wrapNilError.func1(0x0, 0x0)
Nov 24 19:56:08 vol-cl-control-01 nomad.sh[2388]:         /opt/gopath/src/github.com/hashicorp/nomad/nomad/vault.go:1186 +0x24
Nov 24 19:56:08 vol-cl-control-01 nomad.sh[2388]: github.com/hashicorp/nomad/vendor/gopkg.in/tomb%2ev2.(*Tomb).run(0xc420338f00, 0xc420374240)
Nov 24 19:56:08 vol-cl-control-01 nomad.sh[2388]:         /opt/gopath/src/github.com/hashicorp/nomad/vendor/gopkg.in/tomb.v2/tomb.go:153 +0x2b
Nov 24 19:56:08 vol-cl-control-01 nomad.sh[2388]: created by github.com/hashicorp/nomad/vendor/gopkg.in/tomb%2ev2.(*Tomb).Go
Nov 24 19:56:08 vol-cl-control-01 nomad.sh[2388]:         /opt/gopath/src/github.com/hashicorp/nomad/vendor/gopkg.in/tomb.v2/tomb.go:149 +0xb9
Nov 24 19:56:08 vol-cl-control-01 systemd[1]: nomad.service: Main process exited, code=exited, status=2/INVALIDARGUMENT

This happens after demaged vault in our enviroment, and when we restore it state(vault) from backup

@tantra35 tantra35 changed the title SIGSEGV when seems not nonexistent removes SIGSEGV when seems nonexistent requested Nov 26, 2018
@tantra35 tantra35 changed the title SIGSEGV when seems nonexistent requested SIGSEGV when seems vault have stale state Nov 26, 2018
@tantra35
Copy link
Contributor Author

Seems that code must be changed like this:

diff --git a/nomad/vault.go b/nomad/vault.go
index fc60075fb..05829b3d4 100644
--- a/nomad/vault.go
+++ b/nomad/vault.go
@@ -595,8 +595,9 @@ func (v *vaultClient) parseSelfToken() error {
                if err != nil {
                        return fmt.Errorf("failed to lookup Vault periodic token: %v", err)
                }
+       } else {
+               self = secret
        }
-       self = secret

        // Read and parse the fields
        var data tokenData

Because absolutely assignment self = secret looks like typo

@tantra35
Copy link
Contributor Author

tantra35 commented Nov 27, 2018

or may be changed a little bit simpler, like this:

diff --git a/nomad/vault.go b/nomad/vault.go
index fc60075fb..ca59bf356 100644
--- a/nomad/vault.go
+++ b/nomad/vault.go
@@ -588,7 +588,7 @@ func (v *vaultClient) parseSelfToken() error {
        var self *vapi.Secret

        // Try looking up the token using the self endpoint
-       secret, err := auth.LookupSelf()
+       self, err := auth.LookupSelf()
        if err != nil {
                // Try looking up our token directly
                self, err = auth.Lookup(v.client.Token())
@@ -596,7 +596,6 @@ func (v *vaultClient) parseSelfToken() error {
                        return fmt.Errorf("failed to lookup Vault periodic token: %v", err)
                }
        }
-       self = secret

        // Read and parse the fields
        var data tokenData

@tantra35
Copy link
Contributor Author

@preetapan what do you think about this?

@preetapan
Copy link
Contributor

@tantra35 thanks for reporting this. This looks similar to another sigsegv we fixed recently in https://github.com/hashicorp/nomad/pull/4904/files. The root cause is that the vendored Vault client library returns nil(

) on empty responses which we didn't handle correctly above
We are planning a 0.8.7 release so we'll fix this and will be auditing the vault code again for similar issues.

notnoop added a commit that referenced this issue Nov 29, 2018
vault: protect against empty Vault secret response

Fixes #4921

Sadly, we don't have proper mechanism to mock Vault client, so not sure how to best test this.

I inspected the Vault client interactions, specially for cases where returned value is nil even if the error is also nil.  I believe we covered all correctly now:
* [`v.client.Sys().InitStatus()`](https://github.com/hashicorp/nomad/blob/f3853f11daa51336a2d46d883b1aff6feeddc7ec/nomad/vault.go#L427) - the value is non-nil boolean
* [`v.client.Sys().CapabilitiesSelf(path)`](https://github.com/hashicorp/nomad/blob/f3853f11daa51336a2d46d883b1aff6feeddc7ec/nomad/vault.go#L812): Capabilities handles empty bodies in [`hasCapability`](https://github.com/hashicorp/nomad/blob/f3853f11daa51336a2d46d883b1aff6feeddc7ec/vendor/github.com/hashicorp/vault/api/sys_capabilities.go#L43-L45) - also the `nil` array is handled with proper fail-safe default.
* [`v.client.Logical().Read(fmt.Sprintf("auth/token/roles/%s", role))`](https://github.com/hashicorp/nomad/blob/f3853f11daa51336a2d46d883b1aff6feeddc7ec/nomad/vault.go#L834-L840) handles when `rsecret` is nil
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants