Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store and return RPC error in ACL cache entries #12885

Merged
merged 2 commits into from
May 3, 2022
Merged

Conversation

kyhavlov
Copy link
Contributor

This PR adds an Error field to the IdentityCacheEntry to preserve the error returned when attempting to resolve a token. Currently we return acl.ErrNotFound in this case which bubbles up to the client agent that made the request and the down policy ends up not being respected if it's set to allow.

@kyhavlov kyhavlov added pr/no-metrics-test pr/no-docs PR does not include docs and should not trigger reminder for cherrypicking them. labels Apr 28, 2022
@kyhavlov kyhavlov requested a review from a team April 28, 2022 17:57
@rboyer rboyer requested review from a team and kisunji and removed request for a team April 29, 2022 20:57
@kisunji kisunji self-assigned this Apr 29, 2022
Comment on lines 1286 to 1293
// Attempt to resolve the token - this should set up the cache entry with a nil
// identity and permission denied error.
authz, err := r.ResolveToken(secretID)
require.NoError(t, err)
require.NotNil(t, authz)
entry := r.cache.GetIdentity(tokenSecretCacheID(secretID))
require.Nil(t, entry.Identity)
require.Equal(t, acl.ErrPermissionDenied, entry.Error)
Copy link
Contributor

@kisunji kisunji May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original context was that after this initial ResolveToken primes the cache, the followup call would fail on acl.ErrNotFound and not apply the ACLDownPolicy (bug since it should fail on ACLRemoteError and apply the ACLDownPolicy).

I think this test needs to repeat the above chunk (you'll see that it fails the second time around because ResolveToken returns the cached error)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After applying the proposed change in acl.go the second ResolveToken call passes for me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I wasn't clear:

		authz, err := r.ResolveToken(secretID)
		require.NoError(t, err)
		require.NotNil(t, authz)
		entry := r.cache.GetIdentity(tokenSecretCacheID(secretID))
		require.Nil(t, entry.Identity)
		require.Equal(t, acl.ErrPermissionDenied, entry.Error)

		authz, err = r.ResolveToken(secretID)
		require.NoError(t, err) <- your implementation fails here but it shouldn't
		require.NotNil(t, authz)
		entry = r.cache.GetIdentity(tokenSecretCacheID(secretID))
		require.Nil(t, entry.Identity)
		require.Equal(t, acl.ErrPermissionDenied, entry.Error)

@@ -416,7 +416,7 @@ func (r *ACLResolver) resolveIdentityFromToken(token string) (structs.ACLIdentit
waitForResult := cacheEntry == nil || r.config.ACLDownPolicy != "async-cache"
if !waitForResult {
// waitForResult being false requires the cacheEntry to not be nil
return cacheEntry.Identity, nil
return cacheEntry.Identity, cacheEntry.Error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the caller has special handling for ACLRemoteError, we need to make sure we wrap the error like we do in L430, otherwise we'll miss the ACLRemoteError codepath

Suggested change
return cacheEntry.Identity, cacheEntry.Error
if cacheEntry.Error != nil && !acl.IsErrNotFound(cacheEntry.Error) {
return cacheEntry.Identity, ACLRemoteError{Err: cacheEntry.Error}
}
return cacheEntry.Identity, cacheEntry.Error

Copy link
Contributor Author

@kyhavlov kyhavlov May 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I totally missed this part about needing the return the err wrapped in an ACLRemoteError - I thought the bug was only that we weren't caching the error at all and returning an ErrNotFound when we shouldn't be

localTokens: false,
localPolicies: false,
tokenReadFn: func(_ *structs.ACLTokenGetRequest, reply *structs.ACLTokenResponse) error {
return acl.ErrPermissionDenied
Copy link
Contributor

@kisunji kisunji May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slight nit but can we return something like testErr := fmt.Errorf("network error")?

The type of error returned here isn't too important but I think using an error from the acl package (and the fact that we have special handling for a bunch of ACL error types) makes it a bit confusing.

Copy link
Contributor

@kisunji kisunji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add special handling to the newly returned errors and should update the test a bit to replicate the bug this PR aims to address

Copy link
Contributor

@kisunji kisunji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for working on this. Hopefully this cuts down on all the flakes that the bug subtly caused

@kyhavlov kyhavlov merged commit 0696ed2 into main May 3, 2022
@kyhavlov kyhavlov deleted the acl-err-cache branch May 3, 2022 17:44
@hc-github-team-consul-core
Copy link
Collaborator

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/659162.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/no-docs PR does not include docs and should not trigger reminder for cherrypicking them. pr/no-metrics-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants