Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh the verification JWK sets on demand #5003

Closed
sberyozkin opened this issue Oct 29, 2019 · 19 comments · Fixed by #9891
Closed

Refresh the verification JWK sets on demand #5003

sberyozkin opened this issue Oct 29, 2019 · 19 comments · Fixed by #9891
Assignees
Labels
area/oidc kind/enhancement New feature or request
Milestone

Comments

@sberyozkin
Copy link
Member

Description
As discussed with Paulo and Pedro, the adapter should try to optimize the token verification process by caching the JWK keys while also refreshing them periodically to support the key rotation

CC @pedroigor @stianst

@sberyozkin sberyozkin added kind/enhancement New feature or request area/oidc labels Oct 29, 2019
@stianst
Copy link
Contributor

stianst commented Oct 30, 2019

Keys don't need to be loaded periodically, but rather on-demand. Basically if there is an unknown kid try to fetch keys again. Otherwise there would be a period between keys being rotated on the server side and not being available in the client.

@eiden
Copy link

eiden commented Feb 21, 2020

Basically if there is an unknown kid try to fetch keys again

Keep in mind that this can be used as an attack vector for a DoS since the application will do a http call when it encounters an unknown key.

Otherwise there would be a period between keys being rotated on the server side and not being available in the client.

Look at the JWKS keys returned by for instance google:
https://www.googleapis.com/oauth2/v3/certs

They return two keys. When a key is rotated the following happens:

  1. Add a new key to the JWKS endpoint
  2. Wait until clients have refreshed their keys (controlled via http cache headers)
  3. Start signing new JWT's with the new key
  4. After all the JWT's signed with the old key is expired, remove the old key

By taking the JWT's expiry into account in combination with cache control headers, the issuer can rotate a key without there being a period where clients is unaware of the new key.

@sberyozkin
Copy link
Member Author

Vertx has a load JWKs method which gets the new set and I'd like to have a Timer task calling it at the configured period of time. (My initial assumption the introspection is done on every request was wrong by the way, Vertx would only do it for the opaque tokens).
Apart from this timer task doing the periodic set pulls, we can also do a single pull if no key is available right now.

@sberyozkin sberyozkin self-assigned this Mar 13, 2020
@sberyozkin sberyozkin added this to the 1.4.0 milestone Mar 13, 2020
@stianst
Copy link
Contributor

stianst commented Mar 13, 2020

You don't want to use a timer task to fetch keys. Keys are rotated at a fairly irregular time (monthly, not daily), and a timer would either be far to regular, or it could end taking way to long before new keys are fetched.

As such what you want is to do it on demand when you see a new KID as I mentioned earlier. Of course like everything else you want to have some protection against a DoS attack. A basic approach in this case is to have a configurable minimum time between each request.

@sberyozkin
Copy link
Member Author

sberyozkin commented Mar 13, 2020

@stianst Hi, sure, for the latter I was indeed thinking of having a property which even allows such an on demand request, that would be a start. Plus the other property you suggest :-)
Likewise enabling a timer would be optional, if someone sets to pull every 24 hours for example or every 2 weeks for the long running apps, it won't really make things worse I guess and might save an on demand trip. I'll CC you when PR gets ready, I won't have much problem with dropping the timer idea if it won't really look right...
Cheers

@sberyozkin
Copy link
Member Author

sberyozkin commented Mar 13, 2020

@stianst Hi Stian, by the way, can you please remind about the idea of caching the access tokens for service applications, is it even possible without a nonce ? May be checking jti can work but it would require parsing the token... Cheers

@sberyozkin
Copy link
Member Author

@stianst Yes, I see it being documented that they can be rotated just few times per the year by default :-)

@stianst
Copy link
Contributor

stianst commented Mar 16, 2020

On demand requests should by enabled by default.

I see no reason of a timer based approach in addition. What purpose does it serve? It would only serve a purpose for an application with very few requests, and with a short interval on the timer. Makes no sense IMO.

@stianst
Copy link
Contributor

stianst commented Mar 16, 2020

@sberyozkin I'm not really following what you are talking about caching the access token. Are you talking about caching token introspection endpoint results?

@sberyozkin
Copy link
Member Author

sberyozkin commented Apr 2, 2020

@stianst Right I have abandoned the timer idea for now and PR in the works to the support an on-demand pull. The tricky bit is how to test it :-).
Re the access tokens, I had some vague recollection of the discussion to do with caching something related to ATs, and now I see it was about introspection endpoint results. I recall doing something awhile back around a hawk authentication scheme which was involving checking the nonce and rate limiting how often a token could be used, but that is orthogonal.
Now that I know more about Vertx code, I see it does do the remote introspection only for the opaque tokens. So the introspection response caching, if it is to be done, would only be relevant to the opaque tokens case

@sberyozkin sberyozkin changed the title Cache the verification JWK sets Load the verification JWK sets on demand Apr 2, 2020
@sberyozkin sberyozkin changed the title Load the verification JWK sets on demand Refresh the verification JWK sets on demand Apr 2, 2020
@sberyozkin
Copy link
Member Author

sberyozkin commented Apr 2, 2020

@stianst @pedroigor Unfortunately it is not possible to implement refreshing on demand cleanly on top of Vertx in the short term since its JWK class does not record its kid header and instead uses an internal algorithm to calculate a kid property. The verification is done by iterating over all JWKs which match an algorithm which is OK perfomance wise as I don't really expect that OIDC servers would return a set with say 2 keys with RS256 for example. It might be considered a pretty sensitive change at the Vertx level to use only those JWKs whose kids match that of the current token especially if they have some cases where a set has no keys with the kid properties.
Paulo also recommended for the integrating code to decide if the set should be refreshed.
But there's no way to know if the verification has failed due to the missing JWK or due to the invalid signature.

What I'd like to do is to add a property min-jwk-refresh-interval (in days ! :-) ) and every TenantConfigContext will have a timestamp of the last time the refresh has happened for a given tenant.

if min-jwk-refresh-interval is 0 then no refresh ever happens. If it is > 0 and Vertx reports a validation exception then if there was no refresh yet at all or it happened min-jwk-refresh-interval + 1 days earlier then the keys are reloaded.

The possible problem with reloading on a generic verification failure is that if the attacker provided a token with the randomly modified signature which would be known to fail then when the good client call fails due to Vertx having no uptodate cache, quarkus-oidc will fail such a client if the refresh has happened just few mins earlier and will continue blocking until min-jwk-refresh-interval expires.

So it looks like we really need to know, before refreshing the keys, that the reason the verification has failed is because there were no key with a matching kid. I'm going to open a issue with Vertx and suggest a small improvement which would still keep the Vertx current verification logic.

@stianst sorry, I keep getting back now to the idea of the timer, I get it it will be too regular and might miss badly the recent rotation :-) but it is a low cost tool, will run every month or few months and likewise might get the latest rotation on its own thread just in time before the client with the stale token kid fails and saving it waiting for the refresh call :-). It won't harm and will be optional of course.
We need some solution now I believe before Quarkus will get upgraded to the next version of Vertx OAuth2
Also CC-ing to @eiden who has contributed to the discussion earlier.

@sberyozkin
Copy link
Member Author

Actually, if the attacker has caused a refresh just a few mins before the good client token with a stale kid arrives then there would be no harm, the good client would just get a matching JWK.
But I believe there would always be some edge cases with the min-jwk-refresh-interval anyway.
I think I can open a draft PR pretty soon, though I'll still open a Vertx issue

@stianst
Copy link
Contributor

stianst commented Apr 2, 2020

All I can do is say again that this approach is completely wrong.

When verifying a token the jwt header is important. Only keys that march kid, alg and use should be considered. Anything else is a security risk.

New keys must be fetched on demand. Otherwise you will have issues everytime keys are rotated at the idp.

@stianst
Copy link
Contributor

stianst commented Apr 2, 2020

Adding any measurements to protect against a dos with invalid keys should be in terms of minutes, not hours or days. The longer this value is the bigger risk that you are actually creating a dos yourself. Fail to fetch keys and your not fetching the new keys for days will result in service unavailability in that period of time. Fetching keys is not a massively expensive operation, especially not it you don't parse kids you already have parsed.

@sberyozkin
Copy link
Member Author

@stianst thanks, I think I indirectly came to a similar conclusion above but I did not suspect that making this refresh interval in minutes was in fact the best solution, interesting.
I'm not sure if I can influence the Vertx changes (even though I agree), one should expect I guess that any certified OIDC would not return a set with keys which can actually not be used for the verification. I'll suggest it to Paulo and I'm pretty sure that we can update Vertx to report a more informative exception which would contain a set of kids of the currently available keys and then we can check in Quarkus if the token kid matches one of them or not

@stianst
Copy link
Contributor

stianst commented Apr 2, 2020

So this is a plain bug in Vert.x, there really is no discussion about that, if you open a bug against Vert.x and it is rejected let me know and I'll comment on it, as it is important to pick the correct key from a JWKs set, not just randomly check them all. If keys where supposed to be loaded by trial-and-error they wouldn't have done a header in JWTs ;)

@sberyozkin
Copy link
Member Author

Hi Stian I'm sure the Verx guys will support us :-) as they have done before. OK, let me deal with it. thanks

@sberyozkin
Copy link
Member Author

@stianst I was not quite correct, kid is detected if it is available and calculated otherwise. But in any case it will be a straightforward fix, thanks

@stianst
Copy link
Contributor

stianst commented Apr 2, 2020

@sberyozkin that's okay then. Calculating kid is probably there for some weirdness we had in Keycloak in the olden days, can probably be removed. There's nothing standard on how a kid is generated really so can never be portable cross different IdPs and Keycloak has had the kid in the token for a long time now.

@gsmet gsmet modified the milestones: 1.4.0.CR1, 1.4.0.Final Apr 14, 2020
@gsmet gsmet modified the milestones: 1.4.0.Final, 1.5.0 Apr 22, 2020
@gsmet gsmet removed this from the 1.5.0.CR1 milestone May 19, 2020
@gsmet gsmet added this to the 1.6.0 - master milestone Jun 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/oidc kind/enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants