Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recurring redirects in OIDC against Azure AD #20363

Closed
artkonr opened this issue Sep 23, 2021 · 15 comments · Fixed by #20431
Closed

Recurring redirects in OIDC against Azure AD #20363

artkonr opened this issue Sep 23, 2021 · 15 comments · Fixed by #20431
Labels
area/oidc kind/bug Something isn't working
Milestone

Comments

@artkonr
Copy link

artkonr commented Sep 23, 2021

Describe the bug

Hey team,

I am trying to enable OIDC security for my web app (authorization_code flow) with Azure AD being the identity provider. The problem is that endpoint declaration like below fails with recurring redirects:

@GET
@RolesAllowed("admin-role")
public Response doSmth() {}

The error flow goes as follows:

  1. The request gets routed to the endpoint.
  2. Authentication mechanism kicks in.
  3. I get first proper redirect to the login page of the IDP, enter the data and press "ok".
  4. Redirects start firing endlessly and are effectively killed by Chrome, endpoint returns 401.

The logs aren't too helpful and look like this:

2021-09-23 18:07:40,537 DEBUG [org.jbo.res.res.i18n] (executor-thread-0) RESTEASY002315: PathInfo: /tellme/admin
2021-09-23 18:07:40,538 DEBUG [io.qua.oid.run.CodeAuthenticationMechanism] (executor-thread-0) Authentication request redirect_uri parameter: http://localhost:8080/tellme/admin
2021-09-23 18:07:40,538 DEBUG [io.qua.oid.run.CodeAuthenticationMechanism] (executor-thread-0) q_auth cookie 'max-age' parameter is set to 1800
2021-09-23 18:07:40,722 DEBUG [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-4) Token request redirect_uri parameter: http://localhost:8080/tellme/admin
2021-09-23 18:07:41,068 DEBUG [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-4) q_session cookie 'max-age' parameter is set to 3900
2021-09-23 18:07:41,068 DEBUG [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-4) Final redirect URI: http://localhost:8080/tellme/admin
2021-09-23 18:07:41,076 DEBUG [org.jbo.res.res.i18n] (executor-thread-0) RESTEASY002315: PathInfo: /tellme/admin
2021-09-23 18:07:41,077 DEBUG [io.qua.oid.run.CodeAuthenticationMechanism] (executor-thread-0) Authentication request redirect_uri parameter: http://localhost:8080/tellme/admin
2021-09-23 18:07:41,077 DEBUG [io.qua.oid.run.CodeAuthenticationMechanism] (executor-thread-0) q_auth cookie 'max-age' parameter is set to 1800
2021-09-23 18:07:41,234 DEBUG [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-4) Token request redirect_uri parameter: http://localhost:8080/tellme/admin
2021-09-23 18:07:41,578 DEBUG [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-4) q_session cookie 'max-age' parameter is set to 3900
2021-09-23 18:07:41,579 DEBUG [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-4) Final redirect URI: http://localhost:8080/tellme/admin
2021-09-23 18:07:42,856 DEBUG [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-4) The state cookie is missing after a redirect from IDP, authentication has failed

It seems to me like something goes wrong with how redirect_uri parameter is passed and treated by Azure. The value of the parameter seems to be valid and does not change between redirects, though.

Expected behavior

Once I log in on the IDP page, the browser redirects me to a URI set in the redirect_uri parameter, which should match the endpoint I hit.

Actual behavior

Once I log in on the IDP page I get redirected to the endpoint I hit.

How to Reproduce?

To reproduce:

1. Create a simple quarkus app with Resteasy and OIDC

The only thing that matters is that the app's controller class should have an RBAC-protected endpoint.

The contents of application.properties:

quarkus.oidc.auth-server-url=https://login.microsoftonline.com/TENANT_ID/v2.0
quarkus.oidc.application-type=web_app
quarkus.oidc.client-id=CLIENT_ID
quarkus.oidc.credentials.secret=CLIENT_SECRET
quarkus.oidc.authentication.scopes=offline_access

2. Create an App in Azure, create the roles and permissions.

In Azure Portal:

  1. Create a new Azure app at Azure Active Directory -> App Registrations -> New Registration.
  2. From the screen at Azure Active Directory -> App Registrations -> "your-app-name"
    • copy "Application ID" to CLIENT_ID, "Directory ID" to TENANT_ID placeholders in application.properties above.
    • press "Client credentials", generate a secret and replace CLIENT_SECRET placeholder.
    • press "Redirect URIs" and enter the full URI to the RBAC-protected endpoint in your app.
  3. Go to Azure Active Directory -> App Registrations -> "your-app-name" -> App Roles and create at least one app role. The "Value" attribute will be the name of the role included in the ID JWT.
  4. Navigate to Azure Active Directory -> Enterprise Applications -> "your-app-name" -> Users and Groups -> Add User/Group and add your user account to the role created in p.2.
  5. Go to Azure Active Directory -> App Registrations -> "your-app-name" -> API Permissions and press "Grant admin consent".

3. Enter the name of the created role into @RolesAllowed on the protected endpoint.
4. Hit the endpoint.

Output of uname -a or ver

Darwin C8049 20.5.0 Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64 x86_64

Output of java -version

openjdk version "16.0.1" 2021-04-20 OpenJDK Runtime Environment (build 16.0.1+9-24) OpenJDK 64-Bit Server VM (build 16.0.1+9-24, mixed mode, sharing)

GraalVM version (if different from Java)

not used

Quarkus version or git rev

2.2.3.Final

Build tool (ie. output of mvnw --version or gradlew --version)

Apache Maven 3.8.1 (05c21c65bdfed0f71a2f2ada8b84da59348c4c5d)

Additional information

No response

@artkonr artkonr added the kind/bug Something isn't working label Sep 23, 2021
@quarkus-bot
Copy link

quarkus-bot bot commented Sep 23, 2021

/cc @pedroigor, @sberyozkin

@sberyozkin
Copy link
Member

sberyozkin commented Sep 23, 2021

Hi @artkonr Hmm... A few Azure users have confirmed it was working for them and I thought we have fixed all the redirect loops problems.

The last message explains the problem, the state cookie set during the initial redirect is not visible when the user is redirected back to Quarkus so it is impossible to confirm Quarkus initiated it - I'm surprised it is happening since the cookie path is set to / by default, but perhaps it is also a different domain ?
Please see https://quarkus.io/guides/security-openid-connect-web-authentication#oidc-cookies and experiment with setting cookie path and domain.
But in any case it must be an immediate 401 without the browser terminating the loop

@artkonr
Copy link
Author

artkonr commented Sep 24, 2021

Hi @sberyozkin,

Thanks for the info.

Odd bit here is that in my case, the setup is probably the simplest one, no overlapping roots and the app itself runs on localhost only. Besides, I got a feeling that this last statement in the log is actually an evidence of Chrome breaking the redirect loop, as if denying the state cookie is part of the recursion control.

I also tried setting .cookie-domain and .cookie-path - to little avail.

You mentioned other users reporting similar behavior, could you share the issue details, please? Maybe there could be something I could try as well.

Interesting though, I tried with another tenant on Azure and the behavior was slightly different: the redirects went on until Azure returned "error signing in" page.

@sberyozkin
Copy link
Member

sberyozkin commented Sep 24, 2021

@artkonr Have a look please at https://github.com/quarkusio/quarkus/issues?q=is%3Aissue+is%3Aclosed+azure, a few of them are there.

I got a feeling that this last statement in the log is actually an evidence of Chrome breaking the redirect loop, as if denying the state cookie is part of the recursion control.

Unlikely IMHO, the browser is redirecting the user back to Quarkus which fails the authentication - the problem the exception it throws is not recognized as the authentication completion failure but a failure requiring a challenge so Quarkus again sends a redirect to the OIDC provider for the user to authenticate - Azure has already authenticated this user, redirects back to Quarkus, etc. I'll look at fixing the problem with the repeated redirect attempts on the Quarkus end - but the underlying problem is about the cookie not being available.

Can you give me a favor please and create a test user/account for me to try ? You can ping me privately at https://quarkusio.zulipchat.com/ with the test account connection details or please type them here if it would be a very short term test account.

@artkonr
Copy link
Author

artkonr commented Sep 24, 2021

@sberyozkin, gotcha, just pinged you in Zulip.

@artkonr
Copy link
Author

artkonr commented Sep 24, 2021

OK, I have it.

  1. I hit the endpoint.
  2. I get redirected to auth page of Azure AD and log in.
  3. Azure redirects to my initial target endpoint with authorization code.
  4. Further exchange returns ID token.
  5. Quarkus attempts to save ID JWT into q_session cookie.
  6. Browser rejects the attempt, because in my case the ID token value is simply too long (4900 characters vs. 4096 which is permitted by all browsers to be max cookie value length).
  7. With the cookie not set, Quarkus cannot read ID JWT and tries to get it again and so the loop begins.
  8. At the end of the day, Chrome simply kills the redirect loop, Quarkus sees that the authentication is not complete, because the q_session is not set. Hence the 401.

Theoretically, splitting ID token in parts and saving in separate cookies could solve the issue for the cases when the token is too big. This should IMHO be considered a valid case, because JWT can contain arbitrary amount of data and may vary between tenants even for one IDP.

For the record, OIDC web-app example is perfectly valid, because the realm settings for the Keycloak produce a token only ~2700 chars long.

@sberyozkin
Copy link
Member

Hi @artkonr Nice work, thanks for this investigation.

We have an option to customize the way the session data is represented, https://quarkus.io/guides/security-openid-connect-web-authentication#tokenstatemanager.
We can even restrict it to ID token only by default.
But if a single ID token > 4096 then it is a problem which can only be addressed by a custom manager - please use a basic map as POC for now, the key can be a random number, value - AuthorizationCodeTokens.

In meantime we'll need to think about offering more options. One idea is to ship a database manager which will store the tokens in a database, but it should not be a default. @pedroigor has proposed an option the other day, which, as far as I recall, was about keeping the few most relevant ID token claims only, I'll explore it

Thanks

@sberyozkin
Copy link
Member

@artkonr By the way, I may've misread your conclusion. It looks like you refer to the q_session's cookie length being > 4096.
But by default q_session != ID token. It is ID token + access token + refresh token, please check the docs I linked to.

I think that if you don't expect to propagate the access token to the downstream service or use to request UserInfo - usually IdToken itself has all the details, then quarkus.oidc.token-state-manager.strategy=id-refresh-token setting will do for you, as it s worth keeping a refresh token to refresh the session.

Can you try it please ? I'm nearly sure now it will fix it for you.
I'd still be interested in exploring creating more compact cookies by default but it can be done as part of a separate enhancement request.
Here, we'd need to add a warning when q_session length is > 4096 and advise what alternatives are available in that warning...

@artkonr
Copy link
Author

artkonr commented Sep 25, 2021

Oh, sure, I'll give it a shot, haven't thought of that, thank you!

@artkonr
Copy link
Author

artkonr commented Sep 27, 2021

@sberyozkin, I tried a different token management strategy as you proposed, worked like a charm that, thank you very much! I guess I just didn't see the token value delimiter in the cookie value, did not connect it to the code flow I saw.

I'm curious, though, if this has any performance implications, would not this force the app to always re-request the access token?

@sberyozkin
Copy link
Member

@artkonr Thanks for verifying the alt strategy works.

I'm curious, though, if this has any performance implications, would not this force the app to always re-request the access token?

No, in the code flow ID token is a primary one, the access token is not used by Quarkus unless: 1) the endpoint needs to do a remote request to get UserInfo in addition to what one gets in IdToken 2) Quarkus endpoint, lets call it ServiceA needs to propagate it downstream to Service B (via HTTP Authorization) 3) RBAC is needed but IdToken can not be set up to have the roles and only AT can have them.

So unless 1)-3) requirements have to be met it is not needed and avoiding storing it by default would even make things a bit faster (fewer bytes to carry in the cookie value :-) ).

But there will be cases where 1)-3) have to be met and the default cookie value is > 4096.
In such cases a custom token manager would always help - but we'll try to offer some more options for it to work OOB so that if 1)-3) has to be met and q_session > 4096 - then, optionally, the id token is automatically trimmed.

What I'll do I'll resolve this issue with a PR which will add a warning when the cookie is > 4096 so it can be much easier to trace it :-), and will open a new enhancement request to optimize ID tokens

@artkonr
Copy link
Author

artkonr commented Sep 27, 2021

Oh, OK, it's good that you mentioned the downstream service case, which is effectively the case I will be pursuing: I need Azure AD access to query MS Graph API, which is authenticated with an access token, so it looks like changing of the strategy won't cut it. Guess I'll have to look into overriding TokenStateManager.

I guess a warning would indeed be a good thing to have at least for the time being. Thank you for your assistance!

UPD: actually I feel a bit stupid, because the case of having a large cookie is mentioned on the OIDC doc page :D

@sberyozkin
Copy link
Member

sberyozkin commented Sep 27, 2021

@artkonr I see, you can use https://quarkus.io/guides/security-openid-connect-client#token-propagation for propagating the access token acquired via the code flow.
Np at all - OIDC docs have to be improved, there is even an Epic issue for that, hopefully we'll get to it sooner rather than later.

Oh, I forgot, setting only quarkus.oidc.token-state-manager.split-tokens=true will definitely fix it for you, all 3 tokens will be kept as individual cookies and there will be no need to create a custom manager. See, you missed a few lines in a large OIDC doc and I forgot what I wrote there :-)

So I won't even have to try to optimize the token - as it won't work if the logout is requested and an id token hint has to be provided or if the endpoint code needs to access the claims which were deemed redundant - can be brittle.

Please let me know if quarkus.oidc.token-state-manager.split-tokens=true alone resolves the problem

@artkonr
Copy link
Author

artkonr commented Sep 27, 2021

@sberyozkin , I tried it and it worked immediately, I needed to tweak the claim path to make the entire thing work, do I'm all set now. Thank you so much for the help!

@sberyozkin
Copy link
Member

@artkonr Np at all, glad it worked. Thanks for highlighting how hard it can be to trace the problem related to the cookie size, at least one user did investigate it before, which is why a token state manager was introduced but I did not guess back then to add a warning along the way...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/oidc kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants