-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat][broker] OneStageAuth State: move authn out of constructor #19295
[feat][broker] OneStageAuth State: move authn out of constructor #19295
Conversation
/pulsarbot rerun-failure-checks |
// Authentication is already completed | ||
return CompletableFuture.completedFuture(null); | ||
} | ||
this.authenticationDataSource = new AuthenticationDataCommand( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one question:
When having AuthChallenge
, do we need to renew AuthenticationState
?
It looks like so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a class for single stage authentication, there are no AuthChallenges
. The primary reason to set this here instead of the constructor is that we set it with the authData
that we authenticate. Note that this class always has isExpired
return false
, which means there are never any AuthChallenges
generated.
If this were a multi-stage auth, this solution would not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your means that a single-stage authentication doesn't refresh authentication data, if so, when having AuthChallenge
, we must new an AuthenticationState
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single-stage authentication doesn't refresh authentication data
Yes, that is what I mean. However, I don't agree that it means we need to create a new AuthenticationState
.
My understanding of the the protocol is that the AuthenticationState
method isExpired
returns true
, Pulsar will then call refreshAuthentication()
to get the bytes to send to the client. The client responds with AuthData
, and Pulsar passes the response to authenticate(AuthData)
. Since isExpired
is always false for OneStateAuthenticationState
, there is no expectation of getting an AuthChallenge
back. The single stage authentication is truly one stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the failing tests related to these changes?
@lhotari - yes, that proxy test is failing because of my changes. Pushing up a fix now. Note that the fix includes a new call to |
Looks like
|
Looks like the error was transient. I am not sure what would have led to that error. @lhotari - PTAL |
That's a known issue, #13620 (comment) . Mockito is not thread safe. |
PIP: #12105 ### Motivation When authentication fails in the `ServerCnx`, the state is left in `Start` if the primary `authData` fails authentication and in `Connecting` or `Connected` if the `originalAuthData` authentication fails. To prevent any kind of unexpected behavior, we should go to `Failed` state. Note that the tests verify the current behavior where a failed `originalAuthData` results first in a `Connected` command from the broker and then an `Error` command. I documented that I think this is sub optimal here #19311. ### Modifications * Update `ServerCnx` state to `Failed` when there is an authentication exception during `handleConnect` and during `handleAuthResponse`. * Update `handleAuthResponse` reply to `"Unable to authenticate"` instead of the `AuthenticationState` exception. ### Verifying this change A new test is added. The added test covers the change made in #19295 where we updated `ServerCnx` so that we call `AuthState#authenticate` instead of relying on the implementation detail that the initialization calls `authenticate`. That PR should have added a test. ### Does this pull request potentially affect one of the following parts: This is not a breaking change. ### Documentation - [x] `doc-not-needed` ### Matching PR in forked repository PR in forked repository: michaeljmarshall#18
…he#19312) PIP: apache#12105 When authentication fails in the `ServerCnx`, the state is left in `Start` if the primary `authData` fails authentication and in `Connecting` or `Connected` if the `originalAuthData` authentication fails. To prevent any kind of unexpected behavior, we should go to `Failed` state. Note that the tests verify the current behavior where a failed `originalAuthData` results first in a `Connected` command from the broker and then an `Error` command. I documented that I think this is sub optimal here apache#19311. * Update `ServerCnx` state to `Failed` when there is an authentication exception during `handleConnect` and during `handleAuthResponse`. * Update `handleAuthResponse` reply to `"Unable to authenticate"` instead of the `AuthenticationState` exception. A new test is added. The added test covers the change made in apache#19295 where we updated `ServerCnx` so that we call `AuthState#authenticate` instead of relying on the implementation detail that the initialization calls `authenticate`. That PR should have added a test. This is not a breaking change. - [x] `doc-not-needed` PR in forked repository: #18 (cherry picked from commit 8049690)
…he#19312) PIP: apache#12105 When authentication fails in the `ServerCnx`, the state is left in `Start` if the primary `authData` fails authentication and in `Connecting` or `Connected` if the `originalAuthData` authentication fails. To prevent any kind of unexpected behavior, we should go to `Failed` state. Note that the tests verify the current behavior where a failed `originalAuthData` results first in a `Connected` command from the broker and then an `Error` command. I documented that I think this is sub optimal here apache#19311. * Update `ServerCnx` state to `Failed` when there is an authentication exception during `handleConnect` and during `handleAuthResponse`. * Update `handleAuthResponse` reply to `"Unable to authenticate"` instead of the `AuthenticationState` exception. A new test is added. The added test covers the change made in apache#19295 where we updated `ServerCnx` so that we call `AuthState#authenticate` instead of relying on the implementation detail that the initialization calls `authenticate`. That PR should have added a test. This is not a breaking change. - [x] `doc-not-needed` PR in forked repository: #18 (cherry picked from commit 8049690) (cherry picked from commit 3ef3bf1) (cherry picked from commit 467cd32) (cherry picked from commit 1fab71b)
This change was introduced by apache#19295. That PR had more changes than are worth cherry-picking, though, so this commit only has the additional call to authenticate the original auth data. As a result, this commit is slightly less efficient because in some implementations, the authdata will be validated twice. (cherry picked from commit f9727ca) (cherry picked from commit 1935f07) (cherry picked from commit 870bf04)
I forgot why I tagged this commit for a commit to all of the release branches. Those commits reference the fact that I added a call to authenticate the |
if (authRole == null) { | ||
throw new AuthenticationException("Must authenticate before calling getAuthRole"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks like a breaking change. In KoP, this method is used to get the role from the token. After upgrading the dependency, many authorization related tests failed. See https://github.com/streamnative/kop/actions/runs/4343910383/jobs/7586547352
Not sure if it should be reverted in master branch, but it should not be cherry-picked to release branches. @michaeljmarshall
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BewareMyPower - I agree that we shouldn't cherry pick this line. I don't think I did though, taking a quick look at one of the linked commits. I primarily cherry picked the code that calls authenticate
on the original auth data. Are you seeing this error in the kop logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand now. That KoP test references pulsar 3.0.0, so it must be from a recent build of master. First, I'll note that I didn't cherry pick the behavior change to old release branches. I cherry-picked the extra authentication call for original auth data, which was code introduced in this PR. Second, I think this change is necessary for enabling PIP 97. I first documented the contract here: #19283. The primary issue is that creating the AuthState
should not trigger authenticating the AuthData
. KoP relied on this implementation detail, and that is why the tests failed. Given that the only impacted parties are plugin developers, I think a PIP, documented release notes, a major version bump, and a fail fast behavior (the object throws an exception), make this change acceptable for 3.0.0. Let me know what you think, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed the fact that you have removed these cherry-picked
labels.
Regarding this change, I think you should add another constructor to OneStageAuthenticationState
. Currently, the first argument authData
is actually not used. Maybe you didn't remove it because it's public. (However, you removed the exception signature) But the semantic changed and it looked weird now.
To keep the compatibility, it's better to mark the constructor as deprecated first and keep it the semantic as "authenticate synchronously". To use the asynchronous semantic, you should add another constructor to do that.
public OneStageAuthenticationState(SocketAddress remoteAddress,
SSLSession sslSession,
AuthenticationProvider provider) {
this.provider = provider;
this.remoteAddress = remoteAddress;
this.sslSession = sslSession;
}
@Deprecated
public OneStageAuthenticationState(AuthData authData,
SocketAddress remoteAddress,
SSLSession sslSession,
AuthenticationProvider provider) throws AuthenticationException {
this(remoteAddress, sslSession, provider);
this.authenticationDataSource = new AuthenticationDataCommand(
new String(authData.getBytes(), StandardCharsets.UTF_8), remoteAddress, sslSession);
this.authRole = provider.authenticate(this.authenticationDataSource);
}
KoP relied on this implementation detail
I'd rather use "semantics" instead of the "implementation detail". Take Producer#send
as example:
- Semantics: the send is synchronous
- Implementation details: the exception message representation, etc.
If you made a change that makes send
asynchronous, it should be treated as a breaking change. But if you just modified the exception message, for example, adding a common prefix to the exception message, it would be okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BewareMyPower - thanks for the context and the suggestion.
I think you should add another constructor to
OneStageAuthenticationState
.
Thanks for the suggestion, I agree this is the best way to move forward without breaking any other applications when 3.0.0 is released. I also agree that it makes the API clearer because the unused AuthData
won't be passed in. I shoul be able to get a PR submitted within 24 hours or so.
(However, you removed the exception signature)
I didn't realize this would cause an issue. I'll add it back when I fix the constructor. Out of curiosity, what is the consequence of breaking source compatibility by removing the exception from the method signature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, what is the consequence of breaking source compatibility by removing the exception from the method signature?
No. I just noticed this point when I reviewed again. After looking for some documents and testing the ABI compatibility locally, it seems that removing the exception signature does not affect the ABI compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BewareMyPower - I finally got a chance to come back to this, and now I remember that I did try to use a similar solution that you proposed while writing this PR, but I found it didn't actually work as you propose. There are a few details to show the issue:
First, the OneStageAuthenticationState
is not used directly by the broker or by plugins. Instead, the broker and plugins use AuthenticationProvider#newAuthState
to create the AuthenticationState
object. Here is the code:
Lines 96 to 104 in fb7f14c
/** | |
* Create an authentication data State use passed in AuthenticationDataSource. | |
*/ | |
default AuthenticationState newAuthState(AuthData authData, | |
SocketAddress remoteAddress, | |
SSLSession sslSession) | |
throws AuthenticationException { | |
return new OneStageAuthenticationState(authData, remoteAddress, sslSession, this); | |
} |
As such, changing the constructor in the way you propose is unlikely to help users that built their own extensions. For example, KoP does the following:
Users must rely on the AuthenticationProvider#newAuthState
in order to make the AuthenticationProvider
pluggable.
When I noticed this detail while writing this PR, I thought about adding a new method to the AuthenticationProvider
that would not take the AuthData
object, in the same way that you proposed for the new OneStageAuthenticationState
constructor. However, the main issue is how to default that method in the interface. We could use the following definition:
default AuthenticationState newAuthState(SocketAddress remoteAddress, SSLSession sslSession)
throws AuthenticationException {
return new OneStageAuthenticationState(remoteAddress, sslSession, this);
}
This would be analogous to the current newAuthState
method, but the issue is that custom AuthenticationProvider
implementations that override the current newAuthState
method likely don't want to use the OneStageAuthenticationState
, and that would lead to unexpected behavior.
Another option could be:
default AuthenticationState newAuthState(SocketAddress remoteAddress, SSLSession sslSession)
throws AuthenticationException {
return newAuthState(null, remoteAddress, sslSession);
}
However, that would require breaking the current implementation in the way you're concerned about. (For what it's worth, this option might be the best long term solution for cleaning up the interface if we want to remove the authData
that is passed. The main point is that it doesn't take care of your concern of not breaking user code.)
Another option:
AuthenticationState newAuthState(SocketAddress remoteAddress, SSLSession sslSession);
This unimplemented option has two problems. First, it will break user code if they do not implement the method. Since this whole exercise is an effort to try to prevent users from needing to change any code, that isn't a good option. Second, it removes the OneStageAuthenticationState
default already present in the interface, which seems problematic.
In my view, the primary challenge is that the AuthenticationProvider
interface has a brittle default behavior.
Given the above, do you have any ideas on how to make this better for users building their own broker extensions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the issue is that custom AuthenticationProvider implementations that override the current newAuthState method likely don't want to use the OneStageAuthenticationState, and that would lead to unexpected behavior.
Could you show an example of a possible unexpected behavior? IMO, the existing authentication provider only overrides the previous newAuthState
method still has the previous behavior and is not affected by the new newAuthState
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you show an example of a possible unexpected behavior? IMO, the existing authentication provider only overrides the previous
newAuthState
method still has the previous behavior and is not affected by the newnewAuthState
method.
The challenge comes from the fact that Apache Pulsar will transition to using the new method to build the state object and if that method is not overridden by the custom AuthenticationProvider
implementation, the OneStageAuthenticationState
will be used. That will lead to unexpected behavior.
A good example is the AuthenticationProviderToken
class. That one would "work" with the OneStageAuthenticationState
with the key difference being the isExpired
value from OneStageAuthenticationState
always returns false
while the TokenAuthenticationState
returns true
when the token's exp
claim indicates the token has expired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it. The key point is that Pulsar will prefer the new overload but the existing implementations might still implement the old overload. It seems that there is no better way to solve it perfectly.
### Modifications - Upgrade the Pulsar dependency to 3.0.0.1-SNAPSHOT - Use `List<EntryFilter>` from apache/pulsar#19364 as the filters - Remove the removed APIs in `AuthorizationService` - Use `TopicType` instead of String. - Fix authenticator failure caused by apache/pulsar#19295 Co-authored-by: Demogorgon314 <[email protected]>
…che#19295) (cherry picked from commit e8695bf)
…he#19312) PIP: apache#12105 When authentication fails in the `ServerCnx`, the state is left in `Start` if the primary `authData` fails authentication and in `Connecting` or `Connected` if the `originalAuthData` authentication fails. To prevent any kind of unexpected behavior, we should go to `Failed` state. Note that the tests verify the current behavior where a failed `originalAuthData` results first in a `Connected` command from the broker and then an `Error` command. I documented that I think this is sub optimal here apache#19311. * Update `ServerCnx` state to `Failed` when there is an authentication exception during `handleConnect` and during `handleAuthResponse`. * Update `handleAuthResponse` reply to `"Unable to authenticate"` instead of the `AuthenticationState` exception. A new test is added. The added test covers the change made in apache#19295 where we updated `ServerCnx` so that we call `AuthState#authenticate` instead of relying on the implementation detail that the initialization calls `authenticate`. That PR should have added a test. This is not a breaking change. - [x] `doc-not-needed` PR in forked repository: #18 (cherry picked from commit 8049690)
PIP: #12105
Motivation
In order to implement PIP 97, we must replace
authenticate
calls withauthenticateAsync
. This PR makes those changes forOneStageAuthenticationState
. It also moves the call to authenticate theAuthData
out of the constructor and into theauthenticateAsync
method.One assumption I make in this PR, is that there will not be concurrent calls to the same
AuthenticationState#authenticateAsync
method. That seems like a reasonable assumption because Pulsar uses this state object in the netty eventloop handlers, but please let me know if you disagree.Modifications
authenticate
call out of the constructor so that authentication is only done on the first pass through theauthenticateAsync
method.authenticate
backwards compatible for any plugins that are using it.Verifying this change
New tests are added to verify the new and the old code.
Does this pull request potentially affect one of the following parts:
This could break 3rd party plugins in the broker if they were relying on authentication to happen in the constructor. In order to make those implementations fail fast, this PR includes a change to throw an exception when the
getAuthRole
is called without first callingauthenticateAsync
orauthenticate
. That makes these changes semi-backwards compatible.Documentation
doc-not-needed
Matching PR in forked repository
PR in forked repository: michaeljmarshall#17