Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] core: Allow LoadBalancer2 to use a different authority per SubChannel #2662

Closed
wants to merge 1 commit into from

Conversation

kkaneda
Copy link
Contributor

@kkaneda kkaneda commented Jan 26, 2017

Add a variant of createSubChannel() to LbHelperImpl. The new method has an additional argument for authority.

Background: Square has the following configuration of NameResolver and LoadBalancer.

  • NameResolver resolves an RPC service name (e.g., ExampleService) to DNS names (e.g., example.datacenter1.squareup.com, example.datacenter2.squareup.com).
  • LoadBalancer creates a SubChannel per DNS name.

Each Subchannel gets its authority from ManagedChannel#authority(), which returns the service authority that the NameResolver uses (= ExampleService).

On the other hand, our certificate is not configured to support ExampleService as an SNI. Thus, we got the following error:

Updating sub channel state: TRANSIENT_FAILURE(Status{code=UNAVAILABLE,description=null, cause=javax.net.ssl.SSLHandshakeException: General OpenSslEngine problem
   ...
  at java.lang.Thread.run(Thread.java:745)
  Caused by: java.security.cert.CertificateException: No subject alternative DNS name matching ExampleService found.

We can of course update our certificates, but it would be nice if we can just use a different authority per Subchannel.

Add a variant of createSubChannel() to LbHelperImpl. The new method
has an additional argument for authority.

Background: Square has the following configuration of NameResolver and LoadBalancer.

- NameResolver resolves an RPC service name (e.g., ExampleService) to
  DNS names (e.g., example.datacenter1.squareup.com, example.datacenter2.squareup.com).
- LoadBalancer creates a SubChannel per DNS name.

Each Subchannel gets its authority from
ManagedChannel#authority(), which returns the service authority that the
NameResolver uses (= ExampleService).

On the other hand, our certificate is not configured to support ExampleService
as an SNI. Thus, we got the following error:

Updating sub channel state: TRANSIENT_FAILURE(Status{code=UNAVAILABLE,description=null, cause=javax.net.ssl.SSLHandshakeException: General OpenSslEngine problem
   ...
  at java.lang.Thread.run(Thread.java:745)
  Caused by: java.security.cert.CertificateException: No subject alternative DNS name matching ExampleService found.

We can of course update our certificates, but it would be nice if we
can just use a different authority per subchannel.
@kkaneda kkaneda changed the title [WIP] core: Allow LoadBalancer2 use a different authority per subchannel [WIP] core: Allow LoadBalancer2 to use a different authority per subchannel Jan 26, 2017
@kkaneda kkaneda changed the title [WIP] core: Allow LoadBalancer2 to use a different authority per subchannel [WIP] core: Allow LoadBalancer2 to use a different authority per SubChannel Jan 26, 2017
@lukaszx0
Copy link
Contributor

We can workaround this, but I can imagine other people having similar issues in the future.

@ejona86 @zhangkun83 thoughts?

@zhangkun83
Copy link
Contributor

ManagedChannel#authority() is from NameResolver#getServiceAuthority(). Have you tried letting NameResolver to return the desired authority?

@kkaneda
Copy link
Contributor Author

kkaneda commented Jan 26, 2017

Hmm, I'm not sure that's doable. The javadoc of NameResolver#getServiceAuthority() says "An implementation must generate it locally and must keep it unchanged". However, the desired authority here is either example.datacenter1.squareup.com or example.datacenter2.squareup.com depending on which DNS name is used by SubChannel to connect. So, it's not static.

@lukaszx0
Copy link
Contributor

Yeah. So currentlyNameResolver#getServiceAuthority() returns a service name (fully qualified grpc proto service name) but the NameResolver can resolve the target for which we crate channel to multiple clusters across multiple DCs and for each DC will have a different VIP/DNS name.

@jorgheymans
Copy link
Contributor

@zhangkun83
Copy link
Contributor

@lukaszx0, looks like you are misusing authority. The authority returned by NameResolver is meant to be used directly for making TLS connections. If authority is locally derived by audited code from the target provided by user, we can make sure that the servers we connect to and send RPCs to are authenticated as the identity represented by the target. This is critical for protecting user data.

If NameResolver were to derive authority from network, the name resolution must be secure so that the authority is not tampered with on the network. This is a bigger topic than a Java API change, and should involve all other languages.

@jboeuf is working on the security aspect of gRPC. He may give a better insight of this issue.

@kkaneda
Copy link
Contributor Author

kkaneda commented Jan 27, 2017

I see. That makes sense. If this is not a recommended approach, our option would be to just modify certs, I guess. Could you let us if there is any other option?

@zhangkun83
Copy link
Contributor

What does your target string look like? Using the RPC service name as the authority is unusual. Have you considered using something like scheme:///squareup.com/ExampleService or scheme:///squareup.com/?service=ExampleService for the target?

@lukaszx0
Copy link
Contributor

@zhangkun83 thanks, this makes sense.

Currently our secinfra is set up in a way that doesn't support it, but I just talked with some folks and I think we can easily fix/workaround this.

What does your target string look like? Using the RPC service name as the authority is unusual. Have you considered using something like scheme:///squareup.com/ExampleService or scheme:///squareup.com/?service=ExampleService for the target?

I think we'll continue to use service name as authority with which we'll update our certs to make it work.

The targets that we're currently using are what we call cluster specs. So for example for the GeocodingService, we create channel with target geocoding/nyc1/production (app/dc/cluster name). The cluster spec gets resolved by NameResolver and we create connection to hosts returned from it in LB.

@jboeuf
Copy link
Contributor

jboeuf commented Jan 27, 2017

Just my 2 cents here.

gRPC is doing the right thing. If the certificate doesn't list the authority, the call should fail.
The (sneaky) attack is the following.

  1. The attacker has control over DNS and resolves a legitimate site (e.g. mytrustedbank.com) to the same IP as the evil site that it controls (e.g mycatpictures.com).
  2. The attacker tricks a client to open a connection to mycatpicturesites.com and then to go to mytrustedbank.com.
  3. The gRPC stack resolves mytrustedbank.com and sees that it resolves as the same IP as a current connection (mycatpictures.com). Hypothetically, it could try to optimize the flow by not opening a new channel but just reusing the channel to this IP.
  4. The user enters its credentials to login to mytrustedbank.com. If the authority is not checked against the cert for this connection, the user is owned.

@kkaneda
Copy link
Contributor Author

kkaneda commented Jan 27, 2017

Thanks, everyone! I'll close the PR.

(Just to clarify, using scheme:///<some_service_name>.squareup.com was recommended internally than just scheme:///squareup.com. As for "If the certificate doesn't list the authority, the call should fail", the change in this PR won't change the property, but it will allow LoadBalancer to dynamically change authority based on a result coming from NameResolver.)

@kkaneda kkaneda closed this Jan 27, 2017
@jboeuf
Copy link
Contributor

jboeuf commented Jan 27, 2017

(Just to clarify, using scheme:///<some_service_name>.squareup.com was recommended internally than just scheme:///squareup.com. As for "If the certificate doesn't list the authority, the call should fail", the change in this PR won't change the property, but it will allow LoadBalancer to dynamically change authority based on a result coming from NameResolver.)

If your name resolver is not trusted (such as it is the case for unsecure DNS), it is a security issue to change the authority based on it.

In my example, if you replace DNS by another untrusted naming system that can return an authority, all the attacker would have to do is to say that the authority for mycatpictures.com is actually mytrustedbank.com and voilà!

I asked @zhangkun83 to document the fact that implementations NameResolver#getServiceAuthority() must not rely on untrusted services.

@kkaneda
Copy link
Contributor Author

kkaneda commented Jan 27, 2017

Yes, I agreed. Thanks for the detailed explanation!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants