Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: envoy internal connection #11725

Closed
lambdai opened this issue Jun 24, 2020 · 23 comments
Closed

Discussion: envoy internal connection #11725

lambdai opened this issue Jun 24, 2020 · 23 comments
Labels
area/http stale stalebot believes this issue/PR has not been touched recently

Comments

@lambdai
Copy link
Contributor

lambdai commented Jun 24, 2020

I was exploring dispatching h2 CONNECT to listeners of the same envoy process.
The mental topology is something like below.

curl -> H2 CONNECT -> ListenerA_HCM -> TCP -> ListenerB_TcpProxy.

Without any change listenerA_HCM -> TCP -> ListenerB_TcpProxy will be carried by a TCP connection.
It is expensive in terms of

  1. Resource usage: tcp 4 tuple socket id and consider the TIME_WAIT, ephemeral ports
  2. Latency: it take extra epoll cycle to relay the data
  3. CPU usage: annoying memory copy from and to kernel.

From my point of view, where envoy's role is the sidecar, the tunnel HCM is pluggable. The ListenerB_TcpPoxy carries the business logic. This TcpProxy should kept the minimum changes when ListenerA_HCM switching on and off. I believe the deployment is significantly different from envoy as central proxy.

To reach the minimum change at TcpProxy listener, the reasonable approach is to create a non-socket based connection while providing the connection interface: read/write so that

  1. The cluster used by tunnel HCM could create such a virtual connection,
  2. Dispatcher create both VirtualClientConnection and VirtualServerConnection
  3. TcpListener find the filter chain and create filter chain for VirtualServerConnection, optionally bypass listener filter.

I think the major challenges includes simulating the delay close, half close, watermarks, supporting other transport socket, etc. But not all are necessary at the very beginning.

This approach could be an optimization for any envoy cluster connecting to envoy itself.

Pros:

  1. Build on top of the existing abstraction: listener, cluster, host, connection.
  2. The new functionaries are extensions. The lifetime of above core components are not impacted.
  3. Almost zero overhead for anyone who doesn't use it
  4. Least control plane change.

Cons:

  1. The pipe-ish connection doesn't have the highest performance among the alternative solutions
  2. The new simulated connection is not tcp feature-complete (I doubt if we need the whole).

Alternative solutions
Cluster network filter, or http upstream filter: The dispatch of connections is achieved by X clusters cluster/route instead of X listener. Migration from traditional listener filter chain match to cluster match is non-trivial. It's a huge burden when we switching the tunnel.

CC @alyssawilk @PiotrSikora

@lambdai
Copy link
Contributor Author

lambdai commented Jun 24, 2020

Looks like this is a specialization of #11618

@lambdai
Copy link
Contributor Author

lambdai commented Jun 24, 2020

I looked at my experimental code. My major change is to strip the existing socket from Connection. Looks like I could use the socket interface instead.

@antoniovicente
Copy link
Contributor

Would you get equivalent behavior if ListenerA_HCM were to select ListenerB_TcpProxy's upstream directly instead of going through some virtual connections? I guess that you need more than just listenerB's upstream since there may be some L4 filters that need to operate on the request, and the socket type for listenerB could be something like SSL, in which case you'ld need to hook into https://wiki.openssl.org/index.php/BIO to perform SSL operations without an underlying real fd.

From past experience, mimicing an fd API without a real fd behind it is error prone and challenging.

@antoniovicente
Copy link
Contributor

I looked at my experimental code. My major change is to strip the existing socket from Connection. Looks like I could use the socket interface instead.

Which connection? You still need to go through the H2 codec on listenerA and there could even be multiple H2 CONNECTs active on the same client connection. Passing A's client connection to B's listener won't work.

@lambdai
Copy link
Contributor Author

lambdai commented Jun 24, 2020

Would you get equivalent behavior if ListenerA_HCM were to select ListenerB_TcpProxy's upstream directly instead of going through some virtual connections? I guess that you need more than just listenerB's upstream since there may be some L4 filters that need to operate on the request, and the socket type for listenerB could be something like SSL, in which case you'ld need to hook into https://wiki.openssl.org/index.php/BIO to perform SSL operations without an underlying real fd.

I agree with you that an 100% FD api is extremely hard, especially when listenerB has corner options.

But there is a question ahead.

  1. Does listenerB need to handle both real tcp connection and virtual connection?
  2. Does listenerB need to implement all the tcp listener features?

At my scenario, both answer is NO.
I incline to introduce another UserspacePipeListener. It starts with self defined socket interface, buffer dest and buffer souce transport socket(which excludes SSL), but could carries L4 filters including non-terminated ones (e.g. authz) and terminated ones (e.g. tcp proxy).
I am fine with that because in istio, control plane owns both listenerA_HCM and listenerB_TcpProxy. If ListenerB has a crazy filter or use some not implement yet feature, listenerA_HCM could choose to go through real tcp connections.

@lambdai
Copy link
Contributor Author

lambdai commented Jun 24, 2020

wrt the concern SSL: it's fine to me that control plane create listenerB1 terminating SSL connection and listenerB2 terminating virtual connection.

@lambdai
Copy link
Contributor Author

lambdai commented Jun 24, 2020

I looked at my experimental code. My major change is to strip the existing socket from Connection. Looks like I could use the socket interface instead.

Which connection? You still need to go through the H2 codec on listenerA and there could even be multiple H2 CONNECTs active on the same client connection. Passing A's client connection to B's listener won't work.

Oh, my. It looks like you are much more ambitious than my plan.
Original: client <-C1, connect stream-> HCM <-C2,rawtcp-> TcpProxy <-C3, rawtcp-> server
My plan: client <-C1, connect stream-> HCM <-C2,virtual-> TcpProxy <-C3, rawtcp -> server

My proposal affects only C2. The HCM still need to relay the bytes to C2 from the connect stream in C1. I am not gonna relay the fd to C2 or C3.

In the words of SocketInterface, C2 is a non-fd socket from the view of HCM.

Does it make sense?

@antoniovicente
Copy link
Contributor

antoniovicente commented Jun 25, 2020

I looked at my experimental code. My major change is to strip the existing socket from Connection. Looks like I could use the socket interface instead.

Which connection? You still need to go through the H2 codec on listenerA and there could even be multiple H2 CONNECTs active on the same client connection. Passing A's client connection to B's listener won't work.

Oh, my. It looks like you are much more ambitious than my plan.
Original: client <-C1, connect stream-> HCM <-C2,rawtcp-> TcpProxy <-C3, rawtcp-> server
My plan: client <-C1, connect stream-> HCM <-C2,virtual-> TcpProxy <-C3, rawtcp -> server

My proposal affects only C2. The HCM still need to relay the bytes to C2 from the connect stream in C1. I am not gonna relay the fd to C2 or C3.

In the words of SocketInterface, C2 is a non-fd socket from the view of HCM.

Does it make sense?

I think the original picture is actually:
client <-C1, connect stream-> HCM <-upstream connection C2-> -> <-C3 listener connection-> TcpProxy <-C4, rawtcp-> server

Use of a socket pair or pipe for C2 and C3 seems like a good place to start in order to work around the local ephemeral port limits issue. Most things would continue to work in that case. Yielding to the event loop between writes to C2 and reads from C3 is actually a good thing, you want to allow other players to take a turn. I also think its important to keep call stacks relatively shallow to reduce complexity.

Going beyond that, it depends on what layer you'ld like to hook virtual connections into. Network::ConnectionImpl has several important responsibilities related to handling of high/low watermarks. IOHandle doesn't have a way to schedule events directly and needs to rely on Event::FileEvent or TransportSocketCallbacks::setReadBufferReady() to schedule resumption. Operations that get the local and remote socket address from the socket may also present some difficulties.

Our current non-Envoy based system uses virtual connections for a lot of stuff. I think the move to SSL BIO for virtual connections may be unavoidable. It is mostly a matter of getting it implemented.

@lambdai
Copy link
Contributor Author

lambdai commented Jun 25, 2020

client <-C1, connect stream-> HCM <-upstream connection C2-> -> <-C3 listener connection-> TcpProxy <-C4, rawtcp-> server

Good point!

Use of a socket pair or pipe for C2 and C3 seems like a good place to start in order to work around the local ephemeral port limits issue.

Agree. An envoy connection wrapping socket pair with "connection attributes" and "connect()" solve the ephemeral port problem.
It solve everything except the overhead of bytes copy and the transport at next poll cycle.

@lambdai
Copy link
Contributor Author

lambdai commented Jun 25, 2020

Going beyond that, it depends on what layer you'ld like to hook virtual connections into. Network::ConnectionImpl has several important responsibilities related to handling of high/low watermarks. IOHandle doesn't have a way to schedule events directly and needs to rely on Event::FileEvent or TransportSocketCallbacks::setReadBufferReady() to schedule resumption. Operations that get the local and remote socket address from the socket may also present some difficulties.

I see the challenges there. I cannot even close my virtual connection at this moment :(

@lambdai
Copy link
Contributor Author

lambdai commented Jun 30, 2020

I have an end2end demo showing that we can chain HCM and TcpProxy not through a OS fd. Actually it saves 2 fd. See C2 and C3 above.

https://github.com/lambdai/envoy-dai/tree/hostconn

@mattklein123
Copy link
Member

One other comment is this issue feels very similar to what we have already done with the API listener which is being used by Envoy Mobile. We have already discussed having a TCP variant of the API listener which would work with TCP proxy type setups. I wonder if that would help here? Basically maybe this is a special upstream/socket which actually loops back through an API listener interface? cc @junr03 @goaway

@lambdai
Copy link
Contributor Author

lambdai commented Jul 14, 2020

We have already discussed having a TCP variant of the API listener which would work with TCP proxy type setups.

Great to see that looping back to listener has a wider use case! To me the "tcp proxy type" is a little vague.

What I am proposed here is to hide the loop between the upstream cluster and the dest listener

@mattklein123
Copy link
Member

Sorry my point is that you can have an API listener variant that injects raw TCP bytes vs. HTTP messages.

@lambdai
Copy link
Contributor Author

lambdai commented Jul 15, 2020

I brainstorms the usage of ApiListener and it probably works if

  1. Support listener update.
  2. Could name and reference the listener by socket interface ("api://<api_listener_name>")
  3. Expand the api listener to per worker thread.
  4. allow to attach more than terminate L4 filter (HCM, maybe TcpProxy).

It require extra complexity added to SyntheticConnection.

I think we are head to the similar goal as grabbing a user space connection not though OS fd.

@chradcliffe
Copy link
Contributor

The issue here refers to h2 CONNECT, but am I correct in thinking that this would also apply to HTTP/1.1 CONNECT?

A few months back I posted to the envoy-dev mailing list asking about the best approach to inspect traffic over HTTP/1.1 CONNECT -- in particular, how we would go about running the CONNECT-tunneled traffic through the HTTP stack. The solution that was suggested -- creating an "inner" HCM -- more or less works, but being able to feed the CONNECT tunnel back into a listener sounds like a much better solution, and it sounds like it would also give us the ability to process tunneled TLS.

@stale
Copy link

stale bot commented Aug 24, 2020

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Aug 24, 2020
@lambdai
Copy link
Contributor Author

lambdai commented Aug 27, 2020

design doc updated to reflect the dev branch.
https://docs.google.com/document/d/1ok9oyMw-39lUXcO8ihhY2bL6b5gCoyK3-UyFoPEWCgo/edit?usp=sharing

The strawman branch mainly described the bytestream after connection is established.
The connection establishment is somewhat orthogonal

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Aug 27, 2020
@ggreenway
Copy link
Contributor

A couple ideas on the topic:

  • This could combine nicely with multiple addresses per listener (Support multiple addresses per listener #11184)
  • If it's combined with multiple addresses, we could skip TLS for the local case (if the listener was configured for TLS).

@github-actions
Copy link

github-actions bot commented Dec 9, 2020

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Dec 9, 2020
@github-actions
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@YaoZengzeng
Copy link
Member

@lambdai based on my research, this "internal connection" seems not finished yet? Is there rough data for performance improvement, compared with original way (conneciton through kernel)? Thanks! :)

@liverbirdkte
Copy link
Contributor

@lambdai I have a use case with two listeners listenerA_HCM and listenerB_HCM:
curl -> H2 CONNECT -> ListenerA_HCM -> TLS -> ListenerB_HCM -> Upstream server. listenerA_HCM is used for tunneling, ListenerB_HCM terminates TLS connection and inspects payload to do some security checks.

Do you have any plan to support this SSL use case? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/http stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

8 participants