Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency Disruptor #53

Open
pablochacin opened this issue Nov 17, 2022 · 7 comments
Open

Dependency Disruptor #53

pablochacin opened this issue Nov 17, 2022 · 7 comments
Labels
enhancement New feature or request needs evaluation issue needs evaluation to assess viability or impact

Comments

@pablochacin
Copy link
Collaborator

pablochacin commented Nov 17, 2022

It is a common use case to test the effect of known patterns of behavior in external dependencies (services that are not under the control of the organization). Using the xk6-disruptor, this could be accomplished by implementing a Dependency Disruptor, which instead of disrupting a service (or a group of pods), disrupts the requests these pods make to other services.

This could be implemented using a similar approach used by the disruptor, injecting a transparent proxy but in this case for outgoing requests.

This approach will work well if the service is a dependency for a small set of pods (for example, the pods that back an internal service) but will not work well if many different pods (e.g. many different internal services) use this external dependency.

From the implementation perspective, the two main blockers for this functionality are:

  1. TLS termination. For external services, the most common scenario is to use encrypted communications using TLS. In this case, the disruptor cannot modify the response (e.g. the status code). Moreover, the traffic cannot be intercepted using a simple proxy because the handshaking would fail. Using eBPF may open some alternatives.

  2. How to identify the IP address(es) of the dependency. Currently, the disruptor uses iptables to redirect the traffic to the proxy that injects the faults. In the case of the dependency disruptor the traffic going to the external service is the one that must be intercepted. However, the IP address of this external dependency may not be known at the time the disruptor agent is installed, or it can change during the execution of the disruption (for example, if the external dependency uses DNS load balancing).

@pablochacin pablochacin added the enhancement New feature or request label Dec 5, 2022
@pablochacin pablochacin added the needs evaluation issue needs evaluation to assess viability or impact label May 24, 2023
@nadiamoe

This comment was marked as outdated.

@pablochacin
Copy link
Collaborator Author

pablochacin commented Aug 21, 2023

Apparently, squid proxy implements a mechanism called sslBump for intercepting HTTPS traffic:

   ssl-bump	For each CONNECT request allowed by ssl_bump ACLs,
			establish secure connection with the client and with
			the server, decrypt HTTPS messages as they pass through
			Squid, and treat them as unencrypted HTTP messages,
			becoming the man-in-the-middle.

How this is implemented:

Establish a TLS connection with the server (using client SNI, if any) and establish a TLS connection with the client (using a mimicked server certificate).

According to this tutorial, this requires a self-signed CA root certificate to be deployed in the client's SSL configuration.

@nadiamoe
Copy link
Member

Dropping some thoughts I've been having as well. I think that if we want to intercept TLS connections to trusted sources, we will be forced to perform a (benign) man-in-the-middle attack, just as described above. I see only two ways we can make this:

  1. Change (patch) the code that checks for certificate validity
  2. Add our certificate the list of valid certificates the code checks

Regarding route 1, for all libraries I know, the code that decides whether a certificate is trusted is part of a library used by the application, so there is not a universal (or reasonably wide-scoped) way to do this. Some applications will link dynamically to this library, while others will link statically. The code would need to intrude into the application, like a debugger would, and intercept this points.

Route 2 offers similar compatibility concerns, as different systems and libraries pick this list from different places. However, these systems and libraries often offer mechanisms to perform this specific task of adding certificates to the trusted pool. OpenSSL and Go, for example, will trust any certificate present on a directory listed in the SSL_CERT_DIR environment variable.

An unfortunate requirement would be that this list is typically loaded once when the application starts, so to make any change to this list effective, one would need to start the application after changing this env vars. After it has been restarted, the disruption can still be turned on and off through the usual means (iptables rules).

Another option within route 2, would be to drop certificates in places where we already know the system will look at. This could be useful if a library does not support specifying additional paths through environment variables.

To close the route 2 approach list, there's also the option off intercepting system calls for files that look like CAs, and append ours in the result. This would be more cumbersome but possible to do if we find libraries that read certificates from unpredictable paths, and do not allow modifying those paths externally.

As a summary of my thoughts, the options I can think for TLS proxying are poking into things with a debugger (hard, non-portable) or adding our CAs to the system's trust (less hard, requires restarting). We would need to asses if, within the context of who will going to use the disruptor, and in which environment will they use it, restarting the application is a downside we can live with.

@pablochacin
Copy link
Collaborator Author

To close the route 2 approach list, there's also the option off intercepting system calls for files that look like CAs, and append ours in the result. This would be more cumbersome but possible to do if we find libraries that read certificates from unpredictable paths, and do not allow modifying those paths externally

Which mechanism could be used for this purpose? ebpf for instance does not allow modifying the results of a syscall.

@pablochacin
Copy link
Collaborator Author

Add our certificate the list of valid certificates the code checks
This is the same approach described in the comment above.

Route 2 offers similar compatibility concerns, as different systems and libraries pick this list from different places. However, these systems and libraries often offer mechanisms to perform this specific task of adding certificates to the trusted pool. OpenSSL and Go, for example, will trust any certificate present on a directory listed in the SSL_CERT_DIR environment variable.

An unfortunate requirement would be that this list is typically loaded once when the application starts,

Could you elaborate on why this is the case? I would expect the CA root certificate to be loaded on demand when validating a certificate that refers to il

@nadiamoe
Copy link
Member

nadiamoe commented Aug 25, 2023

Which mechanism could be used for this purpose? ebpf for instance does not allow modifying the results of a syscall.

I think this could be done by hijacking libc's open() with a wrapping library that we then LD_PRELOAD. However this is far from ideal, as it is very intrusive and poses potential compatibility problems. Programs not using libc would not go through this path, for example.

Could you elaborate on why this is the case? I would expect the CA root certificate to be loaded on demand when validating a certificate that refers to it

This could certainly vary between libraries and languages. I believe Go is doing it only once upon the first verification, as it is sync.Onced here. Different libraries may do it differently, although I would not expect them to do it very often as reading potentially hundreds of TLS certificates from disk can potentially be a performance hit.

@pablochacin
Copy link
Collaborator Author

The test implemented in this POC exploits a not well-documented feature in Docker that allows attaching a container to the network stack (or network namespace) of another container.

I don't see how this can work without restarting the application, which is not an option in our scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs evaluation issue needs evaluation to assess viability or impact
Projects
None yet
Development

No branches or pull requests

2 participants