Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End to end encryption of traffic with ACM managed certs #39

Closed
jamsajones opened this issue Nov 28, 2018 · 45 comments
Closed

End to end encryption of traffic with ACM managed certs #39

jamsajones opened this issue Nov 28, 2018 · 45 comments
Assignees

Comments

@jamsajones
Copy link

No description provided.

@mumoshu
Copy link

mumoshu commented Dec 5, 2018

This is very interesting indeed!

I was planning to use Istio to enable end-to-end encryption between microservices, so that we have no chances to connect wrong services due to reused VPC IPs. But maintaining Istio CA/Auth/Citadel and the other parts of Istio's control-plane and its foundation, K8S cluster, just for a service mesh seemed an overkill.

I'd expect App Mesh integrated with ACM to provide the same benefit, without the operational burden.

@pda
Copy link

pda commented Dec 11, 2018

This is the biggest blocker for us moving from traditional serviceA → ALB → serviceB approach to serviceA → AppMesh → serviceB — we always want inter-service requests to use HTTPS with ACM-issued certificates.

But, I can't see how this will be possible in the current model; the App Mesh architecture has the Envoy proxy running under our (the AWS user) control as a sidecar container rather than under AWS' control (as is the case with ALB etc). ACM must not hand the certificate/privkey over to Envoy. So App Mesh would need to introduce another layer of proxy, or perhaps integrate ALB to terminate the HTTPS/TLS.

I'm very keen to hear more about how this might work.

@bcelenza bcelenza changed the title End to snd encryption of traffic with ACM managed certs End to end encryption of traffic with ACM managed certs Dec 31, 2018
@kiranmeduri
Copy link

This is the biggest blocker for us moving from traditional serviceA → ALB → serviceB approach to serviceA → AppMesh → serviceB — we always want inter-service requests to use HTTPS with ACM-issued certificates.

But, I can't see how this will be possible in the current model; the App Mesh architecture has the Envoy proxy running under our (the AWS user) control as a sidecar container rather than under AWS' control (as is the case with ALB etc). ACM must not hand the certificate/privkey over to Envoy. So App Mesh would need to introduce another layer of proxy, or perhaps integrate ALB to terminate the HTTPS/TLS.

I'm very keen to hear more about how this might work.

@pda I would like to understand more about the concern around "ACM must not hand the certificate/privkey over to Envoy". I am assuming this is in the context of TLS termination on the Envoy for incoming traffic on the service endpoint. If the cert pair is specific to the service then what is the concern in giving the secrets to Envoy that is going to terminate TLS?

@coultn coultn transferred this issue from aws/aws-app-mesh-examples Mar 28, 2019
@pda
Copy link

pda commented Mar 28, 2019

@kiranmeduri Thanks for the reply.

When I say “ACM must not …” I'm referring to my understanding that ACM by design always keeps certificates/secrets in AWS-managed context where the AWS customer cannot access it. e.g. there's no way a customer can retrieve a certificate private key attached to an ALB.

Whereas App Mesh has envoy running in customer-managed space. If App Mesh / ACM passes the certificate & private key to the envoy proxy, wouldn't it be possible for the AWS customer to access/exfiltrate it?

It's quite likely I'm misunderstanding some aspect of this.

@mhausenblas mhausenblas added the Roadmap: Proposed We are considering this for inclusion in the roadmap. label Apr 3, 2019
@bcelenza
Copy link
Contributor

@pda You are correct for ACM certificates that are publicly verifiable: the private key cannot be retrieved. For private certificates (from ACM PCA), the private key can be retrieved on behalf of the customer through a secure channel. Private certificates would be useful for service-to-service communication (and mTLS) within a VPC or other private network.

I'm currently researching a number of scenarios we'd like to support, both public and private, and will follow-up here once I have a good handle on what we're proposing.

@pda
Copy link

pda commented Apr 12, 2019

Thanks for the clarification @bcelenza — I look forward to hearing more on this front 👍

@bcelenza bcelenza self-assigned this Apr 12, 2019
@tom-schultz
Copy link

Any updates here?

@bcelenza
Copy link
Contributor

bcelenza commented Apr 26, 2019

@tom-schultz I'm currently talking with the ACM team and working on a design proposal for this feature. I'll have an update here soon.

In the meantime, I'm looking for more input from anyone willing to take the time.

Here are some questions I have. Feel free to answer any that pertain to you. And of course, if I've missed something you feel needs mention, I'd be happy to hear that as well.

A big thanks in advance for anyone who takes the time to provide additional input here.

Questions

  1. Would your service mesh use a single private certificate authority, or multiple? If multiple, what use cases govern that need?
  2. How frequently would you want to renew certificates for your service mesh?
  3. Would you want App Mesh to automatically issue and/or renew certificates for your service mesh?

I'm also curious, for any customers who use ACM PCA today, do you always use ACM-validated domains, or do you use the issue-certificate and import-certificate APIs for certain things?

@alvarow
Copy link

alvarow commented May 6, 2019

@bcelenza

  1. I don't have a preference in that regard. My preference would be to not manage a CA! I'd happily use either Amazon Public CA, or a private CA on my account.

  2. I am not particularly picky in this one, but the security team in my company prefers 1 year certs, no longer than that. Certificate Manager fits nicely.

  3. Most definitely!

For the curiosity question, I mostly use ACM validated domains, but I have one scenario where I need an internal domain hosted on our intranet, which I use the internal PKI to issue it (DC applications making calls to my VPC hosted app).

This feature right here is what prevents me from using App Mesh. I need E2E encryption and I am making do with NLB and Vault issued certs, I would love to drop Vault.

@tom-schultz I'm currently talking with the ACM team and working on a design proposal for this feature. I'll have an update here soon.

In the meantime, I'm looking for more input from anyone willing to take the time.

Here are some questions I have. Feel free to answer any that pertain to you. And of course, if I've missed something you feel needs mention, I'd be happy to hear that as well.

A big thanks in advance for anyone who takes the time to provide additional input here.

Questions

  1. Would your service mesh use a single private certificate authority, or multiple? If multiple, what use cases govern that need?
  2. How frequently would you want to renew certificates for your service mesh?
  3. Would you want App Mesh to automatically issue and/or renew certificates for your service mesh?

I'm also curious, for any customers who use ACM PCA today, do you always use ACM-validated domains, or do you use the issue-certificate and import-certificate APIs for certain things?

@arnuschky
Copy link

Questions

  1. Would your service mesh use a single private certificate authority, or multiple? If multiple, what use cases govern that need?

A single per mesh I guess. We have multiple products, each using their own mesh and CA. But these are separate by AWS accounts so fine for us to have a 1:1 map between PCA and mesh.

  1. How frequently would you want to renew certificates for your service mesh?

No special requirements, at least following common security guidelines (1-2 years). However, we prefer automatic issuance at which point you can rotate much more regularly.

  1. Would you want App Mesh to automatically issue and/or renew certificates for your service mesh?

Ideally yes. We'd prefer it to interface with ACM; similar to CloudFront distributions (create new / reuse existing).

Apart from that we'd love to have authentication too, if possible.

@ntwaddell
Copy link

It would be cool if this could integrate with the AWS Private CA possibly.

@bcelenza
Copy link
Contributor

bcelenza commented Jun 10, 2019

App Mesh will soon be adding support for enabling TLS between services in the mesh. This first pass will allow you to provide a certificate directly from AWS Certificate Manager (ACM) and enable TLS for a given VirtualNode listener. VirtualNodes that act as downstream clients of a TLS-enabled VirtualNode will automatically receive the appropriate validation context to validate the certificate you provide.

With this change, you will be able to use the following options to secure traffic between services:

  1. A certificate issued by your Private Certificate Authority for which ACM manages the private key and certificate renewal (see Request a Private Certificate).
  2. A certificate that has been imported to ACM.

Please note that at this time you cannot use a public certificate provided by ACM.

To enable TLS with a private or imported certificate, we're proposing the following API settings on the VirtualNode listener.

$ aws appmesh create-virtual-node --mesh-name my-mesh \
    --virtual-node-name my-node \
    --spec
{
    "listeners": [
        {
            // Existing port mapping settings.
            "portMapping": {
                "port": 443,
                "protocol": "http"
            },
            // Optional settings for TLS configuration on this listener. When not
            // specified, TLS is disabled.
            "tls": {
                // (REQUIRED) Determines how TLS will be configured on the appropriate listener.
                // Allowed modes:
                // * STRICT: Listener only accepts connections with TLS enabled.
                // * PERMISSIVE: Listener accepts connections with or without TLS enabled.
                // * DISABLED: Listener only accepts connections without TLS.
                "mode": "STRICT",
                // (REQUIRED) Certificate settings for this listener.
                "certificate": {
                    "acm": {
                        // (REQUIRED) The ARN of the certificate to bind to this listener.
                        "certificateArn": "arn:aws:acm:region:123456789012:certificate/12345678-1234-1234-1234-123456789012"
                    }
                }
            }
    ]
}

These changes will enable TLS between services with the use of a server certificate. Please note that client certificates for mTLS are covered in separate roadmap items (#34, #68).

Let us know if these changes fit your service traffic encryption use cases, and if not, what else you'd like to see.

@ntwaddell
Copy link

That would be perfect @bcelenza

@bcelenza
Copy link
Contributor

bcelenza commented Jul 3, 2019

Heads up! For customers who will be using the ACM integration with App Mesh, you will need to update the IAM policy associated with the Envoy Proxy connecting to App Mesh's Envoy Management Service. See #80 for details.

@bcelenza
Copy link
Contributor

bcelenza commented Aug 6, 2019

Hey all, this is ready for trial in our preview environment. Check out the walkthrough to get started using TLS w/ ACM in App Mesh, and the docs for more info. Let us know what you think!

@joshuabaird
Copy link

joshuabaird commented Aug 7, 2019

@bcelenza I noticed that it's recommended that new virtual nodes are created with TLS enabled. Can the TLS configuration not be applied to existing virtual nodes? If not, can you shed some light on why new virtual nodes are required? Hopeful that this will be possible once this feature goes GA.

The implementation of App Mesh has been rather time consuming for us, due to the AWS requirements that certain pieces of infrastructure need to be completely re-created (and not updated) such as enabling service discovery on existing ECS services, changing the type of target-group to "ip" (needed for awsvpc enablement), etc.

@bcelenza
Copy link
Contributor

Client policy support for ACM certificates is enabled in preview, and the walk through has been updated to incorporate the new functionality: https://github.com/aws/aws-app-mesh-examples/tree/master/walkthroughs/tls-with-acm

Looking forward to any feedback on client policies and TLS enforcement! We'll follow-up when we have a closer estimate for when all of this will graduate to GA.

@eddgrant
Copy link

eddgrant commented Jan 29, 2020

Hi folks,

We’re currently evaluating the end to end encryption feature in the preview channel, but have hit somewhat of a roadblock, wanted to share our scenario to ascertain if we’re just doing something wrong or if our use case isn’t actually supported by this feature.

Background

We are migrating approx 900 microservices to ECS and want to use AppMesh. The microservices all have a non-negotiable requirement to address each other using TLS endpoints: If microservice-x wants to call microservice-y it will do so by addressing microservice-y at a TLS endpoint. The microservice containers themselves don’t terminate TLS, they each currently run a sidecar in their ECS task which does the TLS termination and forwards on the plain HTTP connection to the microservice container, within the ECS task instance.

e.g.

microservice-x --> (https:microservice-y.example.com:443) -->  tls-terminating-sidecar
tls-terminating-sidecar --> http://microservice-y:8080 --> microservice-y

This is (hopefully) demonstrated by the following image:

tls-offload-traffic

Our aim was to replace our TLS terminating sidecar with Envoy (managed by AppMesh), having envoy terminate TLS instead. However In our PoC we have discovered that, whilst traffic between the Enjoy sidecars is encrypted, the Envoy instances which capture outbound (egress) requests (in this case the call from microservice-x to microservice-y) do not present a TLS endpoint. Instead they require the microservice containers to make plain HTTP calls. This is (hopefully) demonstrated in the image below:

envoy-traffic

Questions

Are we doing it wrong? 😄 Is there a way to configure the Envoy instances which capture outbound (egress) microservice calls to present a TLS certificate?

Really grateful for any insight here, whether it be that we're simply doing it wrong, or suggestions for approaches which might help us achieve what we're after.

Also, as a final note, I realise that our desired approach doesn't really add much value, compared to letting the microservice instances talk to Envoy over plain HTTP, from a security perspective, however our hands are tied in this regard.

Many thanks,

Edd

@bcelenza
Copy link
Contributor

bcelenza commented Jan 29, 2020

@eddgrant Hey Edd, thanks for taking the time to write up your scenario, and for the diagrams -- always super helpful to get the context and visuals.

You are right that, currently, we expect communication from the application to the local Envoy Proxy to occur via plaintext, and then Envoy to originate TLS to its upstream dependencies (where TLS is of course again terminated at the Envoy).

It's certainly possible that we could build in support to have the egress listener also terminate TLS, which would become particularly important if the proxy is moved off-host. But to date we've had no formal plans to implement this.

A few clarifying questions so I can better understand the need and scope out what this would look like:

  1. Would the Egress listener for Envoy use a different certificate than the ingress listener? Do you have any specific requirements for this certificate? One thing that comes to mind is SNI -- the certificate would essentially need a SAN for every backend.
  2. Presumably, the application would validate the egress listener's certificate against a CA -- is this the same CA that's issuing the certificates for the ingress listener?
  3. Can you share any details on the intent behind the security requirement? Is this designed to protect traffic in the event the host/task/pod has been compromised, or to ensure the application is making requests to an intended endpoint (the proxy), or something else?

@eddgrant
Copy link

Hey @bcelenza ,

Thanks for getting back to me and for clarifying the position r.e. TLS.

I'll try and answer your questions below:

  1. I think the only requirement we would have here is that the Envoy egress listener was able to present a certificate which was valid for the service being accessed. In my example case the egress Envoy listener in service-x's task would need to present a certificate which was presented it as service-y. However thinking about this I can see that this could get complicated when a service can call multiple other services as the Envoy egress listener would need to be able to present a certificate which matched any of the potential service calls that service-x might make, which in practice might mean frequent generation of a certificate with lots of SANs on it?
  2. Yep, in our case we're validating certificates against a CA, so in this case we'd validate the egress listener's certificate against the CA. I can't currently think of any reasons why it would be a problem for us if the same CA were used for both the ingress and egress listeners.
  3. Yeah sure, a few things come to mind here:
    a. We don't operate the services that we're migrating, they are written and operated by internal service teams for whom we provide an internal platform for. Sadly the config for these services is very disparate and we don't have any central levers we can pull to switch their endpoint configuration to use HTTP. The reality is we'd have to ask each team to change their config and re-deploy each service etc. This in itself is not an insurmountable issue, but so far we've been trying to make the migration happen invisibly to them, so having to involve each service team would drastically change the profile and extend the time it took to migrate the services to ECS.
    b. We need to migrate all services with zero downtime, our migration strategy has us "dual-running" the services over a period of time while they get migrated in batches to ECS. The legacy system requires them to address each other over TLS and so we'd break connectivity with the legacy system if we were to re-configure each service to speak in plain HTTP as we migrated it to ECS. (Having said this I'm going to put some more thought in to this as I don't think I've thought about it thoroughly enough yet)
    c. Finally, we've done quite a lot of work in getting end-to-end TLS working in our current systems. I think it would be possible to argue that the security profile of the current AppMesh TLS model is extremely similar to what we've got, however (please don't laugh 😆 ), I can also imagine it's going to be a hard sell for me to try to explain this to various architects/ security folk within the organisation as there's a common perception that any plain HTTP calls will undermine all other security. I'm sure you've heard it all before! That's very much a local cultural problem though and I'm sure it can be overcome!

Hope that helps (and makes sense), don't hesitate to ask further questions if useful.

Edd

@bcelenza
Copy link
Contributor

bcelenza commented Jan 29, 2020

However thinking about this I can see that this could get complicated when a service can call multiple other services as the Envoy egress listener would need to be able to present a certificate which matched any of the potential service calls that service-x might make, which in practice might mean frequent generation of a certificate with lots of SANs on it?

Yeah, either one certificate with all possible service names, or individual certificates per service name. I'm not certain if there's any benefit to either approach in this particular case -- for example, if an attacker on the host can compromise any individual cert, they can likely compromise them all.

We don't operate the services that we're migrating, they are written and operated by internal service teams for whom we provide an internal platform for. Sadly the config for these services is very disparate and we don't have any central levers we can pull to switch their endpoint configuration to use HTTP.

This is a good point. I'd like to dig into Envoy a bit more here and see what's possible. There may be other steps we can take to simplify this workflow.

Finally, we've done quite a lot of work in getting end-to-end TLS working in our current systems. I think it would be possible to argue that the security profile of the current AppMesh TLS model is extremely similar to what we've got, however (please don't laugh ), I can also imagine it's going to be a hard sell for me to try to explain this to various architects/ security folk within the organisation as there's a common perception that any plain HTTP calls will undermine all other security. I'm sure you've heard it all before! That's very much a local cultural problem though and I'm sure it can be overcome!

Definitely have heard this sort of challenge before.

I think it's best we move this discussion to a new issue -- the question of where certificates can come from on the egress listener is a good one, and ACM might be an option here, but would like to keep the overall discussion of encryption between application and proxy separate. Filed as #162.

@bcelenza
Copy link
Contributor

bcelenza commented Mar 2, 2020

TLS with ACM managed private certificates is now generally available in the App Mesh APIs, SDKs, and CloudFormation for all regions that App Mesh operates in. Check out the latest user guide for more information.

Please note that at this time the App Mesh console experience has not been updated. Additionally, support in the Kubernetes controller for App Mesh is pending merge and release (see this PR for the latest).

We'll be holding this issue open until everything is closed out, after which a more formal announcement will be made.

A huge thanks to all who have provided feedback to us through the design and preview period for the feature.

@bcelenza
Copy link
Contributor

Kubernetes controller changes for TLS are now available in release v0.4.0. See the release notes for the full list of changes and improvements we've made in this release.

@bcelenza
Copy link
Contributor

The AWS console now also supports TLS in App Mesh.

You can read more about the release in this blog post.

We're closing this issue, but please check out these additional roadmap items for TLS support:

  1. Allowing the application to optionally originate TLS instead of the proxy: Feature Request: TLS negotiation between the downstream application and upstream service #162
  2. Support for revocation lists from ACM PCA: Feature Request: Allow the use of ACM PCA CRLs when using TLS #172

@bcelenza bcelenza added Roadmap: Shipped and removed Phase: Coming Soon Roadmap: Accepted We are planning on doing this work. labels Mar 13, 2020
@harinjy
Copy link

harinjy commented Jun 11, 2021

@bcelenza In @eddgrant 's example with Appmesh , should microservice-x be using https url to call microservice-y, even when client policies and backends have SSL as STRICT? Will the envoy sidecar enforce TLS if the microservice-x uses http url?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests