-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify external service egress traffic setup #2
Comments
Additional use case here is egress traffic to a service within a k8s/ECS cluster or local network, but that is not part of the mesh (i.e. modeled as a VirtualNode). Egress policies should be flexible enough to support modeling this. |
With the App Mesh GA release, we introduced a change to the Mesh resource that will allow you to enable egress traffic for the entire mesh. You can read about this in the Egress Filter API document. However, we feel there are more improvements we'd like to make here, so I'm leaving this issue open and will move it back to the researching phase in our roadmap. |
We have a setup with an VPC hosting an Aurora cluster, an Elasticache Redis and an EKS cluster. We have created an AppMesh within the EKS cluster. From the AppMesh, our services can contact the Redis and public internet services without issues (the mesh was created with Accessing Aurora from a regular pod in EKS works fine as well. However, accessing Aurora from within the AppMesh within EKS results in MySQL error 2013. Using the mysql CLI from the same pod results in 2013 as well:
Basically, the connections to Aurora stops at the service's sidecar. Any suggestions how to access an Aurora cluster from an AppMesh from an EKS cluster, all in the same VPC? |
@dfw6000 Thanks for reporting. I can confirm this is a bug with how App Mesh is configuring the egress listener for Envoy, and it's specific to the MySQL protocol. I've posted this as a separate bug with details here: #62 I'm currently investigating fixes and workarounds for this issue, and will report back on that issue once I have more news. |
We have an App Mesh consisting of ECS Fargate services running in a private subnet. Egress filter type is set to What would be the recommended way of egressing from an App Mesh ECS service in a private subnet to Cognito in another region while controlling where it can egress to through an allow list? Even though |
Hey @treynash,
One thing worth mentioning up front: the current functionality of Regarding regionality: the current proxy configuration does not restrict regionality for the Regarding Cognito: If I'm reading between the lines of your question, what you're actually looking for is something like an |
Although
I didn't think about modeling Cognito endpoints with Virtual Nodes. If I setup a Virtual Node and Virtual Service to model |
I'm encountering some challenges egressing from App Mesh in a private subnet. I have the NAT setup properly because if I set the egress policy to When I set the policy back to I cranked up the logs in envoy up to
Is it complaining because DNS lookup resulted in an IPv6 address? Is there a way to correct this? |
Hey @treynash, responding to your most recent comment first. You are correct that the connection error is due to the IPv6 address returned from DNS resolution. Envoy's default behavior for DNS resolution is to favor IPv6. There's currently no way to change this through the App Mesh APIs, but we do have a feature request for supporting it (#121). Do you actually have a need for connecting to an endpoint which is resulting in IPv6 DNS query responses, or was the google.com call just a test? I'd be curious if your actual destination would be OK due to only getting a v4 query response. |
Okay, now some specific items from your previous comment.
I think this is worth consideration, yes. I'll bring this up with our team and follow-up.
Your approach makes sense, though we'd of course love to support the tight traffic control using the sort of rule sets we're discussing (which is why this issue is still open). I'm going to write up a quick proposal internally for the ALLOW_SPECIFIED functionality, since this seems generally useful to folks. Regarding the security of the traffic via the proxy: the iptables rules we use to redirect traffic from the application to the Envoy Proxy ignore UID 1337 (the UID that Envoy is running as). Any user-space application that's able to run as UID 1337 will bypass the Envoy automatically as a result, so we generally do not consider App Mesh and Envoy as a strong security boundary for egress traffic.
You'll specifically need to model all possible subdomains as distinct Virtual Services and Virtual Nodes, which may not be ideal depending on how many there are. I think us either allowing Cognito domains implicitly, or implementing domain-based rule sets are better options for the mesh-level configuration -- but hopefully Virtual Service and Virtual Node definition gets you unblocked for now. One area you may hit a rough edge on is TLS origination. If you can configure the Cognito calls from your application not to negotiate TLS, you can have the proxy negotiate it for you (using a Client Policy on the associated Virtual Node backend to the Cognito virtual service). This will allow the proxy to see the destination and route appropriately. If the TLS session is negotiated by the application, the traffic will be encrypted to the proxy, and at the moment it will not be able to route appropriately (see #162 for more info on that). |
Thanks @bcelenza!
Sounds good! Looking forward to such a mechanism to make this easier.
Thanks for the clarification. Indeed, we're taking a defense in depth approach here and this is just one of the cogs in the wheel.
We don't have a lot, and this approach would work out just fine except that for every external domain I have tried, I continue to get bitten by the IPv6 issue. In the meantime, does it sound reasonable as a workaround to fork the envoy repo and create a build that defaults to IPv4? We're already copying the AWS envoy proxy from the official ECR repo into our own ECR repo during deployment of our service.
Thank you for pointing this out. I did come across #162 while investigating this one. If we were to get Virtual Nodes for external services to work, that is without being affected by the IPv6 issue, then we'd most definitely need to use this technique when communicating with the Cognito endpoints. Currently, it appears we'll have to go with the squid proxy approach that I described initially until these wrinkles get worked out and/or the feature you are proposing is implemented. |
No problem, @treynash!
While I think this would be possible, I don't think it would be trivial to do, so reasonable depends on your level of comfort using a custom build of Envoy. We've just recently reclassified #121 as a bug and circled it back around through our priority queue, so I think we'll be looking into defaulting to IPv4 (or allowing you to default via the API) soon.
Following back up on this one as well -- I've opened #236 to track whether implicitly allowing Cognito destinations is something more folks are interested in. Please give us your 👍 on that issue if you get a chance. |
Following back up on this, we have an idea of what we'd like to do to simplify cases where many (if not all) endpoints in the mesh require the ability to send traffic to external destinations. Our current proposal is to add a new type of egress filter called Here's an example of a Mesh configuration (in CloudFormation YAML format) demonstrating the use of this policy for both specific and wildcard domains:
We believe this provides the right balance of flexibility and control for the mesh owner to provide broad access to common destinations from endpoints within the mesh. It also gives us some flexibility long term to introduce alternate behaviors, such as proxying these destinations through a gateway for additional control. One caveat we're aware of: for destinations which use only TCP without TLS, we cannot determine their destination and route traffic appropriately. This is because TCP without TLS only provides the proxy with the source and destination IPs and ports (for which we cannot match against a domain name at this time). However, we've seen that most customers are using TLS when connecting to external destinations. We'd love to hear your feedback on this proposal, and whether you have any other use cases which are not covered directly by it. We're aware there is also a need for allowing individual external destinations at the Virtual Node level, which we are tracking in a separate issue (#241). |
This is really a massive blocker as it goes against the principles of a service mesh. We please need this feature. First issue as mentioned above is we should not do TLS from an application when we have to reach external services. Secondly: A external backend where the virtual route/node is of tcp, doesn't support retries. Only http routes support retries which is not supported by App Mesh for TLS external services. So not only can we not use App Mesh for TLS for external services, but we can also not use its retries. Thirdly: We have a lot of external services unfortunately. I work for a bank, we have a lot of services that's not on App Mesh. If we have to do retries and TLS from within our app, our use for app mesh gets less and less, we can just as well not use App Mesh as we are already having to do it in code. Please can we get traction on this issue, as this is a big disadvantage. |
My experience with app mesh has been miserable. I've spent a good number of days and hours trying to configure it for a typical cluster scenario. It has so many pain points and gotchas that are just not well documented or straightforward. Why is it incredibly hard to do simple things like external https calls or calls to other aws services? That is the basics and 90% of services are going to do this. I just don't understand how this can be overlooked for over 3 years. It uses envoy, yet you can't tap into the power of envoy because it is all obscured by app mesh which only gives a small window of functionality. |
As discussed in #74, the current way to model an external service that a service within the mesh can route to is by modeling the external service as a VirtualNode. For example, if you had two services named Service-A and Service-B, and Service-B was an external service (e.g. gitlab) hosted at the DNS name
gitlab.my-intranet.com
. If you wanted the VirtualNode representing Service-A to be able to egress traffic to Service-B, you would model your mesh configuration as:Service-A:
Service-B:
The VirtualNode model contains many specifications which would not normally apply to an external service not within the control of the mesh (such as backends), while others still do (such as health checks).
This issue is to track the investigation of a general simplification of modeling external entities within the mesh.
The text was updated successfully, but these errors were encountered: