Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy crash while using TCP socket tapping of upstream cluster #13608

Closed
jphx opened this issue Oct 16, 2020 · 0 comments · Fixed by #13638
Closed

Envoy crash while using TCP socket tapping of upstream cluster #13608

jphx opened this issue Oct 16, 2020 · 0 comments · Fixed by #13638
Assignees
Labels
area/tap bug no stalebot Disables stalebot from closing an issue
Milestone

Comments

@jphx
Copy link
Contributor

jphx commented Oct 16, 2020

Since this problem involves a crash, I originally reported it to the [email protected] mailing list. Since the traffic tapping capability is for debugging, however, I was asked to report it here.

I'm trying out Envoy's TCP socket traffic tapping capability. With a fairly simple configuration in an upstream cluster definition, Envoy crashes when I submit an HTTP request to the cluster. Here's the upstream cluster's tapping configuration:

"transport_socket": {
   "name": "envoy.transport_sockets.tap",
   "typed_config": {
     "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tap.v3.Tap",
     "common_config": {
       "static_config": {
         "match_config": {
           "any_match": true
         },
         "output_config": {
           "sinks": [
             {
               "file_per_tap": {
                 "path_prefix": "/tmp/mtu.hserver-multi-tenant-upstream.cerberus"
               }
             }
           ],
           "streaming": true
         }
       }
     },
     "transport_socket": {
       "name": "envoy.transport_sockets.tls"
     }
   }
 },

If I remove streaming: true, it works as expected, but with streaming: true, it crashes whenever an HTTP request is submitted. I activated Envoy's trace-level debugging just before submitting the request. I'm attaching the part of the log starting at the first message at trace level (file crash-log.txt). Unfortunately, the log includes some processing from a Kubernetes readiness probe that happened to be going on at the same time, but you can ignore that. The request that triggered the crash is the GET to /http-serving/this/is/a/test. The stack trace is not particularly helpful, though, due to this problem. I've reproduced this crash with Envoy versions 1.15.0, 1.15.2, and 1.16.0.

To get a better stack trace of the failure, I built a debug version of Envoy 1.15.2 and put that executable into my docker image. With that executable, a better stack trace is produced by Envoy when the failure occurs. I've attached that stack trace (file stack-trace.txt). I also managed to capture a core file from that failure, and with some effort I was able to get gdb to produce a stack trace as well, including the values of local variables, in case that helps (file gdb-backtrace-with-locals.txt). I've attached that file too.

debug-data.tar.gz

@jphx jphx added bug triage Issue requires triage labels Oct 16, 2020
@mattklein123 mattklein123 added area/tap no stalebot Disables stalebot from closing an issue and removed triage Issue requires triage labels Oct 16, 2020
@mattklein123 mattklein123 self-assigned this Oct 16, 2020
@mattklein123 mattklein123 added this to the 1.17.0 milestone Oct 16, 2020
mattklein123 added a commit that referenced this issue Oct 19, 2020
mattklein123 added a commit that referenced this issue Oct 20, 2020
lizan pushed a commit to envoyproxy/data-plane-api that referenced this issue Oct 20, 2020
Fixes envoyproxy/envoy#13608

Signed-off-by: Matt Klein <[email protected]>

Mirrored from https://github.com/envoyproxy/envoy @ b0cf5f6a272e4f79f2287a22364b65a791f8981e
rexengineering pushed a commit to rexengineering/istio-envoy that referenced this issue Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tap bug no stalebot Disables stalebot from closing an issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants