Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP/1.1 outbound external call loses response -- response never received by sending container #3605

Closed
rocketraman opened this issue Oct 19, 2019 · 6 comments

Comments

@rocketraman
Copy link
Contributor

Bug Report

What is the issue?

I have a long-running http/1.1 call outbound from a linkerd2-proxy injected container. Even when the external service (running at 10.254.2.5:7071) has sent its response, the response is never received by my container. Smaller and shorter request/responses work fine, but this longer bigger request does not. The debug logs when the remote service is finished processing are below. From these logs, I noticed this line which may be causing the response to be "lost":

x-5dcfcf7b-85hmn linkerd-proxy DBUG [   385.538607s] hyper::proto::h1::dispatch error writing: Broken pipe (os error 32)    

Note also that setting config config.linkerd.io/skip-outbound-ports=7071 does not solve the problem -- the response is still never received.

The only solution I've found is to remove the linkerd injection from this container, and then everything works fine.

How can it be reproduced?

I don't have a full reproduction, however in my environment I can easily reproduce it by sending a particular request to this external service with linkerd2 enabled. The service takes about 2 minutes to run and then responds, but the response is never received by the calling container.

Logs, error output, etc

x-5dcfcf7b-85hmn linkerd-proxy DBUG [   385.537719s] proxy={client=out dst=10.254.2.5:7071 proto=Http1 { keep_alive: true, wants_h1_upgrade: false, was_absolute_form: false }} hyper::proto::h1::io read 5624 bytes    
x-5dcfcf7b-85hmn linkerd-proxy DBUG [   385.537778s] proxy={client=out dst=10.254.2.5:7071 proto=Http1 { keep_alive: true, wants_h1_upgrade: false, was_absolute_form: false }} hyper::proto::h1::io parsed 9 headers    
x-5dcfcf7b-85hmn linkerd-proxy DBUG [   385.537799s] proxy={client=out dst=10.254.2.5:7071 proto=Http1 { keep_alive: true, wants_h1_upgrade: false, was_absolute_form: false }} hyper::proto::h1::conn incoming body is content-length (2986572 bytes)    
x-5dcfcf7b-85hmn linkerd-proxy DBUG [   385.538056s] hyper::proto::h1::io flushed 5592 bytes    
x-5dcfcf7b-85hmn linkerd-proxy DBUG [   385.538560s] proxy={client=out dst=10.254.2.5:7071 proto=Http1 { keep_alive: true, wants_h1_upgrade: false, was_absolute_form: false }} hyper::proto::h1::io read 8192 bytes    
x-5dcfcf7b-85hmn linkerd-proxy DBUG [   385.538607s] hyper::proto::h1::dispatch error writing: Broken pipe (os error 32)    
x-5dcfcf7b-85hmn linkerd-proxy DBUG [   385.538619s] linkerd2_proxy::proxy::server http1 server error: Error(BodyWrite, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" })

linkerd check output

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API

linkerd-api
-----------
√ control plane pods are ready
√ control plane self-check
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus
√ no invalid service profiles

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

control-plane-version
---------------------
√ control plane is up-to-date
√ control plane and cli versions match

Status check results are √

Environment

  • Kubernetes Version: 1.12.7
  • Cluster Environment: AKS (via AKS-engine)
  • Host OS: Ubuntu 16.04.6 LTS
  • Linkerd version:
$ linkerd version
Client version: stable-2.6.0
Server version: stable-2.6.0
@grampelberg
Copy link
Contributor

So, I tried to replicate this. The following YAML looks like it is working (with a wait for 4 minutes). Anything special your application is doing with the response?

apiVersion: v1
kind: Namespace
metadata:
  name: test
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: curl
  namespace: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: curl
  template:
    metadata:
      annotations:
        linkerd.io/inject: enabled
      labels:
        app: curl
    spec:
      containers:
      - image: alpine
        imagePullPolicy: Always
        name: alpine

        command: ["/bin/sh", "-c"]
        args:
          - |-
            apk add curl && while true; do curl -vv sink:3000 || true; done
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sink
  namespace: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sink
  template:
    metadata:
      labels:
        app: sink
      annotations:
        linkerd.io/inject: enabled
    spec:
      containers:
      - image: buoyantio/bb:latest
        imagePullPolicy: Always
        name: sink

        command: ["/bin/sh", "-c"]
        args:
          - |-
            /out/bb terminus \
              --sleep-in-millis 240000 \
              --h1-server-port 3000 \
              --response-text BANANA
---
apiVersion: v1
kind: Service
metadata:
  name: sink
  namespace: test
spec:
  selector:
    app: sink
  clusterIP: None
  ports:
  - name: test
    port: 3000
    targetPort: 3000

@rocketraman
Copy link
Contributor Author

rocketraman commented Oct 25, 2019

@grampelberg I don't think the issue is solely due to the length of time it takes to send the response, but also because of the size of the response. My "sink" is sending back tens of megabytes of data, your sink is sending back 6 bytes.

@grampelberg
Copy link
Contributor

It would be helpful if you could define that a little bit more so that we can replicate it =) A packet capture of the response would be insanely valuable.

@rocketraman
Copy link
Contributor Author

@grampelberg Do you mean a packet capture from the debug container? Can you help me define the capture expression? Ideally we'd see what is going into the proxy i.e. the response from the remote service, as well as what is going out of the proxy i.e. the response back into the container, which appeared to be "gone".

@grampelberg
Copy link
Contributor

I'd just do tcpdump -i any -s 65535 -w out.pcap, that'll capture everything and we can slice and dice.

@rocketraman
Copy link
Contributor Author

rocketraman commented Dec 20, 2019

It looks like this was related to the behavior of my HTTP client, along with the default setting for net.netfilter.nf_conntrack_tcp_timeout_close_wait in my Kubernetes pods, which was only 60 (seconds).

My HTTP client was doing a half-close of the TCP connection when it was finished transmitting the request. While this is legal according to TCP specifications, it is somewhat unusual and generally frowned upon for HTTP clients to do (see httpwg/http-core#22 and ktorio/ktor#1456).

The problem was that the iptables rules installed by linkerd use nf_conntrack, which "forgets" about the connection state after the close_wait timeout, and so the client program never receives the response. This is why the problem was still visible even if the outbound port was skipped by the proxy -- the mere existence of the iptables rules installed by linkerd was sufficient to cause this.

The solution is to either configure the client to turn off its TCP half-close behavior, or increase the value of net.netfilter.nf_conntrack_tcp_timeout_close_wait inside the pod (see https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/).

Thanks so much to @grampelberg for some marathon debugging / analysis of this issue with me.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants