-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP/1.1 outbound external call loses response -- response never received by sending container #3605
Comments
So, I tried to replicate this. The following YAML looks like it is working (with a wait for 4 minutes). Anything special your application is doing with the response? apiVersion: v1
kind: Namespace
metadata:
name: test
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: curl
namespace: test
spec:
replicas: 1
selector:
matchLabels:
app: curl
template:
metadata:
annotations:
linkerd.io/inject: enabled
labels:
app: curl
spec:
containers:
- image: alpine
imagePullPolicy: Always
name: alpine
command: ["/bin/sh", "-c"]
args:
- |-
apk add curl && while true; do curl -vv sink:3000 || true; done
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: sink
namespace: test
spec:
replicas: 1
selector:
matchLabels:
app: sink
template:
metadata:
labels:
app: sink
annotations:
linkerd.io/inject: enabled
spec:
containers:
- image: buoyantio/bb:latest
imagePullPolicy: Always
name: sink
command: ["/bin/sh", "-c"]
args:
- |-
/out/bb terminus \
--sleep-in-millis 240000 \
--h1-server-port 3000 \
--response-text BANANA
---
apiVersion: v1
kind: Service
metadata:
name: sink
namespace: test
spec:
selector:
app: sink
clusterIP: None
ports:
- name: test
port: 3000
targetPort: 3000 |
@grampelberg I don't think the issue is solely due to the length of time it takes to send the response, but also because of the size of the response. My "sink" is sending back tens of megabytes of data, your sink is sending back 6 bytes. |
It would be helpful if you could define that a little bit more so that we can replicate it =) A packet capture of the response would be insanely valuable. |
@grampelberg Do you mean a packet capture from the debug container? Can you help me define the capture expression? Ideally we'd see what is going into the proxy i.e. the response from the remote service, as well as what is going out of the proxy i.e. the response back into the container, which appeared to be "gone". |
I'd just do |
It looks like this was related to the behavior of my HTTP client, along with the default setting for My HTTP client was doing a half-close of the TCP connection when it was finished transmitting the request. While this is legal according to TCP specifications, it is somewhat unusual and generally frowned upon for HTTP clients to do (see httpwg/http-core#22 and ktorio/ktor#1456). The problem was that the iptables rules installed by linkerd use nf_conntrack, which "forgets" about the connection state after the close_wait timeout, and so the client program never receives the response. This is why the problem was still visible even if the outbound port was skipped by the proxy -- the mere existence of the iptables rules installed by linkerd was sufficient to cause this. The solution is to either configure the client to turn off its TCP half-close behavior, or increase the value of Thanks so much to @grampelberg for some marathon debugging / analysis of this issue with me. |
Bug Report
What is the issue?
I have a long-running http/1.1 call outbound from a linkerd2-proxy injected container. Even when the external service (running at 10.254.2.5:7071) has sent its response, the response is never received by my container. Smaller and shorter request/responses work fine, but this longer bigger request does not. The debug logs when the remote service is finished processing are below. From these logs, I noticed this line which may be causing the response to be "lost":
Note also that setting config
config.linkerd.io/skip-outbound-ports=7071
does not solve the problem -- the response is still never received.The only solution I've found is to remove the linkerd injection from this container, and then everything works fine.
How can it be reproduced?
I don't have a full reproduction, however in my environment I can easily reproduce it by sending a particular request to this external service with linkerd2 enabled. The service takes about 2 minutes to run and then responds, but the response is never received by the calling container.
Logs, error output, etc
linkerd check
outputEnvironment
The text was updated successfully, but these errors were encountered: