-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the possible cause of "Connection reset by peer" when watching ListNamespacedPodWithHttpMessagesAsync #773
Comments
see #533 |
The strange thing is it's the first time I encounter such exceptions and we work like that half a year already. And another strange thing is it stopped throwing exceptions after some time. If you look into the code you will see Orleans handles exception in a loop so I guess now it stabilized and works as expected... |
in the case #533, we noticed that it caused by LB kicked idle connections. retrying maybe the solution if it is caused by network layer. |
@tg123 |
http2 is default if your server supports it (https). |
I found the workaround #533 (comment) has been merged, and there is property KubernetesClientConfiguration.TcpKeepAlive to control it, but it only works for .net 5, I added PR #777 to support .net 6, please take a look. |
There is no good way to change tcp keepalive in container level, seems it is Linux kernel level setting, see https://stackoverflow.com/questions/69302681/setting-tcp-keepalive-on-a-container. I have to keep the application level monitor logic, reset watch after N minutes no new data. I have to do this because I am running workload in AKS with basic load balancer. BLB will silently drop the connection for idle connection(SLB will sent RST to client and server, aka the Connection reset by peer), the watch hang forever after connection dropped. |
I am wondering does SocketsHttpHandler.KeepAlivePingDelay in http2 can send keepalive correctly without the kernel setting change. It doesn't mention in https://docs.microsoft.com/en-us/dotnet/api/system.net.http.socketshttphandler.keepalivepingdelay?view=net-6.0. |
Is it ok to update code in https://github.com/kubernetes-client/csharp/blob/master/src/KubernetesClient/Kubernetes.ConfigInit.cs#L195 to following now?
|
For the KeepAlivePingDelay, the 3mins is a bit long, the idle time of AKS basic load balancer is 4mins, I am afraid it is not enough in that case. Can you recommend the value of KeepAlivePingDelay? |
let test 3 mins first. |
PR submitted, please take a look. |
The 7.0.13 fixed Connection reset by peer in my test, @AntonPetrov83 you can take a try. The .net version should be 5.0+. |
Thanks for PR and verification |
Hi!
A have an AKS (Azure Kubernetes Service) and recently when I deployed my app I started receiving an exception:
Later it stopped responding like that. Is it a transient error? Or should I investigate?
P.S. My app is based on Microsoft.Orleans and there is a
Orleans.Hosting.Kubernetes
extension which usesListNamespacedPodWithHttpMessagesAsync
API.The text was updated successfully, but these errors were encountered: