Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v20240828非常容易io closed #967

Open
frank-pv opened this issue Aug 30, 2024 · 26 comments
Open

v20240828非常容易io closed #967

frank-pv opened this issue Aug 30, 2024 · 26 comments

Comments

@frank-pv
Copy link

frank-pv commented Aug 30, 2024

问题描述:一直使用作者的kcptun用于加速流量,对此非常感谢作者对开源的贡献。但因为水平有限只能将问题反馈出来

1、升级最新版本后,服务端经常出现io close,旧版本不会,具体表现为两次iperf测速后(跑满宽带),立马会出现无法发起任何连接(断流)
2、配置使用作者提供的配置#923
3、怀疑为7月27日的一次提交导致的,并且问题一直存在,链接:4193bb6

  1. 检查 -key xxx至少三遍,—— 检查一致
  2. 保证-nocomp, -datashard, -parityshard, -key, -crypt, -smuxver, -QPP -QPPCount两边一致。—— 检查一致
  3. 是否在服务器端,正确设定了转发的目标服务器地址 --target。—— 检查通过
  4. 是否在客户端,正确的连接到了 client的监听端口。—— 检查通过
  5. 如果第3条不确定,尝试在服务器上telnet target port试试。
  6. 防火墙是否关闭了UDP通信,或者设置了UDP的最大发包速率?——直连
  7. 两端的版本是否一致?—— 检查一致
  8. **是不是最新版本?——
  9. 两端分别是什么操作系统?——Ubuntu——centos,Rocky——openwrt
  10. 两端的输出日志是什么?
    Server 端
    2024/08/30 11:36:06 remote address: 192.168.198.235:52815 2024/08/30 11:36:06 smux version: 2 on connection: [::]:39832 -> 192.168.198.235:52815 2024/08/30 11:36:20 remote address: 192.168.198.235:54825 2024/08/30 11:36:20 smux version: 2 on connection: [::]:39842 -> 192.168.198.235:54825 2024/08/30 11:36:20 stream opened in: 192.168.198.235:54825(3) out: 127.0.0.1:3389 2024/08/30 11:36:20 stream opened in: 192.168.198.235:54825(5) out: 127.0.0.1:3389 2024/08/30 11:36:20 stream opened in: 192.168.198.235:54825(7) out: 127.0.0.1:3389 2024/08/30 11:36:23 stream opened in: 192.168.198.235:54825(9) out: 127.0.0.1:3389 2024/08/30 11:36:23 stream opened in: 192.168.198.235:54825(11) out: 127.0.0.1:3389 2024/08/30 11:36:23 stream opened in: 192.168.198.235:54825(13) out: 127.0.0.1:3389 2024/08/30 11:36:26 stream opened in: 192.168.198.235:54825(15) out: 127.0.0.1:3389 2024/08/30 11:36:26 stream opened in: 192.168.198.235:54825(17) out: 127.0.0.1:3389 2024/08/30 11:36:26 stream opened in: 192.168.198.235:54825(19) out: 127.0.0.1:3389 2024/08/30 11:36:29 stream opened in: 192.168.198.235:54825(21) out: 127.0.0.1:3389 2024/08/30 11:36:36 io: read/write on closed pipe 2024/08/30 11:36:53 stream opened in: 192.168.198.235:54825(23) out: 127.0.0.1:3389 2024/08/30 11:36:53 stream opened in: 192.168.198.235:54825(25) out: 127.0.0.1:3389

Client
2024/08/30 11:36:06 remote address: 192.168.198.235:52815 2024/08/30 11:36:06 smux version: 2 on connection: [::]:39832 -> 192.168.198.235:52815 2024/08/30 11:36:20 remote address: 192.168.198.235:54825 2024/08/30 11:36:20 smux version: 2 on connection: [::]:39842 -> 192.168.198.235:54825 2024/08/30 11:36:20 stream opened in: 192.168.198.235:54825(3) out: 127.0.0.1:3389 2024/08/30 11:36:20 stream opened in: 192.168.198.235:54825(5) out: 127.0.0.1:3389 2024/08/30 11:36:20 stream opened in: 192.168.198.235:54825(7) out: 127.0.0.1:3389 2024/08/30 11:36:23 stream opened in: 192.168.198.235:54825(9) out: 127.0.0.1:3389 2024/08/30 11:36:23 stream opened in: 192.168.198.235:54825(11) out: 127.0.0.1:3389 2024/08/30 11:36:23 stream opened in: 192.168.198.235:54825(13) out: 127.0.0.1:3389 2024/08/30 11:36:26 stream opened in: 192.168.198.235:54825(15) out: 127.0.0.1:3389 2024/08/30 11:36:26 stream opened in: 192.168.198.235:54825(17) out: 127.0.0.1:3389 2024/08/30 11:36:26 stream opened in: 192.168.198.235:54825(19) out: 127.0.0.1:3389 2024/08/30 11:36:29 stream opened in: 192.168.198.235:54825(21) out: 127.0.0.1:3389 2024/08/30 11:36:36 io: read/write on closed pipe 2024/08/30 11:36:53 stream opened in: 192.168.198.235:54825(23) out: 127.0.0.1:3389 2024/08/30 11:36:53 stream opened in: 192.168.198.235:54825(25) out: 127.0.0.1:3389

附上配置:
server:
{ "smuxver": 2, "listen": "[::]:39810-39900", "target": "127.0.0.1:3389", "key": "123456789", "crypt": "aes", "mode": "fast", "mtu": 1400, "sndwnd": 2048, "rcvwnd": 2048, "datashard": 10, "parityshard": 0, "dscp": 46, "nocomp": true, "acknodelay": false, "nodelay": 1, "interval": 20, "resend": 2, "nc": 1, "sockbuf": 16777217, "smuxbuf": 16777217, "streambuf":4194304, "keepalive": 10, "pprof":false, "quiet":false, "tcp":false, "log": "/tmp/kcptun.log" }

客户端:
{ "smuxver": 2, "localaddr": "127.0.0.1:60002", "remoteaddr": "192.168.199.7:39810-39900", "key": "123456789", "crypt": "aes", "mode": "fast", "mtu": 1400, "sndwnd": 256, "rcvwnd": 2048, "datashard": 10, "parityshard": 0, "dscp": 46, "nocomp": true, "acknodelay": false, "nodelay": 1, "interval": 20, "resend": 2, "nc": 1, "conn": 1, "sockbuf": 16777217, "smuxbuf": 16777217, "streambuf":4194304, "keepalive": 10, "autoexpire":600, "quiet":true, "tcp":false, "log": "/tmp/kcptun.log" }

具体表现截图:
image

@xtaci
Copy link
Owner

xtaci commented Aug 30, 2024

数据能完整传输么?

@xtaci
Copy link
Owner

xtaci commented Aug 30, 2024

比如实际传送搞一个大文件,有没有问题?

@xtaci
Copy link
Owner

xtaci commented Aug 30, 2024

是不是大约30s后出现

@frank-pv
Copy link
Author

frank-pv commented Aug 30, 2024

是不是大约30s后出现

1、数据传输是完整的(不排除隧道内TCP完整性校验的结果)
2、问题不是出现在30S后,只要2-3次流量突刺就会触发bug,并且此时无法再发起任何TCP请求
3、通过套SS,在隧道内下载文件是完整的

@xtaci
Copy link
Owner

xtaci commented Aug 30, 2024

图片

用什么参数可以重现,我这没法重现

@frank-pv
Copy link
Author

图片

用什么参数可以重现,我这没法重现

服务端:
[root@kvm-199-7 kcp]# uname -a Linux kvm-199-7 5.14.0-427.22.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Jun 19 17:35:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux [root@kvm-199-7 kcp]# cat kcp.conf { "smuxver": 2, "listen": "[::]:39810-39900", "target": "127.0.0.1:3389", "key": "123456789", "crypt": "aes", "mode": "fast", "mtu": 1400, "sndwnd": 2048, "rcvwnd": 2048, "datashard": 10, "parityshard": 0, "dscp": 46, "nocomp": true, "acknodelay": false, "nodelay": 1, "interval": 20, "resend": 2, "nc": 1, "sockbuf": 16777217, "smuxbuf": 16777217, "streambuf":4194304, "keepalive": 10, "pprof":false, "quiet":false, "tcp":false, "log": "/tmp/kcptun.log" }
客户端:
Linux ubuntu-virtual-machine 6.5.0-41-generic #41~22.04.2-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 3 11:32:55 UTC 2 x86_64 x86_64 x86_64 GNU/Linux root@ubuntu-virtual-machine:/home/ubuntu/kcp# cat kcp.conf { "smuxver": 2, "localaddr": "127.0.0.1:60002", "remoteaddr": "192.168.199.7:39810-39900", "key": "123456789", "crypt": "aes", "mode": "fast", "mtu": 1400, "sndwnd": 256, "rcvwnd": 2048, "datashard": 10, "parityshard": 0, "dscp": 46, "nocomp": true, "acknodelay": false, "nodelay": 1, "interval": 20, "resend": 2, "nc": 1, "conn": 1, "sockbuf": 16777217, "smuxbuf": 16777217, "streambuf":4194304, "keepalive": 10, "autoexpire":600, "quiet":true, "tcp":false, "log": "/tmp/kcptun.log" }

附上测试视频:
https://github.com/user-attachments/assets/9a515513-7e67-4fe3-897d-af8ba8e2e73f

@frank-pv
Copy link
Author

config.zip
这个是配置文件

@xtaci
Copy link
Owner

xtaci commented Aug 30, 2024

图片

@frank-pv
Copy link
Author

图片

确实不太科学,同一台机器确实没问题,不同机器的话问题又会复现出来

@xtaci
Copy link
Owner

xtaci commented Aug 30, 2024

因为某种内部防火墙,被RST了么

@frank-pv
Copy link
Author

因为某种内部防火墙,被RST了么

可以排除这个原因,因为是直联的

@xtaci
Copy link
Owner

xtaci commented Aug 30, 2024

这不好判断了,可以考虑换成其他的工具来测试,不一定非要iperf3

@xtaci
Copy link
Owner

xtaci commented Aug 30, 2024

image

我在client: WSL ubuntu, server: freebsd上 也没有出现断流的问题。

@xtaci
Copy link
Owner

xtaci commented Aug 30, 2024

我再研究下

@frank-pv
Copy link
Author

这不好判断了,可以考虑换成其他的工具来测试,不一定非要iperf3

嗯嗯,这个问题困扰我很久,您有时间再看看,目前我换回上一个版本了
https://github.com/user-attachments/assets/994cc98d-6234-4c73-9051-70cc349984c8

@xtaci
Copy link
Owner

xtaci commented Aug 31, 2024

你可以尝试下我最新的提交,自己编译下

@xtaci
Copy link
Owner

xtaci commented Aug 31, 2024

这个问题大概是这样的:

  1. Ctrl+C中断后,留在streambuf中的待发送数据,还在排队发送中,对端也还在持续echo
  2. 因此第二次再次链接的时候,表现在卡住了,因为上一个链接的数据包,还堆积在发送队列中(因为这次改了closeWait等待30秒才发起关闭,主要是处理有一些链接会 HALF_CLOSE的问题),导致了SMUX的streamSYN并没有及时发送处理,卡在开启链接。

@xtaci
Copy link
Owner

xtaci commented Aug 31, 2024

你可以尝试如下:(在最新的版本下,只改kcptun中的std/copy.go)

  1. 修改closeWait的时间,改为0,1
  2. 启动参数中,增大smuxbuffer的大小。防止堆积

@xtaci
Copy link
Owner

xtaci commented Aug 31, 2024

因此,我接下来要做的尝试如下:

区分server端的copy和client端的copy。

client端的copy是需要立即关闭的,不等待。

@xtaci
Copy link
Owner

xtaci commented Aug 31, 2024

142ac6b

@xtaci
Copy link
Owner

xtaci commented Aug 31, 2024

怎么说呢,这个问题可以环节,但因为是复用在一条链路上,因此一定存在 对头阻塞问题,就是Ctrl+C下中断不了已经堆积在kcp发送队列的数据。

https://zh.wikipedia.org/wiki/%E9%98%9F%E5%A4%B4%E9%98%BB%E5%A1%9E

@frank-pv
Copy link
Author

你可以尝试如下:(在最新的版本下,只改kcptun中的std/copy.go)

  1. 修改closeWait的时间,改为0,1
  2. 启动参数中,增大smuxbuffer的大小。防止堆积

经验证,使用最新提交,closeWait设置为0,可以避免这个问题

image

@frank-pv
Copy link
Author

因此,我接下来要做的尝试如下:

区分server端的copy和client端的copy。

client端的copy是需要立即关闭的,不等待。

测试了一下,堆积应该是发生在server端,而不是client端

@xtaci
Copy link
Owner

xtaci commented Aug 31, 2024

对的,就是这个问题,ctrl+c后,服务端在kcp 队列中的,cancel不掉,那么你可以降低per streambuffer,来缓解这个问题。

@xtaci
Copy link
Owner

xtaci commented Aug 31, 2024

主要是有些奇葩的服务器,会用TCP HALF CLOSE,只关闭发送,不关闭接收,这种情况下,还得预留一个时间来收(30s) 。当然,对于你们的特定应用,你是可以关闭closeWait的,或者这个作为参数,写入启动config。我考虑下。

@xtaci
Copy link
Owner

xtaci commented Aug 31, 2024

3ec90cd
@frank-pv 参数化了
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants