-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it correct to continue to maintain the connection pool when a connection encounters an error? #116
Comments
similar with this open issue #105 think should just close the client or distinguish errors encountered in handlePackage() in a more granular manner to choose whether to reconnect. |
Getty 中 session 代表一个网络连接,client 其实是一个网络连接池,维护一定数量的连接 session,这个数量当然是用户设定的。Getty client 初始版本【2018 年以前的版本】中,每个 client 单独启动一个 goroutine 轮询检测其连接池中 session 数量,如果没有达到用户设定的连接数量就向 server 发起新连接。 当 client 与 server 连接断开时,server 可能是被下线了,可能是意外退出,也有可能是假死。如果上层用户判定对端 server 确实不存在【如收到注册中心发来的 server 下线通知】后,调用 client.Close() 接口把连接池关闭掉。如果上层用户没有调用这个接口把连接池关闭掉,client 就认为对端地址还有效,就会不断尝试发起重连,维护连接池。 综上,从一个旧 session 关闭到创建一个新 session,getty client 初始版本的重连处理流程是: 1 旧 session 关闭网络接收 goroutine; 2 旧 session 网络发送 goroutine 探测到 网络接收 goroutine 退出后终止网络发送,进行资源回收后设定当前 session 无效; 3 client 的轮询 goroutine 检测到无效 session 后把它从 session 连接池删除; 4 client 的轮询 goroutine 检测到有效 session 数目少于 getty 上层使用者设定的数目 且 getty 上层使用者没有通过 client.Close() 接口关闭连接池时,就调用连接接口发起新连接。 上面这种通过定时轮询方式不断查验 client 中 session pool 中每个 session 有效性的方式,可称之为主动连接。主动连接的缺点显然是每个 client 都需要单独启用一个 goroutine。当然,其进一步优化手段之一是可以启动一个全局的 goroutine,定时轮询检测所有 client 的 session pool,不必每个 client 单独启动一个 goroutine。但是个人从 2016 年开始一直在思考一个问题:能否换一种 session pool 维护方式,去掉定时轮询机制,完全不使用任何的 goroutine 维护每个 client 的 session pool? 2018 年 5 月个人在一次午饭后遛弯时,把 getty client 的重连逻辑又重新梳理了一遍,突然想到了另一种方法,在步骤 2 中完全可以对 网络发送 goroutine 进行 “废物利用”,在这个 goroutine 标记当前 session 无效的逻辑步骤之后再加上一个逻辑: 1 如果当前 session 的维护者是一个 client【因为 session 的使用者也可能是 server】; 2 且如果其当前 session pool 的 session 数量少于上层使用者设定的 session number; 3 且如果上层使用者还没有通过 client.Close() 设定当前 session pool 无效【即当前 session pool 有效,或者说是对端 server 有效】 4 满足上面三个条件,网络发送 goroutine 执行连接重连即可; 5 新网络连接 session 建立成功且被加入 client 的 session pool 后,网络发送 goroutine 使命完成直接退出。 我把这种重连方式称之为 lazy reconnect,网络发送 goroutine 在其生命周期的最后阶段应该被称之为 网络重连 goroutine。通过 lazy reconnect这种方式,上述重连步骤 3 和 步骤 4 的逻辑被合入了步骤 2,client 当然也就没必要再启动一个额外的 goroutine 通过定时轮询的方式维护其 session pool 了。 lazy reconnect 整体流程图如上。如果对相关代码流程感兴趣,请移步 "参考 13" 给出的链接,很容易自行分析出来。 以上内容来自 Go 语言网络库 getty 的那些事 第三章。 你先理解下,如果觉得机制不合理,我们可以在这个 issue 里面继续聊。 |
Getty's session in the initial version (prior to 2018) represents a network connection, and the client is actually a network connection pool that maintains a certain number of session connections, which is determined by the user. In the initial version of Getty client, each client starts a separate goroutine to periodically check the session pool's connection count. If the count does not reach the user-defined connection quantity, a new connection request is made to the server. When the client's connection to the server is disconnected, it could be due to the server being offline, unexpectedly terminated, or stuck. If the upper-layer user confirms that the server does not exist (e.g., receiving a server offline notification from the registry), they can call the client.Close() interface to close the connection pool. If the upper-layer user does not call this interface to close the connection pool, the client assumes that the remote address is still valid and continues to attempt reconnection to maintain the connection pool. In summary, the process of closing an old session and creating a new session in the initial version of Getty client's reconnection handling is as follows: The network receiving goroutine of the old session is closed. In May 2018, the author came up with another method during a walk after lunch. In step 2, the network sending goroutine can be "reused" by adding an additional logic after marking the current session as invalid: If the current session is maintained by a client (as the session user could also be a server). The overall flowchart of lazy reconnect is as described above. If you are interested in the related code flow, please refer to the link provided in "Reference 13," where you can easily analyze it yourself. The above content is from Chapter 3 of "Getty: The Story of Go Network Library." Please take some time to understand it, and if you find any flaws in the mechanism, we can continue discussing them in this issue. |
你好,感谢回复。在一个session失效之前去触发重连的逻辑,这种设计没有任何问题,可以节约goroutine。但是,如果连接存在问题,会持续的去重连,我的理解是这种行为是不合理的,会给客户端以及服务端都造成压力。这里对于无效session的区分是不是可以更细一点呢,当连接出现问题的时候就不再去重连。 |
当然可以,你能否构思下,哪些 error 可以让我们明确不再执行 reConnect? |
上面回复中我表达的连接出现“错误“不太准确,这里对于传输层来说,很难去区分上层应用的错误。我想表达的是,当服务端主动断开连接这种情况(服务端发起FIN),是不是就代表着客户端不要再去重连维护这个连接池了呢 |
What happened:
When the dubbo provider reports a connection error, such as exceeding the upper limit of provider accepts, reConnect() will cause the client to continuously reconnect, causing the client's CPU and memory to continue to increase.
Here is the call chain :
What you expected to happen:
I think that when there is a problem with the TCP connection, the practice of maintaining the connection pool should be temporarily stopped
How to reproduce it (as minimally and precisely as possible):
In dubbo settings, set the getty-session-param's connection-number less than the provider's accepts will cause this problem.
Anything else we need to know?:
The text was updated successfully, but these errors were encountered: