-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large number of failures in ab benchmark tests #383
Comments
In addition to making Tempesta restore connections to back end servers faster, there are back end server configuration options that help to keep the connections active for longer periods of time. While it's not recommended to set these options to very high values for traditional HTTP server operation, in our case it's justified. NginxThere's no way to specify unlimited number of requests. There's a recommendation from one of Apache |
Use a smaller initial retry interval for faster reconnects. (#383)
Actually we need to try next server if a scheduled server is dead, see Nginx's proxy_next_upstream_tries. However, we shouldn't retry all the servers since if we have a request which crashes an upstream server, then the whole backend server farm fail one by one. |
There are also must be one more new configuration option - length limit and timeout for the message queue. To avoid bufferbloat problem we have to evict too old requests from the queue head with sending 504 error response to the client as well as send error response to the client if the queue is full and all the requests aren't timed out. |
Note that Nginx provide |
When running Tempesta under benchmark tests such as Apache's
ab
utility, the result is a very large number of failures. All of those failures are non-2xx responses. Tempesta generates error responses on internal errors, but it this case the error in question is404
that is generated when a back end server is not available.The issue closely correlates with how fast Tempesta restores connections to back end servers when those connections are closed. Current timeouts for re-establishing the connections with back end servers are too long to work well under high load. A different reconnect timeout algorithm is needed that would allow multiple reconnect attempts in a short time frame, and only after that would gradually increase the delay between attempts.
The text was updated successfully, but these errors were encountered: