Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large number of failures in ab benchmark tests #383

Closed
keshonok opened this issue Dec 28, 2015 · 5 comments
Closed

Large number of failures in ab benchmark tests #383

keshonok opened this issue Dec 28, 2015 · 5 comments

Comments

@keshonok
Copy link
Contributor

When running Tempesta under benchmark tests such as Apache's ab utility, the result is a very large number of failures. All of those failures are non-2xx responses. Tempesta generates error responses on internal errors, but it this case the error in question is 404 that is generated when a back end server is not available.

The issue closely correlates with how fast Tempesta restores connections to back end servers when those connections are closed. Current timeouts for re-establishing the connections with back end servers are too long to work well under high load. A different reconnect timeout algorithm is needed that would allow multiple reconnect attempts in a short time frame, and only after that would gradually increase the delay between attempts.

@krizhanovsky krizhanovsky added this to the 0.5.0 Web Server milestone Dec 28, 2015
@keshonok
Copy link
Contributor Author

In addition to making Tempesta restore connections to back end servers faster, there are back end server configuration options that help to keep the connections active for longer periods of time. While it's not recommended to set these options to very high values for traditional HTTP server operation, in our case it's justified.

Nginx

There's no way to specify unlimited number of requests. There's a recommendation from one of Nginx developers.

Apache

keshonok added a commit that referenced this issue Jan 25, 2016
Use a smaller initial retry interval for faster reconnects. (#383)
@krizhanovsky
Copy link
Contributor

Actually we need to try next server if a scheduled server is dead, see Nginx's proxy_next_upstream_tries. However, we shouldn't retry all the servers since if we have a request which crashes an upstream server, then the whole backend server farm fail one by one.

@krizhanovsky
Copy link
Contributor

krizhanovsky commented Sep 27, 2016

There are also must be one more new configuration option - length limit and timeout for the message queue. To avoid bufferbloat problem we have to evict too old requests from the queue head with sending 504 error response to the client as well as send error response to the client if the queue is full and all the requests aren't timed out.

@krizhanovsky
Copy link
Contributor

Note that Nginx provide non_idempotent option for proxy_next_upstream and we should do the same.

@keshonok
Copy link
Contributor Author

keshonok commented Mar 3, 2017

Implemented in #660 (merge commit c40924b).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants