-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Set number of http and memcache connections to be equal #687
Conversation
- Use MaxIdleCons parameter in memcache library. Ensures that this many connections are available to use. - Set the number of http connections and memcache connections to be equal in the warmer. This is required because the number http fetches cannot exceed the number of memcache connections. If they did, it would return a connection error to memcache. - Create a separate memecache client for warming and for a user reading images from the registry. Fixes #675
4221195
to
387332c
Compare
We mitigated the problem we were having ourselves by giving memcache operations a generous timeout. The rationale is that we care less about doing these quickly than we care about succeeding at operations. If memcached is under load, it's better to treat that as backpressure, and slow down, than to repeatedly fail (though quickly). However: there is merit in this PR I think, since we can still try to make things run quickly when they can. It seems reasonable to me that letting the client keep a larger number of idle connections around, as the only user of the memcached in question, is an improvement. I'm less convinced that we understand how the fetching and storing perform (i.e., the argument that the request burst number and the max idle connections should be the same), or that this is the end of the story. So I think some more digging is warranted here. |
There are a few improvements in this PR.
So, I think the issues you were talking about are orthogonal to this PR. The issues you are talking about refer to the total timeout for getting a set of manifests for a single image. Currently we're at a 10s timeout. If it takes longer than that, all the other requests will get cancelled. Also see #713. |
I thought at first
Did we get evidence that that was the specific problem we're having? I remember seeing both connection timeouts and I/O timeouts in logs (more so the latter, when stressing it locally).
There's also the automator checking for new versions, though that's only once every minute (unless people decide they need it to be more often). I think it's a reasonable change to separate the memcache clients.
I don't think things are quite as absolute as you've described here: there are always memcached connections available, since the client will just create them. But I suppose we could expect less churn of connections if we usually have as many connections idle as there are operations to do.
Nope, I was definitely talking about memcache operations timing out. |
The evidence was that without the
The client will create them, but it creates them too slowly. Imagine you make 1000 HTTP requests. This causes 1000 new memcache connections. Memcache is very slow to create new connections. See documentation here: https://github.com/memcached/memcached/wiki/Performance#max-clients And there are other complaints around the internet. Their recommendation is:
This is what maxidlecons does.
My misunderstanding. The current default is 1s. This can be altered at the command line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some requests for renaming /redescribing things in light of what MaxIdleConns
does. (Please also rewrite the commit message)
cmd/fluxd/main.go
Outdated
defaultMemcacheConnections = 10 | ||
// The number of connections chosen for memcache and remote GETs must match. | ||
// If they do not match then the remote either creates too many memcache connections | ||
// or not enough to be efficient. |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
cmd/fluxd/main.go
Outdated
@@ -78,11 +79,11 @@ func main() { | |||
memcachedHostname = fs.String("memcached-hostname", "", "Hostname for memcached service to use when caching chunks. If empty, no memcached will be used.") | |||
memcachedTimeout = fs.Duration("memcached-timeout", time.Second, "Maximum time to wait before giving up on memcached requests.") | |||
memcachedService = fs.String("memcached-service", "memcached", "SRV service used to discover memcache servers.") | |||
memcachedConnections = fs.Int("memcached-connections", defaultMemcacheConnections, "maximum number of connections to memcache") | |||
memcachedConnections = fs.Int("memcached-connections", defaultMemcacheConnections, "maximum number of registry connections to memcache") |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
cmd/fluxd/main.go
Outdated
Host: *memcachedHostname, | ||
Service: *memcachedService, | ||
Timeout: *memcachedTimeout, | ||
UpdateInterval: 1 * time.Minute, | ||
Logger: log.NewContext(logger).With("component", "memcached"), | ||
MaxConnections: *memcachedConnections, |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
cmd/fluxd/main.go
Outdated
Timeout: *memcachedTimeout, | ||
UpdateInterval: 1 * time.Minute, | ||
Logger: log.NewContext(logger).With("component", "memcached"), | ||
MaxConnections: *registryBurst, |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
Sure, but we're not doing 1000 HTTP requests, we're globally rate limiting them to about a hundred requests at a time. I do take your point about connection churn -- having more idle connections around is going to help with that. I just don't want to oversell it. Or rather, I don't want to be claiming that the memcache client was being limited to a number of connections -- that's not the case, and therefore not something that's being alleviated here. |
Sure thing. But the point stands. We were getting timeout errors because memcache couldn't create connections quickly enough. I'll fix the review comments. |
- Remove `memcached-connections` parameter - Reworded comments - Renamed variables
Fixes #675