DNS resolution of memcached service fails #1591

dilshad18 · 2018-12-07T13:05:26Z

Following error is happening in flux installation inside minikube instance:

component=memcached err="error updating memcache servers: lookup 172-17-0-3.flux-memcached.flux.svc.cluster.local. on 10.96.0.10:53: no such host"

This is a rather fresh installation and only few files have been applied. We update a single file and it tries to update that and it fails,

The text was updated successfully, but these errors were encountered:

squaremo · 2018-12-11T11:57:32Z

It looks like it either can't reach the DNS server, or doesn't get a result back. The hostname there is a bit odd -- should it include the hyphenated IP address like that? I would expect just flux-memcached.flux.svc.cluster.local.

johnraz · 2018-12-13T16:28:32Z

For what is worth I experience the exact same behavior on minikube and flux deployed with the helm chart.

kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.7", GitCommit:"0c38c362511b20a098d7cd855f1314dad92c2780", GitTreeState:"clean", BuildDate:"2018-08-20T09:56:31Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Flux components versions:

quay.io/weaveworks/flux:1.8.1
quay.io/weaveworks/helm-operator:0.5.1
memcached:1.4.25

Every pods are green and running.

The logs:

ts=2018-12-13T16:16:24.981315793Z caller=memcached.go:153 component=memcached err="error updating memcache servers: lookup 3735393134393938.flux-memcached.flux.svc.cluster.local. on 10.96.0.10:53: no such host"
ts=2018-12-13T16:17:24.981598994Z caller=memcached.go:153 component=memcached err="error updating memcache servers: lookup 172-17-0-16.flux-memcached.flux.svc.cluster.local. on 10.96.0.10:53: no such host"

I'll try to read a bit more about how the service discovery works in memcache and see if I can come up with an explanation...

johnraz · 2018-12-13T16:45:29Z

I can reach the memcache container from the fluxd container by hitting the hostname passed to fluxd:

ps aux gives (truncated by me):

fluxd --ssh-keygen-dir=/var/fluxd/keygen --k8s-secret-name=flux-git-deploy --memcached-hostname=flux-memcached ...

Telnet session from the fluxd container gives (again truncated by me):

/home/flux # telnet flux-memcached 11211
stats
STAT pid 1
STAT uptime 63144
STAT time 1544719316
...

johnraz · 2018-12-13T17:27:23Z

Digging some more shows that the cache seems to be used:

From a memcache telnet session, dumping an item gives:

stats cachedump 32 0
ITEM registryrepov3|quay.io/weaveworks/helm-operator [98187 b; 1545931042 s]
END

So I would say the memcache server list is properly provisioned with the valid hostname and some "ghost" hostnames are trying to get in and are rejected because they can't resolve...

It most likely fails here:
https://github.com/weaveworks/flux/blob/113e1280a27a4cec80465d1d0d0c69b696839f80/registry/cache/memcached/memcached.go#L164

Or there:
https://github.com/bradfitz/gomemcache/blob/1952afaa557dc08e8e0d89eafab110fb501c1a2b/memcache/selector.go#L59-L90

How they get there is a mystery to me so far...

@squaremo do you have any clue how those weird records could get there?

Should we add a note to the FAQ to let people know that this is "ok" and doesn't break the cache?

@dilshad18 could you check in your own setup if you do have something in memcache (if you are not used to memcache, I followed this blog post and it helped me get used to it)

squaremo · 2018-12-13T18:17:11Z

So I would say the memcache server list is properly provisioned with the valid hostname and some "ghost" hostnames are trying to get in and are rejected because they can't resolve...

Yes, that sounds like a good diagnosis. Filling in some details, after looking in the Kubernetes docs re service discovery:

The way it's set up in the example deployment (and chart, and Weave Cloud config, ...) is that memcached has a headless service. According to https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#services (also see https://github.com/kubernetes/dns/blob/master/docs/specification.md), in DNS there will be:

An A record for memcached.namespace.svc.cluster.local with the IP of each ready pod;
An A record for .memcached.namespace.svc.cluster.local for each pod, where is a generated name;
A SRV record which has the auto-generated hostname, and the port, for each pod in the service

With the arguments given in the example deployment of fluxd, it'll query for the SRV records, then the memcache client code (linked above) will query the IP of each host mentioned. So where it's failing is in that second bit -- it can't resolve some or all of the hosts it got from the SRV records.

But the question remains: how did those unresolveable endpoints get there in the first place? That I don't know :-(

squaremo · 2018-12-13T18:19:39Z

BTW it is entirely fine to give the memcached service a clusterIP (i.e., don't set it to None), and not supply the memcached-service argument -- this will make it just resolve the service address, rather than go through SRV records etc.

squaremo changed the title ~~Getting following error~~ DNS resolution of memcached service fails Dec 11, 2018

squaremo added the question label Dec 11, 2018

stefanprodan mentioned this issue Dec 27, 2018

Use ClusterIP service name for connecting to memcached #1618

Merged

stefanprodan closed this as completed in #1618 Dec 27, 2018

bboreham mentioned this issue Mar 4, 2019

Not re-querying DNS #1788

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNS resolution of memcached service fails #1591

DNS resolution of memcached service fails #1591

dilshad18 commented Dec 7, 2018

squaremo commented Dec 11, 2018

johnraz commented Dec 13, 2018

johnraz commented Dec 13, 2018

johnraz commented Dec 13, 2018 •

edited

Loading

squaremo commented Dec 13, 2018

squaremo commented Dec 13, 2018

DNS resolution of memcached service fails #1591

DNS resolution of memcached service fails #1591

Comments

dilshad18 commented Dec 7, 2018

squaremo commented Dec 11, 2018

johnraz commented Dec 13, 2018

johnraz commented Dec 13, 2018

johnraz commented Dec 13, 2018 • edited Loading

squaremo commented Dec 13, 2018

squaremo commented Dec 13, 2018

johnraz commented Dec 13, 2018 •

edited

Loading