Add cluster.peers refresh #383

povilasv · 2018-06-21T13:28:13Z

Changes

Adds a job which runs periodically and refreshes peer.state (requeries DNS server to get peer ip addresses)

The issue is that if at startup DNS has no records, "islands" of 1 thanos members form and DNS never gets refreshed.

An example case where it happens is when you run thanos in Kubernetes with liveness probes configured and restart all thanos pods at once. So on startup no IP's will be added to DNS due to livenessProbe not yet being ready and once membership.Join() happens it will never refresh dns.

Verification

This is log showing that refresh actually works before=2 after=4, which added thanos store gateway and cluster formed. In this case I explicitly set refresh to happen after 5mins to check that actually gossip doesn't heal itself and only after refresh happened it did.

level=info ts=2018-06-21T13:10:11.071198311Z caller=query.go:243 component=query msg="Listening for StoreAPI gRPC" address=0.0.0.0:10901
level=debug ts=2018-06-21T13:10:11.073078374Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH7EWVH2R31D7GYC6D1VM4Y addr=10.2.1.30:10900
level=debug ts=2018-06-21T13:10:11.087558248Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH715BMW9QB4H0GSZ51D6EA addr=10.2.2.26:10900
level=debug ts=2018-06-21T13:10:11.087692504Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH7C6135494HE709G5S2ZS8 addr=10.2.6.59:10900
level=debug ts=2018-06-21T13:10:11.087728965Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH7EFGG7WP4MB5QXDGSD35H addr=10.2.1.29:10900
level=debug ts=2018-06-21T13:10:11.087767742Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH7BYMCDXTKM552H0WVH2YN addr=10.2.6.58:10900
level=debug ts=2018-06-21T13:10:11.087828991Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH71H60AMEMHJG1CFGF9RGF addr=10.2.2.27:10900
level=debug ts=2018-06-21T13:10:11.087857058Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH714RX1Z06QVCZF9R2ZEKT addr=10.2.1.27:10900
level=debug ts=2018-06-21T13:10:11.096549794Z caller=cluster.go:194 component=cluster msg="joined cluster" peerType=query
level=debug ts=2018-06-21T13:10:11.932515848Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGH7C6135494HE709G5S2ZS8 addr=10.2.6.59:10900
level=debug ts=2018-06-21T13:10:11.932747085Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGH7EFGG7WP4MB5QXDGSD35H addr=10.2.1.29:10900
level=debug ts=2018-06-21T13:10:11.932794649Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH7EXA0FJYDYFPP1F60AQNV addr=10.2.2.29:10900
level=debug ts=2018-06-21T13:10:11.932863761Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGH714RX1Z06QVCZF9R2ZEKT addr=10.2.1.27:10900
level=debug ts=2018-06-21T13:10:11.932891322Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGH7BYMCDXTKM552H0WVH2YN addr=10.2.6.58:10900
level=debug ts=2018-06-21T13:10:11.932923465Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGH71H60AMEMHJG1CFGF9RGF addr=10.2.2.27:10900
level=debug ts=2018-06-21T13:10:12.074395071Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGH715BMW9QB4H0GSZ51D6EA addr=10.2.2.26:10900
level=debug ts=2018-06-21T13:15:11.128426912Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH7H1VPWW2NVXWY9PBYGZPX addr=10.2.2.31:10900
level=debug ts=2018-06-21T13:15:11.129666993Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH7EZFRWTC4TG2NQBWVVHY1 addr=10.2.6.61:10900
level=debug ts=2018-06-21T13:15:11.129704941Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH7J1WFWKBV7PQ7NBEHAR8D addr=10.2.2.32:10900
level=debug ts=2018-06-21T13:15:11.132168008Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH7FAH9JGANEMS9DW6TTN91 addr=10.2.1.32:10900
level=debug ts=2018-06-21T13:15:11.132245256Z caller=cluster.go:249 component=cluster msg="refreshed cluster" before=2 after=4
level=info ts=2018-06-21T13:15:16.068759342Z caller=storeset.go:223 component=storeset msg="adding new store to query storeset" address=10.2.1.32:10901
level=info ts=2018-06-21T13:15:16.0687967Z caller=storeset.go:223 component=storeset msg="adding new store to query storeset" address=10.2.2.32:10901
level=info ts=2018-06-21T13:15:16.068810515Z caller=storeset.go:223 component=storeset msg="adding new store to query storeset" address=10.2.2.31:10901
level=info ts=2018-06-21T13:15:16.068820291Z caller=storeset.go:223 component=storeset msg="adding new store to query storeset" address=10.2.6.61:10901

domgreen · 2018-06-21T14:03:55Z

How does this impact query? I think query is currently periodically updating its own members based on the Source and Store labels ... does this mean there will be two places updating the known members for a node?
My 2pc would be look at consolidating the approach so that we can reuse between all nodes that need to know the member list has been updated.
@Bplotka thoughts?

povilasv · 2018-06-21T14:23:14Z

@domgreen it does (https://github.com/improbable-eng/thanos/blob/master/cmd/thanos/query.go#L183) , which looks for members already connected via gossip and tries to do a grpc connection to those nodes. (atleast as I understand that code)

So this means that once dns refresh happens new members join via gossip, then later on thanos query store refresh happens and it does grpc connection to the newly added thanos store api components.
Which would mean that it all starts to work :)

bwplotka

This change makes sense, just I do not immediately know if that is safe to rejoin explicitly multiple times for memberlist library. Need to double check - or maybe you know the answer?

Another way of improving this is set to true (or put to flag) "waitIfEmpty" for clusters.New for other components that store alone. (: What do you think?

bwplotka · 2018-06-21T22:29:09Z

pkg/cluster/cluster.go

+	}
+
+	prev := p.mlist.NumMembers()
+	curr, err := p.mlist.Join(p.knownPeers)


It is not trivial here -> basically you join member that already joined. I need to double check the consequences, but I am kind of worried that:

peer is not closed until you create new one (join)

even if we met above it will meet a little bit distruption.

have you looked closely what is the beaviour of this for memberlist?

@Bplotka I've looked thru code and it won't close the peer. But you are right about the disruption, as it calls pushPullNode (https://github.com/hashicorp/memberlist/blob/master/memberlist.go#L225), which is rather expensive operation, as per docs:

pushPull is invoked periodically to randomly perform a complete state exchange. Used to ensure a high level of convergence, but is also reasonably expensive as the entire state of this node is exchanged with the other node.

I'll change that we just join new members, just by resolving dns and looping thru current members

bwplotka · 2018-06-21T22:41:57Z

@domgreen So the issue is possible here: We join gossip cluster based on knownPeers that is based on some DNS domain. Let's say all components start simultaneously. At the moment of startup query resolved N components. But we could have N+10 components in the system, and they are running just Kubernetes did not catch that yet so it is not yet resolvable. On the other hand the non resolvable component cannot see query yet because it not yet resolvable as well (will be in some time). It's tricky thing :/

povilasv · 2018-06-22T10:12:33Z

@Bplotka here is an example log after these changes applied.
I've setup refresh interval to run in 3 mins so it startups and appears to be alone and after refresh gossip forms and it start to behave normally. And on next refresh run no membership.Join has been called:

➜   k logs -f thanos-query-c85b76d9f-dbj47
level=info ts=2018-06-22T09:09:39.681387733Z caller=flags.go:53 msg="StoreAPI address that will be propagated through gossip" address=10.2.1.66:10901
level=info ts=2018-06-22T09:09:39.693989192Z caller=flags.go:68 msg="QueryAPI address that will be propagated through gossip" address=10.2.1.66:10902
level=debug ts=2018-06-22T09:09:39.703978957Z caller=cluster.go:132 component=cluster msg="resolved peers to following addresses" peers=10.2.1.63:10900,10.2.8.5:10900
level=info ts=2018-06-22T09:09:39.706408749Z caller=query.go:251 msg="starting query node"
level=info ts=2018-06-22T09:09:39.706561502Z caller=query.go:225 msg="Listening for query and metrics" address=0.0.0.0:10902
level=info ts=2018-06-22T09:09:39.709499013Z caller=query.go:243 component=query msg="Listening for StoreAPI gRPC" address=0.0.0.0:10901
level=debug ts=2018-06-22T09:09:39.713822518Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKC36QRP61SW52AQ8M8M3N4 addr=10.2.1.66:10900
level=debug ts=2018-06-22T09:09:39.722744928Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKBK3EMNVBFJ84WATDM920B addr=10.2.8.6:10900
level=debug ts=2018-06-22T09:09:39.72280357Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKBZD5BZ4FEKQ8646WG4BAN addr=10.2.1.64:10900
level=debug ts=2018-06-22T09:09:39.722829056Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKBZC0VT08RH0JN2ZDXB5DM addr=10.2.1.63:10900
level=debug ts=2018-06-22T09:09:39.722869932Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKBJX97A93A0EHT0RX21S0D addr=10.2.8.5:10900
level=debug ts=2018-06-22T09:09:39.722892899Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKC1G5RD82XB8XBGVVVVA7Y addr=10.2.6.85:10900
level=debug ts=2018-06-22T09:09:39.723118726Z caller=cluster.go:196 component=cluster msg="joined cluster" peerType=query
level=debug ts=2018-06-22T09:09:40.714673241Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGKBK3EMNVBFJ84WATDM920B addr=10.2.8.6:10900
level=debug ts=2018-06-22T09:09:40.714731624Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGKC1G5RD82XB8XBGVVVVA7Y addr=10.2.6.85:10900
level=debug ts=2018-06-22T09:09:40.714758104Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGKBJX97A93A0EHT0RX21S0D addr=10.2.8.5:10900
level=debug ts=2018-06-22T09:09:40.714787163Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGKBZD5BZ4FEKQ8646WG4BAN addr=10.2.1.64:10900
level=debug ts=2018-06-22T09:09:40.768824912Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGKBZC0VT08RH0JN2ZDXB5DM addr=10.2.1.63:10900
level=warn ts=2018-06-22T09:09:49.723234701Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:09:59.723258953Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:10:09.723264424Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:10:19.723240848Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:10:29.723266288Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:10:39.723266412Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:10:49.723243504Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:10:59.72326009Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:11:09.723246232Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:11:19.723248675Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:11:29.723242498Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:11:39.723242063Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=debug ts=2018-06-22T09:11:39.726770252Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKC39QEH34SQNYZHJQVPRE7 addr=10.2.8.8:10900
level=debug ts=2018-06-22T09:11:39.726853485Z caller=cluster.go:271 component=cluster msg="refresh cluster peer joined" peer=10.2.8.8:10900 before=1 after=1
level=debug ts=2018-06-22T09:11:39.72846869Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKC39MK934CQN8B9QDNM65D addr=10.2.6.86:10900
level=debug ts=2018-06-22T09:11:39.728517753Z caller=cluster.go:271 component=cluster msg="refresh cluster peer joined" peer=10.2.6.86:10900 before=1 after=1
level=debug ts=2018-06-22T09:11:39.728538211Z caller=cluster.go:275 component=cluster msg="refresh cluster done" peers=thanos-peers.sys-prom:10900 resolvedPeers=10.2.8.8:10900,10.2.6.86:10900
level=info ts=2018-06-22T09:11:44.708529468Z caller=storeset.go:223 component=storeset msg="adding new store to query storeset" address=10.2.6.86:10901
level=info ts=2018-06-22T09:11:44.708581044Z caller=storeset.go:223 component=storeset msg="adding new store to query storeset" address=10.2.8.8:10901
level=debug ts=2018-06-22T09:11:46.714948936Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKC382CYK4Z4QRE1GM5XHNE addr=10.2.8.7:10900
level=debug ts=2018-06-22T09:11:50.665135963Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGKC39QEH34SQNYZHJQVPRE7 addr=10.2.8.8:10900
level=info ts=2018-06-22T09:11:54.706695316Z caller=storeset.go:203 component=storeset msg="store unhealthy or does not exists, closed gRPC client and removed from storeset" address=10.2.8.8:10901
level=debug ts=2018-06-22T09:12:07.714762977Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKC395XF28MJWCEMH46SQM1 addr=10.2.1.67:10900
level=info ts=2018-06-22T09:12:14.707419168Z caller=storeset.go:223 component=storeset msg="adding new store to query storeset" address=10.2.1.67:10901
level=debug ts=2018-06-22T09:12:15.715074145Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKC475EEVV2VHEHSFDZCR1F addr=10.2.8.9:10900
level=info ts=2018-06-22T09:12:19.708183515Z caller=storeset.go:223 component=storeset msg="adding new store to query storeset" address=10.2.8.9:10901
level=debug ts=2018-06-22T09:12:44.714820952Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKC4ZKF49GNWFKE1JEC3WW0 addr=10.2.1.68:10900
level=info ts=2018-06-22T09:12:49.707618687Z caller=storeset.go:223 component=storeset msg="adding new store to query storeset" address=10.2.1.68:10901
level=debug ts=2018-06-22T09:13:39.725316973Z caller=cluster.go:275 component=cluster msg="refresh cluster done" peers=thanos-peers.sys-prom:10900 resolvedPeers=10.2.8.9:10900,10.2.1.67:10900,10.2.1.68:10900,10.2.6.86:10900
level=debug ts=2018-06-22T09:15:39.725314973Z caller=cluster.go:275 component=cluster msg="refresh cluster done" peers=thanos-peers.sys-prom:10900 resolvedPeers=10.2.1.67:10900,10.2.1.68:10900,10.2.6.86:10900,10.2.8.9:10900

bwplotka

Minor nit, but looks nice.

bwplotka · 2018-06-22T20:16:42Z

pkg/cluster/cluster.go

+		}
+
+		if !isPeerFound {
+			curr, err := p.mlist.Join([]string{peer})


why not looping through resolvedPeers and currMembers to find not connected addresses and just do p.mlist.Join(notConnected)? Might be easier to grasp. What do you think?

Yeah, make sense.

bwplotka · 2018-06-22T20:23:22Z

pkg/cluster/cluster.go

@@ -43,6 +45,7 @@ type Peer struct {
 const (
 	DefaultPushPullInterval = 5 * time.Second
 	DefaultGossipInterval   = 5 * time.Second
+	DefaultRefreshInterval  = 30 * time.Second


maybe minute for it? or something bit longer? Not sure.

Yeah, a minute should be fine. We are acquiring a lock (https://github.com/improbable-eng/thanos/pull/383/files#diff-1a365fc2cc3058f4efcd7c1457c216c2R239), so less frequent runs make sense. And it shouldn't increase startup time as it take awhile for every component to join the cluster.

brancz · 2018-06-25T09:54:06Z

Somewhat related, there have been discussions on upstream Prometheus/Alertmanager to re-use the Prometheus service discovery in Alertmanager for discovering memberlist peers, if that happens we should reuse the same piece here.

bwplotka · 2018-06-26T21:16:39Z

Yea, similar flow, I am up for it @brancz 👍 Can you link me to the PRs when they will be some?

bwplotka · 2018-06-26T21:24:50Z

Thanks @povilasv !

* Add cluster.peers refresh * Fix test * Change to list current members and join only new ones * Fixes after review

Capnproto improvements

Add cluster.peers refresh

52a280f

povilasv mentioned this pull request Jun 21, 2018

sidecar: delayed gossip startup if peer is not working #372

Closed

Fix test

14e5906

bwplotka reviewed Jun 21, 2018

View reviewed changes

povilasv force-pushed the peers-refresh branch 2 times, most recently from 42b6b7e to 4713645 Compare June 22, 2018 06:05

povilasv added 2 commits June 22, 2018 09:11

Change to list current members and join only new ones

272b200

Merge branch 'master' into peers-refresh

ef48091

povilasv force-pushed the peers-refresh branch from 4713645 to ef48091 Compare June 22, 2018 06:29

bwplotka approved these changes Jun 22, 2018

View reviewed changes

Fixes after review

04d93cc

bwplotka merged commit fa26827 into thanos-io:master Jun 26, 2018

zexuanhuang pushed a commit to zexuanhuang/thanos that referenced this pull request Jun 27, 2018

Add cluster.peers refresh (thanos-io#383)

db42ebf

* Add cluster.peers refresh * Fix test * Change to list current members and join only new ones * Fixes after review

zexuanhuang pushed a commit to zexuanhuang/thanos that referenced this pull request Jun 27, 2018

Add cluster.peers refresh (thanos-io#383)

a40a7f1

* Add cluster.peers refresh * Fix test * Change to list current members and join only new ones * Fixes after review

zexuanhuang pushed a commit to zexuanhuang/thanos that referenced this pull request Jun 27, 2018

Add cluster.peers refresh (thanos-io#383)

5c7617b

* Add cluster.peers refresh * Fix test * Change to list current members and join only new ones * Fixes after review

povilasv deleted the peers-refresh branch June 27, 2018 04:24

bwplotka mentioned this pull request Jun 28, 2018

sidecar: clustering does not join properly - race condition ? #373

Closed

fpetkovski added a commit to fpetkovski/thanos that referenced this pull request Oct 31, 2024

Merge pull request thanos-io#383 from Shopify/capnproto-improvements

424ea2b

Capnproto improvements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cluster.peers refresh #383

Add cluster.peers refresh #383

povilasv commented Jun 21, 2018 •

edited

Loading

domgreen commented Jun 21, 2018

povilasv commented Jun 21, 2018

bwplotka left a comment

bwplotka Jun 21, 2018

povilasv Jun 22, 2018

bwplotka commented Jun 21, 2018

povilasv commented Jun 22, 2018

bwplotka left a comment

bwplotka Jun 22, 2018

povilasv Jun 25, 2018

bwplotka Jun 22, 2018

povilasv Jun 25, 2018

brancz commented Jun 25, 2018

bwplotka commented Jun 26, 2018

bwplotka commented Jun 26, 2018

Add cluster.peers refresh #383

Add cluster.peers refresh #383

Conversation

povilasv commented Jun 21, 2018 • edited Loading

Changes

Verification

domgreen commented Jun 21, 2018

povilasv commented Jun 21, 2018

bwplotka left a comment

Choose a reason for hiding this comment

bwplotka Jun 21, 2018

Choose a reason for hiding this comment

povilasv Jun 22, 2018

Choose a reason for hiding this comment

bwplotka commented Jun 21, 2018

povilasv commented Jun 22, 2018

bwplotka left a comment

Choose a reason for hiding this comment

bwplotka Jun 22, 2018

Choose a reason for hiding this comment

povilasv Jun 25, 2018

Choose a reason for hiding this comment

bwplotka Jun 22, 2018

Choose a reason for hiding this comment

povilasv Jun 25, 2018

Choose a reason for hiding this comment

brancz commented Jun 25, 2018

bwplotka commented Jun 26, 2018

bwplotka commented Jun 26, 2018

povilasv commented Jun 21, 2018 •

edited

Loading