Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cluster.peers refresh #383

Merged
merged 5 commits into from
Jun 26, 2018
Merged

Conversation

povilasv
Copy link
Member

@povilasv povilasv commented Jun 21, 2018

Changes

REF #372

Adds a job which runs periodically and refreshes peer.state (requeries DNS server to get peer ip addresses)

The issue is that if at startup DNS has no records, "islands" of 1 thanos members form and DNS never gets refreshed.

An example case where it happens is when you run thanos in Kubernetes with liveness probes configured and restart all thanos pods at once. So on startup no IP's will be added to DNS due to livenessProbe not yet being ready and once membership.Join() happens it will never refresh dns.

Verification

This is log showing that refresh actually works before=2 after=4, which added thanos store gateway and cluster formed. In this case I explicitly set refresh to happen after 5mins to check that actually gossip doesn't heal itself and only after refresh happened it did.

level=info ts=2018-06-21T13:10:11.071198311Z caller=query.go:243 component=query msg="Listening for StoreAPI gRPC" address=0.0.0.0:10901
level=debug ts=2018-06-21T13:10:11.073078374Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH7EWVH2R31D7GYC6D1VM4Y addr=10.2.1.30:10900
level=debug ts=2018-06-21T13:10:11.087558248Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH715BMW9QB4H0GSZ51D6EA addr=10.2.2.26:10900
level=debug ts=2018-06-21T13:10:11.087692504Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH7C6135494HE709G5S2ZS8 addr=10.2.6.59:10900
level=debug ts=2018-06-21T13:10:11.087728965Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH7EFGG7WP4MB5QXDGSD35H addr=10.2.1.29:10900
level=debug ts=2018-06-21T13:10:11.087767742Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH7BYMCDXTKM552H0WVH2YN addr=10.2.6.58:10900
level=debug ts=2018-06-21T13:10:11.087828991Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH71H60AMEMHJG1CFGF9RGF addr=10.2.2.27:10900
level=debug ts=2018-06-21T13:10:11.087857058Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH714RX1Z06QVCZF9R2ZEKT addr=10.2.1.27:10900
level=debug ts=2018-06-21T13:10:11.096549794Z caller=cluster.go:194 component=cluster msg="joined cluster" peerType=query
level=debug ts=2018-06-21T13:10:11.932515848Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGH7C6135494HE709G5S2ZS8 addr=10.2.6.59:10900
level=debug ts=2018-06-21T13:10:11.932747085Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGH7EFGG7WP4MB5QXDGSD35H addr=10.2.1.29:10900
level=debug ts=2018-06-21T13:10:11.932794649Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH7EXA0FJYDYFPP1F60AQNV addr=10.2.2.29:10900
level=debug ts=2018-06-21T13:10:11.932863761Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGH714RX1Z06QVCZF9R2ZEKT addr=10.2.1.27:10900
level=debug ts=2018-06-21T13:10:11.932891322Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGH7BYMCDXTKM552H0WVH2YN addr=10.2.6.58:10900
level=debug ts=2018-06-21T13:10:11.932923465Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGH71H60AMEMHJG1CFGF9RGF addr=10.2.2.27:10900
level=debug ts=2018-06-21T13:10:12.074395071Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGH715BMW9QB4H0GSZ51D6EA addr=10.2.2.26:10900
level=debug ts=2018-06-21T13:15:11.128426912Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH7H1VPWW2NVXWY9PBYGZPX addr=10.2.2.31:10900
level=debug ts=2018-06-21T13:15:11.129666993Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH7EZFRWTC4TG2NQBWVVHY1 addr=10.2.6.61:10900
level=debug ts=2018-06-21T13:15:11.129704941Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH7J1WFWKBV7PQ7NBEHAR8D addr=10.2.2.32:10900
level=debug ts=2018-06-21T13:15:11.132168008Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGH7FAH9JGANEMS9DW6TTN91 addr=10.2.1.32:10900
level=debug ts=2018-06-21T13:15:11.132245256Z caller=cluster.go:249 component=cluster msg="refreshed cluster" before=2 after=4
level=info ts=2018-06-21T13:15:16.068759342Z caller=storeset.go:223 component=storeset msg="adding new store to query storeset" address=10.2.1.32:10901
level=info ts=2018-06-21T13:15:16.0687967Z caller=storeset.go:223 component=storeset msg="adding new store to query storeset" address=10.2.2.32:10901
level=info ts=2018-06-21T13:15:16.068810515Z caller=storeset.go:223 component=storeset msg="adding new store to query storeset" address=10.2.2.31:10901
level=info ts=2018-06-21T13:15:16.068820291Z caller=storeset.go:223 component=storeset msg="adding new store to query storeset" address=10.2.6.61:10901

@domgreen
Copy link
Contributor

How does this impact query? I think query is currently periodically updating its own members based on the Source and Store labels ... does this mean there will be two places updating the known members for a node?
My 2pc would be look at consolidating the approach so that we can reuse between all nodes that need to know the member list has been updated.
@Bplotka thoughts?

@povilasv
Copy link
Member Author

@domgreen it does (https://github.com/improbable-eng/thanos/blob/master/cmd/thanos/query.go#L183) , which looks for members already connected via gossip and tries to do a grpc connection to those nodes. (atleast as I understand that code)

So this means that once dns refresh happens new members join via gossip, then later on thanos query store refresh happens and it does grpc connection to the newly added thanos store api components.
Which would mean that it all starts to work :)

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes sense, just I do not immediately know if that is safe to rejoin explicitly multiple times for memberlist library. Need to double check - or maybe you know the answer?

Another way of improving this is set to true (or put to flag) "waitIfEmpty" for clusters.New for other components that store alone. (: What do you think?

}

prev := p.mlist.NumMembers()
curr, err := p.mlist.Join(p.knownPeers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not trivial here -> basically you join member that already joined. I need to double check the consequences, but I am kind of worried that:

  • peer is not closed until you create new one (join)
  • even if we met above it will meet a little bit distruption.

have you looked closely what is the beaviour of this for memberlist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bplotka I've looked thru code and it won't close the peer. But you are right about the disruption, as it calls pushPullNode (https://github.com/hashicorp/memberlist/blob/master/memberlist.go#L225), which is rather expensive operation, as per docs:

pushPull is invoked periodically to randomly perform a complete state
exchange. Used to ensure a high level of convergence, but is also
 reasonably expensive as the entire state of this node is exchanged
 with the other node.


I'll change that we just join new members, just by resolving dns and looping thru current members

@bwplotka
Copy link
Member

@domgreen So the issue is possible here: We join gossip cluster based on knownPeers that is based on some DNS domain. Let's say all components start simultaneously. At the moment of startup query resolved N components. But we could have N+10 components in the system, and they are running just Kubernetes did not catch that yet so it is not yet resolvable. On the other hand the non resolvable component cannot see query yet because it not yet resolvable as well (will be in some time). It's tricky thing :/

@povilasv povilasv force-pushed the peers-refresh branch 2 times, most recently from 42b6b7e to 4713645 Compare June 22, 2018 06:05
@povilasv
Copy link
Member Author

@Bplotka here is an example log after these changes applied.
I've setup refresh interval to run in 3 mins so it startups and appears to be alone and after refresh gossip forms and it start to behave normally. And on next refresh run no membership.Join has been called:

➜   k logs -f thanos-query-c85b76d9f-dbj47
level=info ts=2018-06-22T09:09:39.681387733Z caller=flags.go:53 msg="StoreAPI address that will be propagated through gossip" address=10.2.1.66:10901
level=info ts=2018-06-22T09:09:39.693989192Z caller=flags.go:68 msg="QueryAPI address that will be propagated through gossip" address=10.2.1.66:10902
level=debug ts=2018-06-22T09:09:39.703978957Z caller=cluster.go:132 component=cluster msg="resolved peers to following addresses" peers=10.2.1.63:10900,10.2.8.5:10900
level=info ts=2018-06-22T09:09:39.706408749Z caller=query.go:251 msg="starting query node"
level=info ts=2018-06-22T09:09:39.706561502Z caller=query.go:225 msg="Listening for query and metrics" address=0.0.0.0:10902
level=info ts=2018-06-22T09:09:39.709499013Z caller=query.go:243 component=query msg="Listening for StoreAPI gRPC" address=0.0.0.0:10901
level=debug ts=2018-06-22T09:09:39.713822518Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKC36QRP61SW52AQ8M8M3N4 addr=10.2.1.66:10900
level=debug ts=2018-06-22T09:09:39.722744928Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKBK3EMNVBFJ84WATDM920B addr=10.2.8.6:10900
level=debug ts=2018-06-22T09:09:39.72280357Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKBZD5BZ4FEKQ8646WG4BAN addr=10.2.1.64:10900
level=debug ts=2018-06-22T09:09:39.722829056Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKBZC0VT08RH0JN2ZDXB5DM addr=10.2.1.63:10900
level=debug ts=2018-06-22T09:09:39.722869932Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKBJX97A93A0EHT0RX21S0D addr=10.2.8.5:10900
level=debug ts=2018-06-22T09:09:39.722892899Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKC1G5RD82XB8XBGVVVVA7Y addr=10.2.6.85:10900
level=debug ts=2018-06-22T09:09:39.723118726Z caller=cluster.go:196 component=cluster msg="joined cluster" peerType=query
level=debug ts=2018-06-22T09:09:40.714673241Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGKBK3EMNVBFJ84WATDM920B addr=10.2.8.6:10900
level=debug ts=2018-06-22T09:09:40.714731624Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGKC1G5RD82XB8XBGVVVVA7Y addr=10.2.6.85:10900
level=debug ts=2018-06-22T09:09:40.714758104Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGKBJX97A93A0EHT0RX21S0D addr=10.2.8.5:10900
level=debug ts=2018-06-22T09:09:40.714787163Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGKBZD5BZ4FEKQ8646WG4BAN addr=10.2.1.64:10900
level=debug ts=2018-06-22T09:09:40.768824912Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGKBZC0VT08RH0JN2ZDXB5DM addr=10.2.1.63:10900
level=warn ts=2018-06-22T09:09:49.723234701Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:09:59.723258953Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:10:09.723264424Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:10:19.723240848Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:10:29.723266288Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:10:39.723266412Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:10:49.723243504Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:10:59.72326009Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:11:09.723246232Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:11:19.723248675Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:11:29.723242498Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-06-22T09:11:39.723242063Z caller=cluster.go:296 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=debug ts=2018-06-22T09:11:39.726770252Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKC39QEH34SQNYZHJQVPRE7 addr=10.2.8.8:10900
level=debug ts=2018-06-22T09:11:39.726853485Z caller=cluster.go:271 component=cluster msg="refresh cluster peer joined" peer=10.2.8.8:10900 before=1 after=1
level=debug ts=2018-06-22T09:11:39.72846869Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKC39MK934CQN8B9QDNM65D addr=10.2.6.86:10900
level=debug ts=2018-06-22T09:11:39.728517753Z caller=cluster.go:271 component=cluster msg="refresh cluster peer joined" peer=10.2.6.86:10900 before=1 after=1
level=debug ts=2018-06-22T09:11:39.728538211Z caller=cluster.go:275 component=cluster msg="refresh cluster done" peers=thanos-peers.sys-prom:10900 resolvedPeers=10.2.8.8:10900,10.2.6.86:10900
level=info ts=2018-06-22T09:11:44.708529468Z caller=storeset.go:223 component=storeset msg="adding new store to query storeset" address=10.2.6.86:10901
level=info ts=2018-06-22T09:11:44.708581044Z caller=storeset.go:223 component=storeset msg="adding new store to query storeset" address=10.2.8.8:10901
level=debug ts=2018-06-22T09:11:46.714948936Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKC382CYK4Z4QRE1GM5XHNE addr=10.2.8.7:10900
level=debug ts=2018-06-22T09:11:50.665135963Z caller=delegate.go:88 component=cluster received=NotifyLeave node=01CGKC39QEH34SQNYZHJQVPRE7 addr=10.2.8.8:10900
level=info ts=2018-06-22T09:11:54.706695316Z caller=storeset.go:203 component=storeset msg="store unhealthy or does not exists, closed gRPC client and removed from storeset" address=10.2.8.8:10901
level=debug ts=2018-06-22T09:12:07.714762977Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKC395XF28MJWCEMH46SQM1 addr=10.2.1.67:10900
level=info ts=2018-06-22T09:12:14.707419168Z caller=storeset.go:223 component=storeset msg="adding new store to query storeset" address=10.2.1.67:10901
level=debug ts=2018-06-22T09:12:15.715074145Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKC475EEVV2VHEHSFDZCR1F addr=10.2.8.9:10900
level=info ts=2018-06-22T09:12:19.708183515Z caller=storeset.go:223 component=storeset msg="adding new store to query storeset" address=10.2.8.9:10901
level=debug ts=2018-06-22T09:12:44.714820952Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CGKC4ZKF49GNWFKE1JEC3WW0 addr=10.2.1.68:10900
level=info ts=2018-06-22T09:12:49.707618687Z caller=storeset.go:223 component=storeset msg="adding new store to query storeset" address=10.2.1.68:10901
level=debug ts=2018-06-22T09:13:39.725316973Z caller=cluster.go:275 component=cluster msg="refresh cluster done" peers=thanos-peers.sys-prom:10900 resolvedPeers=10.2.8.9:10900,10.2.1.67:10900,10.2.1.68:10900,10.2.6.86:10900
level=debug ts=2018-06-22T09:15:39.725314973Z caller=cluster.go:275 component=cluster msg="refresh cluster done" peers=thanos-peers.sys-prom:10900 resolvedPeers=10.2.1.67:10900,10.2.1.68:10900,10.2.6.86:10900,10.2.8.9:10900

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, but looks nice.

}

if !isPeerFound {
curr, err := p.mlist.Join([]string{peer})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not looping through resolvedPeers and currMembers to find not connected addresses and just do p.mlist.Join(notConnected)? Might be easier to grasp. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, make sense.

@@ -43,6 +45,7 @@ type Peer struct {
const (
DefaultPushPullInterval = 5 * time.Second
DefaultGossipInterval = 5 * time.Second
DefaultRefreshInterval = 30 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe minute for it? or something bit longer? Not sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a minute should be fine. We are acquiring a lock (https://github.com/improbable-eng/thanos/pull/383/files#diff-1a365fc2cc3058f4efcd7c1457c216c2R239), so less frequent runs make sense. And it shouldn't increase startup time as it take awhile for every component to join the cluster.

@brancz
Copy link
Member

brancz commented Jun 25, 2018

Somewhat related, there have been discussions on upstream Prometheus/Alertmanager to re-use the Prometheus service discovery in Alertmanager for discovering memberlist peers, if that happens we should reuse the same piece here.

@bwplotka
Copy link
Member

Yea, similar flow, I am up for it @brancz 👍 Can you link me to the PRs when they will be some?

@bwplotka bwplotka merged commit fa26827 into thanos-io:master Jun 26, 2018
@bwplotka
Copy link
Member

Thanks @povilasv !

zexuanhuang pushed a commit to zexuanhuang/thanos that referenced this pull request Jun 27, 2018
* Add cluster.peers refresh

* Fix test

* Change to list current members and join only new ones

* Fixes after review
zexuanhuang pushed a commit to zexuanhuang/thanos that referenced this pull request Jun 27, 2018
* Add cluster.peers refresh

* Fix test

* Change to list current members and join only new ones

* Fixes after review
zexuanhuang pushed a commit to zexuanhuang/thanos that referenced this pull request Jun 27, 2018
* Add cluster.peers refresh

* Fix test

* Change to list current members and join only new ones

* Fixes after review
@povilasv povilasv deleted the peers-refresh branch June 27, 2018 04:24
fpetkovski added a commit to fpetkovski/thanos that referenced this pull request Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants