Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sidecar: clustering does not join properly - race condition ? #373

Closed
cwolfinger opened this issue Jun 11, 2018 · 1 comment
Closed

sidecar: clustering does not join properly - race condition ? #373

cwolfinger opened this issue Jun 11, 2018 · 1 comment
Labels

Comments

@cwolfinger
Copy link

thanos, version v0.0.1 (branch: HEAD, revision: 2c63665)
build user: root@0af42dc4266a
build date: 20180602-20:00:24
go version: go1.10.2

What happened
Started a single thanos sidecar and two query nodes. Cluster did not setup until thanos sidecar was restarted.

level=info ts=2018-06-11T17:05:54.3869957Z caller=flags.go:51 msg="StoreAPI address that will be propagated through gossip" address=10.1.2.84:10901
level=debug ts=2018-06-11T17:05:54.5421866Z caller=cluster.go:128 component=cluster msg="resolved peers to following addresses" peers=thanos-peers-hi-res.default.svc.cluster.local:10900
level=info ts=2018-06-11T17:05:54.5480826Z caller=sidecar.go:232 msg="No GCS or S3 bucket was configured, uploads will be disabled"
level=info ts=2018-06-11T17:05:54.5482496Z caller=sidecar.go:269 msg="starting sidecar" peer=
level=info ts=2018-06-11T17:05:54.5507041Z caller=main.go:226 msg="Listening for metrics" address=0.0.0.0:10902
level=info ts=2018-06-11T17:05:54.5521783Z caller=sidecar.go:214 component=store msg="Listening for StoreAPI gRPC" address=0.0.0.0:10901
level=info ts=2018-06-11T17:05:54.5506985Z caller=reloader.go:77 component=reloader msg="started watching config file for changes" in=/etc/prometheus/prometheus.yml.tmpl out=/etc/prometheus-shared/prometheus.yml
level=error ts=2018-06-11T17:05:54.5620611Z caller=runutil.go:43 component=reloader msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post http://127.0.0.1:9090/-/reload: dial tcp 127.0.0.1:9090: connect: connection refused"
level=warn ts=2018-06-11T17:05:54.5665254Z caller=sidecar.go:130 msg="failed to fetch initial external labels. Is Prometheus running? Retrying" err="request config against http://127.0.0.1:9090/api/v1/status/config: Get http://127.0.0.1:9090/api/v1/status/config: dial tcp 127.0.0.1:9090: connect: connection refused"
level=debug ts=2018-06-11T17:05:56.6062056Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CFQWZAVFMYR6VJXYQD9A44C7 addr=10.1.2.84:10900
level=debug ts=2018-06-11T17:05:56.6310602Z caller=cluster.go:190 component=cluster msg="joined cluster" peers=0 peerType=source
level=info ts=2018-06-11T17:05:59.59841Z caller=reloader.go:188 component=reloader msg="Prometheus reload triggered" cfg_in=/etc/prometheus/prometheus.yml.tmpl cfg_out=/etc/prometheus-shared/prometheus.yml rule_dir=

On restart of the thanos sidecar the cluster started properly:

level=info ts=2018-06-11T17:11:43.5448783Z caller=flags.go:51 msg="StoreAPI address that will be propagated through gossip" address=10.1.2.84:10901
level=debug ts=2018-06-11T17:11:43.5589156Z caller=cluster.go:128 component=cluster msg="resolved peers to following addresses" peers=10.1.2.85:10900,10.1.2.86:10900
level=info ts=2018-06-11T17:11:43.5594216Z caller=sidecar.go:232 msg="No GCS or S3 bucket was configured, uploads will be disabled"
level=info ts=2018-06-11T17:11:43.5596431Z caller=sidecar.go:269 msg="starting sidecar" peer=
level=info ts=2018-06-11T17:11:43.5600137Z caller=main.go:226 msg="Listening for metrics" address=0.0.0.0:10902
level=info ts=2018-06-11T17:11:43.5600899Z caller=reloader.go:77 component=reloader msg="started watching config file for changes" in=/etc/prometheus/prometheus.yml.tmpl out=/etc/prometheus-shared/prometheus.yml
level=info ts=2018-06-11T17:11:43.5600343Z caller=sidecar.go:214 component=store msg="Listening for StoreAPI gRPC" address=0.0.0.0:10901
level=info ts=2018-06-11T17:11:43.5973595Z caller=reloader.go:188 component=reloader msg="Prometheus reload triggered" cfg_in=/etc/prometheus/prometheus.yml.tmpl cfg_out=/etc/prometheus-shared/prometheus.yml rule_dir=
level=debug ts=2018-06-11T17:11:43.6295098Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CFQX9ZP6SXDF624765BPVWPX addr=10.1.2.84:10900
level=debug ts=2018-06-11T17:11:43.6725086Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CFQWZB4Y3RC90RDKSVCX97MB addr=10.1.2.86:10900
level=debug ts=2018-06-11T17:11:43.6743459Z caller=delegate.go:82 component=cluster received=NotifyJoin node=01CFQWZAV9E841Z7X8S49TQ8KA addr=10.1.2.85:10900
level=debug ts=2018-06-11T17:11:43.6744618Z caller=cluster.go:190 component=cluster msg="joined cluster" peers=2 peerType=source

What you expected to happen
The cluster to start regardless of order of sidecar and query nodes.

How to reproduce it (as minimally and precisely as possible):
Hard to reproduce seems to be a race condition in startup of K8s resources.

Full logs to relevant components
In previous section

Anything else we need to know
No

Environment:

@bwplotka bwplotka added the bug label Jun 11, 2018
@bwplotka
Copy link
Member

Potential fix landed #383

fpetkovski added a commit to fpetkovski/thanos that referenced this issue Oct 17, 2024
Optimize label usage for stringlabels
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants