Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querier/Store v0.8.0 regression: Thanos now considers identically externally labeled Store and Sidecar instances as having duplicate label sets. #1632

Closed
jstewart612 opened this issue Oct 11, 2019 · 8 comments · Fixed by #1636

Comments

@jstewart612
Copy link

jstewart612 commented Oct 11, 2019

Thanos, Prometheus and Golang version used: 0.8.0, 2.13.0, 1.13.1

Object Storage Provider: Azure

What happened: 0.7.0 I used the "replica" label with values "a" and b". I used --query.replica-label=replica in my Thanos Query nodes. Now, if I put the Store and Sidecar as stores in the same Thanos Query instance, one or the other gets dropped for labels as not unique.

What you expected to happen: Uniqueness of labels only be enforced for the same Thanos type (store versus store, sidecar versus sidecar).

How to reproduce it (as minimally and precisely as possible):

  1. Apply replica = "a" external_label to one Prometheus instance
  2. Apply replica = "b" to another Prometheus instance
  3. Add sidecars to above nodes
  4. Tell a Thanos Query instance to point to those sidecars.

Full logs to relevant components:
{"address":"thanos-query-app-swarm-sidecars:10903","caller":"storeset.go:261","component":"storeset","duplicates":2,"extLset":"{monitor=\"apps-app-swarm\",region=\"useast2\",replica=\"a\"},{monitor=\"apps-app-swarm\",region=\"useast2\",replica=\"b\"},{monitor=\"infra-app-swarm\",region=\"useast2\",replica=\"a\"},{monitor=\"infra-app-swarm\",region=\"useast2\",replica=\"b\"},{monitor=\"services-app-swarm\",region=\"useast2\",replica=\"a\"},{monitor=\"services-app-swarm\",region=\"useast2\",replica=\"b\"}","level":"warn","msg":"dropping store, external labels are not unique","ts":"2019-10-11T06:33:29.563785804Z"}

Anything else we need to know: Again this worked in 0.7.0. I did notice in the UI that the label sets for my Stores used to be blank, and now they are not. If I had to guess, the "glitch" got "fixed" there and caused a collision, but I am not seeing the "new way" to ensure that my given Thanos Store for a given Thanos Sidecar "replica" remains uniquely labelled.

@jojohappy jojohappy self-assigned this Oct 11, 2019
@jojohappy
Copy link
Member

Thanks for report. It seems to be a bug.

@IKSIN
Copy link
Contributor

IKSIN commented Oct 11, 2019

Bug reproduced on query v0.7.0 and store v0.8.0, and not reproduced on query v0.8.0 and store v0.7.0

@jojohappy
Copy link
Member

@IKSIN Yes, because store v0.7.0 will not expose advertised labels. Will fix ASAP.

@bwplotka
Copy link
Member

bwplotka commented Oct 11, 2019

Thanks for reporting!

It looks like we missed quite a simple scenario.

What we changed is v0.8.0 store is advertising labels: It takes all blocks it has and advertise them all. If you have just a simple case of one store and just one Prometheus sidecar - there is overlap in advertise labels logic and querier blocks one. We are working on a fix to Querier and we will release patch release 0.8.1

We can fix Querier for v0.8.1 but still unfortunately, Querier v0.7.0 or below with the newest store v0.8.0 will cause such issues. Restricting the order of upgrading is not nice, let's see what we can do about it.

@bwplotka bwplotka changed the title Thanos now considers identically externally labeled Store and Sidecar instances as having duplicate label sets, didn't used to. Qurier/Store v0.8.0 regression: Thanos now considers identically externally labeled Store and Sidecar instances as having duplicate label sets. Oct 11, 2019
@bwplotka bwplotka changed the title Qurier/Store v0.8.0 regression: Thanos now considers identically externally labeled Store and Sidecar instances as having duplicate label sets. Querier/Store v0.8.0 regression: Thanos now considers identically externally labeled Store and Sidecar instances as having duplicate label sets. Oct 11, 2019
@bwplotka bwplotka assigned bwplotka and unassigned jojohappy Oct 11, 2019
@bwplotka
Copy link
Member

I am happy to take it @jojohappy as it's 1am your time.

bwplotka added a commit that referenced this issue Oct 11, 2019
…locked for older Queriers.

Fixes #1632

Signed-off-by: Bartek Plotka <[email protected]>
bwplotka added a commit that referenced this issue Oct 13, 2019
… blocked for older Queriers.

Changes:
* Cleaned up the code a bit.
* Ensured proper locks.
* thanos_store_nodes_grpc_connections is not per store type and external labels. thanos_store_node_info is marked as deprecated.
* Added (optional, but enabled by default) compatibility label to store GW Info method with flag to disable it. Store only adds it if there is any labelset to advertise.
* Disabled any strict deduplication in external labels detection; just warning!
* More tests; compatibility tests for query pre 0.8.0 and new store GW.
* Remove compatibility label if spotted.

Fixes #1632

Signed-off-by: Bartek Plotka <[email protected]>
@bwplotka
Copy link
Member

Fix is in review: #1636

I built quay.io/thanos/thanos:v0.8.1-test-d042b5b image with the fix. Can you @IKSIN @jstewart612 try it out and let us know if all works as expected? Any order of upgrade should work with this:

  • store 0.8.1 querier 0.7.0
  • store 0.8.1 querier 0.8.1
  • store 0.7.0 querier 0.8.1
    etc.

bwplotka added a commit that referenced this issue Oct 13, 2019
… blocked for older Queriers.

Changes:
* Cleaned up the code a bit.
* Ensured proper locks.
* thanos_store_nodes_grpc_connections is not per store type and external labels. thanos_store_node_info is marked as deprecated.
* Added (optional, but enabled by default) compatibility label to store GW Info method with flag to disable it. Store only adds it if there is any labelset to advertise.
* Disabled any strict deduplication in external labels detection; just warning!
* More tests; compatibility tests for query pre 0.8.0 and new store GW.
* Remove compatibility label if spotted.

Fixes #1632

Signed-off-by: Bartek Plotka <[email protected]>
bwplotka added a commit that referenced this issue Oct 14, 2019
… blocked for older Queriers. (#1636)

Changes:
* Cleaned up the code a bit.
* Ensured proper locks.
* thanos_store_nodes_grpc_connections is not per store type and external labels. thanos_store_node_info is marked as deprecated.
* Added (optional, but enabled by default) compatibility label to store GW Info method with flag to disable it. Store only adds it if there is any labelset to advertise.
* Disabled any strict deduplication in external labels detection; just warning!
* More tests; compatibility tests for query pre 0.8.0 and new store GW.
* Remove compatibility label if spotted.

Fixes #1632

Signed-off-by: Bartek Plotka <[email protected]>
bwplotka added a commit that referenced this issue Oct 14, 2019
… blocked for older Queriers.

Changes:
* Cleaned up the code a bit.
* Ensured proper locks.
* thanos_store_nodes_grpc_connections is not per store type and external labels. thanos_store_node_info is marked as deprecated.
* Added (optional, but enabled by default) compatibility label to store GW Info method with flag to disable it. Store only adds it if there is any labelset to advertise.
* Disabled any strict deduplication in external labels detection; just warning!
* More tests; compatibility tests for query pre 0.8.0 and new store GW.
* Remove compatibility label if spotted.

Fixes #1632

Signed-off-by: Bartek Plotka <[email protected]>
bwplotka added a commit that referenced this issue Oct 14, 2019
… blocked for older Queriers.

Changes:
* Cleaned up the code a bit.
* Ensured proper locks.
* thanos_store_nodes_grpc_connections is not per store type and external labels. thanos_store_node_info is marked as deprecated.
* Added (optional, but enabled by default) compatibility label to store GW Info method with flag to disable it. Store only adds it if there is any labelset to advertise.
* Disabled any strict deduplication in external labels detection; just warning!
* More tests; compatibility tests for query pre 0.8.0 and new store GW.
* Remove compatibility label if spotted.

Fixes #1632

Signed-off-by: Bartek Plotka <[email protected]>
bwplotka added a commit that referenced this issue Oct 14, 2019
…ase 0.8.1 (#1640)

* Pinned `x/sys` to certain version that is buildable for windows platform. (#1626)

Signed-off-by: Bartek Plotka <[email protected]>

* Post release update. (#1628)

Signed-off-by: Bartek Plotka <[email protected]>

* Removed Querier duplicate labels checks & made sure store will be not blocked for older Queriers.

Changes:
* Cleaned up the code a bit.
* Ensured proper locks.
* thanos_store_nodes_grpc_connections is not per store type and external labels. thanos_store_node_info is marked as deprecated.
* Added (optional, but enabled by default) compatibility label to store GW Info method with flag to disable it. Store only adds it if there is any labelset to advertise.
* Disabled any strict deduplication in external labels detection; just warning!
* More tests; compatibility tests for query pre 0.8.0 and new store GW.
* Remove compatibility label if spotted.

Fixes #1632

Signed-off-by: Bartek Plotka <[email protected]>

* Changelog fixes (#1634)

Signed-off-by: Bartek Plotka <[email protected]>
@bwplotka
Copy link
Member

Fixed version is released https://github.com/thanos-io/thanos/releases/tag/v0.8.1 🎉 Thanks all for help in this 🎖️

@jstewart612
Copy link
Author

Working great tyvm!

GiedriusS pushed a commit that referenced this issue Oct 28, 2019
… blocked for older Queriers. (#1636)

Changes:
* Cleaned up the code a bit.
* Ensured proper locks.
* thanos_store_nodes_grpc_connections is not per store type and external labels. thanos_store_node_info is marked as deprecated.
* Added (optional, but enabled by default) compatibility label to store GW Info method with flag to disable it. Store only adds it if there is any labelset to advertise.
* Disabled any strict deduplication in external labels detection; just warning!
* More tests; compatibility tests for query pre 0.8.0 and new store GW.
* Remove compatibility label if spotted.

Fixes #1632

Signed-off-by: Bartek Plotka <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants