Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Compactor] panic: unexpected seriesToChunkEncoder lack of iterations #6775

Closed
piotrhryszko-img opened this issue Oct 5, 2023 · 12 comments · Fixed by #7334
Closed

[Compactor] panic: unexpected seriesToChunkEncoder lack of iterations #6775

piotrhryszko-img opened this issue Oct 5, 2023 · 12 comments · Fixed by #7334

Comments

@piotrhryszko-img
Copy link

piotrhryszko-img commented Oct 5, 2023

Thanos, Prometheus and Golang version used:

thanos, version 0.31.0 (branch: HEAD, revision: 50c464132c265eef64254a9fd063b1e2419e09b7)
  build user:       root@63f5f37ee4e8
  build date:       20230323-10:13:38
  go version:       go1.19.7
  platform:         linux/amd64

Object Storage Provider: S3

What happened:
Thanos compact throws panic: unexpected seriesToChunkEncoder lack of iterations and exists
What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Full logs to relevant components:

Uncomment if you would like to post collapsible logs:

Logs

panic: unexpected seriesToChunkEncoder lack of iterations

goroutine 50 [running]:
github.com/prometheus/prometheus/storage.(*compactChunkIterator).Next(0xc000b56bd0)
	/go/pkg/mod/github.com/prometheus/[email protected]/storage/merge.go:753 +0x88c
github.com/prometheus/prometheus/tsdb.(*LeveledCompactor).populateBlock(0xc00091f260, {0xc00061a120, 0x2, 0x69?}, 0xc0003128f0, {0x2b54960, 0xc000562580}, {0x2b4dc80, 0xc000f63310})
	/go/pkg/mod/github.com/prometheus/[email protected]/tsdb/compact.go:771 +0x1488
github.com/prometheus/prometheus/tsdb.(*LeveledCompactor).write(0xc00091f260, {0xc000bc62c0, 0x37}, 0xc0003128f0, {0xc00061a120, 0x2, 0x2})
	/go/pkg/mod/github.com/prometheus/[email protected]/tsdb/compact.go:597 +0x64d
github.com/prometheus/prometheus/tsdb.(*LeveledCompactor).Compact(0xc00091f260, {0xc000bc62c0, 0x37}, {0xc0000b9fe0, 0x2, 0x4057e40?}, {0x0, 0x0, 0xc0008ea000?})
	/go/pkg/mod/github.com/prometheus/[email protected]/tsdb/compact.go:438 +0x225
github.com/thanos-io/thanos/pkg/compact.(*Group).compact.func3({0x2b5a648?, 0xc00113a960?})
	/app/pkg/compact/compact.go:1075 +0x4a
github.com/thanos-io/thanos/pkg/tracing.DoInSpanWithErr({0x2b5a648?, 0xc00113a960?}, {0x25ef5d1?, 0x2?}, 0xc000e91b48, {0x0?, 0xc000a6a240?, 0x0?})
	/app/pkg/tracing/tracing.go:82 +0xd0
github.com/thanos-io/thanos/pkg/compact.(*Group).compact(0xc001585680, {0x2b5a648, 0xc00113a960}, {0xc000bc62c0, 0x37}, {0x2b42f20, 0xc0005f9bc0}, {0x2b4db40, 0xc00091f260})
	/app/pkg/compact/compact.go:1074 +0xcab
github.com/thanos-io/thanos/pkg/compact.(*Group).Compact.func2({0x2b5a648?, 0xc00113a960?})
	/app/pkg/compact/compact.go:775 +0x65
github.com/thanos-io/thanos/pkg/tracing.DoInSpanWithErr({0x2b5a5a0?, 0xc000666000?}, {0x25fcb34?, 0x9?}, 0xc000e91e30, {0xc000cc2d80?, 0x43cba7?, 0xc000e91d80?})
	/app/pkg/tracing/tracing.go:82 +0xd0
github.com/thanos-io/thanos/pkg/compact.(*Group).Compact(0xc001585680, {0x2b5a5a0, 0xc000666000}, {0xc00089b8a0, 0x1b}, {0x2b42f20, 0xc0005f9bc0}, {0x2b4db40, 0xc00091f260})
	/app/pkg/compact/compact.go:774 +0x35c
github.com/thanos-io/thanos/pkg/compact.(*BucketCompactor).Compact.func2()
	/app/pkg/compact/compact.go:1250 +0x165
created by github.com/thanos-io/thanos/pkg/compact.(*BucketCompactor).Compact
	/app/pkg/compact/compact.go:1247 +0x935

Anything else we need to know:

        - args:
            - compact
            - --log.level=info
            - --log.format=logfmt
            - --http-address=0.0.0.0:10902
            - --objstore.config-file=/etc/config/object-store.yaml
            - --data-dir=/var/thanos/compact
            - --consistency-delay=30m
            - --retention.resolution-raw=30d
            - --retention.resolution-5m=180d
            - --retention.resolution-1h=1y
            - --compact.concurrency=1
            - --wait
            - --deduplication.replica-label=__replica__
@piotrhryszko-img
Copy link
Author

also tried with vertical compaction enabled on another environment and still seeing the same panic

        - args:
            - compact
            - --log.level=info
            - --log.format=logfmt
            - --http-address=0.0.0.0:10902
            - --objstore.config-file=/etc/config/object-store.yaml
            - --data-dir=/var/thanos/compact
            - --consistency-delay=30m
            - --retention.resolution-raw=30d
            - --retention.resolution-5m=180d
            - --retention.resolution-1h=1y
            - --compact.concurrency=1
            - --wait
            - --deduplication.replica-label=__replica__
            - --compact.enable-vertical-compaction
            - --delete-delay=0

@GiedriusS
Copy link
Member

Is this the same with the newest main version? Could you please try it? 0.31.0 is old :/

@piotrhryszko-img
Copy link
Author

Hi @GiedriusS upgrading to the latest version didn't resolve the issue

thanos, version 0.32.4 (branch: HEAD, revision: fcd5683e3049924ae26a680e166ae6f27a344896)
  build user:       root@afb5016d2fc4
  build date:       20231002-07:45:12
  go version:       go1.20.8
  platform:         linux/amd64
  tags:             netgo

As per suggestions on Slack deduplication function was added as in our case applications are scraped by multiple Prometheus instances. This stopped errors from happening. However, it also seems to have caused issues with compaction now, as it's been stuck on a single block for more than 3 days now. Current configuration is below

        - args:
            - compact
            - --log.level=debug
            - --log.format=logfmt
            - --http-address=0.0.0.0:10902
            - --objstore.config-file=/etc/config/object-store.yaml
            - --data-dir=/var/thanos/compact
            - --consistency-delay=30m
            - --retention.resolution-raw=30d
            - --retention.resolution-5m=180d
            - --retention.resolution-1h=1y
            - --compact.concurrency=1
            - --wait
            - --deduplication.replica-label=__replica__
            - --deduplication.func=penalty
            - --compact.enable-vertical-compaction
            - --delete-delay=168h

@yeya24
Copy link
Contributor

yeya24 commented Oct 23, 2023

However, it also seems to have caused issues with compaction now, as it's been stuck on a single block for more than 3 days now.

What's the reason of the block stuck? Did you see any error?

@vCra
Copy link

vCra commented Nov 17, 2023

Hey - I've also seen a similar error on 0.32.4

{"caller":"compact.go:708","level":"info","msg":"Found overlapping blocks during compaction","ts":"2023-11-17T22:56:51.255652657Z","ulid":"01HFFR0H1PS6EWAP1ARPPZ4ZG8"}
panic: unexpected seriesToChunkEncoder lack of iterations

goroutine 289 [running]:
github.com/prometheus/prometheus/storage.(*compactChunkIterator).Next(0xc000274b40)
	/go/pkg/mod/github.com/prometheus/[email protected]/storage/merge.go:753 +0x870
github.com/prometheus/prometheus/tsdb.DefaultBlockPopulator.PopulateBlock({}, {0x2d0f3a8, 0xc000789440}, 0xc0008c1500, {0x2cf1be0, 0xc0006ae0c0}, {0x2d00380, 0xc0000d9cc0}, 0xc000012448?, {0xc00143c040, ...}, ...)
	/go/pkg/mod/github.com/prometheus/[email protected]/tsdb/compact.go:781 +0x1472
github.com/prometheus/prometheus/tsdb.(*LeveledCompactor).write(0xc0006c3860, {0xc00106c0f0, 0x29}, 0xc000806bb0, {0x2cfa620, 0x431d070}, {0xc00143c040, 0x2, 0x2})
	/go/pkg/mod/github.com/prometheus/[email protected]/tsdb/compact.go:601 +0x6db
github.com/prometheus/prometheus/tsdb.(*LeveledCompactor).CompactWithBlockPopulator(0xc0006c3860, {0xc00106c0f0, 0x29}, {0xc00081a340, 0x2, 0x2d28040?}, {0x0, 0x0, 0xc0001ec380?}, {0x2cfa620, ...})
	/go/pkg/mod/github.com/prometheus/[email protected]/tsdb/compact.go:442 +0x6bb
github.com/thanos-io/thanos/pkg/compact.(*Group).compact.func3({0x2d0f3a8, 0xc001c22420})
	/app/pkg/compact/compact.go:1137 +0x125
github.com/thanos-io/thanos/pkg/tracing.DoInSpanWithErr({0x2d0f3a8?, 0xc001476270?}, {0x277957c?, 0x2?}, 0xc0010a5aa0, {0x0?, 0xc000ebc500?, 0x1?})
	/app/pkg/tracing/tracing.go:82 +0xd0
github.com/thanos-io/thanos/pkg/compact.(*Group).compact(0xc000bbc8c0, {0x2d0f3a8, 0xc001476270}, {0xc00106c0f0, 0x29}, {0x2cf4280, 0xc000789770}, {0x2d07640, 0xc0006c3860}, {0x2cfa920, ...}, ...)
	/app/pkg/compact/compact.go:1132 +0x10ad
github.com/thanos-io/thanos/pkg/compact.(*Group).Compact.func2({0x2d0f3a8?, 0xc001476270?})
	/app/pkg/compact/compact.go:830 +0xd7
github.com/thanos-io/thanos/pkg/tracing.DoInSpanWithErr({0x2d0f300?, 0xc0008186e0?}, {0x2787486?, 0x9?}, 0xc0010a5e10, {0xc0000c60d0?, 0x40e227?, 0x58?})
	/app/pkg/tracing/tracing.go:82 +0xd0
github.com/thanos-io/thanos/pkg/compact.(*Group).Compact(0xc000bbc8c0, {0x2d0f300, 0xc0008186e0}, {0xc0002662a0, 0xd}, {0x2cf4280, 0xc000789770}, {0x2d07640, 0xc0006c3860}, {0x2cfa920, ...}, ...)
	/app/pkg/compact/compact.go:829 +0x3cc
github.com/thanos-io/thanos/pkg/compact.(*BucketCompactor).Compact.func2()
	/app/pkg/compact/compact.go:1373 +0x18a
created by github.com/thanos-io/thanos/pkg/compact.(*BucketCompactor).Compact
	/app/pkg/compact/compact.go:1370 +0x90a

When searching for 01HFFR0H1PS6EWAP1ARPPZ4ZG8 in bucket web nothing shows up. I also can't see a directory with that name within the object bucket

@yeya24
Copy link
Contributor

yeya24 commented Nov 18, 2023

Hi, thanks for all the bug report. I wonder if it is possible for someone to share the problematic block since I don't have a good way to reproduce this issue locally. Please let me know. You can reach out to me on Slack.

@bison
Copy link

bison commented Feb 28, 2024

Seeing this panic on v0.34.0 as well. Also don't see the ulid from the logs in the actual bucket and thanos tools bucket verify --log.level=debug --issues=overlapped_blocks against the bucket doesn't show anything.

Would be happy to provide data if I knew how to find the correct blocks.

@vCra
Copy link

vCra commented Feb 28, 2024

Hey @bison I think I narrowed this down to thanos trying to do vertical compaction on already compacted blocks - this could be the case if you've not previously had vertical compaction enabled

If you want to try a hacky fix, you can try disabling compaction for all the blocks before you enabled compaction

(Thats presuming we have the same issue - it could be something different)

In compact, look at the logs before it crashed - it should start to compact several blocks - you'll need to mark these, and you might need to do it lots of times for all the blocks that have already been compacted

@yeya24
Copy link
Contributor

yeya24 commented Feb 28, 2024

Hi @vCra, thanks for the investigation.

I think I narrowed this down to thanos trying to do vertical compaction on already compacted blocks - this could be the case if you've not previously had vertical compaction enabled

It is interesting to know that. How did you fugure this out? Ideally it shouldn't matter to compact whether blocks already compacted or not so shouldn't panic. Maybe we miss something.

@bison
Copy link

bison commented Feb 29, 2024

@vCra wow thanks, that's exactly what's happening. Just upgraded this stack and vertical compaction got enabled where it wasn't before. Now the first time the compactor encounters two previously compacted blocks at 5m resolution, it panics. If I mark the same blocks (and all other similar blocks) with no-compact, then compaction completes.

Edit: Actually I guess it's any previously compacted block. I originally thought it was only at that resolution for some reason.

@vCra
Copy link

vCra commented Feb 29, 2024

How did you figure this out?

I'm only guessing that this is the issue - compactor kept crashing, and I noticed that we were managing to vertically compact all the new blocks without issue, but the old blocks were not getting vertically compacted - in bucketweb it was quite clear.
The issue was that no downsampling was happening - the count of downsample-todo kept on slowly increasing.
Looking at the logs was how we solved it - we though it could be 1 or two corrupted blocks, so I kept marking all these blocks as don't compact - we had a large backlog so it took a while, but I slowly started to see a pattern that it was only the old blocks that were having an issue.

Looking at bucket-web, we still have the old blocks, but just not vertically compacted - we don't care too much, as we won't use this data too frequently (10 is with vertical compaction)

Screenshot 2024-02-29 at 23 51 06

The discussion in https://cloud-native.slack.com/archives/CK5RSSC10/p1681966324787459 helped too

@GiedriusS
Copy link
Member

I spotted this in prod. Looking into it 👁️

GiedriusS added a commit that referenced this issue Apr 30, 2024
For #6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <[email protected]>
GiedriusS added a commit that referenced this issue May 1, 2024
For #6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <[email protected]>
GiedriusS added a commit that referenced this issue May 3, 2024
Adding a minimal test case for issue #6775 - reproduces the panic in the
compactor.

Signed-off-by: Giedrius Statkevičius <[email protected]>
GiedriusS added a commit that referenced this issue May 3, 2024
Adding a minimal test case for issue #6775 - reproduces the panic in the
compactor.

Signed-off-by: Giedrius Statkevičius <[email protected]>
GiedriusS added a commit to vinted/thanos that referenced this issue May 3, 2024
Adding a minimal test case for issue thanos-io#6775 - reproduces the panic in the
compactor.

Signed-off-by: Giedrius Statkevičius <[email protected]>
Nashluffy pushed a commit to Nashluffy/thanos that referenced this issue May 14, 2024
For thanos-io#6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: mluffman <[email protected]>
Nashluffy pushed a commit to Nashluffy/thanos that referenced this issue May 14, 2024
Adding a minimal test case for issue thanos-io#6775 - reproduces the panic in the
compactor.

Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: mluffman <[email protected]>
saswatamcode pushed a commit to saswatamcode/thanos that referenced this issue May 28, 2024
For thanos-io#6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <[email protected]>
saswatamcode added a commit that referenced this issue May 28, 2024
* compact: recover from panics (#7318)

For #6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Sidecar: wait for prometheus on startup (#7323)

Signed-off-by: Michael Hoffmann <[email protected]>

* Receive: fix serverAsClient.Series goroutines leak (#6948)

* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <[email protected]>

* fix lint

Signed-off-by: Thibault Mange <[email protected]>

* update changelog

Signed-off-by: Thibault Mange <[email protected]>

* delete invalid comment

Signed-off-by: Thibault Mange <[email protected]>

* remove temp dev test

Signed-off-by: Thibault Mange <[email protected]>

* remove timer channel drain

Signed-off-by: Thibault Mange <[email protected]>

---------

Signed-off-by: Thibault Mange <[email protected]>

* Receive: fix stats (#7373)

If we account stats for remote write and local writes we will count them
twice since the remote write will be counted locally again by the remote
receiver instance.

Signed-off-by: Michael Hoffmann <[email protected]>

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (#7382)

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline

Signed-off-by: Saswata Mukherjee <[email protected]>

* small fix

Signed-off-by: Saswata Mukherjee <[email protected]>

---------

Signed-off-by: Saswata Mukherjee <[email protected]>

* Query: dont pass query hints to avoid triggering pushdown (#7392)

If we have a new querier it will create query hints even without the
pushdown feature being present anymore. Old sidecars will then trigger
query pushdown which leads to broken max,min,max_over_time and
min_over_time.

Signed-off-by: Michael Hoffmann <[email protected]>

* Cut patch release v0.35.1

Signed-off-by: Saswata Mukherjee <[email protected]>

---------

Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Michael Hoffmann <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Saswata Mukherjee <[email protected]>
Co-authored-by: Giedrius Statkevičius <[email protected]>
Co-authored-by: Michael Hoffmann <[email protected]>
Co-authored-by: Thibault Mange <[email protected]>
jnyi pushed a commit to jnyi/thanos that referenced this issue Jun 1, 2024
* compact: recover from panics (thanos-io#7318)

For thanos-io#6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Sidecar: wait for prometheus on startup (thanos-io#7323)

Signed-off-by: Michael Hoffmann <[email protected]>

* Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948)

* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <[email protected]>

* fix lint

Signed-off-by: Thibault Mange <[email protected]>

* update changelog

Signed-off-by: Thibault Mange <[email protected]>

* delete invalid comment

Signed-off-by: Thibault Mange <[email protected]>

* remove temp dev test

Signed-off-by: Thibault Mange <[email protected]>

* remove timer channel drain

Signed-off-by: Thibault Mange <[email protected]>

---------

Signed-off-by: Thibault Mange <[email protected]>

* Receive: fix stats (thanos-io#7373)

If we account stats for remote write and local writes we will count them
twice since the remote write will be counted locally again by the remote
receiver instance.

Signed-off-by: Michael Hoffmann <[email protected]>

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382)

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline

Signed-off-by: Saswata Mukherjee <[email protected]>

* small fix

Signed-off-by: Saswata Mukherjee <[email protected]>

---------

Signed-off-by: Saswata Mukherjee <[email protected]>

* Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392)

If we have a new querier it will create query hints even without the
pushdown feature being present anymore. Old sidecars will then trigger
query pushdown which leads to broken max,min,max_over_time and
min_over_time.

Signed-off-by: Michael Hoffmann <[email protected]>

* Cut patch release v0.35.1

Signed-off-by: Saswata Mukherjee <[email protected]>

---------

Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Michael Hoffmann <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Saswata Mukherjee <[email protected]>
Co-authored-by: Giedrius Statkevičius <[email protected]>
Co-authored-by: Michael Hoffmann <[email protected]>
Co-authored-by: Thibault Mange <[email protected]>
jnyi pushed a commit to jnyi/thanos that referenced this issue Jun 4, 2024
* compact: recover from panics (thanos-io#7318)

For thanos-io#6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Sidecar: wait for prometheus on startup (thanos-io#7323)

Signed-off-by: Michael Hoffmann <[email protected]>

* Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948)

* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <[email protected]>

* fix lint

Signed-off-by: Thibault Mange <[email protected]>

* update changelog

Signed-off-by: Thibault Mange <[email protected]>

* delete invalid comment

Signed-off-by: Thibault Mange <[email protected]>

* remove temp dev test

Signed-off-by: Thibault Mange <[email protected]>

* remove timer channel drain

Signed-off-by: Thibault Mange <[email protected]>

---------

Signed-off-by: Thibault Mange <[email protected]>

* Receive: fix stats (thanos-io#7373)

If we account stats for remote write and local writes we will count them
twice since the remote write will be counted locally again by the remote
receiver instance.

Signed-off-by: Michael Hoffmann <[email protected]>

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382)

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline

Signed-off-by: Saswata Mukherjee <[email protected]>

* small fix

Signed-off-by: Saswata Mukherjee <[email protected]>

---------

Signed-off-by: Saswata Mukherjee <[email protected]>

* Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392)

If we have a new querier it will create query hints even without the
pushdown feature being present anymore. Old sidecars will then trigger
query pushdown which leads to broken max,min,max_over_time and
min_over_time.

Signed-off-by: Michael Hoffmann <[email protected]>

* Cut patch release v0.35.1

Signed-off-by: Saswata Mukherjee <[email protected]>

---------

Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Michael Hoffmann <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Saswata Mukherjee <[email protected]>
Co-authored-by: Giedrius Statkevičius <[email protected]>
Co-authored-by: Michael Hoffmann <[email protected]>
Co-authored-by: Thibault Mange <[email protected]>
hczhu-db pushed a commit to databricks/thanos that referenced this issue Aug 22, 2024
* compact: recover from panics (thanos-io#7318)

For thanos-io#6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Sidecar: wait for prometheus on startup (thanos-io#7323)

Signed-off-by: Michael Hoffmann <[email protected]>

* Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948)

* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <[email protected]>

* fix lint

Signed-off-by: Thibault Mange <[email protected]>

* update changelog

Signed-off-by: Thibault Mange <[email protected]>

* delete invalid comment

Signed-off-by: Thibault Mange <[email protected]>

* remove temp dev test

Signed-off-by: Thibault Mange <[email protected]>

* remove timer channel drain

Signed-off-by: Thibault Mange <[email protected]>

---------

Signed-off-by: Thibault Mange <[email protected]>

* Receive: fix stats (thanos-io#7373)

If we account stats for remote write and local writes we will count them
twice since the remote write will be counted locally again by the remote
receiver instance.

Signed-off-by: Michael Hoffmann <[email protected]>

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382)

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline

Signed-off-by: Saswata Mukherjee <[email protected]>

* small fix

Signed-off-by: Saswata Mukherjee <[email protected]>

---------

Signed-off-by: Saswata Mukherjee <[email protected]>

* Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392)

If we have a new querier it will create query hints even without the
pushdown feature being present anymore. Old sidecars will then trigger
query pushdown which leads to broken max,min,max_over_time and
min_over_time.

Signed-off-by: Michael Hoffmann <[email protected]>

* Cut patch release v0.35.1

Signed-off-by: Saswata Mukherjee <[email protected]>

---------

Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Michael Hoffmann <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Saswata Mukherjee <[email protected]>
Co-authored-by: Giedrius Statkevičius <[email protected]>
Co-authored-by: Michael Hoffmann <[email protected]>
Co-authored-by: Thibault Mange <[email protected]>
hczhu-db pushed a commit to databricks/thanos that referenced this issue Aug 22, 2024
Adding a minimal test case for issue thanos-io#6775 - reproduces the panic in the
compactor.

Signed-off-by: Giedrius Statkevičius <[email protected]>
hczhu-db pushed a commit to databricks/thanos that referenced this issue Aug 22, 2024
* compact: recover from panics (thanos-io#7318)

For thanos-io#6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Sidecar: wait for prometheus on startup (thanos-io#7323)

Signed-off-by: Michael Hoffmann <[email protected]>

* Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948)

* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <[email protected]>

* fix lint

Signed-off-by: Thibault Mange <[email protected]>

* update changelog

Signed-off-by: Thibault Mange <[email protected]>

* delete invalid comment

Signed-off-by: Thibault Mange <[email protected]>

* remove temp dev test

Signed-off-by: Thibault Mange <[email protected]>

* remove timer channel drain

Signed-off-by: Thibault Mange <[email protected]>

---------

Signed-off-by: Thibault Mange <[email protected]>

* Receive: fix stats (thanos-io#7373)

If we account stats for remote write and local writes we will count them
twice since the remote write will be counted locally again by the remote
receiver instance.

Signed-off-by: Michael Hoffmann <[email protected]>

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382)

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline

Signed-off-by: Saswata Mukherjee <[email protected]>

* small fix

Signed-off-by: Saswata Mukherjee <[email protected]>

---------

Signed-off-by: Saswata Mukherjee <[email protected]>

* Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392)

If we have a new querier it will create query hints even without the
pushdown feature being present anymore. Old sidecars will then trigger
query pushdown which leads to broken max,min,max_over_time and
min_over_time.

Signed-off-by: Michael Hoffmann <[email protected]>

* Cut patch release v0.35.1

Signed-off-by: Saswata Mukherjee <[email protected]>

---------

Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Michael Hoffmann <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Saswata Mukherjee <[email protected]>
Co-authored-by: Giedrius Statkevičius <[email protected]>
Co-authored-by: Michael Hoffmann <[email protected]>
Co-authored-by: Thibault Mange <[email protected]>
hczhu-db pushed a commit to databricks/thanos that referenced this issue Aug 22, 2024
* compact: recover from panics (thanos-io#7318)

For thanos-io#6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Sidecar: wait for prometheus on startup (thanos-io#7323)

Signed-off-by: Michael Hoffmann <[email protected]>

* Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948)

* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <[email protected]>

* fix lint

Signed-off-by: Thibault Mange <[email protected]>

* update changelog

Signed-off-by: Thibault Mange <[email protected]>

* delete invalid comment

Signed-off-by: Thibault Mange <[email protected]>

* remove temp dev test

Signed-off-by: Thibault Mange <[email protected]>

* remove timer channel drain

Signed-off-by: Thibault Mange <[email protected]>

---------

Signed-off-by: Thibault Mange <[email protected]>

* Receive: fix stats (thanos-io#7373)

If we account stats for remote write and local writes we will count them
twice since the remote write will be counted locally again by the remote
receiver instance.

Signed-off-by: Michael Hoffmann <[email protected]>

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382)

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline

Signed-off-by: Saswata Mukherjee <[email protected]>

* small fix

Signed-off-by: Saswata Mukherjee <[email protected]>

---------

Signed-off-by: Saswata Mukherjee <[email protected]>

* Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392)

If we have a new querier it will create query hints even without the
pushdown feature being present anymore. Old sidecars will then trigger
query pushdown which leads to broken max,min,max_over_time and
min_over_time.

Signed-off-by: Michael Hoffmann <[email protected]>

* Cut patch release v0.35.1

Signed-off-by: Saswata Mukherjee <[email protected]>

---------

Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Michael Hoffmann <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Saswata Mukherjee <[email protected]>
Co-authored-by: Giedrius Statkevičius <[email protected]>
Co-authored-by: Michael Hoffmann <[email protected]>
Co-authored-by: Thibault Mange <[email protected]>
hczhu-db pushed a commit to databricks/thanos that referenced this issue Aug 22, 2024
Adding a minimal test case for issue thanos-io#6775 - reproduces the panic in the
compactor.

Signed-off-by: Giedrius Statkevičius <[email protected]>
hczhu-db pushed a commit to databricks/thanos that referenced this issue Aug 22, 2024
* compact: recover from panics (thanos-io#7318)

For thanos-io#6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Sidecar: wait for prometheus on startup (thanos-io#7323)

Signed-off-by: Michael Hoffmann <[email protected]>

* Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948)

* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <[email protected]>

* fix lint

Signed-off-by: Thibault Mange <[email protected]>

* update changelog

Signed-off-by: Thibault Mange <[email protected]>

* delete invalid comment

Signed-off-by: Thibault Mange <[email protected]>

* remove temp dev test

Signed-off-by: Thibault Mange <[email protected]>

* remove timer channel drain

Signed-off-by: Thibault Mange <[email protected]>

---------

Signed-off-by: Thibault Mange <[email protected]>

* Receive: fix stats (thanos-io#7373)

If we account stats for remote write and local writes we will count them
twice since the remote write will be counted locally again by the remote
receiver instance.

Signed-off-by: Michael Hoffmann <[email protected]>

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382)

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline

Signed-off-by: Saswata Mukherjee <[email protected]>

* small fix

Signed-off-by: Saswata Mukherjee <[email protected]>

---------

Signed-off-by: Saswata Mukherjee <[email protected]>

* Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392)

If we have a new querier it will create query hints even without the
pushdown feature being present anymore. Old sidecars will then trigger
query pushdown which leads to broken max,min,max_over_time and
min_over_time.

Signed-off-by: Michael Hoffmann <[email protected]>

* Cut patch release v0.35.1

Signed-off-by: Saswata Mukherjee <[email protected]>

---------

Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Michael Hoffmann <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Saswata Mukherjee <[email protected]>
Co-authored-by: Giedrius Statkevičius <[email protected]>
Co-authored-by: Michael Hoffmann <[email protected]>
Co-authored-by: Thibault Mange <[email protected]>
jnyi pushed a commit to jnyi/thanos that referenced this issue Oct 17, 2024
* compact: recover from panics (thanos-io#7318)

For thanos-io#6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Sidecar: wait for prometheus on startup (thanos-io#7323)

Signed-off-by: Michael Hoffmann <[email protected]>

* Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948)

* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <[email protected]>

* fix lint

Signed-off-by: Thibault Mange <[email protected]>

* update changelog

Signed-off-by: Thibault Mange <[email protected]>

* delete invalid comment

Signed-off-by: Thibault Mange <[email protected]>

* remove temp dev test

Signed-off-by: Thibault Mange <[email protected]>

* remove timer channel drain

Signed-off-by: Thibault Mange <[email protected]>

---------

Signed-off-by: Thibault Mange <[email protected]>

* Receive: fix stats (thanos-io#7373)

If we account stats for remote write and local writes we will count them
twice since the remote write will be counted locally again by the remote
receiver instance.

Signed-off-by: Michael Hoffmann <[email protected]>

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382)

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline

Signed-off-by: Saswata Mukherjee <[email protected]>

* small fix

Signed-off-by: Saswata Mukherjee <[email protected]>

---------

Signed-off-by: Saswata Mukherjee <[email protected]>

* Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392)

If we have a new querier it will create query hints even without the
pushdown feature being present anymore. Old sidecars will then trigger
query pushdown which leads to broken max,min,max_over_time and
min_over_time.

Signed-off-by: Michael Hoffmann <[email protected]>

* Cut patch release v0.35.1

Signed-off-by: Saswata Mukherjee <[email protected]>

---------

Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Michael Hoffmann <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Saswata Mukherjee <[email protected]>
Co-authored-by: Giedrius Statkevičius <[email protected]>
Co-authored-by: Michael Hoffmann <[email protected]>
Co-authored-by: Thibault Mange <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants