Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.*: v0.19.0-rc.1 - Huge usage of memory to upload blocks to S3 compared to 0.18.0 #3917

Closed
ahurtaud opened this issue Mar 11, 2021 · 15 comments

Comments

@ahurtaud
Copy link
Contributor

Thanos, Prometheus and Golang version used:
v0.19.0-rc.1

Object Storage Provider:
S3-capable API. (Scality)

What happened:
While validating 0.19.0-rc.1, we faced many OOMKilled thanos sidecars while uploading blocks.
Attaching 2 pprof of Allocs. v0.19 and v0.18
profile-0.19.pdf
minio-go
Client putObjectMultipartStreamFromReadAt 2052.85MB (46.65%)

profile-0.18.pdf

We are seeing same issue with the compactor, and thanos store I believe (but these have more memory limits)

What you expected to happen:
Everything was working fine with 0.18, I saw minio-go lib got updated, maybe a bug there?

How to reproduce it (as minimally and precisely as possible):
Hard to tell, dont know if this affect every S3 object storage. But memory consumption during upload (on big prometheus instance)

@yeya24
Copy link
Contributor

yeya24 commented Mar 11, 2021

The latest release adds hash calculation for each file when uploading the TSDB block. I am not sure if it will cause the memory issue.
@ahurtaud Maybe you can try to set hash-func flag to empty string and see the memory usage? https://github.com/thanos-io/thanos/blob/main/cmd/thanos/config.go#L144

If OOMs still happen, then it might be some problems with the minio sdk.

@GiedriusS
Copy link
Member

GiedriusS commented Mar 12, 2021

What options do you have on Thanos Sidecar? 🤔 What about Store & Compact? Could you upload profiles from them as well? The minio-go function indeed seems to be the difference here.

@GiedriusS
Copy link
Member

It's probably related to minio/mc#3376 & minio/minio-go#1357

@ahurtaud
Copy link
Contributor Author

I am looking at it today.
@yeya24 : hash-func default value is the empty string :/
@GiedriusS : nothing fancy I think.

    Args:
      sidecar
      --log.level=info
      --tsdb.path=/var/prometheus
      --prometheus.url=http://localhost:9090
      --reloader.config-file=/etc/prometheus/config/prometheus.yml.tmpl
      --reloader.config-envsubst-file=/etc/prometheus/shared/prometheus.yml
      --objstore.config-file=/etc/prometheus/s3config/bucket.yml
      --reloader.rule-dir=/etc/prometheus/rules

I've checked also the issue you linked. It looks like minio is not planning a fix for now. I believe it is fine to release thanos like that and see if it impacts Amazon S3 as well.

Will keep you posted about Store and Compact.

@ahurtaud
Copy link
Contributor Author

Ok so, it is definitely a memory Increase during uploads. The store component is not affected. Only the shipper is impacted. (so I believe sidecar, compact, receive and ruler)

I'll attached the profile for the compactor during upload, however it is less visible as it is using I guess lots of page cache or so for the downsampling and so all my memory is "used". We can still see minio-go putObjectMultipartStreamFromReadAt high.
profile-compact-019.pdf
This memory is part of RSS (subject to kubernetes OOM kill).

Big grafana dashboard

Screen Shot 2021-03-15 at 17 33 45

So I believe the minio-go issue linked above by @GiedriusS is the root cause.

I think, we will increase the memory limits for the components which are doing uploads.
Do you think it would be worth looking at it on minio-go project?

@GiedriusS
Copy link
Member

GiedriusS commented Mar 15, 2021

I'm not sure if there is a way to avoid this behaviour from the caller:

- For size smaller than 5MiB PutObject automatically does a single atomic Put operation.
- For size larger than 5MiB PutObject automatically does a resumable multipart Put operation.
- For size input as -1 PutObject does a multipart Put operation until input stream reaches EOF.
  Maximum object size that can be uploaded through this operation will be 5TiB.

@harshavardhana sorry for pinging you here on an unrelated project but maybe it would be possible to control this new behaviour introduced in minio/minio-go#1357 with a flag or something? In our case, we are sure that the files we are trying to upload are immutable due to how TSDB works so that would help out a lot. 🤗

Another option is to revert the minio-go version upgrade ;/

@harshavardhana
Copy link

I'm not sure if there is a way to avoid this behaviour from the caller:

- For size smaller than 5MiB PutObject automatically does a single atomic Put operation.
- For size larger than 5MiB PutObject automatically does a resumable multipart Put operation.
- For size input as -1 PutObject does a multipart Put operation until input stream reaches EOF.
  Maximum object size that can be uploaded through this operation will be 5TiB.

@harshavardhana sorry for pinging you here on an unrelated project but maybe it would be possible to control this new behaviour introduced in minio/minio-go#1357 with a flag or something? In our case, we are sure that the files we are trying to upload are immutable due to how TSDB works so that would help out a lot.

@GiedriusS you can set the PartSize for each request to reduce the memory usage overall

NumThreads: 2, // upload two parts in parallel 
PartSize: 10 * humanize.MiByte, // use 10MiB per part size instead of default 128MiB

@harshavardhana
Copy link

I'm not sure if there is a way to avoid this behaviour from the caller:

- For size smaller than 5MiB PutObject automatically does a single atomic Put operation.
- For size larger than 5MiB PutObject automatically does a resumable multipart Put operation.
- For size input as -1 PutObject does a multipart Put operation until input stream reaches EOF.
  Maximum object size that can be uploaded through this operation will be 5TiB.

@harshavardhana sorry for pinging you here on an unrelated project but maybe it would be possible to control this new behaviour introduced in minio/minio-go#1357 with a flag or something? In our case, we are sure that the files we are trying to upload are immutable due to how TSDB works so that would help out a lot.

@GiedriusS you can set the PartSize for each request to reduce the memory usage overall

NumThreads: 2, // upload two parts in parallel 
PartSize: 10 * humanize.MiByte, // use 10MiB per part size instead of default 128MiB

#3935 something like this will reduce the memory usage by orders of magnitude.

@bwplotka
Copy link
Member

Thanks all for reporting and drilling down. I think we narrowed down to the root cause: minio/minio-go#1357

I think it's quite odd that minio client by default for multipart uses 0.5GB of memory (4 workers by default). We would maybe like to use HTTP request for such large part, not necessary that much memory for it 🤔 Wonder if we can improve this on minio size first.

Also to me it looks like we need some even local benchmark for UploadFile with minio client 🤗

@bwplotka
Copy link
Member

Let's check, I think I might have some ideas to help, commenting here: minio/mc#3376 (comment)

BTW thanks @harshavardhana for jumping and helping! 🤗

@bwplotka
Copy link
Member

Looking on it, last thing until v0.19.0-rc.2 Will try to make some benchmarks

@bwplotka
Copy link
Member

Performed some benchmarks:

	// Uploading 100MB with:
	// PartSize: 1024 * 1024 * 100 = BenchmarkUpload-12    	      84	 416025119 ns/op	421129826 B/op	     946 allocs/op  (400MB allocated for single op)
	// ParSize:  1024 * 1024 * 25  = BenchmarkUpload-12    	     100	 273648888 ns/op	106639830 B/op	    1576 allocs/op (100MB overallocated).
	// PartSize: 1024 * 1024 * 8   = BenchmarkUpload-12    	      97	 254052232 ns/op	35392395 B/op	    3078 allocs/op (Still 33MB allocated [8x4])

As expected @harshavardhana, wonder how it was before this change against race? 🤔

EDIT:

I reverted the race fix locally on my branch and the result is:

// PartSize: 1024 * 1024 * 100 = BenchmarkUpload-12    	      88	 282906636 ns/op	 1671249 B/op	     907 allocs/op
// PartSize: 1024 * 1024 * 8 =BenchmarkUpload-12    	     100	 300660587 ns/op	 2195440 B/op	    3044 allocs/op

The difference is significant.

Code:

// Copyright (c) The Thanos Authors.
// Licensed under the Apache License 2.0.

package s3_test

import (
	"bytes"
	"context"
	"strings"
	"testing"

	"github.com/cortexproject/cortex/integration/e2e"
	e2edb "github.com/cortexproject/cortex/integration/e2e/db"
	"github.com/go-kit/kit/log"
	"github.com/thanos-io/thanos/pkg/objstore/s3"
	"github.com/thanos-io/thanos/test/e2e/e2ethanos"

	"github.com/thanos-io/thanos/pkg/testutil"
)

// Regression benchmark for https://github.com/thanos-io/thanos/issues/3917.
func BenchmarkUpload(b *testing.B) {
	b.ReportAllocs()
	ctx := context.Background()

	s, err := e2e.NewScenario("e2e_bench_mino_client")
	testutil.Ok(b, err)
	b.Cleanup(e2ethanos.CleanScenario(b, s))

	const bucket = "test"
	m := e2edb.NewMinio(8080, bucket)
	testutil.Ok(b, s.StartAndWaitReady(m))

	bkt, err := s3.NewBucketWithConfig(log.NewNopLogger(), s3.Config{
		Bucket:    bucket,
		AccessKey: e2edb.MinioAccessKey,
		SecretKey: e2edb.MinioSecretKey,
		Endpoint:  m.HTTPEndpoint(),
		Insecure:  true,
	}, "test-feed")
	testutil.Ok(b, err)

	buf := bytes.Buffer{}
	buf.Grow(1028 * 1028 * 100) // 100MB.
	word := "abcdefghij"
	for i := 0; i < buf.Cap()/len(word); i++ {
		_, _ = buf.WriteString(word)
	}
	str := buf.String()

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		testutil.Ok(b, bkt.Upload(ctx, "test", strings.NewReader(str)))
	}
}

@bwplotka
Copy link
Member

I am considering using fork for release, and fix this later on master. Any objections?

@bwplotka
Copy link
Member

Discussing minio/mc#3376 for long term approach with super helpful @harshavardhana

@ahurtaud
Copy link
Contributor Author

That is fine for me, also we should rename and re-label the issue properly (no 0.19 tag then, and affecting only the components doing S3 uploads: sidecar, compact, receive, ruler?).
Feel free to rename to something more fitting the issue.

@bwplotka bwplotka changed the title sidecar (+ compact and store): v0.19.0-rc.1 - Huge usage of memory to upload blocks to S3 compared to 0.18.0 .*: v0.19.0-rc.1 - Huge usage of memory to upload blocks to S3 compared to 0.18.0 Mar 24, 2021
bwplotka added a commit that referenced this issue Mar 24, 2021
…eed.

Fixes: #3917

Long term fix: #3967

Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue Mar 24, 2021
…eed.

Fixes: #3917

Long term fix: #3967

Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue Mar 24, 2021
…eed.

Fixes: #3917

Long term fix: #3967

Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue Mar 24, 2021
…eed. (#3968)

Fixes: #3917

Long term fix: #3967

Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue Mar 26, 2021
…eed. (#3968)

Fixes: #3917

Long term fix: #3967

Signed-off-by: Bartlomiej Plotka <[email protected]>
# Conflicts:
#	go.sum
kakkoyun pushed a commit that referenced this issue Mar 26, 2021
* tools: Fix partial and empty matchers in rewrite (#3891)

* Fix partial and empty matchers match

Signed-off-by: yeya24 <[email protected]>

* add testcase for non-equal matchers

Signed-off-by: yeya24 <[email protected]>

* v0.19.0 patch: Added receive benchmark; Fixed Receiver excessive mem usage introduced in 0.17 (#3943)

* Added receive benchmark, baseline.

```
goos: linux
goarch: amd64
pkg: github.com/thanos-io/thanos/pkg/receive
BenchmarkHandlerReceiveHTTP
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them.
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./OK
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./OK-12      	   22260	   1550152 ns/op	 1380340 B/op	    6093 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./conflict_errors
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./conflict_errors-12         	    6619	   6430408 ns/op	 4522487 B/op	   26118 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them.
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./OK
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./OK-12                     	    2695	  17208794 ns/op	15072963 B/op	   60441 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./conflict_errors
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./conflict_errors-12        	     474	  72533286 ns/op	46396932 B/op	  260141 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK-12                	     270	 137050518 ns/op	226595379 B/op	     132 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors-12   	      21	1616025443 ns/op	698724321 B/op	     408 allocs/op
PASS

Process finished with exit code 0
```


Signed-off-by: Bartlomiej Plotka <[email protected]>

* Copy labels.

```
GOROOT=/home/bwplotka/.gvm/gos/go1.15 #gosetup
GOPATH=/home/bwplotka/Repos/thanosgopath #gosetup
/home/bwplotka/.gvm/gos/go1.15/bin/go test -c -o /tmp/___BenchmarkHandlerReceiveHTTP_in_github_com_thanos_io_thanos_pkg_receive github.com/thanos-io/thanos/pkg/receive #gosetup
/tmp/___BenchmarkHandlerReceiveHTTP_in_github_com_thanos_io_thanos_pkg_receive -test.v -test.bench ^\QBenchmarkHandlerReceiveHTTP\E$ -test.run ^$ -test.benchmem -test.benchtime=30s
goos: linux
goarch: amd64
pkg: github.com/thanos-io/thanos/pkg/receive
BenchmarkHandlerReceiveHTTP
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/OK
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/OK-12      	   25887	   1537262 ns/op	 1380023 B/op	    6092 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/conflict_errors
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/conflict_errors-12         	    4237	   7547968 ns/op	 4522583 B/op	   26118 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/OK
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/OK-12                     	    2205	  16513380 ns/op	15071092 B/op	   60420 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/conflict_errors
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/conflict_errors-12        	     525	  67278233 ns/op	46396645 B/op	  260141 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK-12               	     285	 148049189 ns/op	226596168 B/op	     132 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors-12  	      20	1731361499 ns/op	698722550 B/op	     401 allocs/op
PASS

Process finished with exit code 0

```

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addded bench.,

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fix.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Improved API.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Changelog.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addressed Lucas comments.

Signed-off-by: Bartlomiej Plotka <[email protected]>
# Conflicts:
#	CHANGELOG.md

* compact: clean up directories thoroughly (#3869)

* compact: clean up directories properly

I couldn't stop thinking about this code for some reason and I have
figured that I had missed one case in #3031. We need to also clean up
the directories in compaction groups. A compaction could fail leaving
some new directory with a random ULID on the disk. Before attempting to
do another compaction loop, we need to remove it as well because the
compaction process always produces a new, unique directory.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Remove all non expected dirs.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addressed comment.

Signed-off-by: Bartlomiej Plotka <[email protected]>

Co-authored-by: Bartlomiej Plotka <[email protected]>

* Added benchmark, Moved minio-deps to fork without race fix we don't need. (#3968)

Fixes: #3917

Long term fix: #3967

Signed-off-by: Bartlomiej Plotka <[email protected]>
# Conflicts:
#	go.sum

* Changelog fix.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Build fix.

Signed-off-by: Bartlomiej Plotka <[email protected]>

Co-authored-by: Ben Ye <[email protected]>
Co-authored-by: Giedrius Statkevičius <[email protected]>
bwplotka added a commit that referenced this issue Apr 9, 2021
* Cut v0.19.0-rc.0 (#3860)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed missed changelog comments. (#3865)

Missed from: #3860

Signed-off-by: Bartlomiej Plotka <[email protected]>

* rename Metadata API to MetricMetadata API (#3877)

Signed-off-by: yeya24 <[email protected]>

* Fix parseStep bug introduced in PR #3740 (#3887) (#3889)

* Fix bug introduced in PR #3740

Signed-off-by: Hitanshu Mehta <[email protected]>

* Minor change in test

Signed-off-by: Hitanshu Mehta <[email protected]>

Co-authored-by: Hitanshu Mehta <[email protected]>

* Tools: rewrite delete can delete series only if it matches all matchers (#3886) (#3890)

* rewrite delete should delete a series only if it matches all matchers in a deletion request

Signed-off-by: yeya24 <[email protected]>

* add test case

Signed-off-by: yeya24 <[email protected]>

Co-authored-by: Ben Ye <[email protected]>

* cmd/thanos/receive.go: Receive client infers TLS (#3899)

Currently, the thanos Receiver infers whether it should use TLS for
the gRPC clients that forwards time series to other receivers based on
whether the remote-write HTTP server uses TLS. This is not correct, as
the HTTP server may use TLS without the gRPC server using TLS. This
commit fixes the inference.

Longer-term, this will be taken care of by the receive/router split.

Signed-off-by: Lucas Servén Marín <[email protected]>

* Cut v0.19.0-rc.1 (#3900)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* tools: Fix partial and empty matchers in rewrite (#3891)

* Fix partial and empty matchers match

Signed-off-by: yeya24 <[email protected]>

* add testcase for non-equal matchers

Signed-off-by: yeya24 <[email protected]>

* v0.19.0 patch: Added receive benchmark; Fixed Receiver excessive mem usage introduced in 0.17 (#3943)

* Added receive benchmark, baseline.

```
goos: linux
goarch: amd64
pkg: github.com/thanos-io/thanos/pkg/receive
BenchmarkHandlerReceiveHTTP
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them.
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./OK
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./OK-12      	   22260	   1550152 ns/op	 1380340 B/op	    6093 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./conflict_errors
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./conflict_errors-12         	    6619	   6430408 ns/op	 4522487 B/op	   26118 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them.
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./OK
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./OK-12                     	    2695	  17208794 ns/op	15072963 B/op	   60441 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./conflict_errors
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./conflict_errors-12        	     474	  72533286 ns/op	46396932 B/op	  260141 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK-12                	     270	 137050518 ns/op	226595379 B/op	     132 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors-12   	      21	1616025443 ns/op	698724321 B/op	     408 allocs/op
PASS

Process finished with exit code 0
```


Signed-off-by: Bartlomiej Plotka <[email protected]>

* Copy labels.

```
GOROOT=/home/bwplotka/.gvm/gos/go1.15 #gosetup
GOPATH=/home/bwplotka/Repos/thanosgopath #gosetup
/home/bwplotka/.gvm/gos/go1.15/bin/go test -c -o /tmp/___BenchmarkHandlerReceiveHTTP_in_github_com_thanos_io_thanos_pkg_receive github.com/thanos-io/thanos/pkg/receive #gosetup
/tmp/___BenchmarkHandlerReceiveHTTP_in_github_com_thanos_io_thanos_pkg_receive -test.v -test.bench ^\QBenchmarkHandlerReceiveHTTP\E$ -test.run ^$ -test.benchmem -test.benchtime=30s
goos: linux
goarch: amd64
pkg: github.com/thanos-io/thanos/pkg/receive
BenchmarkHandlerReceiveHTTP
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/OK
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/OK-12      	   25887	   1537262 ns/op	 1380023 B/op	    6092 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/conflict_errors
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/conflict_errors-12         	    4237	   7547968 ns/op	 4522583 B/op	   26118 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/OK
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/OK-12                     	    2205	  16513380 ns/op	15071092 B/op	   60420 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/conflict_errors
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/conflict_errors-12        	     525	  67278233 ns/op	46396645 B/op	  260141 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK-12               	     285	 148049189 ns/op	226596168 B/op	     132 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors-12  	      20	1731361499 ns/op	698722550 B/op	     401 allocs/op
PASS

Process finished with exit code 0

```

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addded bench.,

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fix.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Improved API.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Changelog.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addressed Lucas comments.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* compact: clean up directories thoroughly (#3869)

* compact: clean up directories properly

I couldn't stop thinking about this code for some reason and I have
figured that I had missed one case in #3031. We need to also clean up
the directories in compaction groups. A compaction could fail leaving
some new directory with a random ULID on the disk. Before attempting to
do another compaction loop, we need to remove it as well because the
compaction process always produces a new, unique directory.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Remove all non expected dirs.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addressed comment.

Signed-off-by: Bartlomiej Plotka <[email protected]>

Co-authored-by: Bartlomiej Plotka <[email protected]>

* Added benchmark, Moved minio-deps to fork without race fix we don't need. (#3968)

Fixes: #3917

Long term fix: #3967

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Cut v0.19.0-rc.2 (#3969)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* pkg/rules: fix deduplication of equal alerts with different labels (#3960) (#3999)

Currently, if an alerting rule having the same name with different
severity labels is being returned from different replicas then they
are being treated as separate alerts.

Given the following alerts a1,a2 with severities s1,s2 returned from
replicas r1,2:

a1[s1,r1]
a1[s2,r1]
a1[s1,r2]
a1[s2,r2]

Then, currently, the algorithm deduplicates to:

a1[s1]
a1[s2]
a1[s1]
a1[s2]

Instead of the intendet result:

a1[s1]
a1[s2]

This fixes it by removing replica labels before sorting labels for
deduplication.

Signed-off-by: Sergiusz Urbaniak <[email protected]>
# Conflicts:
#	CHANGELOG.md

Co-authored-by: Sergiusz Urbaniak <[email protected]>

* Cut v0.19.0 (#3998)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed VERSION file.

Signed-off-by: Bartlomiej Plotka <[email protected]>

Co-authored-by: Ben Ye <[email protected]>
Co-authored-by: Hitanshu Mehta <[email protected]>
Co-authored-by: Lucas Servén Marín <[email protected]>
Co-authored-by: Giedrius Statkevičius <[email protected]>
Co-authored-by: Sergiusz Urbaniak <[email protected]>
openshift-merge-robot pushed a commit to stolostron/thanos that referenced this issue Apr 12, 2021
* CHANGELOG.md: add v0.16.0 link (thanos-io#3697)

This small commit fixes the changelog entry for release v0.16.0 so that
the section title includes a link to the actual release.

Signed-off-by: Lucas Servén Marín <[email protected]>

* Delete deletion-mark.json at last when deleting a block (thanos-io#3661)

* Delete deletion-mark.json at last when deleting a block

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed linter

Signed-off-by: Marco Pracucci <[email protected]>

* fixed broken links (thanos-io#3645)

Signed-off-by: Namanl2001 <[email protected]>

* chore(tutorials): fix some broken links in 2-lts (thanos-io#3702)

Signed-off-by: Bradley <[email protected]>

* Fix race condition in BinaryReader.LookupSymbol() (thanos-io#3705)

* Fix race condition in BinaryReader.LookupSymbol()

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed BinaryReader receiver and added CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* ui: make old bucket viewer UI work with vanilla blocks (thanos-io#3700)

* ui: make old UI work with vanilla blocks

Make the old bucket viewer UI work with vanilla Prometheus blocks by
checking whether the `Thanos` part exists before formatting the HTML.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* CHANGELOG: update

Signed-off-by: Giedrius Statkevičius <[email protected]>

* ui/bucket: fix according to @squat's suggestions

Signed-off-by: Giedrius Statkevičius <[email protected]>

* ui: update bindata after latest changes

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Query-frontend CMD: use detailed text description

Signed-off-by: dmaiocchi <[email protected]>
Co-authored-by: Bartlomiej Plotka <[email protected]>

* Upgrade Cortex and remove SwiftConfig (thanos-io#3708)

Signed-off-by: Marco Pracucci <[email protected]>

* chore(deps): upgrading hugo (v0.80.0) fixes thanos-io#3653 (thanos-io#3714)

Signed-off-by: Bradley <[email protected]>

* store: Make more S3 http.Transport settings configurable. (thanos-io#3657)

The http.Transport is not auto-tuning and one size does not seem to fit
all cases. In order to respond to a query a store gateway might need to
fetch a large (thousands) of postings, series and chunks.

While the number of idle connections has been increased recently it can
still be too low or expose bursty (opening, closing) behavior. Allow to
tune most of the http.Transport parameters. I considered embedding the
full Transport but there is already a Transport member.

Signed-off-by: Holger Hans Peter Freyther <[email protected]>

* added definiton of chunk and referenced it in the docs (thanos-io#3629)

* Update design.md

Update store.md

Update design.md

Update storage.md

Update troubleshooting.md

Signed-off-by: Biswajit Ghosh <[email protected]>

* Update design.md

Update store.md

Update design.md

Update troubleshooting.md

Update storage.md

Signed-off-by: Biswajit Ghosh <[email protected]>

* Upgraded to newest bingo. (thanos-io#3718)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Update help messages with a consistent use of capitals and periods. (thanos-io#3727)

* Update help messages with a consistent use of capitals and periods.

Signed-off-by: Matt Whitney <[email protected]>

* Update documentation to reflect message changes.

Signed-off-by: Matt Whitney <[email protected]>

* Run `make docs` to correct the documentation rather than manual edits.

Signed-off-by: Matt Whitney <[email protected]>

* docs: update examples/dashboards/dashboards.md (thanos-io#3733)

Signed-off-by: Mert Acikportali <[email protected]>

* pkg/errutil: correct the multierror file suffix (thanos-io#3736)

This commit changes the file suffix from .go.go to simply .go.

Signed-off-by: Lucas Servén Marín <[email protected]>

* Fixed website and added new step for release process (tmp). (thanos-io#3738)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* api: Added Proposal for Discovery Endpoint idea (InfoAPI) (thanos-io#3703)

* docs/proposals/210701_endpoint_discovery.md: Add initial proposal

Signed-off-by: Lili Cosic <[email protected]>

* Update docs/proposals/210701_endpoint_discovery.md

Co-authored-by: Bartlomiej Plotka <[email protected]>

Co-authored-by: Bartlomiej Plotka <[email protected]>

* Introduce "block-meta-fetch-concurrency" flag for compact and store (thanos-io#3752)

* Introduce "block-meta-fetch-concurrency" flag for compact and store

Furthermore consolidate the fetcher concurrency by moving the
fetcherConcurrency const to the fetcher code of the block package.

Signed-off-by: Johannes Frey <[email protected]>

* Document flags

Signed-off-by: Johannes Frey <[email protected]>

* Add `--query-range.request-downsampled` flag to Query Frontend (thanos-io#2641) (thanos-io#3723)

* Add `--query-range.request-downsampled` flag to Query Frontend (thanos-io#2641)

Signed-off-by: Vladimir Kononov <[email protected]>

* Apply suggestions from code review

Co-authored-by: Bartlomiej Plotka <[email protected]>
Signed-off-by: Vladimir Kononov <[email protected]>

Co-authored-by: Bartlomiej Plotka <[email protected]>

* Pinned to newer busybox on quay. (thanos-io#3762)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fix panic on concurrent index-header lazy reader usage and unload (thanos-io#3760)

* Fix panic on concurrent index-header lazy reader usage and unload

Signed-off-by: Marco Pracucci <[email protected]>

* Addressed review comments

Signed-off-by: Marco Pracucci <[email protected]>

* Addressed review comments

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed Planner.Plan() description (thanos-io#3767)

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed panic on concurrent index-header lazy load / unload (thanos-io#3759)

Signed-off-by: Marco Pracucci <[email protected]>

* Pad compaction planner size check (thanos-io#3773)

Pad the index size check in the compaction planner by 15% to avoid
situations where the sum of index bytes bloats to be larger than the
sum of the original index sizes.

* thanos-io#3724
* thanos-io#3750

Signed-off-by: Ben Kochie <[email protected]>

* pkg/server/http: Allow passing http.ServeMux with a server option (thanos-io#3769)

Signed-off-by: Matthias Loibl <[email protected]>

* Update Katacoda Official Link (thanos-io#3768)

Signed-off-by: soniasingla <[email protected]>

* Nomitate kakkoyun for the v0.20.0 release (thanos-io#3778)

Signed-off-by: Kemal Akkoyun <[email protected]>

* Fix typo in query component documentation for store api link (thanos-io#3675)

* Fix wrong store api link

Signed-off-by: Junyoung, Sung <[email protected]>

* Fix store api link to absolute

Signed-off-by: Junyoung, Sung <[email protected]>

* e2e: Resolved Flaky tests (thanos-io#3777)

* added retry function

Signed-off-by: Abhishek357 <[email protected]>

* fixed CI test

Signed-off-by: Abhishek357 <[email protected]>

* restructured code

Signed-off-by: Abhishek357 <[email protected]>

* Add an exempt label to ignore issue that has a PR

Signed-off-by: Kemal Akkoyun <[email protected]>

* in memory cache for caching bucket (thanos-io#3579)

* in memeory cache for caching bucket

Signed-off-by: Sudharshann D <[email protected]>

* addressing comments

Signed-off-by: Sudharshann D <[email protected]>

* Introduce "allow overlapping blocks" flag for Thanos receiver (thanos-io#3792)

* add tsdb.allow-overlapping-blocks flag to receiver

Signed-off-by: Max Chandler <[email protected]>

* make format

Signed-off-by: Max Chandler <[email protected]>

* cleanup whitespace

Signed-off-by: Max Chandler <[email protected]>

* cleanup whitespace

Signed-off-by: Max Chandler <[email protected]>

* make docs

Signed-off-by: Max Chandler <[email protected]>

* fix changelog conflict

Signed-off-by: Max Chandler <[email protected]>

Co-authored-by: Matt Lawrence <[email protected]>

* Reduce memory allocations in bucketBlock.readChunkRange() (thanos-io#3796)

Signed-off-by: Marco Pracucci <[email protected]>

* update prometheus version

Signed-off-by: Mauro Stettler <[email protected]>

* update cortex

Signed-off-by: Mauro Stettler <[email protected]>

* add new property to LabelValuesRequest

Signed-off-by: Mauro Stettler <[email protected]>

* set label matchers in LabelValueRequest

Signed-off-by: Mauro Stettler <[email protected]>

* make docs

Signed-off-by: Mauro Stettler <[email protected]>

* Allow to customise S3 SSE on a per-request basis (thanos-io#3783)

* Allow to customise S3 SSE on a per-request basis

Signed-off-by: Marco Pracucci <[email protected]>

* Addressed review comments

Signed-off-by: Marco Pracucci <[email protected]>

* Refactoring: allow to pass BytesPool to store.NewBucketStore() (thanos-io#3801)

Signed-off-by: Marco Pracucci <[email protected]>

* Allow to customise the partitioner used by the BucketStore (thanos-io#3802)

Signed-off-by: Marco Pracucci <[email protected]>

* Fix button display when there is no panels (thanos-io#3694)

Signed-off-by: Namanl2001 <[email protected]>

* fix tests

Signed-off-by: Mauro Stettler <[email protected]>

* Allow downstream projects to customise the Partitioner (thanos-io#3808)

Signed-off-by: Marco Pracucci <[email protected]>

* Makefile: Fix to command to find React source files (thanos-io#3805)

Signed-off-by: Hitanshu Mehta <[email protected]>

* Merge release 0.18 (thanos-io#3809)

* CHANGELOG.md: release v0.18.0

Signed-off-by: Lucas Servén Marín <[email protected]>

* VERSION,tutorials: bump versions

Signed-off-by: Lucas Servén Marín <[email protected]>

* Revert "VERSION,tutorials: bump versions"

This reverts commit 5a27c70.
The "thanos:" prefix for the container images was accidentally removed.

Signed-off-by: Lucas Servén Marín <[email protected]>

* VERSION,tutorials: bump version for all images

Signed-off-by: Lucas Servén Marín <[email protected]>

* CHANGELOG: bump release date

Signed-off-by: Lucas Servén Marín <[email protected]>

* pkg/rules/proxy: fix hotlooping when receiving client errors

Currently, if we receive an error from the underlying client stream,
we continue with trying to receive additional data.
This causes a hotloop as we will receive the same error again.
This fixes it by returning in the error case and adds a unit test for the proxy logic.

Fixes thanos-io#3717

Signed-off-by: Sergiusz Urbaniak <[email protected]>

* CHANGELOG.md: fix changelog to incorporate new fix (thanos-io#3737)

This commit fixes the order of the changelog to properly reflect a
recent change.

Signed-off-by: Lucas Servén Marín <[email protected]>

* Fixed website and added new step for release process (tmp). (thanos-io#3738)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* CHANGELOG.md: update v0.18.0 release date

Signed-off-by: Lucas Servén Marín <[email protected]>

Co-authored-by: Sergiusz Urbaniak <[email protected]>
Co-authored-by: Bartlomiej Plotka <[email protected]>

* reloader: try to fix test flakiness (thanos-io#3798)

This is an attempt of fixing the `TestReloader_DirectoriesApply` test
flakiness. I call this an attempt because I cannot reproduce it locally.
However, I have noticed that during runs where this fails the logs look
like this:

```
--- FAIL: TestReloader_DirectoriesApply (3.04s)
    reloader_test.go:256: Performing step number 0
    reloader_test.go:256: Performing step number 1
    reloader_test.go:256: Performing step number 2
    reloader_test.go:256: Performing step number 3
    reloader_test.go:256: Performing step number 4
    reloader_test.go:256: Performing step number 6
    reloader_test.go:343: reloader_test.go:343:

        	exp: 5

        	got: 6
```

It immediately jumps to another value. This gave me a hint and I think
that this is happening because potentially `i` can be written to/read
from by multiple goroutines. On very resource constrained systems like
the CircleCI runners, it could just happen that the `if` doesn't do what
it is supposed to.

Try to fix this problem by protecting the whole HTTP handler with a
mutex. Since this is only a test and not a performance critical path, I
think this is a reasonable change to do.

Add extra check for reload failures.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* addressing PR comments

Signed-off-by: Mauro Stettler <[email protected]>

* Block Viewer: Move overlapping blocks to separate rows (thanos-io#3729)

* Block Viewer: overlapping blocks issue resolved

Signed-off-by: Namanl2001 <[email protected]>

* small changes and make assets

Signed-off-by: Namanl2001 <[email protected]>

* Add flag to set default step

Signed-off-by: Hitanshu Mehta <[email protected]>

* Add support to use default step in classic ui

Signed-off-by: Hitanshu Mehta <[email protected]>

* Add support to use default step in new ui

Signed-off-by: Hitanshu Mehta <[email protected]>

* Lint fixes

Signed-off-by: Hitanshu Mehta <[email protected]>

* change ParseStep function to be consistent with ui

Signed-off-by: Hitanshu Mehta <[email protected]>

* Improve description of query.default-step flag

Signed-off-by: Hitanshu Mehta <[email protected]>

* Minor fixes

Signed-off-by: Hitanshu Mehta <[email protected]>

* Use default value when  flag is undefined

Signed-off-by: Hitanshu Mehta <[email protected]>

* minor fixes

Signed-off-by: Hitanshu Mehta <[email protected]>

* Minor fixes

Signed-off-by: Hitanshu Mehta <[email protected]>

* Update changelog

Signed-off-by: Hitanshu Mehta <[email protected]>

* Fix typo

Signed-off-by: Hitanshu Mehta <[email protected]>

* Receive: Improve handling of empty time series from clients (thanos-io#3815)

* exit early if a request has no timeseries data

Signed-off-by: Matt Lawrence <[email protected]>
Co-authored-by: Max Chandler <[email protected]>

* update changelog

Signed-off-by: Matt Lawrence <[email protected]>

* update changelog

Signed-off-by: Matt Lawrence <[email protected]>

* Move constant to right side of comparison

Co-authored-by: Lucas Servén Marín <[email protected]>
Signed-off-by: Matt Lawrence <[email protected]>

Co-authored-by: Max Chandler <[email protected]>
Co-authored-by: Lucas Servén Marín <[email protected]>

* Reduced allocated memory by chunks reader in the store gateway at query time (thanos-io#3814)

* Reduced allocated memory by chunks reader in the store gateway at query time

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed linter issues

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed linter (hopefully)

Signed-off-by: Marco Pracucci <[email protected]>

* Renamed function

Signed-off-by: Marco Pracucci <[email protected]>

* Updated comment

Signed-off-by: Marco Pracucci <[email protected]>

* Updated code comment

Signed-off-by: Marco Pracucci <[email protected]>

* promclient: fix error's message (thanos-io#3824)

Use the provided `method` in the error messages. I got scared reading
the logs that the requests are still being sent using GET instead of
POST which I had specified.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* pkg/ui/react-app: update dependencies (thanos-io#3818)

Fix lodash security issue.

Signed-off-by: Simon Pasquier <[email protected]>

* Add objstore.List() recursive support (thanos-io#3823)

* Add objstore.List() recursive support

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed CachingBucket

Signed-off-by: Marco Pracucci <[email protected]>

* Fix Cortex compilation issue

Signed-off-by: Marco Pracucci <[email protected]>

* Tidy go.mod/sum

Signed-off-by: Marco Pracucci <[email protected]>

* Fix linter

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed TestMetricBucket_Close

Signed-off-by: Marco Pracucci <[email protected]>

* Upgraded Cortex

Signed-off-by: Marco Pracucci <[email protected]>

* Inlined struct initialisation

Signed-off-by: Marco Pracucci <[email protected]>

* Upgrade Cortex again

Signed-off-by: Marco Pracucci <[email protected]>

* Upgrade Cortex (thanos-io#3828)

Signed-off-by: Marco Pracucci <[email protected]>

* Truncated S3 "get object" response is reported as error (thanos-io#3795)

* Add test to ensure S3 truncated response is an error

Signed-off-by: Marco Pracucci <[email protected]>

* Added missing copyright

Signed-off-by: Marco Pracucci <[email protected]>

* Upgraded Minio

Signed-off-by: Marco Pracucci <[email protected]>

* Upgraded Minio again

Signed-off-by: Marco Pracucci <[email protected]>

* Implement federated metric metadata API (thanos-io#3686)

* support federated metadata API

Signed-off-by: Ben Ye <[email protected]>

* update comments

Signed-off-by: yeya24 <[email protected]>

* use parseInt

Signed-off-by: yeya24 <[email protected]>

* address Prem's comments

Signed-off-by: yeya24 <[email protected]>

* update proto comment

Signed-off-by: yeya24 <[email protected]>

* add changelog

Signed-off-by: yeya24 <[email protected]>

* Fixed TestBucketStore_ManyParts_e2e (thanos-io#3841)

https://app.circleci.com/pipelines/github/thanos-io/thanos/5120/workflows/efd2a21d-13b7-4035-99e3-cb1af8023694/jobs/13809

This fail is only visible for anyone from Thanos Team proposing PR, due
to extra tests against bucket providers.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Reinforcing errutil and fixed ugly bug which caused empty []error to te treated as error, (thanos-io#3836)

Previous multi-error implementation could cause very ugly bug of returnig empty multi-error
that should be treated as success not error by API, but if .Err() is not invoked it will be
used as non nil error.

Once we merge this, we can do cleaner solution that slighly change nesting behaviour: thanos-io#3833

There were 9 places where we had this bug in handler due to MultiError lib allowing to do so.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Export metadata fetcher metrics (thanos-io#3660)

Signed-off-by: Marco Pracucci <[email protected]>

* store: Cleaned up API for test/benchmark purposes. (thanos-io#3650)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Allow Cortex to fully reuse FetcherMetrics (thanos-io#3842)

Signed-off-by: Marco Pracucci <[email protected]>

* downsample: ensure consistent order (thanos-io#3843)

Ensure that we have a consistent order of blocks that we are going to
downsample. `range` over maps doesn't enforce any particular order on
purpose.

This is needed for https://github.com/thanos-io/thanos/pull/3031/files.
ATM in that PR before downsampling we delete all directories which do
not match blocks ULIDs in the remote object storage. Ideally, we should
only keep around the files of a block which we are about to downsample.

It is impossible to do that properly ATM if during another iteration
we'd start from a different block. Thus, let's have a consistent order.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* [Fix] Replace unsupported %v macro in logs (thanos-io#3847)

Signed-off-by: Stéphane Brillant <[email protected]>

* Update ui npm dependencies to latest versions (thanos-io#3813)

* Update ui npm deps to latest

Signed-off-by: Saswata Mukherjee <[email protected]>

* Fix linting errors

Signed-off-by: Saswata Mukherjee <[email protected]>

* Change camelcase lint rule

Signed-off-by: Saswata Mukherjee <[email protected]>

* Fix CI configs and scripts to work with main branch (thanos-io#3848)

Signed-off-by: Prem Saraswat <[email protected]>

* block: precalculate hashes if enabled and use them during compaction (downloading) (thanos-io#3031)

* block: precalculate hashes if enabled and use them during compaction

Added the possibility to ignore certain directories in
objstore.{Download,DownloadDir}. Do not download files which have the
same hash as in remote object storage. Wire up `--hash-func` so that
writers could specify what hash function to use when uploading. There is
no performance impact if no hash function has been explicitly specified.
Clean up the removal of files logic in Thanos Compact to ensure we do
not remove something that exists on disk already.

Tested manually + new tests cover all of this more or less.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* block: expose GatherFileStats and use it

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Revert "block: expose GatherFileStats and use it"

This reverts commit 259c70b.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* block: do not calc hash for dirs, add locks

Signed-off-by: Giedrius Statkevičius <[email protected]>

* docs/tools: update

Signed-off-by: Giedrius Statkevičius <[email protected]>

* shipper: pass s.hashFunc

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Fix according to Bartek's comments

Signed-off-by: Giedrius Statkevičius <[email protected]>

* compact: clean up comment

Signed-off-by: Giedrius Statkevičius <[email protected]>

* block: close with log on error

Signed-off-by: Giedrius Statkevičius <[email protected]>

* *: remove unused FNs

Signed-off-by: Giedrius Statkevičius <[email protected]>

* compact: add e2e test for new hash functionality

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Fix according to Bartek's comments

Signed-off-by: Giedrius Statkevičius <[email protected]>

* mixin: Upgrade jsonnet tooling (thanos-io#3855)

* Upgrade jsonnet tooling

Signed-off-by: Kemal Akkoyun <[email protected]>

* Fix bingo version issue

Signed-off-by: Kemal Akkoyun <[email protected]>

* Fix unintended removal issue

Signed-off-by: Kemal Akkoyun <[email protected]>

* Update reference of master to main in docs (thanos-io#3849)

* Update reference of master to main in docs

Signed-off-by: Prem Saraswat <[email protected]>

* Update image tag in tutorials/kubernetes-helm

Signed-off-by: Prem Saraswat <[email protected]>

* Cut v0.19.0-rc.0 (thanos-io#3860)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed missed changelog comments. (thanos-io#3865)

Missed from: thanos-io#3860

Signed-off-by: Bartlomiej Plotka <[email protected]>

* rename Metadata API to MetricMetadata API (thanos-io#3877)

Signed-off-by: yeya24 <[email protected]>

* Fix parseStep bug introduced in PR thanos-io#3740 (thanos-io#3887) (thanos-io#3889)

* Fix bug introduced in PR thanos-io#3740

Signed-off-by: Hitanshu Mehta <[email protected]>

* Minor change in test

Signed-off-by: Hitanshu Mehta <[email protected]>

Co-authored-by: Hitanshu Mehta <[email protected]>

* Tools: rewrite delete can delete series only if it matches all matchers (thanos-io#3886) (thanos-io#3890)

* rewrite delete should delete a series only if it matches all matchers in a deletion request

Signed-off-by: yeya24 <[email protected]>

* add test case

Signed-off-by: yeya24 <[email protected]>

Co-authored-by: Ben Ye <[email protected]>

* cmd/thanos/receive.go: Receive client infers TLS (thanos-io#3899)

Currently, the thanos Receiver infers whether it should use TLS for
the gRPC clients that forwards time series to other receivers based on
whether the remote-write HTTP server uses TLS. This is not correct, as
the HTTP server may use TLS without the gRPC server using TLS. This
commit fixes the inference.

Longer-term, this will be taken care of by the receive/router split.

Signed-off-by: Lucas Servén Marín <[email protected]>

* Cut v0.19.0-rc.1 (thanos-io#3900)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* tools: Fix partial and empty matchers in rewrite (thanos-io#3891)

* Fix partial and empty matchers match

Signed-off-by: yeya24 <[email protected]>

* add testcase for non-equal matchers

Signed-off-by: yeya24 <[email protected]>

* v0.19.0 patch: Added receive benchmark; Fixed Receiver excessive mem usage introduced in 0.17 (thanos-io#3943)

* Added receive benchmark, baseline.

```
goos: linux
goarch: amd64
pkg: github.com/thanos-io/thanos/pkg/receive
BenchmarkHandlerReceiveHTTP
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them.
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./OK
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./OK-12      	   22260	   1550152 ns/op	 1380340 B/op	    6093 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./conflict_errors
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./conflict_errors-12         	    6619	   6430408 ns/op	 4522487 B/op	   26118 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them.
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./OK
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./OK-12                     	    2695	  17208794 ns/op	15072963 B/op	   60441 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./conflict_errors
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./conflict_errors-12        	     474	  72533286 ns/op	46396932 B/op	  260141 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK-12                	     270	 137050518 ns/op	226595379 B/op	     132 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors-12   	      21	1616025443 ns/op	698724321 B/op	     408 allocs/op
PASS

Process finished with exit code 0
```


Signed-off-by: Bartlomiej Plotka <[email protected]>

* Copy labels.

```
GOROOT=/home/bwplotka/.gvm/gos/go1.15 #gosetup
GOPATH=/home/bwplotka/Repos/thanosgopath #gosetup
/home/bwplotka/.gvm/gos/go1.15/bin/go test -c -o /tmp/___BenchmarkHandlerReceiveHTTP_in_github_com_thanos_io_thanos_pkg_receive github.com/thanos-io/thanos/pkg/receive #gosetup
/tmp/___BenchmarkHandlerReceiveHTTP_in_github_com_thanos_io_thanos_pkg_receive -test.v -test.bench ^\QBenchmarkHandlerReceiveHTTP\E$ -test.run ^$ -test.benchmem -test.benchtime=30s
goos: linux
goarch: amd64
pkg: github.com/thanos-io/thanos/pkg/receive
BenchmarkHandlerReceiveHTTP
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/OK
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/OK-12      	   25887	   1537262 ns/op	 1380023 B/op	    6092 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/conflict_errors
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/conflict_errors-12         	    4237	   7547968 ns/op	 4522583 B/op	   26118 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/OK
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/OK-12                     	    2205	  16513380 ns/op	15071092 B/op	   60420 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/conflict_errors
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/conflict_errors-12        	     525	  67278233 ns/op	46396645 B/op	  260141 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK-12               	     285	 148049189 ns/op	226596168 B/op	     132 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors-12  	      20	1731361499 ns/op	698722550 B/op	     401 allocs/op
PASS

Process finished with exit code 0

```

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addded bench.,

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fix.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Improved API.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Changelog.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addressed Lucas comments.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* compact: clean up directories thoroughly (thanos-io#3869)

* compact: clean up directories properly

I couldn't stop thinking about this code for some reason and I have
figured that I had missed one case in thanos-io#3031. We need to also clean up
the directories in compaction groups. A compaction could fail leaving
some new directory with a random ULID on the disk. Before attempting to
do another compaction loop, we need to remove it as well because the
compaction process always produces a new, unique directory.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Remove all non expected dirs.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addressed comment.

Signed-off-by: Bartlomiej Plotka <[email protected]>

Co-authored-by: Bartlomiej Plotka <[email protected]>

* Added benchmark, Moved minio-deps to fork without race fix we don't need. (thanos-io#3968)

Fixes: thanos-io#3917

Long term fix: thanos-io#3967

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Cut v0.19.0-rc.2 (thanos-io#3969)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* pkg/rules: fix deduplication of equal alerts with different labels (thanos-io#3960) (thanos-io#3999)

Currently, if an alerting rule having the same name with different
severity labels is being returned from different replicas then they
are being treated as separate alerts.

Given the following alerts a1,a2 with severities s1,s2 returned from
replicas r1,2:

a1[s1,r1]
a1[s2,r1]
a1[s1,r2]
a1[s2,r2]

Then, currently, the algorithm deduplicates to:

a1[s1]
a1[s2]
a1[s1]
a1[s2]

Instead of the intendet result:

a1[s1]
a1[s2]

This fixes it by removing replica labels before sorting labels for
deduplication.

Signed-off-by: Sergiusz Urbaniak <[email protected]>
# Conflicts:
#	CHANGELOG.md

Co-authored-by: Sergiusz Urbaniak <[email protected]>

* Cut v0.19.0 (thanos-io#3998)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed VERSION file.

Signed-off-by: Bartlomiej Plotka <[email protected]>

Co-authored-by: Lucas Servén Marín <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>
Co-authored-by: Naman Lakhwani <[email protected]>
Co-authored-by: Bradley <[email protected]>
Co-authored-by: Giedrius Statkevičius <[email protected]>
Co-authored-by: dmaiocchi <[email protected]>
Co-authored-by: Bartlomiej Plotka <[email protected]>
Co-authored-by: Ben Ye <[email protected]>
Co-authored-by: Holger Freyther <[email protected]>
Co-authored-by: Biswajit Ghosh <[email protected]>
Co-authored-by: Matt W <[email protected]>
Co-authored-by: Mert Açıkportalı <[email protected]>
Co-authored-by: Lili Cosic <[email protected]>
Co-authored-by: Johannes Frey <[email protected]>
Co-authored-by: Vladimir Kononov <[email protected]>
Co-authored-by: Ben Kochie <[email protected]>
Co-authored-by: Matthias Loibl <[email protected]>
Co-authored-by: Sonia Singla <[email protected]>
Co-authored-by: Kemal Akkoyun <[email protected]>
Co-authored-by: Junyoung, Sung <[email protected]>
Co-authored-by: Abhishek Singh Chauhan <[email protected]>
Co-authored-by: Sudhar287 <[email protected]>
Co-authored-by: Max Chandler <[email protected]>
Co-authored-by: Matt Lawrence <[email protected]>
Co-authored-by: Mauro Stettler <[email protected]>
Co-authored-by: Hitanshu Mehta <[email protected]>
Co-authored-by: Sergiusz Urbaniak <[email protected]>
Co-authored-by: Hitanshu Mehta <[email protected]>
Co-authored-by: Matt Lawrence <[email protected]>
Co-authored-by: Max Chandler <[email protected]>
Co-authored-by: Simon Pasquier <[email protected]>
Co-authored-by: Stephane Brillant <[email protected]>
Co-authored-by: Saswata Mukherjee <[email protected]>
Co-authored-by: Prem Saraswat <[email protected]>
brancz pushed a commit to brancz/objstore that referenced this issue Jan 28, 2022
brancz pushed a commit to brancz/objstore that referenced this issue Jan 28, 2022
* tools: Fix partial and empty matchers in rewrite (#3891)

* Fix partial and empty matchers match

Signed-off-by: yeya24 <[email protected]>

* add testcase for non-equal matchers

Signed-off-by: yeya24 <[email protected]>

* v0.19.0 patch: Added receive benchmark; Fixed Receiver excessive mem usage introduced in 0.17 (#3943)

* Added receive benchmark, baseline.

```
goos: linux
goarch: amd64
pkg: github.com/thanos-io/thanos/pkg/receive
BenchmarkHandlerReceiveHTTP
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them.
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./OK
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./OK-12      	   22260	   1550152 ns/op	 1380340 B/op	    6093 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./conflict_errors
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./conflict_errors-12         	    6619	   6430408 ns/op	 4522487 B/op	   26118 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them.
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./OK
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./OK-12                     	    2695	  17208794 ns/op	15072963 B/op	   60441 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./conflict_errors
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./conflict_errors-12        	     474	  72533286 ns/op	46396932 B/op	  260141 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK-12                	     270	 137050518 ns/op	226595379 B/op	     132 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors-12   	      21	1616025443 ns/op	698724321 B/op	     408 allocs/op
PASS

Process finished with exit code 0
```


Signed-off-by: Bartlomiej Plotka <[email protected]>

* Copy labels.

```
GOROOT=/home/bwplotka/.gvm/gos/go1.15 #gosetup
GOPATH=/home/bwplotka/Repos/thanosgopath #gosetup
/home/bwplotka/.gvm/gos/go1.15/bin/go test -c -o /tmp/___BenchmarkHandlerReceiveHTTP_in_github_com_thanos_io_thanos_pkg_receive github.com/thanos-io/thanos/pkg/receive #gosetup
/tmp/___BenchmarkHandlerReceiveHTTP_in_github_com_thanos_io_thanos_pkg_receive -test.v -test.bench ^\QBenchmarkHandlerReceiveHTTP\E$ -test.run ^$ -test.benchmem -test.benchtime=30s
goos: linux
goarch: amd64
pkg: github.com/thanos-io/thanos/pkg/receive
BenchmarkHandlerReceiveHTTP
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/OK
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/OK-12      	   25887	   1537262 ns/op	 1380023 B/op	    6092 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/conflict_errors
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/conflict_errors-12         	    4237	   7547968 ns/op	 4522583 B/op	   26118 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/OK
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/OK-12                     	    2205	  16513380 ns/op	15071092 B/op	   60420 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/conflict_errors
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/conflict_errors-12        	     525	  67278233 ns/op	46396645 B/op	  260141 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK-12               	     285	 148049189 ns/op	226596168 B/op	     132 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors-12  	      20	1731361499 ns/op	698722550 B/op	     401 allocs/op
PASS

Process finished with exit code 0

```

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addded bench.,

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fix.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Improved API.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Changelog.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addressed Lucas comments.

Signed-off-by: Bartlomiej Plotka <[email protected]>
# Conflicts:
#	CHANGELOG.md

* compact: clean up directories thoroughly (#3869)

* compact: clean up directories properly

I couldn't stop thinking about this code for some reason and I have
figured that I had missed one case in #3031. We need to also clean up
the directories in compaction groups. A compaction could fail leaving
some new directory with a random ULID on the disk. Before attempting to
do another compaction loop, we need to remove it as well because the
compaction process always produces a new, unique directory.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Remove all non expected dirs.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addressed comment.

Signed-off-by: Bartlomiej Plotka <[email protected]>

Co-authored-by: Bartlomiej Plotka <[email protected]>

* Added benchmark, Moved minio-deps to fork without race fix we don't need. (#3968)

Fixes: thanos-io/thanos#3917

Long term fix: thanos-io/thanos#3967

Signed-off-by: Bartlomiej Plotka <[email protected]>
# Conflicts:
#	go.sum

* Changelog fix.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Build fix.

Signed-off-by: Bartlomiej Plotka <[email protected]>

Co-authored-by: Ben Ye <[email protected]>
Co-authored-by: Giedrius Statkevičius <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants