-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shipper: Be strict about upload order unless it's specified so & cut v0.13.0-rc.2 #2765
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just wondering though .. what's the use case of not uploading oldest to newest? do we really need a flag?
I mentioned this in flag help. Why I created an option to still enable it:
Those arguments are not strong I agree, I am happy to reconsider this... Maybe moving this flag to Also technically we should do rc.3 with this not full release. Thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good! Just some language nits
CHANGELOG.md
Outdated
@@ -26,6 +26,7 @@ We use *breaking* word for marking changes that are not backward compatible (rel | |||
- [#2416](https://github.com/thanos-io/thanos/pull/2416) Bucket: Fixed issue #2416 bug in `inspect --sort-by` doesn't work correctly in all cases. | |||
- [#2719](https://github.com/thanos-io/thanos/pull/2719) Query: `irate` and `resets` use now counter downsampling aggregations. | |||
- [#2705](https://github.com/thanos-io/thanos/pull/2705) minio-go: Added support for `af-south-1` and `eu-south-1` regions. | |||
- [#2753](https://github.com/thanos-io/thanos/issues/2753) Sidecar,Receive,Rule: Fixed cause for compactor overlapping blocks in upload error cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit on comment formatting: we should have spaces between component names
cmd/thanos/receive.go
Outdated
@@ -89,6 +89,12 @@ func registerReceive(m map[string]setupFunc, app *kingpin.Application) { | |||
|
|||
walCompression := cmd.Flag("tsdb.wal-compression", "Compress the tsdb WAL.").Default("true").Bool() | |||
|
|||
allowOutOfOrderUpload := cmd.Flag("shipper.allow-out-of-order-uploads", | |||
"If true shipper will skip failed block uploads in given iteration and retry later. This means that some newer blocks might uploaded sooner than older."+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep these as full sentences to make it more readable.
"If true shipper will skip failed block uploads in given iteration and retry later. This means that some newer blocks might uploaded sooner than older."+ | |
"If true, shipper will skip failed block uploads in the given iteration and retry later. This means that some newer blocks might be uploaded sooner than older blocks."+ |
cmd/thanos/receive.go
Outdated
@@ -89,6 +89,12 @@ func registerReceive(m map[string]setupFunc, app *kingpin.Application) { | |||
|
|||
walCompression := cmd.Flag("tsdb.wal-compression", "Compress the tsdb WAL.").Default("true").Bool() | |||
|
|||
allowOutOfOrderUpload := cmd.Flag("shipper.allow-out-of-order-uploads", | |||
"If true shipper will skip failed block uploads in given iteration and retry later. This means that some newer blocks might uploaded sooner than older."+ | |||
"This will trigger compaction without those blocks, as a resulted create 'valid overlap situation'. Set it to true if you have vertical compaction enabled and wish to upload blocks as soon as possible without caring"+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This will trigger compaction without those blocks, as a resulted create 'valid overlap situation'. Set it to true if you have vertical compaction enabled and wish to upload blocks as soon as possible without caring"+ | |
"This will trigger compaction without those blocks and as a result will create a 'valid overlap situation'. Set it to true if you have vertical compaction enabled and wish to upload blocks as soon as possible without caring"+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for clarification, what is a 'valid overlap situation'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed to just an overlap situation
docs/operating/troubleshooting.md
Outdated
@@ -29,13 +28,15 @@ Checking producers log for such ULID, and checking meta.json (e.g if sample stat | |||
|
|||
### Reasons | |||
|
|||
- You are running Thanos (sidecar, ruler or receive) older than 0.13.0. During transient upload errors there was possibility to have overlaps caused by compactor not being aware of all blocks See: [this](https://github.com/thanos-io/thanos/issues/2753) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- You are running Thanos (sidecar, ruler or receive) older than 0.13.0. During transient upload errors there was possibility to have overlaps caused by compactor not being aware of all blocks See: [this](https://github.com/thanos-io/thanos/issues/2753) | |
- You are running Thanos (sidecar, ruler or receive) older than 0.13.0. During transient upload errors there is a possibility to have overlaps caused by the compactor not being aware of all blocks See: [this](https://github.com/thanos-io/thanos/issues/2753) |
} | ||
} | ||
|
||
if err := s.upload(ctx, m); err != nil { | ||
level.Error(s.logger).Log("msg", "shipping failed", "block", m.ULID, "err", err) | ||
if !s.allowOutOfOrderUploads { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
var metas []*metadata.Meta | ||
// blockMetasFromOldest returns the block meta of each block found in dir | ||
// sorted by minTime asc. | ||
func (s *Shipper) blockMetasFromOldest() (metas []*metadata.Meta, _ error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, I think this is a good simplification
Yes I think rc.3 is the right thing to do here. I think a hidden flag for now sounds good. |
Signed-off-by: Bartlomiej Plotka <[email protected]>
Signed-off-by: Bartlomiej Plotka <[email protected]>
openshift/master * upstream/release-0.13: Cut release v0.13.0 shipper: Be strict about upload order unless it's specified so & cut v0.13.0-rc.2 (thanos-io#2765) Cut 0.13.0 release. (thanos-io#2762) Cut release 0.13.0-rc.1 (thanos-io#2720) Store: `irate` and `resets` use now counter downsampling aggregations. (thanos-io#2719) deps: Updated minio-go dependency to v6.0.56 to add two region endpoints (thanos-io#2705) (thanos-io#2718) store/proxy: Deduplicate chunks on StoreAPI level. Recommend chunk sorting for StoreAPI + Optimized iter chunk dedup. (thanos-io#2710) (thanos-io#2711) Allow using multiple memcached clients at the same time. (thanos-io#2648) (thanos-io#2698) Updated Prometheus as little as possible to include Isolation fix. (thanos-io#2697) Release fix attempt2. Fixed test job. (thanos-io#2650) Fixed promu build to build in compatible directory that crossbuild understands. Cut v0.13.0-rc.0 (thanos-io#2628)
This is actually a quite real case for potential overlaps in Thanos system, so fixing before 0.13.
Fixes #2753
Thanks @gburek-fastly for all pointers, it helped us to narrow this down 💪
Signed-off-by: Bartlomiej Plotka [email protected]