-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compact: accept malformed index #953
compact: accept malformed index #953
Conversation
Handle cases where we detect that postings have labels listed incorrectly due to Prometheus Isuue thanos-io#5372. With a command line option set these specific errors can be ignored as they happen with Prometheus 2.8.0 and lesser versions.
The VerifyIndex() function explicitly states testing of the invariant ordering, so rather than adding additional parameters to change its behavior when --debug.accept-malformed-index is set, we skip the verification step on the newly compacted TSDB block. This allows compaction to happen as normal when out of order labels are present in the index.
Circleci failes with: Not sure what this is, but I don't believe this relates to my changes... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I like this, but let's address @povilasv comments before merge.
Sorry for this CI fail it's a flakiness. I restarted build.
* Use fields instead of function parameters * use the same variable name everywhere
Can we remove WIP from title and commit? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one nit ((:
pkg/compact/compact.go
Outdated
@@ -908,7 +918,7 @@ func (c *BucketCompactor) Compact(ctx context.Context) error { | |||
|
|||
level.Info(c.logger).Log("msg", "start of compaction") | |||
|
|||
groups, err := c.sy.Groups() | |||
groups, err := c.sy.Groups(c.acceptMalformedIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put to group field as well? Essentially if it is option like this then it suggests that we change this dynamically, which we don't. Also bool argument in function is discouraged ):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at that this morning, which would mean I need to modify the Syncer to support this flag rather than BucketCompactor. We'd pass in the command line option here:
https://github.com/jjneely/thanos/blob/jjneely/accept-malformed-index/cmd/thanos/compact.go#L170
rather than here:
https://github.com/jjneely/thanos/blob/jjneely/accept-malformed-index/cmd/thanos/compact.go#L200
That seemed like not a straight forward path. Then I found that compaction Groups were created by the Syncer rather than the BlockCompactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks (:
Up for a rerun of CircleCI? The failure seems new and unrelated most likely.... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing job! Just two minor nits about a comment and a log message, and we can merge this.
pkg/block/index.go
Outdated
OutOfOrderLabels int | ||
} | ||
|
||
// PrometheusIssue5372Err returns an error if statsd indicates postings with out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wording seems a bit off here. Perhaps: statsd indicates
-> stats indicate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in the most recent patch
return stats, errors.Errorf("out-of-order label set %s for series %d", lset, id) | ||
stats.OutOfOrderLabels++ | ||
level.Warn(logger).Log("msg", | ||
"out-of-order label set: known bug in Prometheus 2.8.0 and below", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should suggest users here to enable --debug.accept-malformed-index
here? I mean we would only tell users about a known bug now but wouldn't tell them how to fix it so that's weird 😄 we could be more user-friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I addressed this in pkg/compact/compact.go
where the error is generated that would otherwise halt execution of the compact component. Rather than here were it just becomes noise when the CLI option is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, that's even better 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you a lot for your work on this and for everyone's reviews :)
This PR attempts to address #888
Changes
Verification
I've run this code with and without the added CLI option against my Prometheus TSDB blocks known to exhibit this condition. The condition is flagged appropriately. New compacted blocks are built.
WIP Status