-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MimirIngesterHasNotShippedBlocks alert #5396
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
00eca56
to
c95a23f
Compare
Thanks @krajorama for your review. However, after your review I changed the metric name to keep it consistent with the other shipper metrics (so having |
...metamonitoring-values-generated/mimir-distributed/templates/metamonitoring/mixin-alerts.yaml
Outdated
Show resolved
Hide resolved
pkg/ingester/shipper.go
Outdated
Help: "Total number of TSDB block upload failures", | ||
}), | ||
lastSuccessfulUploadTime: promauto.With(reg).NewGauge(prometheus.GaugeOpts{ | ||
Name: "cortex_ingester_shipper_last_successful_upload_time", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_timestamp_seconds
is recommended suffix for metric like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in e266481
Signed-off-by: Marco Pracucci <[email protected]>
…_ingester_shipper_last_successful_upload_timestamp_seconds Signed-off-by: Marco Pracucci <[email protected]>
c95a23f
to
e266481
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-5396-to-r245 origin/r245
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x 7d9b3d42468c6f39aad236dbb7093895580400e6
# Push it to GitHub
git push --set-upstream origin backport-5396-to-r245
git switch main
# Remove the local backport branch
git branch -D backport-5396-to-r245 Then, create a pull request where the |
* Fix MimirIngesterHasNotShippedBlocks alert Signed-off-by: Marco Pracucci <[email protected]> * Renamed cortex_ingester_shipper_last_successful_upload_time to cortex_ingester_shipper_last_successful_upload_timestamp_seconds Signed-off-by: Marco Pracucci <[email protected]> --------- Signed-off-by: Marco Pracucci <[email protected]> (cherry picked from commit 7d9b3d4)
* Fix MimirIngesterHasNotShippedBlocks alert * Renamed cortex_ingester_shipper_last_successful_upload_time to cortex_ingester_shipper_last_successful_upload_timestamp_seconds --------- Signed-off-by: Marco Pracucci <[email protected]> (cherry picked from commit 7d9b3d4) Co-authored-by: Marco Pracucci <[email protected]>
What this PR does
Earlier today we've got an issue for which I was expecting
MimirIngesterHasNotShippedBlocks
to fire, but it didn't. Turns out the reason is that the metricthanos_shipper_last_successful_upload_time
is not exported anymore (it's tracked in the code, but not mapped intsdbMetrics
). This PR fixes it.User-facing changes:
cortex_ingester_shipper_last_successful_upload_timestamp_seconds
cortex_ingester_shipper_dir_syncs_total
because useless (never used in our mixing)cortex_ingester_shipper_dir_sync_failures_total
because it was never incremented (never used in our mixin)Note to reviewers:
metrics
struct has been renamed toshipperMetrics
and is now shared across all shipper instances. The metrics tracked are safe to be shared because they're either counters (safe) orcortex_ingester_shipper_last_successful_upload_timestamp_seconds
(gauge) which when updated is always set to current time (so we effectively track the "last successful upload time").Which issue(s) this PR fixes or relates to
N/A
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]