Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add pg_sync_standby_nominal_actual and pg_replication_blocked_transactions #109

Merged

Conversation

TLINDEN
Copy link
Collaborator

@TLINDEN TLINDEN commented Jan 16, 2025

Description

Add two more monitoring metrics to postgreslet deployment queries.yaml:

  • pg_sync_standby_nominal_actual: show if nominal and actual sync standbys, if any, are matching, becomes critical when nominal!=actual
  • pg_replication_blocked_transactions: show the number of transactions blocked by one or more defect sync standbys

These two additional metrics fix the following problem: every once in a while a sync standby in another DC fails (for whatever reason). Now if someone issues a write transaction on the primary (e.g. insert, update or delete), then this query will hang (forever or until some timeout). On a high load instance the apps using the database will after some time get stuck and in the end the whole system stands still.

However, currently we do not have any means to monitor such events. From our current monitoring view, everything on the primary looks fantastic. So with these two new metrics we are now able to issue a critical alert when a sync standby is unresponsive and thereby blocking transactions on the primary.

@TLINDEN TLINDEN requested a review from a team as a code owner January 16, 2025 11:10
@eberlep eberlep changed the base branch from master to backup-exporter-sidecar January 16, 2025 11:53
@eberlep eberlep merged commit 4452044 into backup-exporter-sidecar Jan 16, 2025
1 check passed
@eberlep eberlep deleted the feature/postgres-add-query-check-sync-standby branch January 16, 2025 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants