-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Point In Time Recovery #551
base: main
Are you sure you want to change the base?
Conversation
# Conflicts: # src/upgrade.py
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are only collecting binlogs from the leader unit, we need to add handling for when Juju elects a new leader -> what should happen here? Should the new leader unit start collecting binlogs instead?
Furthermore, while thinking of the above use case, we will also handle the scaling scenario -> what if the leader unit is scaled down?
Also, I would really prefer it if we could add an integration test for the above scenario (where the leader unit is scaled down after which the PITR is performed) after we determine how to handle the scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments and I'll try to test it.
Hi @Zvirovyi , please take a look the failing tests. Ping me for authorize the test run |
This is handled by binding s3 changed event to the leader elected event like
Same as above, I don't see difference here.
Would it be VM-specific integration test? If so, then should we add it? |
# Conflicts: # lib/charms/mysql/v0/mysql.py
# Conflicts: # lib/charms/mysql/v0/s3_helpers.py
# Conflicts: # lib/charms/mysql/v0/mysql.py # src/charm.py # src/upgrade.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks great! Will help resolve the unit test failures so that we can see the integration test results
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #551 +/- ##
==========================================
- Coverage 66.25% 64.11% -2.15%
==========================================
Files 17 20 +3
Lines 3180 4489 +1309
Branches 424 742 +318
==========================================
+ Hits 2107 2878 +771
- Misses 935 1370 +435
- Partials 138 241 +103 ☔ View full report in Codecov by Sentry. |
Important!
This PR relies on the last version of the charmed-mysql-snap and canonical/charmed-mysql-snap#63.
Overview
MySQL stores binary transactions logs. This PR adds a service job to upload these logs to the S3 bucket and the ability to use them later for a point-in-time-recovery with a new
restore-to-time
parameter during restore. This new parameter accepts MySQL timestamp or keywordlatest
(for replaying all the transaction logs).Also, a new application blocked status is introduced -
Another cluster S3 repository
to signal user that used S3 repository is claimed by the another cluster and binlogs collecting job is disabled and creating new backups is restricted (these are the only workload limitation). This is crucial to keep stored binary logs safe from the another clusters. This check uses@@GLOBAL.group_replication_group_name
.After restore, cluster group replication is reinitialized, so practically it becomes a new different cluster. For these cases,
Another cluster S3 repository
message is changed to theMove restored cluster to another S3 repository
to indicate this event more conveniently for the user.Both the block messages will disappear when S3 configuration is removed or changed to the empty repository.
Usage example
juju run mysql/leader restore backup-id=2024-11-20T17:08:24Z restore-to-time="2024-11-20 17:10:01"
juju run mysql/leader restore backup-id=2024-11-20T17:08:24Z restore-to-time="latest"
Key notes