-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VReplication Copy Phase: Parallelize Bulk Inserts #10828
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Bug fixes
Non-trivial changes
New/Existing features
Backward compatibility
|
With @mattlord help I have been benchmarking the performance of this PR alongside #10788. Both PRs improve performance between 10-200%, depending on factors like CPU load on the data source. Because this PR came first, already has undergone some code review, and is simpler than #10788, I'm going to add more work to this PR.
|
8d13a17
to
b6d6298
Compare
a877a99
to
a7da4da
Compare
2a41fd9
to
9d90fe6
Compare
ce597c0
to
1a7bbb5
Compare
4facf7b
to
fb29459
Compare
3aea5b5
to
5d8bde8
Compare
Have squashed all the commits. Was too annoying to try to fix the commit that was missing DCO with rebase. Previous commit history preserved over here just in case. |
…ovements This PR adds a new experimental flag `--vreplication-parallel-insert-workers=[integer]` that enables parallel bulk inserts during the copy phase of VReplication. It is not set by default. This PR also adds three new VReplication stats: - `VReplicationTableCopyTimings` is similar to `PhaseTimings{Copy}`, but provides per-table-per-stream observability and updates throughout the copy phase, instead of at the end of the copy phase. - `VReplicationTableCopyRowCounts` counts rows copied per-table-per-stream throughout the copy phase. - `VReplicationBulkInsertConcurrency` gauges the configured number of concurrent insertion workers. Co-authored-by: Rohit Nayak <[email protected]> Signed-off-by: Max Englander <[email protected]>
5d8bde8
to
41a4233
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again, great work on this! I only had some minor nits/questions/suggestions. We can huddle on those and get this merged. ❤️
bps.TableCopyRowCounts = stats.NewCountersWithSingleLabel("", "", "Table", "") | ||
bps.TableCopyTimings = stats.NewTimings("", "", "Table") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a sample of what this looks like in the /debug/vars
output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a sample:
"VReplicationTableCopyRowCounts": {"src.-.copy.1.data": 8388608},
func (vr *vreplicator) readSettings(ctx context.Context) (settings binlogplayer.VRSettings, numTablesToCopy int64, err error) { | ||
settings, err = binlogplayer.ReadVRSettings(vr.dbClient, vr.id) | ||
// Same as readSettings, but stores some of the results on this vr. | ||
func (vr *vreplicator) loadSettings(ctx context.Context, dbClient *vdbClient) (settings binlogplayer.VRSettings, numTablesToCopy int64, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the need for these two function changes, what am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to be able to re-use logic in vreplicator.go
for setting things like setting and unsettingsql_mode
and disabling FK checks. I want to use that logic in each new parallel MySQL conn that's created in vcopier.go
. The current vreplicator.go
code makes that a little difficult because it changes state in vreplicator
.
To address that I move the logic I wanted to re-use into a stateless fn readSettings
, and move the stateful stuff into loadSettings
.
Signed-off-by: Max Englander <[email protected]>
Description
This PR adds a new experimental
vttablet
flag--vreplication-parallel-insert-workers=[integer]
that enables parallel bulk inserts during the copy phase of VReplication. It is disabled (1) by default.This PR also adds new VReplication stats:
VReplicationTableCopyTimings
: similar toPhaseTimings{Copy}
, but provides per-table-per-stream observability and updates throughout the copy phase, instead of at the end of the copy phase.VReplicationTableCopyRowCounts
: counts rows copied per-table-per-stream throughout the copy phase.Approach
When parallel inserts are not enabled, VReplication behaves the same way as before this PR, with the exception that the new stats are used.
When setting
--vreplication-parallel-insert-workers
> 1 the bulk inserts — eachexecuted as a single transaction from the vstream packet contents — may happen in parallel and
out-of-order, but the commits of those transactions are still serialized in order.
Other aspects of the VReplication copy-phase logic are preserved:
_vt.copy_state
always follow their corresponding inserts from within the vstream packet.commit
for the vstream packet always follows the corresponding write to_vt.copy_state
._vt.copy_state
andcommit
steps (steps 2 and 3 above) for PK1 will both precede the_vt.copy_state
write and commit steps of PK2.Other phases (catchup, fast-forward, "running") are not changed by this PR.
Checklist
Impacted Areas in Vitess
Components that this PR will affect: