Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VReplication Copy Phase: Parallelize Bulk Inserts #10828

Merged
merged 2 commits into from
Oct 26, 2022

Conversation

rohit-nayak-ps
Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps commented Jul 25, 2022

Description

This PR adds a new experimental vttablet flag --vreplication-parallel-insert-workers=[integer] that enables parallel bulk inserts during the copy phase of VReplication. It is disabled (1) by default.

This PR also adds new VReplication stats:

  • VReplicationTableCopyTimings: similar to PhaseTimings{Copy}, but provides per-table-per-stream observability and updates throughout the copy phase, instead of at the end of the copy phase.
  • VReplicationTableCopyRowCounts: counts rows copied per-table-per-stream throughout the copy phase.

Approach

When parallel inserts are not enabled, VReplication behaves the same way as before this PR, with the exception that the new stats are used.

When setting --vreplication-parallel-insert-workers > 1 the bulk inserts — each
executed as a single transaction from the vstream packet contents — may happen in parallel and
out-of-order, but the commits of those transactions are still serialized in order.

Other aspects of the VReplication copy-phase logic are preserved:

  1. All statements executed when processing a vstream packet occur within a single MySQL transaction.
  2. Writes to _vt.copy_state always follow their corresponding inserts from within the vstream packet.
  3. The final commit for the vstream packet always follows the corresponding write to _vt.copy_state.
  4. The vstream packets are committed in the order seen in the stream. So for any PK1 and PK2, the write to _vt.copy_state and commit steps (steps 2 and 3 above) for PK1 will both precede the _vt.copy_state write and commit steps of PK2.

Other phases (catchup, fast-forward, "running") are not changed by this PR.

Checklist

Impacted Areas in Vitess

Components that this PR will affect:

  • Query Serving
  • VReplication
  • Cluster Management
  • Build/CI
  • VTAdmin

@vitess-bot
Copy link
Contributor

vitess-bot bot commented Jul 25, 2022

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • If a new flag is being introduced, review whether it is really needed. The flag names should be clear and intuitive (as far as possible), and the flag's help should be descriptive. Additionally, flag names should use dashes (-) as word separators rather than underscores (_).
  • If a workflow is added or modified, each items in Jobs should be named in order to mark it as required. If the workflow should be required, the GitHub Admin should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should either include a link to an issue that describes the bug OR an actual description of the bug and how to reproduce, along with a description of the fix.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.

@maxenglander
Copy link
Collaborator

maxenglander commented Aug 2, 2022

With @mattlord help I have been benchmarking the performance of this PR alongside #10788. Both PRs improve performance between 10-200%, depending on factors like CPU load on the data source.

Because this PR came first, already has undergone some code review, and is simpler than #10788, I'm going to add more work to this PR.

  • Completely avoid async path when experimental flag not present
  • Get E2E tests to pass without experimental flag
  • Get unit tests to pass without experimental flag
  • Get E2E tests to pass with experimental flag
  • Get unit tests to pass with experimental flag
  • Add more unit tests with experimental flag
  • Add more E2E tests with experimental flag

@maxenglander maxenglander force-pushed the rn-vrep-concurrent-copy branch 5 times, most recently from 8d13a17 to b6d6298 Compare August 8, 2022 20:19
@maxenglander maxenglander force-pushed the rn-vrep-concurrent-copy branch 4 times, most recently from a877a99 to a7da4da Compare August 15, 2022 23:08
@maxenglander maxenglander force-pushed the rn-vrep-concurrent-copy branch 3 times, most recently from 2a41fd9 to 9d90fe6 Compare August 16, 2022 05:23
@mattlord mattlord added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VReplication Benchmark me Add label to PR to run benchmarks labels Aug 16, 2022
@maxenglander maxenglander force-pushed the rn-vrep-concurrent-copy branch 3 times, most recently from ce597c0 to 1a7bbb5 Compare August 18, 2022 05:33
@mattlord mattlord changed the title Do Not Merge: VReplication Copy Phase: Parallelize bulk insert to check for performance improvements VReplication Copy Phase: Parallelize bulk insert to check for performance improvements Aug 18, 2022
@mattlord mattlord marked this pull request as ready for review August 18, 2022 15:01
@mattlord mattlord self-requested a review as a code owner August 18, 2022 15:01
@maxenglander maxenglander force-pushed the rn-vrep-concurrent-copy branch from 4facf7b to fb29459 Compare September 8, 2022 16:33
@mattlord mattlord added the release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) label Sep 19, 2022
@rohit-nayak-ps rohit-nayak-ps removed the release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) label Oct 9, 2022
@maxenglander maxenglander force-pushed the rn-vrep-concurrent-copy branch from 3aea5b5 to 5d8bde8 Compare October 20, 2022 20:26
@maxenglander
Copy link
Collaborator

maxenglander commented Oct 20, 2022

Have squashed all the commits. Was too annoying to try to fix the commit that was missing DCO with rebase. Previous commit history preserved over here just in case.

…ovements

This PR adds a new experimental flag
`--vreplication-parallel-insert-workers=[integer]` that enables parallel bulk
inserts during the copy phase of VReplication. It is not set by default.

This PR also adds three new VReplication stats:
 - `VReplicationTableCopyTimings` is similar to `PhaseTimings{Copy}`, but
   provides per-table-per-stream observability and updates throughout the copy
   phase, instead of at the end of the copy phase.
 - `VReplicationTableCopyRowCounts` counts rows copied per-table-per-stream
   throughout the copy phase.
 - `VReplicationBulkInsertConcurrency` gauges the configured number of concurrent
   insertion workers.

Co-authored-by: Rohit Nayak <[email protected]>
Signed-off-by: Max Englander <[email protected]>
@maxenglander maxenglander force-pushed the rn-vrep-concurrent-copy branch from 5d8bde8 to 41a4233 Compare October 20, 2022 21:24
Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, great work on this! I only had some minor nits/questions/suggestions. We can huddle on those and get this merged. ❤️

doc/releasenotes/16_0_0_summary.md Outdated Show resolved Hide resolved
go/test/endtoend/vreplication/vreplication_test.go Outdated Show resolved Hide resolved
go/test/endtoend/vreplication/vreplication_test.go Outdated Show resolved Hide resolved
Comment on lines +166 to +167
bps.TableCopyRowCounts = stats.NewCountersWithSingleLabel("", "", "Table", "")
bps.TableCopyTimings = stats.NewTimings("", "", "Table")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a sample of what this looks like in the /debug/vars output?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a sample:

"VReplicationTableCopyRowCounts": {"src.-.copy.1.data": 8388608},

go/vt/vttablet/tabletmanager/vreplication/vcopier.go Outdated Show resolved Hide resolved
go/vt/vttablet/tabletmanager/vreplication/vcopier.go Outdated Show resolved Hide resolved
go/vt/vttablet/tabletmanager/vreplication/vcopier.go Outdated Show resolved Hide resolved
go/vt/vttablet/tabletmanager/vreplication/vcopier.go Outdated Show resolved Hide resolved
func (vr *vreplicator) readSettings(ctx context.Context) (settings binlogplayer.VRSettings, numTablesToCopy int64, err error) {
settings, err = binlogplayer.ReadVRSettings(vr.dbClient, vr.id)
// Same as readSettings, but stores some of the results on this vr.
func (vr *vreplicator) loadSettings(ctx context.Context, dbClient *vdbClient) (settings binlogplayer.VRSettings, numTablesToCopy int64, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the need for these two function changes, what am I missing?

Copy link
Collaborator

@maxenglander maxenglander Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to be able to re-use logic in vreplicator.go for setting things like setting and unsettingsql_mode and disabling FK checks. I want to use that logic in each new parallel MySQL conn that's created in vcopier.go. The current vreplicator.go code makes that a little difficult because it changes state in vreplicator.

To address that I move the logic I wanted to re-use into a stateless fn readSettings, and move the stateful stuff into loadSettings.

Signed-off-by: Max Englander <[email protected]>
@maxenglander maxenglander requested a review from mattlord October 25, 2022 19:10
@mattlord mattlord changed the title VReplication Copy Phase: Parallelize bulk insert to check for performance improvements VReplication Copy Phase: Parallelize Bulk Inserts Oct 26, 2022
@mattlord mattlord merged commit c5eb7a6 into vitessio:main Oct 26, 2022
@mattlord mattlord deleted the rn-vrep-concurrent-copy branch October 26, 2022 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark me Add label to PR to run benchmarks Component: VReplication Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants