Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXPERIMENTAL breaking down the vreplication flow #8044

Closed

Conversation

shlomi-noach
Copy link
Contributor

DO NOT MERGE

Description

This is an ongoing experiment in analyzing VReplicatiob behavior on large data and under load. Specifically we're looking to break down massive operations into smaller parts, and to be able to either parallelize or simplify some code; the target objective is:

  • reduce overall vreplication runtime
  • reduce overall vreplication impact

Related Issue(s)

Writeup coming.

Checklist

  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented May 5, 2021

This PR intentionally mixes multiple approaches, and intentionally prints out tons of log information.

Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
…lace into' are compatible with expected result

Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach
Copy link
Contributor Author

I am having a really hard time fixing the tests. Some of the tests expect specific sequence of queries to run on the database, and the whole point of this PR is that we change the sequence of queries, in an unpredictable way. This is one of the cases in the Vitess test framework where we should change from "test the queries" to "test the eventual data". But I find it overwhelming and I'm not sure where to begin. I feel like I'm stuck.

…h concistsent snapshot, but table locks not acquired. In this setup we return a GTID which is >= row-select time, as opposed to <= row-select time in streamWithoutSnapshot

Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented May 23, 2021

EDIT: SOLVED


One test that is failing on a logical issues is vreplication_basic:

May 23 11:40:39 --- FAIL: TestBasicVreplicationWorkflow (191.64s)
May 23 11:40:39     --- PASS: TestBasicVreplicationWorkflow/insertInitialData (0.14s)
May 23 11:40:39     --- FAIL: TestBasicVreplicationWorkflow/materializeRollup (10.55s)
May 23 11:40:39         --- PASS: TestBasicVreplicationWorkflow/materializeRollup/materialize (0.51s)
May 23 11:40:39     --- FAIL: TestBasicVreplicationWorkflow/shardCustomer (31.32s)
May 23 11:40:39     --- FAIL: TestBasicVreplicationWorkflow/validateRollupReplicates (1.01s)
May 23 11:40:39     --- FAIL: TestBasicVreplicationWorkflow/shardOrders (10.13s)
May 23 11:40:39     --- FAIL: TestBasicVreplicationWorkflow/shardMerchant (32.47s)
May 23 11:40:39     --- FAIL: TestBasicVreplicationWorkflow/materializeProduct (10.12s)
May 23 11:40:39         --- PASS: TestBasicVreplicationWorkflow/materializeProduct/materialize (0.07s)
May 23 11:40:39     --- FAIL: TestBasicVreplicationWorkflow/materializeMerchantOrders (10.13s)
May 23 11:40:39         --- PASS: TestBasicVreplicationWorkflow/materializeMerchantOrders/materialize (0.08s)
May 23 11:40:39     --- FAIL: TestBasicVreplicationWorkflow/materializeSales (10.11s)
May 23 11:40:39         --- PASS: TestBasicVreplicationWorkflow/materializeSales/materialize (0.08s)
May 23 11:40:39     --- FAIL: TestBasicVreplicationWorkflow/materializeMerchantSales (10.12s)
May 23 11:40:39         --- PASS: TestBasicVreplicationWorkflow/materializeMerchantSales/materialize (0.08s)
May 23 11:40:39     --- FAIL: TestBasicVreplicationWorkflow/reshardMerchant2to3SplitMerge (40.86s)
May 23 11:40:39         --- FAIL: TestBasicVreplicationWorkflow/reshardMerchant2to3SplitMerge/reshard (40.84s)
May 23 11:40:39     --- FAIL: TestBasicVreplicationWorkflow/reshardMerchant3to1Merge (11.44s)
May 23 11:40:39         --- FAIL: TestBasicVreplicationWorkflow/reshardMerchant3to1Merge/reshard (11.43s)
May 23 11:40:39 FAIL
May 23 11:40:39 FAIL	vitess.io/vitess/go/test/endtoend/vreplication	191.665s

Specifically:

May 23 11:38:22     helper.go:75: 
May 23 11:38:22         	Error Trace:	helper.go:75
May 23 11:38:22         	            				vreplication_test.go:366
May 23 11:38:22         	Error:      	Not equal: 
May 23 11:38:22         	            	expected: "[[INT64(1)]]"
May 23 11:38:22         	            	actual  : "[[INT64(0)]]"
May 23 11:38:22         	            	
May 23 11:38:22         	            	Diff:
May 23 11:38:22         	            	--- Expected
May 23 11:38:22         	            	+++ Actual
May 23 11:38:22         	            	@@ -1 +1 @@
May 23 11:38:22         	            	-[[INT64(1)]]
May 23 11:38:22         	            	+[[INT64(0)]]
May 23 11:38:22         	Test:       	TestBasicVreplicationWorkflow/validateRollupReplicates

where it seems an extra insert is written to a table, and that insert is then not detected by vreplication. While the test that fails is materialization, I don't think this has to do with materialization. I am unable to understand why this test fails. I'm unable to un-produce it. Been banging my head at this for days now.

Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach
Copy link
Contributor Author

It was, after all, related to GROUP BY.

…old-style, single query SELECT with consistent snapshot and precise GTID pos, or multiple limited SELECTs with estimated GTID pos

Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
@@ -63,7 +63,7 @@ func TestExternalConnectorCopy(t *testing.T) {

expectDBClientAndVreplicationQueries(t, []string{
"begin",
"insert into tab1(id,val) values (1,'a'), (2,'b')",
"* into tab1(id,val) values (1,'a'), (2,'b')",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* is a new "magic" prefix (like / is a magic prefix) that indicates "this is expected to be a substring in the actual result"

@@ -527,6 +527,9 @@ func (tpb *tablePlanBuilder) generateInsertStatement() *sqlparser.ParsedQuery {
func (tpb *tablePlanBuilder) generateInsertPart(buf *sqlparser.TrackedBuffer) *sqlparser.ParsedQuery {
if tpb.onInsert == insertIgnore {
buf.Myprintf("insert ignore into %v(", tpb.name)
} else if tpb.onInsert == insertNormal {
// the condition (tpb.onInsert == insertNormal) is true when there is no GROUP BY
buf.Myprintf("replace into %v(", tpb.name)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still WIP, replace into is not valid in:

  • Materialize, where GROUP BY is found
  • Online DDL, where a UNIQUE KEY is added

@shlomi-noach
Copy link
Contributor Author

I believe we will not be going down this particular path. I have thoughts on how to parallelize VReplication using the existing logic (with consistent snapshot).
This PR is educational and instructive, but otherwise a dead end. Closing it.

@shlomi-noach shlomi-noach deleted the vreplication-shorter-snapshots branch August 29, 2021 06:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant