-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workload/tpcc: optimize statements using batching and vectorized prepared statements #53983
base: master
Are you sure you want to change the base?
workload/tpcc: optimize statements using batching and vectorized prepared statements #53983
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @nvanbenschoten, and @RaduBerinde)
pkg/sql/opt/xform/testdata/external/tpcc, line 231 at r6 (raw file):
s_quantity = CASE (s_i_id, s_w_id) WHEN (6823, 0) THEN 26
This is another one that is kind of ridiculous. I believe it can be expressed nicely using something like UPDATE SET s_quantity = q FROM VALUES (...) AS v(i,w,q) WHERE s_i_id = i AND s_w_id = w
Nice speedup. I assume this is on top of the speedup we got when we removed the unnecessary FK indexes? |
These were pretty outdated and the test was running slow, as a result. Release justification: testing only
Up to this point, we had run `fixtures load` (RESTORE) on gce and `fixtures import` (IMPORT) on aws and azure. This was confusing, inconsistent, and made the process of updating the TPC-C workload more effort than strictly necessary. Now that IMPORT is stable and just as fast as, if not faster than, RESTORE, let's use that everywhere. Release justification: testing only
Addresses a TODO. This query has been supported by the optimizer since at least 526b4e5, which is when it was rewritten in the optimizer's test suite. Release justification: testing only
This commit rewrites a few statements in the New-Order transaction to use array placeholders and unnest. This trick helped out a lot with TPC-E because it reduced the number of statements per transaction. In this case, we were already running these as single statements by generating custom SQL, but doing so prevented us from using prepared statements. Release justification: testing only
We saw with TPC-E that the best way to optimize a CPU-bound workload is to issue fewer statements. Even if these statements are larger, less back and forth with the client can dramatically reduce the load on CRDB. To that end, this commit replaces 20 small SELECT statements at the beginning of the Delivery txn with a single SELECT statement. This new statement was tricky to get right because it was critical for the limits on each of its new_order scans to be pushed down to minimize contention. That means that we can't do something like perform a lookup join on a static `VALUES (1, 2, 3, ...)`. Release justification: testing only
The TPC-C spec references "deferred execution" of Delivery txns (section 2.7.2):“The Delivery transaction must be executed in deferred mode... by queuing the transaction for deferred execution". This commit moves our implementation in this direction. While it doesn't introduce a real queue, it provides mutual exclusion between Delivery txns for the same warehouse. This trick is discussed in http://www.vldb.org/pvldb/vol13/p629-huang.pdf, which mentions that: "Delivery transactions for the same warehouse always conflict, so there is no point in trying to execute them in parallel on different cores". This change had a very positive impact when running a single warehouse without wait times on an in-memory version of CockroachDB. Release justification: testing only
47c38ad
to
36709df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is on top of the speedup we got when we removed the unnecessary FK indexes?
Yes, this is on top of the other speedups we got in this release.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)
pkg/sql/opt/xform/testdata/external/tpcc, line 231 at r6 (raw file):
Previously, RaduBerinde wrote…
This is another one that is kind of ridiculous. I believe it can be expressed nicely using something like
UPDATE SET s_quantity = q FROM VALUES (...) AS v(i,w,q) WHERE s_i_id = i AND s_w_id = w
Good idea! Done.
The stats-quality tests need to be updated. You can run with rewrite-actual-stats but you need a running node with TPC-H loaded (SF1 I believe). |
Do you think we could pull out "use fixtures import for tpcc, regardless of cloud" in a separate PR so we can merge that ASAP? |
Pulled the first two commits into #54880. |
This PR contains a series of optimizations to TPC-C that I'm hoping to land before our next large-scale TPC-C test. The optimizations build off some of the insights gained while optimizing TPC-E, like using array placeholders and
unnest
to support "vectorized" prepared statements and to batch small statements into larger statements whenever possible. It also implements spec-compliant mutual exclusion for Delivery transactions operating on the same warehouse, which is mostly useless with wait times enabled, but was critical to keep tail latencies under control in a test like the one discussed in #30624 (comment).See the individual commit messages for more detail.
When combined with #30624, this gives the following speedup:
Release justification: testing only