Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workload/tpcc: optimize statements using batching and vectorized prepared statements #53983

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Sep 4, 2020

This PR contains a series of optimizations to TPC-C that I'm hoping to land before our next large-scale TPC-C test. The optimizations build off some of the insights gained while optimizing TPC-E, like using array placeholders and unnest to support "vectorized" prepared statements and to batch small statements into larger statements whenever possible. It also implements spec-compliant mutual exclusion for Delivery transactions operating on the same warehouse, which is mostly useless with wait times enabled, but was critical to keep tail latencies under control in a test like the one discussed in #30624 (comment).

See the individual commit messages for more detail.

When combined with #30624, this gives the following speedup:

name                     old warehouses  new warehouses  delta
tpccbench/nodes=3/cpu=4        753 ± 3%        809 ± 2%  +7.44%  (p=0.000 n=9+7)

Release justification: testing only

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @nvanbenschoten, and @RaduBerinde)


pkg/sql/opt/xform/testdata/external/tpcc, line 231 at r6 (raw file):

  s_quantity
    = CASE (s_i_id, s_w_id)
    WHEN (6823, 0) THEN 26

This is another one that is kind of ridiculous. I believe it can be expressed nicely using something like UPDATE SET s_quantity = q FROM VALUES (...) AS v(i,w,q) WHERE s_i_id = i AND s_w_id = w

@andy-kimball
Copy link
Contributor

Nice speedup. I assume this is on top of the speedup we got when we removed the unnecessary FK indexes?

These were pretty outdated and the test was running slow, as a result.

Release justification: testing only
Up to this point, we had run `fixtures load` (RESTORE) on gce and
`fixtures import` (IMPORT) on aws and azure. This was confusing,
inconsistent, and made the process of updating the TPC-C workload more
effort than strictly necessary. Now that IMPORT is stable and just as
fast as, if not faster than, RESTORE, let's use that everywhere.

Release justification: testing only
Addresses a TODO.

This query has been supported by the optimizer since at least 526b4e5,
which is when it was rewritten in the optimizer's test suite.

Release justification: testing only
This commit rewrites a few statements in the New-Order transaction
to use array placeholders and unnest. This trick helped out a lot
with TPC-E because it reduced the number of statements per transaction.
In this case, we were already running these as single statements by
generating custom SQL, but doing so prevented us from using prepared
statements.

Release justification: testing only
We saw with TPC-E that the best way to optimize a CPU-bound workload
is to issue fewer statements. Even if these statements are larger,
less back and forth with the client can dramatically reduce the
load on CRDB. To that end, this commit replaces 20 small SELECT
statements at the beginning of the Delivery txn with a single
SELECT statement.

This new statement was tricky to get right because it was critical for
the limits on each of its new_order scans to be pushed down to minimize
contention. That means that we can't do something like perform a lookup
join on a static `VALUES (1, 2, 3, ...)`.

Release justification: testing only
The TPC-C spec references "deferred execution" of Delivery txns (section
2.7.2):“The Delivery transaction must be executed in deferred mode... by
queuing the transaction for deferred execution". This commit moves our
implementation in this direction. While it doesn't introduce a real
queue, it provides mutual exclusion between Delivery txns for the same
warehouse. This trick is discussed in
http://www.vldb.org/pvldb/vol13/p629-huang.pdf, which mentions that:
"Delivery transactions for the same warehouse always conflict, so there
is no point in trying to execute them in parallel on different cores".

This change had a very positive impact when running a single warehouse
without wait times on an in-memory version of CockroachDB.

Release justification: testing only
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tpccPrep branch from 47c38ad to 36709df Compare September 5, 2020 03:25
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is on top of the speedup we got when we removed the unnecessary FK indexes?

Yes, this is on top of the other speedups we got in this release.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)


pkg/sql/opt/xform/testdata/external/tpcc, line 231 at r6 (raw file):

Previously, RaduBerinde wrote…

This is another one that is kind of ridiculous. I believe it can be expressed nicely using something like UPDATE SET s_quantity = q FROM VALUES (...) AS v(i,w,q) WHERE s_i_id = i AND s_w_id = w

Good idea! Done.

@RaduBerinde
Copy link
Member

The stats-quality tests need to be updated. You can run with rewrite-actual-stats but you need a running node with TPC-H loaded (SF1 I believe).

@RaduBerinde
Copy link
Member

Do you think we could pull out "use fixtures import for tpcc, regardless of cloud" in a separate PR so we can merge that ASAP?

@nvanbenschoten
Copy link
Member Author

Pulled the first two commits into #54880.

@rafiss rafiss added the X-noremind Bots won't notify about PRs with X-noremind label Feb 4, 2021
@jordanlewis jordanlewis removed their request for review December 15, 2021 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
X-noremind Bots won't notify about PRs with X-noremind
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants