Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: improve precision of index spans to scan for multi-column-family single-row fetches #18168

Closed
jordanlewis opened this issue Sep 1, 2017 · 13 comments
Assignees
Labels
A-sql-optimizer SQL logical planning and optimizations. C-performance Perf of queries or internals. Solution not expected to change functional behavior.

Comments

@jordanlewis
Copy link
Member

jordanlewis commented Sep 1, 2017

Suppose we have a table and a query:

CREATE TABLE a (a INT PRIMARY KEY, b INT, c INT, FAMILY(a), FAMILY(b), FAMILY(c);
INSERT INTO a VALUES(1,2,3);

SELECT b FROM a WHERE a=1;

Currently, this query will cause a scan from /a/primary/1 -> /a/primary/2, even though ideally we'd only need to scan the key at /a/primary/1/b.

We have enough information to improve this scan, since we know that c is not a needed column, a is the primary index and therefore unique, and we have an equality constraint on a.

We should consider improving the algorithm in spanFromLogicalSpan to incorporate all of this information into its final output span.

cc @danhhz @andreimatei @petermattis based on our earlier conversation.

@jordanlewis jordanlewis self-assigned this Sep 1, 2017
@jordanlewis jordanlewis added this to the 1.2 milestone Sep 1, 2017
@jordanlewis jordanlewis modified the milestones: 2.0, 2.1 Feb 22, 2018
@jordanlewis jordanlewis added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Feb 22, 2018
@jordanlewis jordanlewis added the A-sql-execution Relating to SQL execution. label Apr 25, 2018
@jordanlewis jordanlewis added A-sql-optimizer SQL logical planning and optimizations. and removed A-sql-execution Relating to SQL execution. labels Sep 5, 2018
@jordanlewis jordanlewis modified the milestones: 2.1, 2.2 Sep 5, 2018
@nvanbenschoten
Copy link
Member

I now suspect that this is the reason why multiple column families didn't provide a noticeable improvement in YCSB workload A. If an UPDATE statement only needed to scan the primary key and the columns that it was updating, then splitting every column in YCSB into a separate column family could reduce contention by 10x. I think this means we could scale to 10 times the throughput in the zipfian distribution.

@jordanlewis
Copy link
Member Author

The code that needs to be updated is spansFromConstraint in opt_index_selection.go. That code path needs to get passed the set of needed columns from ConstructScan and determine just the precise set of column families that are required to pick up all the needed columns.

One complication is that spanFromConstraintSpan currently always returns exactly 1 span. If the required set of column families isn't adjacent within a row, it will need to return more than 1 span. I think the RowFetcher will be okay with seeing more than 1 span per row, based on the results of the investigation from #29374, but that will have to be verified.

@andreimatei
Copy link
Contributor

I've filed #30656 to track similar improvements to be done even when col fams are not used (and all columns are part of a single col fam).

@solongordon
Copy link
Contributor

Regarding the complication @jordanlewis mentioned, I'm thinking for a first pass we can just handle the case where only a single column family is needed. Does that seem reasonable, and is it sufficient for TPCC?

@jordanlewis
Copy link
Member Author

Unfortunately I don't think it's sufficient for TPCC, which would have tables that need 3 columns families total - one with the mutable columns from newOrder, one with the immutable columns, and one with the mutable columns from payment.

@jordanlewis
Copy link
Member Author

Actually, I'm not entirely sure if the row fetcher would need to itself emit spans over more than 1 column family, though. @nvanbenschoten I think you thought this through in more detail, didn't you?

@nvanbenschoten
Copy link
Member

Yes, TPC-C and YCSB will both require us to emit spans over more than 1 column family for the proposed optimization to work correctly. In both cases, there will be a static column family that all requests will read and a set of dynamic column families, out of which any given request will only read and update one.

@jordanlewis
Copy link
Member Author

But we're talking about an entire transaction, right? The row fetcher only has to do tight spans for each query in isolation, and it looks like each individual read-only query in payment.go and new_order.go on the shared tables (customer, district, warehouse) only touches static columns.

In new order,

SELECT c_discount, c_last, c_credit FROM customer WHERE c_w_id=$1 and c_d_id=$2 and c_id=$3 is all static
SELECT w_tax FROM warehouse WHERE w_id... is static

In payment,

SELECT c_id FROM customer WHERE c_w_id, c_d_id, c_last = ... order by c_first is static

And the rest of the queries are either mutations or queries on tables that aren't shared across those 2 txns.

So actually I think it's fine to only do this special case for 1 CF for now, and we'll still get a win for TPCC.

@nvanbenschoten
Copy link
Member

The problem isn't just the read-only queries, it's also the read-write queries. If one of the UPDATE statements touches a column family that it doesn't need to read or update, then we've already lost. For instance, newOrder's first statement UPDATE district SET d_next_o_id = d_next_o_id + 1 WHERE d_w_id = ... needs to read the static column family and one of the dynamic column families. If the limitation means that it wont be able to avoid touching the other dynamic column family (which contains d_ytd) then we're not going to avoid any contention.

@nvanbenschoten
Copy link
Member

Oh and FWIW, this issue touches the following TODO:

// TODO(radu/knz): currently the optimization always loads the
// entire row from KV and only skips unnecessary decodes to
// Datum. Investigate whether performance is to be gained (e.g. for
// tables with wide rows) by reading only certain columns from KV
// using point lookups instead of a single range lookup for the
// entire row.
valNeededForCol util.FastIntSet

solongordon added a commit to solongordon/cockroach that referenced this issue Sep 27, 2018
For tables with multiple column families, point lookups will now only
scan column families which contain the needed columns. Previously we
would scan the entire row. This optimization allows for faster lookups
and, perhaps more importantly, reduces contention between operations on
the same row but disjoint column families.

Fixes cockroachdb#18168

Release note: None
@nvanbenschoten
Copy link
Member

Unfortunately, my last post which I deleted was premature. I realized shortly after that we weren't correctly scanning at least one non-nullable column family when fetching the rows to update, so we weren't actually performing any updates at all because all fields in the dataset start as NULL. In other words, this is what the plan looked like.

root@:26257/defaultdb> EXPLAIN UPDATE ycsb SET field4 = 's' WHERE ycsb_key = 's';                                                                                                             tree         | field |    description
+---------------------+-------+-------------------+
  count               |       |
   └── update         |       |
        │             | table | ycsb
        │             | set   | field4
        └── render    |       |
             └── scan |       |
                      | table | ycsb@primary
                      | spans | /"s"/5/1-/"s"/5/2
(8 rows)

With that issue fixed, the plan correctly looks like:

root@:26257/ycsb> EXPLAIN UPDATE usertable SET field4 = 's' WHERE ycsb_key = 's';
         tree         | field |           description
+---------------------+-------+---------------------------------+
  count               |       |
   └── update         |       |
        │             | table | usertable
        │             | set   | field4
        └── render    |       |
             └── scan |       |
                      | table | usertable@primary
                      | spans | /"s"/0-/"s"/1 /"s"/5/1-/"s"/5/2
(8 rows)

I ran a modified YCSB workload again and the theory still panned out, but it certainly wasn't the 5x throughput improvement we saw in the other test.

image

This is YCSB's workload A with a zipfian load distribution. I ran it on a three-node GCE cluster with n1-highcpu-16 machines and the nobarrier SSD mount option. The concurrency started at 1 thread and doubled every minute up to 1024 threads. We can see that the peak throughput is ~50% higher with column families than without. In one sence, this is disapointing compared to the crazy speedup we saw before. In another, 50% is still an awesome win and clearly justifies the urgency of this change! It also makes me wonder whether contention is still the primary bottleneck in YCSB.

I also ran a second test where I used the old broken fix for this issue but adjusted the YCSB schema to use non-nullable columns so that we wouldn't miss rows. That resulted in very similar performance to what we see here. That indicates that the extra span in the point lookup has a negligible cost, or at least a cost that is offset by the extra data size required for non-nullable columns in this schema.

@nvanbenschoten
Copy link
Member

It is also interesting that these new measurements hit a scalability cliff at the same place as the test that wasn't performing any updates at all. That lends further credence to the idea that there's a new dominant bottleneck that's replaced contention once we reach a concurrency of around 64 threads.

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Oct 1, 2018

I wonder if we're hitting the same concurrency cliff that we saw in #26178. workload/ycsb uses prepared statements that are scoped to the entire connection pool, so it's possible.

solongordon added a commit to solongordon/cockroach that referenced this issue Oct 2, 2018
For tables with multiple column families, point lookups will now only
scan column families which contain the needed columns. Previously we
would scan the entire row. This optimization allows for faster lookups
and, perhaps more importantly, reduces contention between operations on
the same row but disjoint column families.

Fixes cockroachdb#18168

Release note: None
solongordon added a commit to solongordon/cockroach that referenced this issue Oct 2, 2018
For tables with multiple column families, point lookups will now only
scan column families which contain the needed columns. Previously we
would scan the entire row. This optimization allows for faster lookups
and, perhaps more importantly, reduces contention between operations on
the same row but disjoint column families.

Fixes cockroachdb#18168

Release note: None
@petermattis petermattis removed this from the 2.2 milestone Oct 5, 2018
solongordon added a commit to solongordon/cockroach that referenced this issue Oct 9, 2018
For tables with multiple column families, point lookups will now only
scan column families which contain the needed columns. Previously we
would scan the entire row. This optimization allows for faster lookups
and, perhaps more importantly, reduces contention between operations on
the same row but disjoint column families.

Fixes cockroachdb#18168

Release note: None
solongordon added a commit to solongordon/cockroach that referenced this issue Oct 9, 2018
For tables with multiple column families, point lookups will now only
scan column families which contain the needed columns. Previously we
would scan the entire row. This optimization allows for faster lookups
and, perhaps more importantly, reduces contention between operations on
the same row but disjoint column families.

Fixes cockroachdb#18168

Release note: None
craig bot pushed a commit that referenced this issue Oct 9, 2018
30744: sql: optimize point lookups on column families r=solongordon a=solongordon

For tables with multiple column families, point lookups will now only
scan column families which contain the needed columns. Previously we
would scan the entire row. This optimization allows for faster lookups
and, perhaps more importantly, reduces contention between operations on
the same row but disjoint column families.

Fixes #18168

Release note: None

Co-authored-by: Solon Gordon <[email protected]>
@craig craig bot closed this as completed in #30744 Oct 9, 2018
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Nov 29, 2018
This change adds a new `--families` flag to the ycsb workload. Now
that cockroachdb#18168 is addressed, this significantly reduces the contention
present in the workload by avoiding conflicts on updates to different
columns in the same table.

Release note: None
craig bot pushed a commit that referenced this issue Nov 29, 2018
32704: workload/ycsb: add flag to use column families r=nvanbenschoten a=nvanbenschoten

This change adds a new `--families` flag to the ycsb workload. Now
that #18168 is addressed, this significantly reduces the contention
present in the workload by avoiding conflicts on updates to different
columns in the same table.

I just confirmed that this still provides a huge speedup. On a 24 cpu machine:
```
workload run ycsb --init --workload='A' --concurrency=128 --duration=1m


--families=false gives:

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0         103089         1718.0     13.9      0.6      1.8    604.0   1476.4  read
   60.0s        0         102947         1715.6     59.5      5.2     11.5   2281.7   8321.5  update

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
   60.0s        0         206036         3433.6     36.7      3.3      8.9   1342.2   8321.5


--families=true gives:

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0         333477         5557.8      9.2      0.6      6.0    302.0   1275.1  read
   60.0s        0         332366         5539.3     13.7      6.8     17.8     54.5   4831.8  update

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
   60.0s        0         665843        11097.1     11.5      3.9     16.3    268.4   4831.8
```

cc. @robert-s-lee @drewdeally 

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Projects
None yet
Development

No branches or pull requests

7 participants