Skip to content

Schema Design

Alex Yang edited this page Aug 26, 2016 · 3 revisions

Helpful things to keep in mind when using the MyRocks storage engine.

Reverse Column Families (for descending scans)

For workloads where descending scans are needed often, e.g. ORDER BY DESC LIMIT 100, utilizing reverse column families will increase performance. To understand why, we need to take a look at how keys are stored within RocksDB.

RocksDB keys are memcomparable with each other, see a general description of the format here: https://github.com/facebook/mysql-5.6/wiki/MyRocks-record-format#memcomparable-format

There are two types of memcmp which we do in RocksDB storage engine: regular and reverse. By default, we use a direct comparator which is effectively the same as memcmp. However, if the column family name starts with "rev:", we internally use a "reverse" comparator (return -memcmp(a,b));

This reverse comparison makes it more efficient to iterate ranges in reverse. One reason is because RocksDB uses prefix compression inside blocks of SST files. (see more information here: https://github.com/facebook/mysql-5.6/wiki/MyRocks-advantages-over-InnoDB#prefix-key-encoding). This somewhat slows a typical reverse iterator. By reversing the ordering of keys, we make reverse iteration faster (for instance, on queries such as select * from t1 desc limit 100;)

It is important to note that due to the nature of column families either using a regular or "reverse" comparator, either ORDER BY ASC or ORDER BY DESC will be slower for each column family.

An example usage of a reverse column family. Here foo is a reverse column family, while bar is a regular column family.

create table t1 (
  pk int primary key,
  a int not null,
  b int not null,
  key(a) comment 'rev:foo',
  key(b) comment 'bar'
) engine=rocksdb;

# fill table here...

select a from t1 order by a limit 5;

Avoid non-covering Secondary Index Scan

Both InnoDB and MyRocks use a clustered index and secondary index lookup/scan format. (Read more here https://dev.mysql.com/doc/refman/5.7/en/innodb-index-types.html)

In MyRocks, the clustered index is typically the primary key for the table (a hidden primary key is auto-generated if one is not specified explicitly). This means that lookups by primary key are single step reads, e.g. SELECT * FROM t1 WHERE pk = 1.

Secondary keys in MyRocks contain both the memcomparable format of the secondary key followed by the primary key. Again, more information on key format can be found here: https://github.com/facebook/mysql-5.6/wiki/MyRocks-record-format#memcomparable-format. Because of this format, if queries using the secondary key touch only the secondary key + primary key fields, there is no need to read the clustered index (no extra lookup), as all the information needed is contained within the key itself already. This is also known as the covering index/keyread optimization.

The important insight is when designing the secondary keys of a table, to maximize index lookup/scan efficiency, you should try to avoid non-covering secondary index scans as they will be more expensive (extra lookup to get the rest of the fields from PK).

Currently index only scans are not supported for certain collations, see https://github.com/facebook/mysql-5.6/issues/28 for progress on supporting these collations.

Clone this wiki locally