Improve event store indexes #43

basz · 2017-01-17T12:15:05Z

referring #42

some indexed need to be changed (ix_unique_event?)

Than I think ix_unique_event must be reordered
Or the index must be set on the json path instead..

prolic · 2017-03-01T10:47:22Z

@basz Can you verify the problems with the newest master?

basz · 2017-03-01T13:35:20Z

I still seem to have lowering throughput after a while... I don't understand why exactly... The thing is that replaying events has stable performance. That tells me it isn't a problem with the read models (v6 style).

By the way I don't see any difference in the schema, but that might be because I update regularly to master...

oqq · 2017-03-03T07:59:05Z

@basz can you provide a small example gist?
That could me help a lot to improve the indexes.

basz · 2017-03-06T09:38:53Z

would this help? two aggregates one with ~60000 events one with just 1.

https://gist.github.com/basz/f1b6e6c7262e8fabacacb1c05cb35a70

oqq · 2017-03-10T19:12:25Z

Are you using the current implementation from MySqlSingleStreamStrategy?

If so please test this query:

SELECT * FROM _7794a224f85453058cf637d52e7c4487568a8271
    -> WHERE metadata->"$._aggregate_version" > 0 AND metadata->"$._aggregate_id" = '503113e6-820f-5523-b18f-e71a56db31eb' AND metadata->"$._aggregate_type" = 'HF\\Model\\Domain\\Dossier\\Aggregate\\Dossier' AND `no` >= 1
    -> ORDER BY `no` ASC

@prolic the main reason for that issue is the lack of indexes.
Maybe it makes sense that we are always provide an index /aggregate_id, aggregate_type, version/ (or in other order) and sort the metadata query always in that order, since this would the most used metadata query I guess. The MySqlSingleStreamStrategy stays on unique key, all other could simply use a not unique index.

This would improve performance for such queries against tables with much data.
Currently no index is used at all.

prolic · 2017-03-11T07:02:28Z

I created a test-script for this and am testing now.

oqq · 2017-03-11T07:46:07Z

Please provide Explain outputs with used indexes and queries for your tests.

prolic · 2017-03-11T08:07:17Z

I have something else to do now, will come back to it later this evening.

resolves #43

prolic · 2017-03-11T16:11:21Z

Here is the reordered MySqlSingleStreamStrategy: https://gist.github.com/prolic/389820fcd30c821fcce5da3e1e9c6f89

Here are the test-scripts:
MySql: https://gist.github.com/prolic/533b82fb97b671ac5f1a0d7b5017cb27
Postgres: https://gist.github.com/prolic/2000e71a4d0aa2d6adfcfa5633d45a74

Usage:

MySql

./vendor/bin/phpunit -c phpunit.xml.pdo_mysql --group=prepare tests/MyTest.php
./vendor/bin/phpunit -c phpunit.xml.pdo_mysql --group=run tests/MyTest.php

Postgres

./vendor/bin/phpunit -c phpunit.xml.pdo_pgsql --group=prepare tests/PoTest.php
./vendor/bin/phpunit -c phpunit.xml.pdo_pgsql --group=run tests/PoTest.php

MySQL Explain says

using MySqlSingleStreamStrategy

EXPLAIN SELECT * FROM _b3daa77b4c04a9551b8781d03191fe098f325e67 WHERE metadata->"$._aggregate_type" = 'user1' AND metadata->"$._aggregate_id" = 'ai_1' AND `no` >= 1 ORDER BY `no` ASC LIMIT 10000;
+----+-------------+-------------------------------------------+------------+-------+---------------+---------+---------+------+-------+----------+-------------+
| id | select_type | table                                     | partitions | type  | possible_keys | key     | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+-------------------------------------------+------------+-------+---------------+---------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | _b3daa77b4c04a9551b8781d03191fe098f325e67 | NULL       | range | PRIMARY       | PRIMARY | 8       | NULL | 43600 |     1.00 | Using where |
+----+-------------+-------------------------------------------+------------+-------+---------------+---------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

using reordered index on MySqlSingleStreamStrategy

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4
    -> WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND `no` >= 1
    -> ORDER BY `no` ASC
    -> LIMIT 10000;
+----+-------------+-------------------------------------------+------------+------+-------------------------+-----------------+---------+-------------+------+----------+---------------------------------------+
| id | select_type | table                                     | partitions | type | possible_keys           | key             | key_len | ref         | rows | filtered | Extra                                 |
+----+-------------+-------------------------------------------+------------+------+-------------------------+-----------------+---------+-------------+------+----------+---------------------------------------+
|  1 | SIMPLE      | _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 | NULL       | ref  | PRIMARY,ix_unique_event | ix_unique_event | 560     | const,const | 2000 |    50.00 | Using index condition; Using filesort |
+----+-------------+-------------------------------------------+------------+------+-------------------------+-----------------+---------+-------------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

Postgres Explain says

using PostgresSingleStreamStrategy

Limit  (cost=5271.28..5276.30 rows=2010 width=216)
   ->  Sort  (cost=5271.28..5276.30 rows=2010 width=216)
         Sort Key: no
         ->  Seq Scan on _b3daa77b4c04a9551b8781d03191fe098f325e67  (cost=0.00..5161.00 rows=2010 width=216)
               Filter: ((no >= 1) AND ((metadata ->> '_aggregate_type'::text) = 'user1'::text) AND ((metadata ->> '_aggregate_id'::text) = 'ai_1'::text))
(5 rows)

using reordered index on PostgresSingleStreamStrategy

Limit  (cost=2815.02..2819.88 rows=1946 width=216)
   ->  Sort  (cost=2815.02..2819.88 rows=1946 width=216)
         Sort Key: no
         ->  Index Scan using _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4_expr_expr1_expr2_idx on _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4  (cost=0.42..2708.70 rows=1946 width=216)
               Index Cond: (((metadata ->> '_aggregate_type'::text) = 'user2'::text) AND ((metadata ->> '_aggregate_id'::text) = 'ai_1'::text))
               Filter: (no >= 1)
(6 rows)

Test-Results

using MySqlSingleStreamStrategy

ai_1: 0.077847003936768
ai_2: 0.06538200378418
ai_20: 0.078543186187744
ai_33: 0.077946901321411
ai_59: 0.066684007644653

using reordered index on MySqlSingleStreamStrategy

ai_1: 0.014132976531982
ai_2: 0.00044894218444824
ai_20: 0.013552904129028
ai_33: 0.015939950942993
ai_59: 0.0004270076751709

using MySqlAggregateStreamStrategy

ai_1: 0.010020017623901
ai_2: 0.00036787986755371
ai_20: 0.010578870773315
ai_33: 0.010411024093628

using PostgresSingleStreamStrategy

ai_1: 0.049942016601562
ai_2: 0.0040481090545654
ai_20: 0.047064065933228
ai_33: 0.047213077545166
ai_59: 0.0043928623199463

using reordered index on PostgresSingleStreamStrategy

ai_1: 0.0057878494262695
ai_2: 0.00040006637573242
ai_20: 0.0052559375762939
ai_33: 0.0060200691223145
ai_59: 0.00037407875061035

using PostgresAggregateStreamStrategy

ai_1: 0.0055849552154541
ai_2: 0.0003809928894043
ai_20: 0.0049259662628174
ai_33: 0.0048758983612061

Conclusion

The SingleStreamStrategy needs reordering of the indexes!
The AggregateStreamStrategy is still the best!
Postgres is way faster then MySql!

PR

see here: #66

oqq · 2017-03-11T17:10:24Z

TL;RL: reordering the index is not enough.

Problem here is that an reorder of index would not help. That helps in this use case, since the query creates a where condition in that order.

If you have a query with WHERE type = 'fuu' AND id = 'bar' AND version = '1' ORDER BY no, than MySQL uses the index that is ordered like this: (type, id, version). If you add additional the no to that index, than 'Using filesort' would gone away and this query would be much faster.

But if anyone run this query with e.g. WHERE id = 'bar' AND type = 'fuu' AND version = '1' MySQL does not use this index anymore.

What we need is a sort in MetadataMatcher. The order of the index is regardless, but the query must match the existing index.

So what I suggest is:
Add an index in that order how a user would potentially use the MetadataMatcher and sort all values in Query to match that index. Maybe it would be necessary to add some value to the query.

So, if we had an index like this: (type, id, version, no) (the preferred one I guess), than the query should always be: WHERE type = AND id = AND version = [no =] ORDER BY no. If the id and version is not provided in metadata, it is are likely that a query like WHERE type = 'fuu' AND id != '00000-00000-00000' AND version != 0 ORDER BY no is much faster then WHERE type = 'fuu' ORDER BY no`'.

oqq · 2017-03-11T17:31:12Z

Oh, this could be easily done by enriching the where condition with:

$where = array_merge(['type' => "!=''", 'id' => "!=''", 'version' => '!=0'], $where);

So the right order for present values are correct and not provided value would be filled up.

How about to add something like a enrichAndSortQuery() method to the strategy? (bad method name, sure)

prolic · 2017-03-12T06:28:43Z

Interessting: The insert with the provided test-scripts above are much faster on MySQL than with Postgres, while Postgres reads much faster than MySQL. Could be that this is due to the default configuration of the databases. Any ideas what's going on here?

prolic · 2017-03-12T06:40:17Z

When I add no to the index, the performance is still the same and Using filesort is still part of the MySQL explain - @oqq wanna modify my test-script to get it running even faster with other indexes?

About the other stuff:
While the order of the metadata matchers make a difference on which indexes are used, it does not makes sense to order the metadata matchers within the MetadataMatcher class. I think it's more reasonable to create the MetadataMatcher carefully with what you have as indexes configured. The provided default persistance strategies are SimpleStreamStrategy (useful for not-event-sourced-systems or stream-to-stream-projections), SingleStreamStrategy (useful for prooph/event-sourcing and prooph/micro), AggregateStreamStrategy (useful for prooph/event-sourcing and prooph/micro).

The only problem that we have currently is with SingleStreamStrategy, so let's talk about this one now. Using prooph/event-sourcing for your aggregates, you don't create the MetadataMatcher yourself, it's part of the event-sourcing component. So there is no need for reordering, we simply confirm that the same orders are used in code. (Same for micro, we should add some documentation around this topic perhaps.) If you bypass event-sourcing component or access the event-store directly (for whatever reasons) and want to pass custom MetadataMatchers, you are responsible for the correct order of matches (and even chose a custom PersistanceStrategy with custom indexes, to improve performance) - so I think this is only a documentation issue.

@oqq If you are able to remove the Using filesort from MySQL explain, and get some more performance, let's do that.

prolic · 2017-03-12T06:53:45Z

My Postgres results are ready (with added no to the index):

First, without no:

Limit  (cost=2909.97..2915.04 rows=2029 width=216)
   ->  Sort  (cost=2909.97..2915.04 rows=2029 width=216)
         Sort Key: no
         ->  Index Scan using _b3daa77b4c04a9551b8781d03191fe098f325e67_expr_expr1_expr2_idx on _b3daa77b4c04a9551b8781d03191fe098f325e67  (cost=0.42..2798.51 rows=2029 width=216)
               Index Cond: (((metadata ->> '_aggregate_type'::text) = 'user1'::text) AND ((metadata ->> '_aggregate_id'::text) = 'ai_1'::text))
               Filter: (no >= 1)
(6 rows)

timing:

ai_1: 0.0066189765930176
ai_2: 0.00038790702819824
ai_20: 0.0055079460144043
ai_33: 0.0051159858703613
ai_59: 0.00033402442932129

With added no:

Limit  (cost=2908.55..2913.57 rows=2010 width=216)
   ->  Sort  (cost=2908.55..2913.57 rows=2010 width=216)
         Sort Key: no
         ->  Index Scan using _a1881c06eec96db9901c7bbfe41c42a3f08e9c_expr_expr1_expr2_no_idx on _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4  (cost=0.42..2798.27 rows=2010 width=216)
               Index Cond: (((metadata ->> '_aggregate_type'::text) = 'user2'::text) AND ((metadata ->> '_aggregate_id'::text) = 'ai_1'::text) AND (no >= 1))
(5 rows)

timing:

ai_1: 0.005335807800293
ai_2: 0.00040507316589355
ai_20: 0.005220890045166
ai_33: 0.0051581859588623
ai_59: 0.00038909912109375

So we see: Using Postgres adding no to the index increases performance - but we have to create a new index for this, because the existing is a unique contraint index.

Sadly, the same change doesn't make a difference on MySQL - I don't know why. Ideas?

prolic · 2017-03-12T07:01:06Z

Test-preparation (inserts)

using Postgres:

Time: 52.11 seconds

using MySQL

Time: 1.21 minutes

So Postgres is also faster writing - I said something different before - might have been a missing warmup - sorry for the confusion.

oqq · 2017-03-12T09:34:24Z

Your query is lacking the version in query. If your index is (type, id, version, no) than MySQL can use no in that index for sorting.

If the where clause is like

WHERE type = ? AND id = ? AND version = ? [AND no = ?] ORDER BY no

So MySQL could read all entries directly from the index in that order and no additional sorting is necessary.

if you run a query like this

WHERE type = ? AND id = ? [AND no = ?] ORDER BY no

than the no part of that index would be ignored. MySQL could not know about this part in the index, cuz the lack of version. Such an index is not stored as a set of values. It is more a joint value like type-id-version-no. So only type-id of that index is used for searching and an additional sorting after searching is needed.

I am not sure how Postgres works. But I am pretty sure that RDBs works quite similar in that case. Also I am not able to read Postgres EXPLAIN for now. But I know Using filesort in MySQL is really bad and could omitted with better indexes like my description.

How I read this line i guess thats the same as Using filesort in MySQL and should be avoided.

Sort (cost=2909.97..2915.04 rows=2029 width=216)

TL;RL

use an index (type, id, version, no)
query with WHERE type = ? AND id = ? AND version != 0 ORDER BY no
OR
use an index (type, id, no)
query with WHERE type = ? AND id = ? ORDER BY no
OR
use an index (type, no)
query with WHERE type = ? ORDER BY no
AND
display your results here ;)

I will use your test script for my self today. For now I guess an index (type, id, no) is the best choice if we ensure the query begins always with type and id in where clause.

prolic · 2017-03-12T10:00:32Z

I don't understand, what is your use case for type - no or type - id - no ? Using event sourcing component no such query is ever executed. Am 12.03.2017 17:34 schrieb "Eric Braun" <[email protected]>: Your query is lacking the version in query. If your index is (type, id, version, no) than MySQL can use no in that index for sorting. If the where clause is like WHERE type = ? AND id = ? AND version = ? [AND no = ?] ORDER BY no So MySQL could read all entries directly from the index in that order and no additional sorting is necessary. if you run a query like this WHERE type = ? AND id = ? [AND no = ?] ORDER BY no than the no part of that index would be ignored. MySQL could not know about this part in the index, cuz the lack of version. Such an index is not stored as a set of values. It is more a joint value like type-id-version-no. So only type-id of that index is used for searching and an additional sorting after searching is needed. I am not sure how Postgres works. But I am pretty sure that RDBs works quite similar in that case. Also I am also not able to read Postgres EXPLAIN for now. But I know Using filesort in MySQL is really bad and could omitted with better indexes like my description. How I read this line: Sort (cost=2909.97..2915.04 rows=2029 width=216) I guess thats the same as Using filesort in MySQL and should be avoided. *TL;RL* - use an index (type, id, version, no) - query with WHERE type = ? AND id = ? AND version != 0 ORDER BY no OR - use an index (type, id, no) - query with WHERE type = ? AND id = ? ORDER BY no OR - use an index (type, no) - query with WHERE type = ? ORDER BY no AND - display your results here ;) I will use your test script for my self today. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#43 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAYEvD55eWazYuLIS3ritDkwHVaiAWIXks5rk7wggaJpZM4Llkxh> .

oqq · 2017-03-12T10:14:12Z

When i load an aggregate root, than the query would be

WHERE type = ? AND id = ? ORDER BY no

right?

If you remove no from index, than MySQL searches all relevant tuples from the index (type, id), get the values from storage and has to sort this set after.

If you have no involved in the index, MySQL searches all tuples in index in the correct order. So no additional sorting is needed.

I will provide a test case to demonstrate that.

prolic · 2017-03-12T10:16:51Z

I see, we can change event-sourcing to always add version to the query, that should resolve the issue.

oqq · 2017-03-12T10:34:26Z

Not necessary.
We need always a unique key for (type, id, version) without no in single stream strategy.
If if we add an additional index (type, id, no) to every strategy, the most queries would run faster.

OR

we sort by version instead of no if type and id is in metadata matcher.

Maybe we should collect all probably queries. Than I can provide an index strategy with smallest storage food print but good performance for all.

prolic · 2017-03-12T10:44:02Z

we currently use this:

type, id, version, no (using snapshots)

and

type, id, no (without snapshots)

we can add version >= 1 to the second query (without snapshots) and everything should work just fine.
adding another index for this use-case may have an impact on write performance.

No other queries are executed at all - when a user has a special use case to have additional event store queries, he needs a custom persistence strategy.

oqq · 2017-03-12T10:50:34Z

If you add no to the unique key in single stream strategy, than the uniqueness would be broken, right?

prolic · 2017-03-12T10:52:10Z

no the no will be added to another index, not the unique one :)

oqq · 2017-03-12T10:52:51Z

What I see here:
We could not add no to single stream strategy. So we have to add a new index. Also on other strategies we have to add a new index.

Let this index be small as possible. So maybe version could be avoided if not really needed.

prolic · 2017-03-12T10:54:07Z

version is a requirement when snapshots are used. (I got snapshot with version 5, so fetch only events where version > 5)

oqq · 2017-03-12T10:55:05Z

(type, id, no) could also be used if the query is WHERE type = ? AND id = ? and version > 5 ORDER BY no

Not perfectly, but better than currently.

prolic · 2017-03-12T10:59:10Z

I just tested with another index like

INDEX `ix_query_event` (`aggregate_type`, `aggregate_id`, `aggregate_version`, `no`)

and explain gives:

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4
WHERE `no` >= 1 AND metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND metadata->"$._aggregate_version" >= 1
ORDER BY `no` ASC
LIMIT 10000;

+----+-------------+-------------------------------------------+------------+-------+----------------------------------------+-----------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table                                     | partitions | type  | possible_keys                          | key             | key_len | ref  | rows | filtered | Extra                                 |
+----+-------------+-------------------------------------------+------------+-------+----------------------------------------+-----------------+---------+------+------+----------+---------------------------------------+
|  1 | SIMPLE      | _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 | NULL       | range | PRIMARY,ix_unique_event,ix_query_event | ix_unique_event | 564     | NULL | 2000 |    50.00 | Using index condition; Using filesort |
+----+-------------+-------------------------------------------+------------+-------+----------------------------------------+-----------------+---------+------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

oqq · 2017-03-12T11:00:12Z

Your query starts with no, but has to start with type for that index.

prolic · 2017-03-12T11:00:32Z

ups, wrong order, again:

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND metadata->"$._aggregate_version" >= 1 AND `no` >= 1
ORDER BY `no` ASC
LIMIT 10000;

+----+-------------+-------------------------------------------+------------+-------+----------------------------------------+-----------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table                                     | partitions | type  | possible_keys                          | key             | key_len | ref  | rows | filtered | Extra                                 |
+----+-------------+-------------------------------------------+------------+-------+----------------------------------------+-----------------+---------+------+------+----------+---------------------------------------+
|  1 | SIMPLE      | _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 | NULL       | range | PRIMARY,ix_unique_event,ix_query_event | ix_unique_event | 564     | NULL | 2000 |    50.00 | Using index condition; Using filesort |
+----+-------------+-------------------------------------------+------------+-------+----------------------------------------+-----------------+---------+------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

oqq · 2017-03-12T11:01:53Z

I got this:

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND `no` >= 1
ORDER BY `no` ASC

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	_a1881c06eec96db9901c7bbfe41c42a3f08e9cb4	NULL	range	PRIMARY,ix_unique_event	ix_unique_event	568	NULL	2000	100.00	Using index condition

prolic · 2017-03-12T11:02:47Z

Difference is, I added aggregate_version already

prolic · 2017-03-12T11:04:25Z

create table ..... 
UNIQUE KEY `ix_unique_event` (`aggregate_type`, `aggregate_id`, `aggregate_version`),
INDEX `ix_query_event` (`aggregate_type`, `aggregate_id`, `aggregate_version`, `no`)
....

oqq · 2017-03-12T11:07:34Z

MySQL uses ix_unique_event in that case. Either MySQL hat decided this is the better one, or (more likely) dont know the other one is better.

Please test with an index hint https://dev.mysql.com/doc/refman/5.7/en/index-hints.html

prolic · 2017-03-12T11:11:20Z

Add a query hint would complicate the code a lot, not sure from where to get this, as indexes are only in the persistence strategy.

However for testing:

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 use INDEX (ix_query_event) WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND metadata->"$._aggregate_version" >= 1 AND `no` >= 1 ORDER BY `no` ASC LIMIT 10000;
+----+-------------+-------------------------------------------+------------+-------+----------------+----------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table                                     | partitions | type  | possible_keys  | key            | key_len | ref  | rows | filtered | Extra                                 |
+----+-------------+-------------------------------------------+------------+-------+----------------+----------------+---------+------+------+----------+---------------------------------------+
|  1 | SIMPLE      | _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 | NULL       | range | ix_query_event | ix_query_event | 572     | NULL | 2000 |    33.33 | Using index condition; Using filesort |
+----+-------------+-------------------------------------------+------------+-------+----------------+----------------+---------+------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

prolic · 2017-03-12T11:11:59Z

It's still using filesort

oqq · 2017-03-12T11:12:01Z

I have tested this without query hint and got the right index usages. Anything in your code is wrong. ;)

prolic · 2017-03-12T11:13:01Z

My table:

CREATE TABLE `_a1881c06eec96db9901c7bbfe41c42a3f08e9cb4` (
  `no` bigint(20) NOT NULL AUTO_INCREMENT,
  `event_id` char(36) COLLATE utf8_bin NOT NULL,
  `event_name` varchar(100) COLLATE utf8_bin NOT NULL,
  `payload` json NOT NULL,
  `metadata` json NOT NULL,
  `created_at` datetime(6) NOT NULL,
  `aggregate_version` int(11) unsigned GENERATED ALWAYS AS (json_extract(`metadata`,'$._aggregate_version')) STORED NOT NULL,
  `aggregate_id` char(36) COLLATE utf8_bin GENERATED ALWAYS AS (json_unquote(json_extract(`metadata`,'$._aggregate_id'))) STORED NOT NULL,
  `aggregate_type` varchar(150) COLLATE utf8_bin GENERATED ALWAYS AS (json_unquote(json_extract(`metadata`,'$._aggregate_type'))) STORED NOT NULL,
  PRIMARY KEY (`no`),
  UNIQUE KEY `ix_event_id` (`event_id`),
  UNIQUE KEY `ix_unique_event` (`aggregate_type`,`aggregate_id`,`aggregate_version`),
  KEY `ix_query_event` (`aggregate_type`,`aggregate_id`,`aggregate_version`,`no`)
) ENGINE=InnoDB AUTO_INCREMENT=96002 DEFAULT CHARSET=utf8 COLLATE=utf8_bin

query is above

prolic · 2017-03-12T11:13:32Z

MySQL: 5.7.17-0ubuntu0.16.04.1

oqq · 2017-03-12T11:14:56Z

Please give me some minutes. I will setup an own test script.

prolic · 2017-03-12T11:22:31Z

testing on postgres:

Limit  (cost=2897.16..2898.89 rows=691 width=216)
   ->  Sort  (cost=2897.16..2898.89 rows=691 width=216)
         Sort Key: no
         ->  Index Scan using _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4_expr_expr1_expr2_idx on _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4  (cost=0.42..2864.57 rows=691 width=216)
               Index Cond: (((metadata ->> '_aggregate_type'::text) = 'user2'::text) AND ((metadata ->> '_aggregate_id'::text) = 'ai_1'::text))
               Filter: ((no >= 1) AND (((metadata ->> '_aggregate_version'::text))::integer >= 1))
(6 rows)

having index like:

CREATE INDEX on $tableName
((metadata->>'_aggregate_type'), (metadata->>'_aggregate_id'), (metadata->>'_aggregate_version'), no);

seems also not to take the new index

query is

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4
WHERE metadata->>'_aggregate_type' = 'user2' AND metadata->>'_aggregate_id' = 'ai_1' AND CAST(metadata->>'_aggregate_version' as int) >= 1 AND no >= 1
ORDER BY no ASC
LIMIT 10000;

prolic · 2017-03-12T11:23:01Z

back online later

oqq · 2017-03-12T12:21:15Z

Here are my results:

Table

CREATE TABLE `_a1881c06eec96db9901c7bbfe41c42a3f08e9cb4` (
  `no` bigint(20) NOT NULL AUTO_INCREMENT,
  `event_id` char(36) COLLATE utf8_bin NOT NULL,
  `event_name` varchar(100) COLLATE utf8_bin NOT NULL,
  `payload` json NOT NULL,
  `metadata` json NOT NULL,
  `created_at` datetime(6) NOT NULL,
  `aggregate_version` int(11) unsigned GENERATED ALWAYS AS (json_extract(`metadata`,'$._aggregate_version')) STORED NOT NULL,
  `aggregate_id` char(36) COLLATE utf8_bin GENERATED ALWAYS AS (json_unquote(json_extract(`metadata`,'$._aggregate_id'))) STORED NOT NULL,
  `aggregate_type` varchar(150) COLLATE utf8_bin GENERATED ALWAYS AS (json_unquote(json_extract(`metadata`,'$._aggregate_type'))) STORED NOT NULL,
  PRIMARY KEY (`no`),
  UNIQUE KEY `ix_event_id` (`event_id`),
  UNIQUE KEY `ix_unique_event` (`aggregate_type`,`aggregate_id`,`aggregate_version`),
  KEY `ix_query_aggregate` (`aggregate_type`,`aggregate_id`,`no`),
  KEY `ix_query_aggregate2` (`aggregate_type`,`aggregate_id`,`aggregate_version`,`no`)
) ENGINE=InnoDB AUTO_INCREMENT=96002 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

with two indexes. One with version and one without.
I have used your test script to fill up the data and have ~100k values in that table.

Query without version; index without version

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 USE INDEX (ix_query_aggregate)
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND `no` >= 1
ORDER BY `no` ASC
LIMIT 10000;

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	_a1881c06eec96db9901c7bbfe41c42a3f08e9cb4	NULL	range	ix_query_aggregate	ix_query_aggregate	568	NULL	2000	100.00	Using index condition

Query without version; index with version

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 USE INDEX (ix_query_aggregate2)
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND `no` >= 1
ORDER BY `no` ASC
LIMIT 10000;

1	SIMPLE	_a1881c06eec96db9901c7bbfe41c42a3f08e9cb4	NULL	ref	ix_query_aggregate2	ix_query_aggregate2	560	const,const	2000	33.33	Using index condition; Using filesort

Query with version; index _without version

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 USE INDEX (ix_query_aggregate)
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND metadata->"$._aggregate_version" >= 1 AND `no` >= 1
ORDER BY `no` ASC
LIMIT 10000;

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	_a1881c06eec96db9901c7bbfe41c42a3f08e9cb4	NULL	range	ix_query_aggregate	ix_query_aggregate	568	NULL	2000	100.00	Using index condition; Using where

Query with version; index with version

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 USE INDEX (ix_query_aggregate2)
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND metadata->"$._aggregate_version" >= 1 AND `no` >= 1
ORDER BY `no` ASC
LIMIT 10000;

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	_a1881c06eec96db9901c7bbfe41c42a3f08e9cb4	NULL	range	ix_query_aggregate2	ix_query_aggregate2	572	NULL	2000	33.33	Using index condition; Using filesort

All queries without `AND no >= 1`

The type changes from "range" to "ref"

Conclusion

The index without version is the better one in all cases. (type, id, no)
The index with version would always using filesort. Not sure why, but thats what happens.
removing AND no >= 1 from query has better performance (not much). So if the load method is executed with count = null, we should remove that from query.
If the aggregate_version does not matter, aggregate_version >= 1 should be removed from query. If it does matter, it could be added and would not conclude much on performance.
Metadata queries should always be sorted with type and id in first place.
We need to investigate why the right index is not used without use index

prolic · 2017-03-12T12:37:38Z

thanks @oqq - am playing with it a little now

prolic · 2017-03-12T13:25:44Z

Postgres

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4
WHERE metadata->>'_aggregate_type' = 'user2' AND metadata->>'_aggregate_id' = 'ai_1' AND no >= 1
ORDER BY no ASC
LIMIT 10000;

Limit  (cost=0.42..2734.22 rows=1955 width=216)
   ->  Index Scan using _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4_expr_expr1_no_idx on _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4  (cost=0.42..2734.22 rows=1955 width=216)
         Index Cond: (((metadata ->> '_aggregate_type'::text) = 'user2'::text) AND ((metadata ->> '_aggregate_id'::text) = 'ai_1'::text) AND (no >= 1))
(3 rows)

MySQL

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 USE INDEX(ix_query_aggregate)
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND `no` >= 1
ORDER BY `no` ASC
LIMIT 10000;

+----+-------------+-------------------------------------------+------------+-------+--------------------+--------------------+---------+------+------+----------+-----------------------+
| id | select_type | table                                     | partitions | type  | possible_keys      | key                | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------------------------------------------+------------+-------+--------------------+--------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 | NULL       | range | ix_query_aggregate | ix_query_aggregate | 568     | NULL | 2000 |   100.00 | Using index condition |
+----+-------------+-------------------------------------------+------------+-------+--------------------+--------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

prooph/event-sourcing#57 is closed
#66 is updated

prolic · 2017-03-12T13:46:53Z

Btw: Postgres is able to get the correct index himself - way better :)

oqq · 2017-03-12T14:58:06Z

That looks much better now. 👍

oqq mentioned this issue Jan 23, 2017

Performance degrading #45

Closed

prolic assigned basz Mar 1, 2017

prolic self-assigned this Mar 11, 2017

prolic added the bug label Mar 11, 2017

prolic added a commit that referenced this issue Mar 11, 2017

Improve event store indexes

6c9fee8

resolves #43

prolic mentioned this issue Mar 11, 2017

Improve event store indexes #66

Merged

prolic mentioned this issue Mar 12, 2017

always add version to query to use indexes prooph/event-sourcing#57

Closed

prolic closed this as completed in #66 Mar 12, 2017

prolic mentioned this issue Apr 9, 2019

Remove forced index from MySqlSingleStreamStrategy #191

Closed

codeliner mentioned this issue Apr 15, 2020

Add indexed queries to MySqlEventStore #225

Closed

Improve event store indexes #43

Improve event store indexes #43

Comments

basz commented Jan 17, 2017

prolic commented Mar 1, 2017

basz commented Mar 1, 2017

oqq commented Mar 3, 2017

basz commented Mar 6, 2017

oqq commented Mar 10, 2017

prolic commented Mar 11, 2017

oqq commented Mar 11, 2017

prolic commented Mar 11, 2017

prolic commented Mar 11, 2017

Usage:

MySql

Postgres

MySQL Explain says

using MySqlSingleStreamStrategy

using reordered index on MySqlSingleStreamStrategy

Postgres Explain says

using PostgresSingleStreamStrategy

using reordered index on PostgresSingleStreamStrategy

Test-Results

using MySqlSingleStreamStrategy

using reordered index on MySqlSingleStreamStrategy

using MySqlAggregateStreamStrategy

using PostgresSingleStreamStrategy

using reordered index on PostgresSingleStreamStrategy

using PostgresAggregateStreamStrategy

Conclusion

PR

oqq commented Mar 11, 2017 • edited Loading

oqq commented Mar 11, 2017

prolic commented Mar 12, 2017

prolic commented Mar 12, 2017

prolic commented Mar 12, 2017

prolic commented Mar 12, 2017

Test-preparation (inserts)

using Postgres:

using MySQL

oqq commented Mar 12, 2017 • edited Loading

prolic commented Mar 12, 2017 via email

oqq commented Mar 12, 2017

prolic commented Mar 12, 2017

oqq commented Mar 12, 2017

prolic commented Mar 12, 2017

oqq commented Mar 12, 2017

prolic commented Mar 12, 2017

oqq commented Mar 12, 2017 • edited Loading

prolic commented Mar 12, 2017

oqq commented Mar 12, 2017 • edited Loading

prolic commented Mar 12, 2017

oqq commented Mar 12, 2017 • edited Loading

prolic commented Mar 12, 2017

oqq commented Mar 12, 2017

prolic commented Mar 12, 2017

prolic commented Mar 12, 2017

oqq commented Mar 12, 2017

prolic commented Mar 12, 2017

prolic commented Mar 12, 2017

oqq commented Mar 12, 2017

prolic commented Mar 12, 2017

prolic commented Mar 12, 2017

oqq commented Mar 12, 2017

prolic commented Mar 12, 2017

prolic commented Mar 12, 2017

oqq commented Mar 12, 2017 • edited Loading

Table

Query without version; index without version

Query without version; index with version

Query with version; index _without version

Query with version; index with version

All queries without AND no >= 1

Conclusion

prolic commented Mar 12, 2017

prolic commented Mar 12, 2017

Postgres

MySQL

prolic commented Mar 12, 2017

oqq commented Mar 12, 2017

oqq commented Mar 11, 2017 •

edited

Loading

oqq commented Mar 12, 2017 •

edited

Loading

oqq commented Mar 12, 2017 •

edited

Loading

oqq commented Mar 12, 2017 •

edited

Loading

oqq commented Mar 12, 2017 •

edited

Loading

oqq commented Mar 12, 2017 •

edited

Loading

All queries without `AND no >= 1`