Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve event store indexes #43

Closed
basz opened this issue Jan 17, 2017 · 46 comments · Fixed by #66
Closed

Improve event store indexes #43

basz opened this issue Jan 17, 2017 · 46 comments · Fixed by #66
Assignees
Labels

Comments

@basz
Copy link
Member

basz commented Jan 17, 2017

referring #42

some indexed need to be changed (ix_unique_event?)

Than I think ix_unique_event must be reordered
Or the index must be set on the json path instead..

@oqq

@prolic
Copy link
Member

prolic commented Mar 1, 2017

@basz Can you verify the problems with the newest master?

@basz
Copy link
Member Author

basz commented Mar 1, 2017

I still seem to have lowering throughput after a while... I don't understand why exactly... The thing is that replaying events has stable performance. That tells me it isn't a problem with the read models (v6 style).

By the way I don't see any difference in the schema, but that might be because I update regularly to master...

@oqq
Copy link
Member

oqq commented Mar 3, 2017

@basz can you provide a small example gist?
That could me help a lot to improve the indexes.

@basz
Copy link
Member Author

basz commented Mar 6, 2017

would this help? two aggregates one with ~60000 events one with just 1.

https://gist.github.com/basz/f1b6e6c7262e8fabacacb1c05cb35a70

@oqq
Copy link
Member

oqq commented Mar 10, 2017

Are you using the current implementation from MySqlSingleStreamStrategy?

If so please test this query:

SELECT * FROM _7794a224f85453058cf637d52e7c4487568a8271
    -> WHERE metadata->"$._aggregate_version" > 0 AND metadata->"$._aggregate_id" = '503113e6-820f-5523-b18f-e71a56db31eb' AND metadata->"$._aggregate_type" = 'HF\\Model\\Domain\\Dossier\\Aggregate\\Dossier' AND `no` >= 1
    -> ORDER BY `no` ASC

@prolic the main reason for that issue is the lack of indexes.
Maybe it makes sense that we are always provide an index /aggregate_id, aggregate_type, version/ (or in other order) and sort the metadata query always in that order, since this would the most used metadata query I guess. The MySqlSingleStreamStrategy stays on unique key, all other could simply use a not unique index.

This would improve performance for such queries against tables with much data.
Currently no index is used at all.

@prolic prolic self-assigned this Mar 11, 2017
@prolic prolic added the bug label Mar 11, 2017
@prolic
Copy link
Member

prolic commented Mar 11, 2017

I created a test-script for this and am testing now.

@oqq
Copy link
Member

oqq commented Mar 11, 2017

Please provide Explain outputs with used indexes and queries for your tests.

@prolic
Copy link
Member

prolic commented Mar 11, 2017

I have something else to do now, will come back to it later this evening.

prolic added a commit that referenced this issue Mar 11, 2017
@prolic
Copy link
Member

prolic commented Mar 11, 2017

Here is the reordered MySqlSingleStreamStrategy: https://gist.github.com/prolic/389820fcd30c821fcce5da3e1e9c6f89

Here are the test-scripts:
MySql: https://gist.github.com/prolic/533b82fb97b671ac5f1a0d7b5017cb27
Postgres: https://gist.github.com/prolic/2000e71a4d0aa2d6adfcfa5633d45a74

Usage:

MySql

./vendor/bin/phpunit -c phpunit.xml.pdo_mysql --group=prepare tests/MyTest.php
./vendor/bin/phpunit -c phpunit.xml.pdo_mysql --group=run tests/MyTest.php

Postgres

./vendor/bin/phpunit -c phpunit.xml.pdo_pgsql --group=prepare tests/PoTest.php
./vendor/bin/phpunit -c phpunit.xml.pdo_pgsql --group=run tests/PoTest.php

MySQL Explain says

using MySqlSingleStreamStrategy

EXPLAIN SELECT * FROM _b3daa77b4c04a9551b8781d03191fe098f325e67 WHERE metadata->"$._aggregate_type" = 'user1' AND metadata->"$._aggregate_id" = 'ai_1' AND `no` >= 1 ORDER BY `no` ASC LIMIT 10000;
+----+-------------+-------------------------------------------+------------+-------+---------------+---------+---------+------+-------+----------+-------------+
| id | select_type | table                                     | partitions | type  | possible_keys | key     | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+-------------------------------------------+------------+-------+---------------+---------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | _b3daa77b4c04a9551b8781d03191fe098f325e67 | NULL       | range | PRIMARY       | PRIMARY | 8       | NULL | 43600 |     1.00 | Using where |
+----+-------------+-------------------------------------------+------------+-------+---------------+---------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

using reordered index on MySqlSingleStreamStrategy

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4
    -> WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND `no` >= 1
    -> ORDER BY `no` ASC
    -> LIMIT 10000;
+----+-------------+-------------------------------------------+------------+------+-------------------------+-----------------+---------+-------------+------+----------+---------------------------------------+
| id | select_type | table                                     | partitions | type | possible_keys           | key             | key_len | ref         | rows | filtered | Extra                                 |
+----+-------------+-------------------------------------------+------------+------+-------------------------+-----------------+---------+-------------+------+----------+---------------------------------------+
|  1 | SIMPLE      | _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 | NULL       | ref  | PRIMARY,ix_unique_event | ix_unique_event | 560     | const,const | 2000 |    50.00 | Using index condition; Using filesort |
+----+-------------+-------------------------------------------+------------+------+-------------------------+-----------------+---------+-------------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

Postgres Explain says

using PostgresSingleStreamStrategy

Limit  (cost=5271.28..5276.30 rows=2010 width=216)
   ->  Sort  (cost=5271.28..5276.30 rows=2010 width=216)
         Sort Key: no
         ->  Seq Scan on _b3daa77b4c04a9551b8781d03191fe098f325e67  (cost=0.00..5161.00 rows=2010 width=216)
               Filter: ((no >= 1) AND ((metadata ->> '_aggregate_type'::text) = 'user1'::text) AND ((metadata ->> '_aggregate_id'::text) = 'ai_1'::text))
(5 rows)

using reordered index on PostgresSingleStreamStrategy

Limit  (cost=2815.02..2819.88 rows=1946 width=216)
   ->  Sort  (cost=2815.02..2819.88 rows=1946 width=216)
         Sort Key: no
         ->  Index Scan using _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4_expr_expr1_expr2_idx on _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4  (cost=0.42..2708.70 rows=1946 width=216)
               Index Cond: (((metadata ->> '_aggregate_type'::text) = 'user2'::text) AND ((metadata ->> '_aggregate_id'::text) = 'ai_1'::text))
               Filter: (no >= 1)
(6 rows)

Test-Results

using MySqlSingleStreamStrategy

ai_1: 0.077847003936768
ai_2: 0.06538200378418
ai_20: 0.078543186187744
ai_33: 0.077946901321411
ai_59: 0.066684007644653

using reordered index on MySqlSingleStreamStrategy

ai_1: 0.014132976531982
ai_2: 0.00044894218444824
ai_20: 0.013552904129028
ai_33: 0.015939950942993
ai_59: 0.0004270076751709

using MySqlAggregateStreamStrategy

ai_1: 0.010020017623901
ai_2: 0.00036787986755371
ai_20: 0.010578870773315
ai_33: 0.010411024093628

using PostgresSingleStreamStrategy

ai_1: 0.049942016601562
ai_2: 0.0040481090545654
ai_20: 0.047064065933228
ai_33: 0.047213077545166
ai_59: 0.0043928623199463

using reordered index on PostgresSingleStreamStrategy

ai_1: 0.0057878494262695
ai_2: 0.00040006637573242
ai_20: 0.0052559375762939
ai_33: 0.0060200691223145
ai_59: 0.00037407875061035

using PostgresAggregateStreamStrategy

ai_1: 0.0055849552154541
ai_2: 0.0003809928894043
ai_20: 0.0049259662628174
ai_33: 0.0048758983612061

Conclusion

The SingleStreamStrategy needs reordering of the indexes!
The AggregateStreamStrategy is still the best!
Postgres is way faster then MySql!

PR

see here: #66

@oqq
Copy link
Member

oqq commented Mar 11, 2017

TL;RL: reordering the index is not enough.

Problem here is that an reorder of index would not help. That helps in this use case, since the query creates a where condition in that order.

If you have a query with WHERE type = 'fuu' AND id = 'bar' AND version = '1' ORDER BY no, than MySQL uses the index that is ordered like this: (type, id, version). If you add additional the no to that index, than 'Using filesort' would gone away and this query would be much faster.

But if anyone run this query with e.g. WHERE id = 'bar' AND type = 'fuu' AND version = '1' MySQL does not use this index anymore.

What we need is a sort in MetadataMatcher. The order of the index is regardless, but the query must match the existing index.

So what I suggest is:
Add an index in that order how a user would potentially use the MetadataMatcher and sort all values in Query to match that index. Maybe it would be necessary to add some value to the query.

So, if we had an index like this: (type, id, version, no) (the preferred one I guess), than the query should always be: WHERE type = AND id = AND version = [no =] ORDER BY no. If the id and version is not provided in metadata, it is are likely that a query like WHERE type = 'fuu' AND id != '00000-00000-00000' AND version != 0 ORDER BY no is much faster then WHERE type = 'fuu' ORDER BY no`'.

@oqq
Copy link
Member

oqq commented Mar 11, 2017

Oh, this could be easily done by enriching the where condition with:

$where = array_merge(['type' => "!=''", 'id' => "!=''", 'version' => '!=0'], $where);

So the right order for present values are correct and not provided value would be filled up.

How about to add something like a enrichAndSortQuery() method to the strategy? (bad method name, sure)

@prolic
Copy link
Member

prolic commented Mar 12, 2017

Interessting: The insert with the provided test-scripts above are much faster on MySQL than with Postgres, while Postgres reads much faster than MySQL. Could be that this is due to the default configuration of the databases. Any ideas what's going on here?

@prolic
Copy link
Member

prolic commented Mar 12, 2017

When I add no to the index, the performance is still the same and Using filesort is still part of the MySQL explain - @oqq wanna modify my test-script to get it running even faster with other indexes?

About the other stuff:
While the order of the metadata matchers make a difference on which indexes are used, it does not makes sense to order the metadata matchers within the MetadataMatcher class. I think it's more reasonable to create the MetadataMatcher carefully with what you have as indexes configured. The provided default persistance strategies are SimpleStreamStrategy (useful for not-event-sourced-systems or stream-to-stream-projections), SingleStreamStrategy (useful for prooph/event-sourcing and prooph/micro), AggregateStreamStrategy (useful for prooph/event-sourcing and prooph/micro).

The only problem that we have currently is with SingleStreamStrategy, so let's talk about this one now. Using prooph/event-sourcing for your aggregates, you don't create the MetadataMatcher yourself, it's part of the event-sourcing component. So there is no need for reordering, we simply confirm that the same orders are used in code. (Same for micro, we should add some documentation around this topic perhaps.) If you bypass event-sourcing component or access the event-store directly (for whatever reasons) and want to pass custom MetadataMatchers, you are responsible for the correct order of matches (and even chose a custom PersistanceStrategy with custom indexes, to improve performance) - so I think this is only a documentation issue.

@oqq If you are able to remove the Using filesort from MySQL explain, and get some more performance, let's do that.

@prolic
Copy link
Member

prolic commented Mar 12, 2017

My Postgres results are ready (with added no to the index):

First, without no:

Limit  (cost=2909.97..2915.04 rows=2029 width=216)
   ->  Sort  (cost=2909.97..2915.04 rows=2029 width=216)
         Sort Key: no
         ->  Index Scan using _b3daa77b4c04a9551b8781d03191fe098f325e67_expr_expr1_expr2_idx on _b3daa77b4c04a9551b8781d03191fe098f325e67  (cost=0.42..2798.51 rows=2029 width=216)
               Index Cond: (((metadata ->> '_aggregate_type'::text) = 'user1'::text) AND ((metadata ->> '_aggregate_id'::text) = 'ai_1'::text))
               Filter: (no >= 1)
(6 rows)

timing:

ai_1: 0.0066189765930176
ai_2: 0.00038790702819824
ai_20: 0.0055079460144043
ai_33: 0.0051159858703613
ai_59: 0.00033402442932129

With added no:

Limit  (cost=2908.55..2913.57 rows=2010 width=216)
   ->  Sort  (cost=2908.55..2913.57 rows=2010 width=216)
         Sort Key: no
         ->  Index Scan using _a1881c06eec96db9901c7bbfe41c42a3f08e9c_expr_expr1_expr2_no_idx on _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4  (cost=0.42..2798.27 rows=2010 width=216)
               Index Cond: (((metadata ->> '_aggregate_type'::text) = 'user2'::text) AND ((metadata ->> '_aggregate_id'::text) = 'ai_1'::text) AND (no >= 1))
(5 rows)

timing:

ai_1: 0.005335807800293
ai_2: 0.00040507316589355
ai_20: 0.005220890045166
ai_33: 0.0051581859588623
ai_59: 0.00038909912109375

So we see: Using Postgres adding no to the index increases performance - but we have to create a new index for this, because the existing is a unique contraint index.

Sadly, the same change doesn't make a difference on MySQL - I don't know why. Ideas?

@prolic
Copy link
Member

prolic commented Mar 12, 2017

Test-preparation (inserts)

using Postgres:

Time: 52.11 seconds

using MySQL

Time: 1.21 minutes

So Postgres is also faster writing - I said something different before - might have been a missing warmup - sorry for the confusion.

@oqq
Copy link
Member

oqq commented Mar 12, 2017

Your query is lacking the version in query. If your index is (type, id, version, no) than MySQL can use no in that index for sorting.

If the where clause is like

WHERE type = ? AND id = ? AND version = ? [AND no = ?] ORDER BY no

So MySQL could read all entries directly from the index in that order and no additional sorting is necessary.

if you run a query like this

WHERE type = ? AND id = ? [AND no = ?] ORDER BY no

than the no part of that index would be ignored. MySQL could not know about this part in the index, cuz the lack of version. Such an index is not stored as a set of values. It is more a joint value like type-id-version-no. So only type-id of that index is used for searching and an additional sorting after searching is needed.

I am not sure how Postgres works. But I am pretty sure that RDBs works quite similar in that case. Also I am not able to read Postgres EXPLAIN for now. But I know Using filesort in MySQL is really bad and could omitted with better indexes like my description.

How I read this line i guess thats the same as Using filesort in MySQL and should be avoided.

Sort (cost=2909.97..2915.04 rows=2029 width=216)

TL;RL

  • use an index (type, id, version, no)
  • query with WHERE type = ? AND id = ? AND version != 0 ORDER BY no
    OR
  • use an index (type, id, no)
  • query with WHERE type = ? AND id = ? ORDER BY no
    OR
  • use an index (type, no)
  • query with WHERE type = ? ORDER BY no
    AND
  • display your results here ;)

I will use your test script for my self today. For now I guess an index (type, id, no) is the best choice if we ensure the query begins always with type and id in where clause.

@prolic
Copy link
Member

prolic commented Mar 12, 2017 via email

@oqq
Copy link
Member

oqq commented Mar 12, 2017

When i load an aggregate root, than the query would be

WHERE type = ? AND id = ? ORDER BY no

right?

If you remove no from index, than MySQL searches all relevant tuples from the index (type, id), get the values from storage and has to sort this set after.

If you have no involved in the index, MySQL searches all tuples in index in the correct order. So no additional sorting is needed.

I will provide a test case to demonstrate that.

@prolic
Copy link
Member

prolic commented Mar 12, 2017

I see, we can change event-sourcing to always add version to the query, that should resolve the issue.

@oqq
Copy link
Member

oqq commented Mar 12, 2017

Not necessary.
We need always a unique key for (type, id, version) without no in single stream strategy.
If if we add an additional index (type, id, no) to every strategy, the most queries would run faster.

OR

we sort by version instead of no if type and id is in metadata matcher.

Maybe we should collect all probably queries. Than I can provide an index strategy with smallest storage food print but good performance for all.

@prolic
Copy link
Member

prolic commented Mar 12, 2017

we currently use this:

type, id, version, no (using snapshots)

and

type, id, no (without snapshots)

we can add version >= 1 to the second query (without snapshots) and everything should work just fine.
adding another index for this use-case may have an impact on write performance.

No other queries are executed at all - when a user has a special use case to have additional event store queries, he needs a custom persistence strategy.

@oqq
Copy link
Member

oqq commented Mar 12, 2017

If you add no to the unique key in single stream strategy, than the uniqueness would be broken, right?

@prolic
Copy link
Member

prolic commented Mar 12, 2017

no the no will be added to another index, not the unique one :)

@oqq
Copy link
Member

oqq commented Mar 12, 2017

What I see here:
We could not add no to single stream strategy. So we have to add a new index. Also on other strategies we have to add a new index.

Let this index be small as possible. So maybe version could be avoided if not really needed.

@prolic
Copy link
Member

prolic commented Mar 12, 2017

version is a requirement when snapshots are used. (I got snapshot with version 5, so fetch only events where version > 5)

@oqq
Copy link
Member

oqq commented Mar 12, 2017

(type, id, no) could also be used if the query is WHERE type = ? AND id = ? and version > 5 ORDER BY no

Not perfectly, but better than currently.

@prolic
Copy link
Member

prolic commented Mar 12, 2017

I just tested with another index like

INDEX `ix_query_event` (`aggregate_type`, `aggregate_id`, `aggregate_version`, `no`)

and explain gives:

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4
WHERE `no` >= 1 AND metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND metadata->"$._aggregate_version" >= 1
ORDER BY `no` ASC
LIMIT 10000;

+----+-------------+-------------------------------------------+------------+-------+----------------------------------------+-----------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table                                     | partitions | type  | possible_keys                          | key             | key_len | ref  | rows | filtered | Extra                                 |
+----+-------------+-------------------------------------------+------------+-------+----------------------------------------+-----------------+---------+------+------+----------+---------------------------------------+
|  1 | SIMPLE      | _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 | NULL       | range | PRIMARY,ix_unique_event,ix_query_event | ix_unique_event | 564     | NULL | 2000 |    50.00 | Using index condition; Using filesort |
+----+-------------+-------------------------------------------+------------+-------+----------------------------------------+-----------------+---------+------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

@oqq
Copy link
Member

oqq commented Mar 12, 2017

Your query starts with no, but has to start with type for that index.

@prolic
Copy link
Member

prolic commented Mar 12, 2017

ups, wrong order, again:

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND metadata->"$._aggregate_version" >= 1 AND `no` >= 1
ORDER BY `no` ASC
LIMIT 10000;
+----+-------------+-------------------------------------------+------------+-------+----------------------------------------+-----------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table                                     | partitions | type  | possible_keys                          | key             | key_len | ref  | rows | filtered | Extra                                 |
+----+-------------+-------------------------------------------+------------+-------+----------------------------------------+-----------------+---------+------+------+----------+---------------------------------------+
|  1 | SIMPLE      | _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 | NULL       | range | PRIMARY,ix_unique_event,ix_query_event | ix_unique_event | 564     | NULL | 2000 |    50.00 | Using index condition; Using filesort |
+----+-------------+-------------------------------------------+------------+-------+----------------------------------------+-----------------+---------+------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

@oqq
Copy link
Member

oqq commented Mar 12, 2017

I got this:

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND `no` >= 1
ORDER BY `no` ASC
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	_a1881c06eec96db9901c7bbfe41c42a3f08e9cb4	NULL	range	PRIMARY,ix_unique_event	ix_unique_event	568	NULL	2000	100.00	Using index condition

@prolic
Copy link
Member

prolic commented Mar 12, 2017

Difference is, I added aggregate_version already

@prolic
Copy link
Member

prolic commented Mar 12, 2017

create table ..... 
UNIQUE KEY `ix_unique_event` (`aggregate_type`, `aggregate_id`, `aggregate_version`),
INDEX `ix_query_event` (`aggregate_type`, `aggregate_id`, `aggregate_version`, `no`)
....

@oqq
Copy link
Member

oqq commented Mar 12, 2017

MySQL uses ix_unique_event in that case. Either MySQL hat decided this is the better one, or (more likely) dont know the other one is better.

Please test with an index hint https://dev.mysql.com/doc/refman/5.7/en/index-hints.html

@prolic
Copy link
Member

prolic commented Mar 12, 2017

Add a query hint would complicate the code a lot, not sure from where to get this, as indexes are only in the persistence strategy.

However for testing:

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 use INDEX (ix_query_event) WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND metadata->"$._aggregate_version" >= 1 AND `no` >= 1 ORDER BY `no` ASC LIMIT 10000;
+----+-------------+-------------------------------------------+------------+-------+----------------+----------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table                                     | partitions | type  | possible_keys  | key            | key_len | ref  | rows | filtered | Extra                                 |
+----+-------------+-------------------------------------------+------------+-------+----------------+----------------+---------+------+------+----------+---------------------------------------+
|  1 | SIMPLE      | _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 | NULL       | range | ix_query_event | ix_query_event | 572     | NULL | 2000 |    33.33 | Using index condition; Using filesort |
+----+-------------+-------------------------------------------+------------+-------+----------------+----------------+---------+------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

@prolic
Copy link
Member

prolic commented Mar 12, 2017

It's still using filesort

@oqq
Copy link
Member

oqq commented Mar 12, 2017

I have tested this without query hint and got the right index usages. Anything in your code is wrong. ;)

@prolic
Copy link
Member

prolic commented Mar 12, 2017

My table:

CREATE TABLE `_a1881c06eec96db9901c7bbfe41c42a3f08e9cb4` (
  `no` bigint(20) NOT NULL AUTO_INCREMENT,
  `event_id` char(36) COLLATE utf8_bin NOT NULL,
  `event_name` varchar(100) COLLATE utf8_bin NOT NULL,
  `payload` json NOT NULL,
  `metadata` json NOT NULL,
  `created_at` datetime(6) NOT NULL,
  `aggregate_version` int(11) unsigned GENERATED ALWAYS AS (json_extract(`metadata`,'$._aggregate_version')) STORED NOT NULL,
  `aggregate_id` char(36) COLLATE utf8_bin GENERATED ALWAYS AS (json_unquote(json_extract(`metadata`,'$._aggregate_id'))) STORED NOT NULL,
  `aggregate_type` varchar(150) COLLATE utf8_bin GENERATED ALWAYS AS (json_unquote(json_extract(`metadata`,'$._aggregate_type'))) STORED NOT NULL,
  PRIMARY KEY (`no`),
  UNIQUE KEY `ix_event_id` (`event_id`),
  UNIQUE KEY `ix_unique_event` (`aggregate_type`,`aggregate_id`,`aggregate_version`),
  KEY `ix_query_event` (`aggregate_type`,`aggregate_id`,`aggregate_version`,`no`)
) ENGINE=InnoDB AUTO_INCREMENT=96002 DEFAULT CHARSET=utf8 COLLATE=utf8_bin

query is above

@prolic
Copy link
Member

prolic commented Mar 12, 2017

MySQL: 5.7.17-0ubuntu0.16.04.1

@oqq
Copy link
Member

oqq commented Mar 12, 2017

Please give me some minutes. I will setup an own test script.

@prolic
Copy link
Member

prolic commented Mar 12, 2017

testing on postgres:

Limit  (cost=2897.16..2898.89 rows=691 width=216)
   ->  Sort  (cost=2897.16..2898.89 rows=691 width=216)
         Sort Key: no
         ->  Index Scan using _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4_expr_expr1_expr2_idx on _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4  (cost=0.42..2864.57 rows=691 width=216)
               Index Cond: (((metadata ->> '_aggregate_type'::text) = 'user2'::text) AND ((metadata ->> '_aggregate_id'::text) = 'ai_1'::text))
               Filter: ((no >= 1) AND (((metadata ->> '_aggregate_version'::text))::integer >= 1))
(6 rows)

having index like:

CREATE INDEX on $tableName
((metadata->>'_aggregate_type'), (metadata->>'_aggregate_id'), (metadata->>'_aggregate_version'), no);

seems also not to take the new index

query is

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4
WHERE metadata->>'_aggregate_type' = 'user2' AND metadata->>'_aggregate_id' = 'ai_1' AND CAST(metadata->>'_aggregate_version' as int) >= 1 AND no >= 1
ORDER BY no ASC
LIMIT 10000;

@prolic
Copy link
Member

prolic commented Mar 12, 2017

back online later

@oqq
Copy link
Member

oqq commented Mar 12, 2017

Here are my results:

Table

CREATE TABLE `_a1881c06eec96db9901c7bbfe41c42a3f08e9cb4` (
  `no` bigint(20) NOT NULL AUTO_INCREMENT,
  `event_id` char(36) COLLATE utf8_bin NOT NULL,
  `event_name` varchar(100) COLLATE utf8_bin NOT NULL,
  `payload` json NOT NULL,
  `metadata` json NOT NULL,
  `created_at` datetime(6) NOT NULL,
  `aggregate_version` int(11) unsigned GENERATED ALWAYS AS (json_extract(`metadata`,'$._aggregate_version')) STORED NOT NULL,
  `aggregate_id` char(36) COLLATE utf8_bin GENERATED ALWAYS AS (json_unquote(json_extract(`metadata`,'$._aggregate_id'))) STORED NOT NULL,
  `aggregate_type` varchar(150) COLLATE utf8_bin GENERATED ALWAYS AS (json_unquote(json_extract(`metadata`,'$._aggregate_type'))) STORED NOT NULL,
  PRIMARY KEY (`no`),
  UNIQUE KEY `ix_event_id` (`event_id`),
  UNIQUE KEY `ix_unique_event` (`aggregate_type`,`aggregate_id`,`aggregate_version`),
  KEY `ix_query_aggregate` (`aggregate_type`,`aggregate_id`,`no`),
  KEY `ix_query_aggregate2` (`aggregate_type`,`aggregate_id`,`aggregate_version`,`no`)
) ENGINE=InnoDB AUTO_INCREMENT=96002 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

with two indexes. One with version and one without.
I have used your test script to fill up the data and have ~100k values in that table.

Query without version; index without version

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 USE INDEX (ix_query_aggregate)
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND `no` >= 1
ORDER BY `no` ASC
LIMIT 10000;
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	_a1881c06eec96db9901c7bbfe41c42a3f08e9cb4	NULL	range	ix_query_aggregate	ix_query_aggregate	568	NULL	2000	100.00	Using index condition

Query without version; index with version

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 USE INDEX (ix_query_aggregate2)
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND `no` >= 1
ORDER BY `no` ASC
LIMIT 10000;
1	SIMPLE	_a1881c06eec96db9901c7bbfe41c42a3f08e9cb4	NULL	ref	ix_query_aggregate2	ix_query_aggregate2	560	const,const	2000	33.33	Using index condition; Using filesort

Query with version; index _without version

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 USE INDEX (ix_query_aggregate)
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND metadata->"$._aggregate_version" >= 1 AND `no` >= 1
ORDER BY `no` ASC
LIMIT 10000;
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	_a1881c06eec96db9901c7bbfe41c42a3f08e9cb4	NULL	range	ix_query_aggregate	ix_query_aggregate	568	NULL	2000	100.00	Using index condition; Using where

Query with version; index with version

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 USE INDEX (ix_query_aggregate2)
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND metadata->"$._aggregate_version" >= 1 AND `no` >= 1
ORDER BY `no` ASC
LIMIT 10000;
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	_a1881c06eec96db9901c7bbfe41c42a3f08e9cb4	NULL	range	ix_query_aggregate2	ix_query_aggregate2	572	NULL	2000	33.33	Using index condition; Using filesort

All queries without AND no >= 1

The type changes from "range" to "ref"

Conclusion

  • The index without version is the better one in all cases. (type, id, no)
  • The index with version would always using filesort. Not sure why, but thats what happens.
  • removing AND no >= 1 from query has better performance (not much). So if the load method is executed with count = null, we should remove that from query.
  • If the aggregate_version does not matter, aggregate_version >= 1 should be removed from query. If it does matter, it could be added and would not conclude much on performance.
  • Metadata queries should always be sorted with type and id in first place.
  • We need to investigate why the right index is not used without use index

@prolic
Copy link
Member

prolic commented Mar 12, 2017

thanks @oqq - am playing with it a little now

@prolic
Copy link
Member

prolic commented Mar 12, 2017

Postgres

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4
WHERE metadata->>'_aggregate_type' = 'user2' AND metadata->>'_aggregate_id' = 'ai_1' AND no >= 1
ORDER BY no ASC
LIMIT 10000;
Limit  (cost=0.42..2734.22 rows=1955 width=216)
   ->  Index Scan using _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4_expr_expr1_no_idx on _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4  (cost=0.42..2734.22 rows=1955 width=216)
         Index Cond: (((metadata ->> '_aggregate_type'::text) = 'user2'::text) AND ((metadata ->> '_aggregate_id'::text) = 'ai_1'::text) AND (no >= 1))
(3 rows)

MySQL

EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 USE INDEX(ix_query_aggregate)
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND `no` >= 1
ORDER BY `no` ASC
LIMIT 10000;
+----+-------------+-------------------------------------------+------------+-------+--------------------+--------------------+---------+------+------+----------+-----------------------+
| id | select_type | table                                     | partitions | type  | possible_keys      | key                | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------------------------------------------+------------+-------+--------------------+--------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 | NULL       | range | ix_query_aggregate | ix_query_aggregate | 568     | NULL | 2000 |   100.00 | Using index condition |
+----+-------------+-------------------------------------------+------------+-------+--------------------+--------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

prooph/event-sourcing#57 is closed
#66 is updated

@prolic
Copy link
Member

prolic commented Mar 12, 2017

Btw: Postgres is able to get the correct index himself - way better :)

@oqq
Copy link
Member

oqq commented Mar 12, 2017

That looks much better now. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants