-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve event store indexes #43
Comments
@basz Can you verify the problems with the newest master? |
I still seem to have lowering throughput after a while... I don't understand why exactly... The thing is that replaying events has stable performance. That tells me it isn't a problem with the read models (v6 style). By the way I don't see any difference in the schema, but that might be because I update regularly to master... |
@basz can you provide a small example gist? |
would this help? two aggregates one with ~60000 events one with just 1. https://gist.github.com/basz/f1b6e6c7262e8fabacacb1c05cb35a70 |
Are you using the current implementation from If so please test this query: SELECT * FROM _7794a224f85453058cf637d52e7c4487568a8271
-> WHERE metadata->"$._aggregate_version" > 0 AND metadata->"$._aggregate_id" = '503113e6-820f-5523-b18f-e71a56db31eb' AND metadata->"$._aggregate_type" = 'HF\\Model\\Domain\\Dossier\\Aggregate\\Dossier' AND `no` >= 1
-> ORDER BY `no` ASC @prolic the main reason for that issue is the lack of indexes. This would improve performance for such queries against tables with much data. |
I created a test-script for this and am testing now. |
Please provide |
I have something else to do now, will come back to it later this evening. |
Here is the reordered MySqlSingleStreamStrategy: https://gist.github.com/prolic/389820fcd30c821fcce5da3e1e9c6f89 Here are the test-scripts: Usage:MySql./vendor/bin/phpunit -c phpunit.xml.pdo_mysql --group=prepare tests/MyTest.php Postgres./vendor/bin/phpunit -c phpunit.xml.pdo_pgsql --group=prepare tests/PoTest.php MySQL Explain saysusing MySqlSingleStreamStrategy
using reordered index on MySqlSingleStreamStrategy
Postgres Explain saysusing PostgresSingleStreamStrategy
using reordered index on PostgresSingleStreamStrategy
Test-Resultsusing MySqlSingleStreamStrategyai_1: 0.077847003936768 using reordered index on MySqlSingleStreamStrategyai_1: 0.014132976531982 using MySqlAggregateStreamStrategyai_1: 0.010020017623901 using PostgresSingleStreamStrategyai_1: 0.049942016601562 using reordered index on PostgresSingleStreamStrategyai_1: 0.0057878494262695 using PostgresAggregateStreamStrategyai_1: 0.0055849552154541 ConclusionThe SingleStreamStrategy needs reordering of the indexes! PRsee here: #66 |
TL;RL: reordering the index is not enough. Problem here is that an reorder of index would not help. That helps in this use case, since the query creates a where condition in that order. If you have a query with But if anyone run this query with e.g. What we need is a sort in MetadataMatcher. The order of the index is regardless, but the query must match the existing index. So what I suggest is: So, if we had an index like this: |
Oh, this could be easily done by enriching the where condition with: $where = array_merge(['type' => "!=''", 'id' => "!=''", 'version' => '!=0'], $where); So the right order for present values are correct and not provided value would be filled up. How about to add something like a |
Interessting: The insert with the provided test-scripts above are much faster on MySQL than with Postgres, while Postgres reads much faster than MySQL. Could be that this is due to the default configuration of the databases. Any ideas what's going on here? |
When I add About the other stuff: The only problem that we have currently is with SingleStreamStrategy, so let's talk about this one now. Using prooph/event-sourcing for your aggregates, you don't create the MetadataMatcher yourself, it's part of the event-sourcing component. So there is no need for reordering, we simply confirm that the same orders are used in code. (Same for micro, we should add some documentation around this topic perhaps.) If you bypass event-sourcing component or access the event-store directly (for whatever reasons) and want to pass custom MetadataMatchers, you are responsible for the correct order of matches (and even chose a custom PersistanceStrategy with custom indexes, to improve performance) - so I think this is only a documentation issue. @oqq If you are able to remove the |
My Postgres results are ready (with added First, without
timing:
With added
timing:
So we see: Using Postgres adding Sadly, the same change doesn't make a difference on MySQL - I don't know why. Ideas? |
Test-preparation (inserts)using Postgres:Time: 52.11 seconds using MySQLTime: 1.21 minutes So Postgres is also faster writing - I said something different before - might have been a missing warmup - sorry for the confusion. |
Your query is lacking the If the where clause is like WHERE type = ? AND id = ? AND version = ? [AND no = ?] ORDER BY no So MySQL could read all entries directly from the index in that order and no additional sorting is necessary. if you run a query like this WHERE type = ? AND id = ? [AND no = ?] ORDER BY no than the I am not sure how Postgres works. But I am pretty sure that RDBs works quite similar in that case. Also I am not able to read Postgres How I read this line i guess thats the same as
TL;RL
I will use your test script for my self today. For now I guess an index |
I don't understand, what is your use case for type - no or type - id - no ?
Using event sourcing component no such query is ever executed.
Am 12.03.2017 17:34 schrieb "Eric Braun" <[email protected]>:
Your query is lacking the version in query. If your index is (type, id,
version, no) than MySQL can use no in that index for sorting.
If the where clause is like
WHERE type = ? AND id = ? AND version = ? [AND no = ?] ORDER BY no
So MySQL could read all entries directly from the index in that order and
no additional sorting is necessary.
if you run a query like this
WHERE type = ? AND id = ? [AND no = ?] ORDER BY no
than the no part of that index would be ignored. MySQL could not know about
this part in the index, cuz the lack of version. Such an index is not
stored as a set of values. It is more a joint value like type-id-version-no.
So only type-id of that index is used for searching and an additional
sorting after searching is needed.
I am not sure how Postgres works. But I am pretty sure that RDBs works
quite similar in that case. Also I am also not able to read Postgres EXPLAIN
for now. But I know Using filesort in MySQL is really bad and could omitted
with better indexes like my description.
How I read this line:
Sort (cost=2909.97..2915.04 rows=2029 width=216)
I guess thats the same as Using filesort in MySQL and should be avoided.
*TL;RL*
- use an index (type, id, version, no)
- query with WHERE type = ? AND id = ? AND version != 0 ORDER BY no
OR
- use an index (type, id, no)
- query with WHERE type = ? AND id = ? ORDER BY no
OR
- use an index (type, no)
- query with WHERE type = ? ORDER BY no
AND
- display your results here ;)
I will use your test script for my self today.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#43 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAYEvD55eWazYuLIS3ritDkwHVaiAWIXks5rk7wggaJpZM4Llkxh>
.
|
When i WHERE type = ? AND id = ? ORDER BY no right? If you remove If you have I will provide a test case to demonstrate that. |
I see, we can change event-sourcing to always add |
Not necessary. OR we sort by Maybe we should collect all probably queries. Than I can provide an index strategy with smallest storage food print but good performance for all. |
we currently use this: type, id, version, no (using snapshots) and type, id, no (without snapshots) we can add version >= 1 to the second query (without snapshots) and everything should work just fine. No other queries are executed at all - when a user has a special use case to have additional event store queries, he needs a custom persistence strategy. |
If you add |
no the |
What I see here: Let this index be small as possible. So maybe |
|
(type, id, no) could also be used if the query is Not perfectly, but better than currently. |
I just tested with another index like
and explain gives:
|
Your query starts with |
ups, wrong order, again:
|
I got this: EXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND `no` >= 1
ORDER BY `no` ASC
|
Difference is, I added aggregate_version already |
|
MySQL uses ix_unique_event in that case. Either MySQL hat decided this is the better one, or (more likely) dont know the other one is better. Please test with an index hint https://dev.mysql.com/doc/refman/5.7/en/index-hints.html |
Add a query hint would complicate the code a lot, not sure from where to get this, as indexes are only in the persistence strategy. However for testing:
|
It's still using filesort |
I have tested this without query hint and got the right index usages. Anything in your code is wrong. ;) |
My table:
query is above |
MySQL: 5.7.17-0ubuntu0.16.04.1 |
Please give me some minutes. I will setup an own test script. |
testing on postgres:
having index like:
seems also not to take the new index query is
|
back online later |
Here are my results: TableCREATE TABLE `_a1881c06eec96db9901c7bbfe41c42a3f08e9cb4` (
`no` bigint(20) NOT NULL AUTO_INCREMENT,
`event_id` char(36) COLLATE utf8_bin NOT NULL,
`event_name` varchar(100) COLLATE utf8_bin NOT NULL,
`payload` json NOT NULL,
`metadata` json NOT NULL,
`created_at` datetime(6) NOT NULL,
`aggregate_version` int(11) unsigned GENERATED ALWAYS AS (json_extract(`metadata`,'$._aggregate_version')) STORED NOT NULL,
`aggregate_id` char(36) COLLATE utf8_bin GENERATED ALWAYS AS (json_unquote(json_extract(`metadata`,'$._aggregate_id'))) STORED NOT NULL,
`aggregate_type` varchar(150) COLLATE utf8_bin GENERATED ALWAYS AS (json_unquote(json_extract(`metadata`,'$._aggregate_type'))) STORED NOT NULL,
PRIMARY KEY (`no`),
UNIQUE KEY `ix_event_id` (`event_id`),
UNIQUE KEY `ix_unique_event` (`aggregate_type`,`aggregate_id`,`aggregate_version`),
KEY `ix_query_aggregate` (`aggregate_type`,`aggregate_id`,`no`),
KEY `ix_query_aggregate2` (`aggregate_type`,`aggregate_id`,`aggregate_version`,`no`)
) ENGINE=InnoDB AUTO_INCREMENT=96002 DEFAULT CHARSET=utf8 COLLATE=utf8_bin; with two indexes. One with Query without version; index without versionEXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 USE INDEX (ix_query_aggregate)
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND `no` >= 1
ORDER BY `no` ASC
LIMIT 10000;
Query without version; index with versionEXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 USE INDEX (ix_query_aggregate2)
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND `no` >= 1
ORDER BY `no` ASC
LIMIT 10000;
Query with version; index _without versionEXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 USE INDEX (ix_query_aggregate)
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND metadata->"$._aggregate_version" >= 1 AND `no` >= 1
ORDER BY `no` ASC
LIMIT 10000;
Query with version; index with versionEXPLAIN SELECT * FROM _a1881c06eec96db9901c7bbfe41c42a3f08e9cb4 USE INDEX (ix_query_aggregate2)
WHERE metadata->"$._aggregate_type" = 'user2' AND metadata->"$._aggregate_id" = 'ai_1' AND metadata->"$._aggregate_version" >= 1 AND `no` >= 1
ORDER BY `no` ASC
LIMIT 10000;
All queries without
|
thanks @oqq - am playing with it a little now |
Postgres
MySQL
prooph/event-sourcing#57 is closed |
Btw: Postgres is able to get the correct index himself - way better :) |
That looks much better now. 👍 |
referring #42
some indexed need to be changed (ix_unique_event?)
@oqq
The text was updated successfully, but these errors were encountered: