-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Role model/dataset #3068
Optimize Role model/dataset #3068
Conversation
54cdc4f
to
13bd96b
Compare
The Role model is not backed by a dedicated table, but a dataset that is a UNION of eight individual roles tables. When filtering for roles, this was done by adding WHERE conditions to the overall dataset, which resulted in slow query executions. A more efficient approach is to move the WHERE conditions into the individual datasets per role. This commit customizes the Role model/dataset. The creation of the UNIONed dataset is moved into a dedicated class (RoleDataset) that supports building a dataset with filters (i.e. WHERE conditions) per role (i.e. individual table). The 'where' method of the Role dataset is overridden and uses Dataset.from ([1]) to change the dataset's source. It replaces the source with a new dataset built with filters per role. As 'where' calls can be chained, the filters need to be kept in the dataset instance (cache) and the source dataset needs to be built again on each 'where' invocation with all the filters, i.e. previous and new WHERE conditions. Furthermore fully qualified identifiers (table name and column name) have to be adapted; as the WHERE conditions are applied on the individual roles tables, no table name prefix is required and thus removed from the identifiers. Some filters can now be handled in a more efficient way. When filtering for a specific role type, the corresponding dataset gets a 'WHERE 1 = 1', whereas all the other datasets get a 'WHERE 1 = 0', meaning they never return any rows. Something similar is done when filtering for organizations or spaces; the opposite role types get a 'WHERE 1 = 0' as they don't have the requested 'id' field. As there might be filters that still should be applied to the overall dataset, 'where' invocations with a block (aka. virtual row block) are treated as before, i.e. applied at the end. Drawbacks: - The current implementation of the custom Role model/dataset is rather strict with regards to allowed filter expressions. This means that if some functionality is added (e.g. new or different filter), the custom Role model/dataset needs to be adapted. This was done by intention to ensure that every code path is tested. - To limit the number of different filter expressions, invocations of 'first(cond)' and 'find(cond)' on the custom Role model/dataset have been replaced with 'where(cond).first' which is semantically identical. [1] https://sequel.jeremyevans.net/rdoc/classes/Sequel/Dataset.html#method-i-from
13bd96b
to
638479b
Compare
An SQL statement taken from the unit tests (three filters are applied on each individual roles table) and the complex "visibility" filter is applied at the end:
|
This looks good to me and it is two orders of magnitude faster in an environment we have that is modeling modest user scale. Is there any reason it is still a draft? |
The main reason for keeping this as a draft was that I wanted to perform more performance testing. As you run such tests in your environment, it should be good to be merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looked good to @evanfarrar and I. In our scenario, we saw a query for roles drop to about 0.002 seconds from over 2 seconds. And a query for /v3/organizations/:guid/users drop to 0.0009 seconds from 0.01 seconds.
The drawbacks seem like reasonable compromises to me, as the first will force us to think about this if we want a new filter. The later seemed reasonable since otherwise you'd have to write your own first
method
The optimizations done in PR cloudfoundry#3068 [1] make the dedicated org and space role classes superfluous. [1] cloudfoundry#3068
The optimizations done in PR cloudfoundry#3068 [1] make the dedicated org and space role classes superfluous. [1] cloudfoundry#3068 Co-authored-by: Josh Russett <[email protected]>
The
Role
model is not backed by a dedicated table, but a dataset that is a UNION of eight individual roles tables. When filtering for roles, this was done by adding WHERE conditions to the overall dataset, which resulted in slow query executions. A more efficient approach is to move the WHERE conditions into the individual datasets per role.This commit customizes the
Role
model/dataset. The creation of the UNIONed dataset is moved into a dedicated class (RoleDataset
) that supports building a dataset with filters (i.e. WHERE conditions) per role (i.e. individual table). Thewhere
method of theRole
dataset is overridden and usesDataset.from
([1]) to change the dataset's source. It replaces the source with a new dataset built with filters per role.As
where
calls can be chained, the filters need to be kept in the dataset instance (cache
) and the source dataset needs to be built again on eachwhere
invocation with all the filters, i.e. previous and new WHERE conditions.Furthermore fully qualified identifiers (table name and column name) have to be adapted; as the WHERE conditions are applied on the individual roles tables, no table name prefix is required and thus removed from the identifiers.
Some filters can now be handled in a more efficient way. When filtering for a specific role type, the corresponding dataset gets a
WHERE 1 = 1
, whereas all the other datasets get aWHERE 1 = 0
, meaning they never return any rows. Something similar is done when filtering for organizations or spaces; the opposite role types get aWHERE 1 = 0
as they don't have the requestedid
field.As there might be filters that still should be applied to the overall dataset,
where
invocations with a block (aka. virtual row block) are treated as before, i.e. applied at the end.Drawbacks:
Role
model/dataset is rather strict with regards to allowed filter expressions. This means that if some functionality is added (e.g. new or different filter), the customRole
model/dataset needs to be adapted. This was done by intention to ensure that every code path is tested.first(cond)
andfind(cond)
on the customRole
model/dataset have been replaced withwhere(cond).first
which is semantically identical.[1] https://sequel.jeremyevans.net/rdoc/classes/Sequel/Dataset.html#method-i-from
I have reviewed the contributing guide
I have viewed, signed, and submitted the Contributor License Agreement
I have made this pull request to the
main
branchI have run all the unit tests using
bundle exec rake
I have run CF Acceptance Tests