Optimize Role model/dataset #3068

philippthun · 2022-11-18T09:09:17Z

The Role model is not backed by a dedicated table, but a dataset that is a UNION of eight individual roles tables. When filtering for roles, this was done by adding WHERE conditions to the overall dataset, which resulted in slow query executions. A more efficient approach is to move the WHERE conditions into the individual datasets per role.

This commit customizes the Role model/dataset. The creation of the UNIONed dataset is moved into a dedicated class (RoleDataset) that supports building a dataset with filters (i.e. WHERE conditions) per role (i.e. individual table). The where method of the Role dataset is overridden and uses Dataset.from ([1]) to change the dataset's source. It replaces the source with a new dataset built with filters per role.

As where calls can be chained, the filters need to be kept in the dataset instance (cache) and the source dataset needs to be built again on each where invocation with all the filters, i.e. previous and new WHERE conditions.

Furthermore fully qualified identifiers (table name and column name) have to be adapted; as the WHERE conditions are applied on the individual roles tables, no table name prefix is required and thus removed from the identifiers.

Some filters can now be handled in a more efficient way. When filtering for a specific role type, the corresponding dataset gets a WHERE 1 = 1, whereas all the other datasets get a WHERE 1 = 0, meaning they never return any rows. Something similar is done when filtering for organizations or spaces; the opposite role types get a WHERE 1 = 0 as they don't have the requested id field.

As there might be filters that still should be applied to the overall dataset, where invocations with a block (aka. virtual row block) are treated as before, i.e. applied at the end.

Drawbacks:

The current implementation of the custom Role model/dataset is rather strict with regards to allowed filter expressions. This means that if some functionality is added (e.g. new or different filter), the custom Role model/dataset needs to be adapted. This was done by intention to ensure that every code path is tested.
To limit the number of different filter expressions, invocations of first(cond) and find(cond) on the custom Role model/dataset have been replaced with where(cond).first which is semantically identical.

[1] https://sequel.jeremyevans.net/rdoc/classes/Sequel/Dataset.html#method-i-from

I have reviewed the contributing guide
I have viewed, signed, and submitted the Contributor License Agreement
I have made this pull request to the main branch
I have run all the unit tests using bundle exec rake
I have run CF Acceptance Tests

The Role model is not backed by a dedicated table, but a dataset that is a UNION of eight individual roles tables. When filtering for roles, this was done by adding WHERE conditions to the overall dataset, which resulted in slow query executions. A more efficient approach is to move the WHERE conditions into the individual datasets per role. This commit customizes the Role model/dataset. The creation of the UNIONed dataset is moved into a dedicated class (RoleDataset) that supports building a dataset with filters (i.e. WHERE conditions) per role (i.e. individual table). The 'where' method of the Role dataset is overridden and uses Dataset.from ([1]) to change the dataset's source. It replaces the source with a new dataset built with filters per role. As 'where' calls can be chained, the filters need to be kept in the dataset instance (cache) and the source dataset needs to be built again on each 'where' invocation with all the filters, i.e. previous and new WHERE conditions. Furthermore fully qualified identifiers (table name and column name) have to be adapted; as the WHERE conditions are applied on the individual roles tables, no table name prefix is required and thus removed from the identifiers. Some filters can now be handled in a more efficient way. When filtering for a specific role type, the corresponding dataset gets a 'WHERE 1 = 1', whereas all the other datasets get a 'WHERE 1 = 0', meaning they never return any rows. Something similar is done when filtering for organizations or spaces; the opposite role types get a 'WHERE 1 = 0' as they don't have the requested 'id' field. As there might be filters that still should be applied to the overall dataset, 'where' invocations with a block (aka. virtual row block) are treated as before, i.e. applied at the end. Drawbacks: - The current implementation of the custom Role model/dataset is rather strict with regards to allowed filter expressions. This means that if some functionality is added (e.g. new or different filter), the custom Role model/dataset needs to be adapted. This was done by intention to ensure that every code path is tested. - To limit the number of different filter expressions, invocations of 'first(cond)' and 'find(cond)' on the custom Role model/dataset have been replaced with 'where(cond).first' which is semantically identical. [1] https://sequel.jeremyevans.net/rdoc/classes/Sequel/Dataset.html#method-i-from

philippthun · 2022-11-18T11:22:03Z

An SQL statement taken from the unit tests (three filters are applied on each individual roles table) and the complex "visibility" filter is applied at the end:

SELECT *, count(*) OVER () AS `pagination_total_results`
FROM (SELECT 'organization_auditor' AS `type`,
             `role_guid`            AS `guid`,
             `organization_id`,
             -1                     AS `space_id`,
             `user_id`,
             `created_at`,
             `updated_at`
      FROM `organizations_auditors`
      WHERE ((`user_id` IN (SELECT `id` FROM `users` WHERE (`guid` IN ('other-user-guid')))) AND
             (`created_at` < '2028-05-26 18:47:01') AND
             (`role_guid` IN ('f1faa0ec-69d4-4d95-9403-7da9ef3984b8', '5e094f74-5027-4d6b-8fbc-86e29449cccf')))
      UNION ALL
      SELECT 'organization_manager' AS `type`,
             `role_guid`            AS `guid`,
             `organization_id`,
             -1                     AS `space_id`,
             `user_id`,
             `created_at`,
             `updated_at`
      FROM `organizations_managers`
      WHERE ((`user_id` IN (SELECT `id` FROM `users` WHERE (`guid` IN ('other-user-guid')))) AND
             (`created_at` < '2028-05-26 18:47:01') AND
             (`role_guid` IN ('f1faa0ec-69d4-4d95-9403-7da9ef3984b8', '5e094f74-5027-4d6b-8fbc-86e29449cccf')))
      UNION ALL
      SELECT 'organization_billing_manager' AS `type`,
             `role_guid`                    AS `guid`,
             `organization_id`,
             -1                             AS `space_id`,
             `user_id`,
             `created_at`,
             `updated_at`
      FROM `organizations_billing_managers`
      WHERE ((`user_id` IN (SELECT `id` FROM `users` WHERE (`guid` IN ('other-user-guid')))) AND
             (`created_at` < '2028-05-26 18:47:01') AND
             (`role_guid` IN ('f1faa0ec-69d4-4d95-9403-7da9ef3984b8', '5e094f74-5027-4d6b-8fbc-86e29449cccf')))
      UNION ALL
      SELECT 'organization_user' AS `type`,
             `role_guid`         AS `guid`,
             `organization_id`,
             -1                  AS `space_id`,
             `user_id`,
             `created_at`,
             `updated_at`
      FROM `organizations_users`
      WHERE ((`user_id` IN (SELECT `id` FROM `users` WHERE (`guid` IN ('other-user-guid')))) AND
             (`created_at` < '2028-05-26 18:47:01') AND
             (`role_guid` IN ('f1faa0ec-69d4-4d95-9403-7da9ef3984b8', '5e094f74-5027-4d6b-8fbc-86e29449cccf')))
      UNION ALL
      SELECT 'space_auditor' AS `type`,
             `role_guid`     AS `guid`,
             -1              AS `organization_id`,
             `space_id`,
             `user_id`,
             `created_at`,
             `updated_at`
      FROM `spaces_auditors`
      WHERE ((`user_id` IN (SELECT `id` FROM `users` WHERE (`guid` IN ('other-user-guid')))) AND
             (`created_at` < '2028-05-26 18:47:01') AND
             (`role_guid` IN ('f1faa0ec-69d4-4d95-9403-7da9ef3984b8', '5e094f74-5027-4d6b-8fbc-86e29449cccf')))
      UNION ALL
      SELECT 'space_manager' AS `type`,
             `role_guid`     AS `guid`,
             -1              AS `organization_id`,
             `space_id`,
             `user_id`,
             `created_at`,
             `updated_at`
      FROM `spaces_managers`
      WHERE ((`user_id` IN (SELECT `id` FROM `users` WHERE (`guid` IN ('other-user-guid')))) AND
             (`created_at` < '2028-05-26 18:47:01') AND
             (`role_guid` IN ('f1faa0ec-69d4-4d95-9403-7da9ef3984b8', '5e094f74-5027-4d6b-8fbc-86e29449cccf')))
      UNION ALL
      SELECT 'space_developer' AS `type`,
             `role_guid`       AS `guid`,
             -1                AS `organization_id`,
             `space_id`,
             `user_id`,
             `created_at`,
             `updated_at`
      FROM `spaces_developers`
      WHERE ((`user_id` IN (SELECT `id` FROM `users` WHERE (`guid` IN ('other-user-guid')))) AND
             (`created_at` < '2028-05-26 18:47:01') AND
             (`role_guid` IN ('f1faa0ec-69d4-4d95-9403-7da9ef3984b8', '5e094f74-5027-4d6b-8fbc-86e29449cccf')))
      UNION ALL
      SELECT 'space_supporter' AS `type`,
             `role_guid`       AS `guid`,
             -1                AS `organization_id`,
             `space_id`,
             `user_id`,
             `created_at`,
             `updated_at`
      FROM `spaces_supporters`
      WHERE ((`user_id` IN (SELECT `id` FROM `users` WHERE (`guid` IN ('other-user-guid')))) AND
             (`created_at` < '2028-05-26 18:47:01') AND
             (`role_guid` IN ('f1faa0ec-69d4-4d95-9403-7da9ef3984b8', '5e094f74-5027-4d6b-8fbc-86e29449cccf')))) AS `t1`
WHERE ((`space_id` IN (SELECT `space_id`
                       FROM `spaces_developers`
                       WHERE (`user_id` = 4)
                       UNION
                       SELECT `space_id`
                       FROM `spaces_managers`
                       WHERE (`user_id` = 4)
                       UNION
                       SELECT `space_id`
                       FROM `spaces_auditors`
                       WHERE (`user_id` = 4)
                       UNION
                       SELECT `space_id`
                       FROM `spaces_supporters`
                       WHERE (`user_id` = 4)
                       UNION
                       SELECT `spaces`.`id`
                       FROM `spaces`
                                INNER JOIN (SELECT `organization_id`
                                            FROM `organizations_managers`
                                            WHERE (`user_id` = 4)) AS `t1`
                                           ON (`t1`.`organization_id` = `spaces`.`organization_id`))) OR
       (`organization_id` IN (2)))
ORDER BY `t1`.`created_at` DESC, `t1`.`guid` ASC LIMIT 50
OFFSET 0

evanfarrar · 2023-01-20T18:08:14Z

This looks good to me and it is two orders of magnitude faster in an environment we have that is modeling modest user scale. Is there any reason it is still a draft?

philippthun · 2023-01-24T08:25:13Z

The main reason for keeping this as a draft was that I wanted to perform more performance testing. As you run such tests in your environment, it should be good to be merged.

moleske

This looked good to @evanfarrar and I. In our scenario, we saw a query for roles drop to about 0.002 seconds from over 2 seconds. And a query for /v3/organizations/:guid/users drop to 0.0009 seconds from 0.01 seconds.

The drawbacks seem like reasonable compromises to me, as the first will force us to think about this if we want a new filter. The later seemed reasonable since otherwise you'd have to write your own first method

The optimizations done in PR cloudfoundry#3068 [1] make the dedicated org and space role classes superfluous. [1] cloudfoundry#3068

The optimizations done in PR cloudfoundry#3068 [1] make the dedicated org and space role classes superfluous. [1] cloudfoundry#3068 Co-authored-by: Josh Russett <[email protected]>

cf-gitbot added the unscheduled label Nov 18, 2022

philippthun force-pushed the more-specific-role-queries branch 2 times, most recently from 54cdc4f to 13bd96b Compare November 18, 2022 10:55

philippthun changed the title ~~wip~~ Optimize Role model/dataset Nov 18, 2022

philippthun force-pushed the more-specific-role-queries branch from 13bd96b to 638479b Compare November 18, 2022 11:02

philippthun mentioned this pull request Nov 18, 2022

slow /v3/roles call. cloudfoundry/capi-release#278

Closed

philippthun mentioned this pull request Jan 20, 2023

WIP/RFC: parameterize roles by org #3153

Closed

5 tasks

philippthun marked this pull request as ready for review January 24, 2023 08:21

moleske approved these changes Jan 25, 2023

View reviewed changes

moleske merged commit 2258522 into cloudfoundry:main Jan 25, 2023

cf-gitbot removed the unscheduled label Jan 25, 2023

philippthun mentioned this pull request Feb 7, 2023

Add tests for /v3/roles cloudfoundry/cf-performance-tests#30

Merged

philippthun mentioned this pull request Apr 24, 2023

GET v3 user space roles is slow #1974

Closed

philippthun mentioned this pull request Sep 14, 2023

Remove dedicated org and space role classes #3439

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Role model/dataset #3068

Optimize Role model/dataset #3068

philippthun commented Nov 18, 2022 •

edited

Loading

philippthun commented Nov 18, 2022

evanfarrar commented Jan 20, 2023 •

edited

Loading

philippthun commented Jan 24, 2023

moleske left a comment

Optimize Role model/dataset #3068

Optimize Role model/dataset #3068

Conversation

philippthun commented Nov 18, 2022 • edited Loading

philippthun commented Nov 18, 2022

evanfarrar commented Jan 20, 2023 • edited Loading

philippthun commented Jan 24, 2023

moleske left a comment

Choose a reason for hiding this comment

philippthun commented Nov 18, 2022 •

edited

Loading

evanfarrar commented Jan 20, 2023 •

edited

Loading