Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient queries in sync_role_definitions #25040

Closed
3 tasks done
giftig opened this issue Aug 21, 2023 · 4 comments
Closed
3 tasks done

Inefficient queries in sync_role_definitions #25040

giftig opened this issue Aug 21, 2023 · 4 comments
Labels
good first issue Good first issues for new contributors

Comments

@giftig
Copy link
Contributor

giftig commented Aug 21, 2023

On init, the security manager syncs role definitions for Admin, Alpha, Gamma, etc. by invoking set_role with each role, passing in a function which filters PermissionView instances and determines if they should be in that role. set_role then fetches all PermissionView instances from the database and invokes the filter function.

The filter function checks a load of rules about viewmenu and permission name, so ultimately we need to look at pvm.permission.name and pvm.view_menu.name for every PermissionView we retrieved from the database: that's where this function is very inefficient. Because the relations with Permission and ViewMenu are lazy, it's executing another 2 SELECT queries each for each pvm to get those two names, which should have been eagerly loaded when getting all the instances. Additionally, since all PermissionView instances are fetched for every invocation of set_role, these queries are all run another 5x each.

So the query count being executed by this sync task is approximately:

5 * 2 * (pvm_count)

There's also a ViewMenu for every dataset, so pvm_count can easily be 10000+.

As a result of that, we're seeing this task hammer the database and take upward of 20 mins to execute. If we update it to eagerly load the relations it's likely we could reduce this to a single query and make it ~100x faster.

How to reproduce the bug

  1. Set SQLALCHEMY_ECHO = True
  2. Run the sync_role_definitions task, either from the flask shell or as part of the superset init process
  3. Observe a large number of individual SELECT queries being performed.

Expected results

Should be faster and use fewer queries

Actual results

Slow, lots of queries.

Environment

(please complete the following information):

  • browser type and version: N/A
  • superset version: 2.1.0, but also checked the function in master
  • python version: N/A
  • node.js version: N/A
  • any feature flags active: N/A

Checklist

Make sure to follow these steps before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

Here's a sample of some of the queries observed:

2023-08-21 13:13:04,272 INFO sqlalchemy.engine.Engine SELECT ab_view_menu.id AS ab_view_menu_id, ab_view_menu.name AS ab_view_menu_name 
FROM ab_view_menu 
WHERE ab_view_menu.id = %(pk_1)s

clearly showing it's looking up entries individually.

I have a pretty good idea how to fix this so I will raise a PR if I get time, but it's quite a simple fix so perhaps someone who wants to contribute a small change to the project could take a look at it as well. The inefficiency doesn't have enormous impact because it happens during init, but it does nevertheless slow down the init process a lot when you have many datasets, so it would be good to fix.

@eschutho eschutho added the good first issue Good first issues for new contributors label Aug 22, 2023
@IEdiong
Copy link

IEdiong commented Sep 28, 2023

Hi @eschutho, can this issue be assigned to me.

@giftig
Copy link
Contributor Author

giftig commented Sep 28, 2023

@IEdiong there's already a PR raised by @suicide11 for this issue; looks like github didn't link it properly because of the formatting of the commit message.

#25340

@IEdiong
Copy link

IEdiong commented Sep 28, 2023

Ok

rusackas pushed a commit that referenced this issue Oct 11, 2023
@giftig
Copy link
Contributor Author

giftig commented Nov 6, 2023

Closing as the associated PR was merged.

@giftig giftig closed this as completed Nov 6, 2023
cccs-rc pushed a commit to CybercentreCanada/superset that referenced this issue Mar 6, 2024
sfirke pushed a commit to sfirke/superset that referenced this issue Mar 22, 2024
vinothkumar66 pushed a commit to vinothkumar66/superset that referenced this issue Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good first issues for new contributors
Projects
None yet
Development

No branches or pull requests

3 participants