Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: memoize db_engine_spec in database #14638

Merged
merged 2 commits into from
May 14, 2021

Conversation

villebro
Copy link
Member

@villebro villebro commented May 14, 2021

SUMMARY

A recent PR #14547 introduced a performance regression causing dataset metadata fetching to become very slow for datasets with large numbers of columns. I originally thought the type regexes were the problem, but when researching the problem more closely it turns out that just referencing self.table.database.db_engine_spec in a TableColumn instance cost ~6ms on my local machine. Multiply that by 1000 columns ~= 6000 ms. To get around this I added memoization to the semi-expensive regex, but also added memoizing for Database.db_engine_spec. This should also speed up query rendering a bit, as there was similar logic there.

BEFORE #14547 (pre-regression)

For the World Bank dataset (328 cols), fetching the data took slightly less than 180ms before on my local machine (including the unnecessary 20 ms redirect):
image

CURRENT (master)

For the same dataset, retrieval of data now takes ~10s!
image

AFTER

Retrieval is now slightly quicker than originally (including no redirect):
image

TEST PLAN

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@villebro villebro force-pushed the villebro/memoize-db-engine-spec branch from 083d3b5 to 1ff8ab7 Compare May 14, 2021 07:46
@codecov
Copy link

codecov bot commented May 14, 2021

Codecov Report

Merging #14638 (06f90f6) into master (e4d2424) will decrease coverage by 0.09%.
The diff coverage is 93.10%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #14638      +/-   ##
==========================================
- Coverage   77.47%   77.38%   -0.10%     
==========================================
  Files         958      958              
  Lines       48486    48480       -6     
  Branches     5679     5683       +4     
==========================================
- Hits        37565    37514      -51     
- Misses      10721    10766      +45     
  Partials      200      200              
Flag Coverage Δ
hive 80.94% <92.85%> (-0.03%) ⬇️
javascript 72.52% <100.00%> (+<0.01%) ⬆️
mysql 81.21% <92.85%> (-0.03%) ⬇️
postgres 81.23% <92.85%> (-0.03%) ⬇️
presto ?
python 81.60% <92.85%> (-0.19%) ⬇️
sqlite 80.85% <92.85%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/connectors/sqla/models.py 88.64% <92.00%> (-1.62%) ⬇️
...-frontend/src/datasource/ChangeDatasourceModal.tsx 90.90% <100.00%> (+3.89%) ⬆️
superset/db_engine_specs/base.py 88.45% <100.00%> (+0.02%) ⬆️
superset/models/core.py 89.40% <100.00%> (+0.27%) ⬆️
superset/db_engine_specs/presto.py 84.42% <0.00%> (-5.90%) ⬇️
superset/connectors/base/models.py 88.03% <0.00%> (-2.66%) ⬇️
superset-frontend/src/components/Tabs/Tabs.tsx 96.55% <0.00%> (-0.33%) ⬇️
...components/DashboardBuilder/DashboardContainer.tsx 100.00% <0.00%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4d2424...06f90f6. Read the comment docs.

@villebro villebro force-pushed the villebro/memoize-db-engine-spec branch from 1ff8ab7 to 5e28bb6 Compare May 14, 2021 08:08
@villebro villebro requested a review from dpgaspar May 14, 2021 08:14
@villebro villebro force-pushed the villebro/memoize-db-engine-spec branch from 5e28bb6 to 2f38ef5 Compare May 14, 2021 08:19
@kgabryje
Copy link
Member

Can we remove the cypress timeout overrides introduced in the other PR?

@villebro villebro changed the title perf: memoize db_engine_spec in sqla table classes perf: memoize db_engine_spec in database May 14, 2021
@villebro
Copy link
Member Author

Can we remove the cypress timeout overrides introduced in the other PR?

Good idea; I'll do that. I've also been looking into introducing perf tests on the backend to identify these easier (will be a follow-up PR).

return self.table.db_engine_spec

@property
def type_generic(self) -> Optional[utils.GenericDataType]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider renaming to get_generic_type gives an head warning that it will do some computation around it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is added to the column payload in the dataset request to complement the existing type field, so I think we need to keep it as a property.

@villebro villebro merged commit 97c9e37 into apache:master May 14, 2021
@villebro villebro deleted the villebro/memoize-db-engine-spec branch May 14, 2021 09:49
cccs-RyanS pushed a commit to CybercentreCanada/superset that referenced this pull request Dec 17, 2021
* perf: memoize db_engine_spec in sqla table classes

* remove extended cypress timeouts
QAlexBall pushed a commit to QAlexBall/superset that referenced this pull request Dec 29, 2021
* perf: memoize db_engine_spec in sqla table classes

* remove extended cypress timeouts
cccs-rc pushed a commit to CybercentreCanada/superset that referenced this pull request Mar 6, 2024
* perf: memoize db_engine_spec in sqla table classes

* remove extended cypress timeouts
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.3.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels preset-io size/M 🚢 1.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants