-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: memoize db_engine_spec in database #14638
perf: memoize db_engine_spec in database #14638
Conversation
083d3b5
to
1ff8ab7
Compare
Codecov Report
@@ Coverage Diff @@
## master #14638 +/- ##
==========================================
- Coverage 77.47% 77.38% -0.10%
==========================================
Files 958 958
Lines 48486 48480 -6
Branches 5679 5683 +4
==========================================
- Hits 37565 37514 -51
- Misses 10721 10766 +45
Partials 200 200
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
1ff8ab7
to
5e28bb6
Compare
5e28bb6
to
2f38ef5
Compare
Can we remove the cypress timeout overrides introduced in the other PR? |
Good idea; I'll do that. I've also been looking into introducing perf tests on the backend to identify these easier (will be a follow-up PR). |
return self.table.db_engine_spec | ||
|
||
@property | ||
def type_generic(self) -> Optional[utils.GenericDataType]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider renaming to get_generic_type
gives an head warning that it will do some computation around it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is added to the column payload in the dataset request to complement the existing type
field, so I think we need to keep it as a property.
* perf: memoize db_engine_spec in sqla table classes * remove extended cypress timeouts
* perf: memoize db_engine_spec in sqla table classes * remove extended cypress timeouts
* perf: memoize db_engine_spec in sqla table classes * remove extended cypress timeouts
SUMMARY
A recent PR #14547 introduced a performance regression causing dataset metadata fetching to become very slow for datasets with large numbers of columns. I originally thought the type regexes were the problem, but when researching the problem more closely it turns out that just referencing
self.table.database.db_engine_spec
in aTableColumn
instance cost ~6ms on my local machine. Multiply that by 1000 columns ~= 6000 ms. To get around this I added memoization to the semi-expensive regex, but also added memoizing forDatabase.db_engine_spec
. This should also speed up query rendering a bit, as there was similar logic there.BEFORE #14547 (pre-regression)
For the World Bank dataset (328 cols), fetching the data took slightly less than 180ms before on my local machine (including the unnecessary 20 ms redirect):
CURRENT (master)
For the same dataset, retrieval of data now takes ~10s!
AFTER
Retrieval is now slightly quicker than originally (including no redirect):
TEST PLAN
ADDITIONAL INFORMATION