You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Creating Time-series Line Chart with high cardinality always times out due to inefficiencies in the pandas_postprocessing.pivot module.
The example below may seem slightly constructed, but I think it's likely that a Superset user will come across this at some point: They want to build a time-series chart and inadvertently create high cardinality groupings without setting a series limit. Currently, they will be confronted with a timeout and be non the wiser. With a minor optimization we can instead show them the data they requested and they can make a decision from there.
A better solution than a simple performance fix would be imo, if Superset would make a decision and apply a series limit for the user, but I figure that would be more of a feature request :)
Executing a pivot for with `drop_missing_columns=False` and lots of resulting columns can increase the postprocessing time by seconds or even minutes for large datasets.
The main culprit is `df.drop(...)` operation in the for loop. We can refactor this slightly, without any change to the results, and push down the postprocessing time
to seconds instead of minutes for large datasets (millions of columns).
Fixesapache#23464
This is likely fixed by now, and is pretty out of date if not. If people are still encountering this in current versions (3.x) please open a new Issue or a PR to address the problem.
Creating Time-series Line Chart with high cardinality always times out due to inefficiencies in the
pandas_postprocessing.pivot
module.The example below may seem slightly constructed, but I think it's likely that a Superset user will come across this at some point: They want to build a time-series chart and inadvertently create high cardinality groupings without setting a series limit. Currently, they will be confronted with a timeout and be non the wiser. With a minor optimization we can instead show them the data they requested and they can make a decision from there.
A better solution than a simple performance fix would be imo, if Superset would make a decision and apply a series limit for the user, but I figure that would be more of a feature request :)
How to reproduce the bug
cleaned_sales_data
contact_first_name, contact_last_name, phone
)Time-series Line Chart
Expected results
Chart should load within a few seconds
Actual results
Chart will time out or take a very long time
Screenshots
Environment
(please complete the following information):
2.0.1
,2.1.0rc3
and latestmaster@7ef06b0a6
3.8.13
Checklist
Make sure to follow these steps before submitting your issue - thank you!
Additional context
I will open a PR shortly and link to this issue.
The text was updated successfully, but these errors were encountered: