Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Remove prequery properties from query_obj #7896

Merged
merged 6 commits into from
Jul 23, 2019
Merged

[Bugfix] Remove prequery properties from query_obj #7896

merged 6 commits into from
Jul 23, 2019

Conversation

villebro
Copy link
Member

@villebro villebro commented Jul 18, 2019

CATEGORY

Choose one

  • Bug Fix
  • Refactor

SUMMARY

TL;DR:

There is a bug in how the prequeries property of query_obj is constructed in sqla/models.py, which causes cache misses if there are more than one filter selected in FilterBox. This PR fixes that bug without introducing any (user facing) functional changes.

Long explanation:

When non-join based prequeries were introduced in #4163, the properties is_prequery and prequeries were introduced into query_obj. This was done to be able to produce "Top N" filtering for engines that don't support joins, namely Druid (and later Pinot). However, prequeries are not actually needed in the query_obj (and later cache_dict), as they are only an intermediary construct that can be deduced from the query_obj. Furthermore, prequeries were added to query_obj in sqla_models.py, making the calls non-idempotent and non-deterministic. Ideally query_obj should never be mutated after it is constructed in viz.py, as all the information necessary for the query should already be known at that time. This caused a caching bug in Filterbox, which runs as many queries as there are filters, and erroneously interpreted the previous queries as prequeries, causing bloated prequeries properties in the query_obj.

When Pinot support was introduced in #6719, two named tuples (SqlaQuery and QueryStrExtended) were introduced in sqla/models.py to facilitate returning more context for how a query that produces the resulting DataFrame were produced. In the current codebase, this is the correct placeholder for prequeries. This PR removes both is_prequery and prequeries from query_obj, and adds prequeries into the SqlaQuery and QueryStrExtended objects, enabling the functionality introduced in #4163.

Functional changes:

  • Removed prequeries and is_prequery from query_obj.
  • Moved prequery related handling from query_obj to SqlaQuery and QueryStrExtended, ensuring that query_obj stays unmutated during processing in sqla/models.py.

Other refactoring:

  • Renamed property inner_joins in BaseEngineSpec to allows_joins (currently not necessary to distinguish between inner/outer/full joins). Also harmonized other similarly named properties, such as allows_subqueries and allows_column_aliases.
  • Added typing to SqlaQueryand QueryStrExtended.
  • Moved some query formatting from core.py to sqla/models.py.

TEST PLAN

  • CI
  • Tested locally by setting allows_joins to both True and False on sqlite
  • Verified that no more cache misses occur when running Filterbox with more than one filter.

ADDITIONAL INFORMATION

REVIEWERS

@smokemonster99

@codecov-io
Copy link

codecov-io commented Jul 18, 2019

Codecov Report

Merging #7896 into master will increase coverage by 8.88%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7896      +/-   ##
==========================================
+ Coverage   56.95%   65.83%   +8.88%     
==========================================
  Files         354      461     +107     
  Lines       10563    22210   +11647     
  Branches     2425     2425              
==========================================
+ Hits         6016    14622    +8606     
- Misses       4426     7467    +3041     
  Partials      121      121
Impacted Files Coverage Δ
superset/viz.py 71.77% <ø> (ø)
superset/connectors/druid/models.py 82.2% <ø> (ø)
superset/common/query_object.py 27.77% <ø> (ø)
superset/connectors/sqla/models.py 83.5% <100%> (ø)
superset/db_engine_specs/druid.py 100% <100%> (ø)
superset/db_engine_specs/base.py 87.98% <100%> (ø)
superset/db_engine_specs/gsheets.py 100% <100%> (ø)
superset/models/core.py 82.95% <100%> (ø)
superset/db_engine_specs/pinot.py 68.96% <100%> (ø)
superset/views/core.py 71.93% <100%> (ø)
... and 107 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4568b2a...b2e03e1. Read the comment docs.

Copy link

@smokemonster99 smokemonster99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested in my dev environment on v33.0 and it looks good. LGTM

@villebro
Copy link
Member Author

@smokemonster99 looking more closely this bug rquires delicate fixing, as the logic causing it is quite intricate. Will probably take a few days + a thorough round of review. In the meantime this WIP hack or your fix should be ok for your use case.

@villebro villebro changed the title [WIP] Fix Filterbox cache misses [Bugfix] Remove prequery properties from query_obj Jul 20, 2019
@villebro
Copy link
Member Author

@smokemonster99 this is now ready; Can you test this in your env to make sure everything works as intended?

@mistercrunch
Copy link
Member

mistercrunch commented Jul 20, 2019

Thanks for digging deep on this one. This LGTM, but would like if @betodealmeida @agrawaldevesh could take a look.

@villebro
Copy link
Member Author

Yeah, this was some serious rabbit hole stuff 😅

@smokemonster99
Copy link

@villebro I will be traveling for meetings next week, I’ll see if I can get to it mid or late next week if you haven’t merged by then.

@villebro
Copy link
Member Author

@smokemonster99 no prob and thanks for bringing this to our attention, good catch!

)
QueryStringExtended = namedtuple("QueryStringExtended", ["sql", "labels_expected"])

class SqlaQuery(NamedTuple):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, the new NamedTuple are so much better, always hated the way the old ones were delcared...

Copy link
Member

@mistercrunch mistercrunch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a pass and it LGTM, feel free to merge AFAIC

@villebro
Copy link
Member Author

Thanks for the review @mistercrunch ; I think this turned out to be less invasive than I had originally thought, so will go ahead and merge.

@villebro villebro merged commit 07a76f8 into apache:master Jul 23, 2019
alex-mark pushed a commit to alex-mark/incubator-superset that referenced this pull request Jul 29, 2019
* Create query_obj for every filter

* Deprecate is_prequery and prequeries from query_obj

* Fix tests

* Fix typos and remove redundant ; from sql

* Add typing to namedtuples and move all query str logic to one place

* Fix unit test
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.34.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 0.34.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filter Box Caching Incorrectly for Multi-Query Use Case
4 participants