-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Remove prequery properties from query_obj #7896
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7896 +/- ##
==========================================
+ Coverage 56.95% 65.83% +8.88%
==========================================
Files 354 461 +107
Lines 10563 22210 +11647
Branches 2425 2425
==========================================
+ Hits 6016 14622 +8606
- Misses 4426 7467 +3041
Partials 121 121
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested in my dev environment on v33.0 and it looks good. LGTM
@smokemonster99 looking more closely this bug rquires delicate fixing, as the logic causing it is quite intricate. Will probably take a few days + a thorough round of review. In the meantime this WIP hack or your fix should be ok for your use case. |
@smokemonster99 this is now ready; Can you test this in your env to make sure everything works as intended? |
Thanks for digging deep on this one. This LGTM, but would like if @betodealmeida @agrawaldevesh could take a look. |
Yeah, this was some serious rabbit hole stuff 😅 |
@villebro I will be traveling for meetings next week, I’ll see if I can get to it mid or late next week if you haven’t merged by then. |
@smokemonster99 no prob and thanks for bringing this to our attention, good catch! |
) | ||
QueryStringExtended = namedtuple("QueryStringExtended", ["sql", "labels_expected"]) | ||
|
||
class SqlaQuery(NamedTuple): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, the new NamedTuple
are so much better, always hated the way the old ones were delcared...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a pass and it LGTM, feel free to merge AFAIC
Thanks for the review @mistercrunch ; I think this turned out to be less invasive than I had originally thought, so will go ahead and merge. |
* Create query_obj for every filter * Deprecate is_prequery and prequeries from query_obj * Fix tests * Fix typos and remove redundant ; from sql * Add typing to namedtuples and move all query str logic to one place * Fix unit test
CATEGORY
Choose one
SUMMARY
TL;DR:
There is a bug in how the
prequeries
property ofquery_obj
is constructed insqla/models.py
, which causes cache misses if there are more than one filter selected in FilterBox. This PR fixes that bug without introducing any (user facing) functional changes.Long explanation:
When non-join based prequeries were introduced in #4163, the properties
is_prequery
andprequeries
were introduced intoquery_obj
. This was done to be able to produce "Top N" filtering for engines that don't support joins, namely Druid (and later Pinot). However, prequeries are not actually needed in thequery_obj
(and latercache_dict
), as they are only an intermediary construct that can be deduced from thequery_obj
. Furthermore, prequeries were added toquery_obj
insqla_models.py
, making the calls non-idempotent and non-deterministic. Ideallyquery_obj
should never be mutated after it is constructed inviz.py
, as all the information necessary for the query should already be known at that time. This caused a caching bug in Filterbox, which runs as many queries as there are filters, and erroneously interpreted the previous queries as prequeries, causing bloatedprequeries
properties in thequery_obj
.When Pinot support was introduced in #6719, two named tuples (
SqlaQuery
andQueryStrExtended
) were introduced insqla/models.py
to facilitate returning more context for how a query that produces the resultingDataFrame
were produced. In the current codebase, this is the correct placeholder for prequeries. This PR removes bothis_prequery
andprequeries
fromquery_obj
, and addsprequeries
into theSqlaQuery
andQueryStrExtended
objects, enabling the functionality introduced in #4163.Functional changes:
prequeries
andis_prequery
fromquery_obj
.query_obj
toSqlaQuery
andQueryStrExtended
, ensuring thatquery_obj
stays unmutated during processing insqla/models.py
.Other refactoring:
inner_joins
inBaseEngineSpec
toallows_joins
(currently not necessary to distinguish between inner/outer/full joins). Also harmonized other similarly named properties, such asallows_subqueries
andallows_column_aliases
.SqlaQuery
andQueryStrExtended
.core.py
tosqla/models.py
.TEST PLAN
allows_joins
to bothTrue
andFalse
onsqlite
ADDITIONAL INFORMATION
REVIEWERS
@smokemonster99