-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(query): order by adhoc metrics should trigger group by #13434
Conversation
Lots of great improvements here (I'll add review comments later). It seems the root problem here is that the selected column isn't being added to the groupby clause, not the aggregate expression being missing from the select. Also, one thing worthwhile considering is not all db engines require having the order by aggregate expression in the selects:
While I don't see a lot of use cases where omitting the aggregate in the select, it's not strictly always necessary, and could save unnecessary network bandwidth when not needed. So instead of always adding it to the selects, we might want to introduce a parameter in |
Thanks for the headsup, @villebro ! Another thing I noticed is that it's not always correct to assume "columns" with no "metrics" requires no aggregation. E.g. when table chart selects Therefore we can not really deprecate |
Yes, I've been planning on that - I was thinking along the lines of an |
I'm trying to imagine what would be a useful case in exposing Note that in some data engines (e.g. Presto), there are performance implications in using I'm OK with either way, but would hope we can keep the interface simple and avoid the possibility of getting the same results with different query configs. |
I was actually contemplating sending the |
I'm not super passionate about adding distinct support, so if there's any risk of it causing added complexity or getting in the way I'm ok with just supporting group bys. Same goes for adding vs removing order bys from the select - I'm fine adding them in the select, and agree that it's good to have them around for validation purposes (the real optimizations are probably elsewhere anyways, like using binary protocols, streaming etc). But in general I'm in the camp of "if a feature can easily be added, why not", and tend to prefer to avoid adding limitations or forcing features if there are no clear motivations for doing so. |
I just removed the logic of adding order by only metrics to select expressions and it still works at least for Postgres. We can decide if we want to expose these hidden columns later when there are clearer actual feature requests. |
Codecov Report
@@ Coverage Diff @@
## master #13434 +/- ##
==========================================
- Coverage 77.41% 72.63% -4.79%
==========================================
Files 918 918
Lines 46673 46674 +1
Branches 5720 5720
==========================================
- Hits 36132 33901 -2231
- Misses 10405 12557 +2152
- Partials 136 216 +80
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think there is still a way to fully deprecate groupby
. We just assume group by
queries whenever metrics
is not None
. An empty metrics
array should also trigger group by.
This could be a somewhat dangerous change and would require more testing.
superset/utils/core.py
Outdated
if item_key not in seen: | ||
seen.add(item_key) | ||
result.append(item) | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is not used by this PR anymore, but I'm keeping it here for convenience.
else metric["label"] # type: ignore | ||
for metric in metrics | ||
self.metrics = metrics and [ | ||
x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use x
in case some may wonder whether metric
is also from the parent context (e.g. an __init__
argument).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd personally prefer metric_
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always prefer x
in list comprehensions and lambda functions because it's simpler and reads more "local".
8608c6e
to
332ba4d
Compare
Presto tests seem to be failing because of this error: |
2753c8d
to
4bb6545
Compare
) * fix(query): properly select adhoc metrics in orderby * Throw error when sql is empty * Allow `metrics` to be None * Always use alias in orderby for metrics * Bump table chart version and migrate histogram to typescript * Fix Histogram without groupby * Fix Presto birth names test * Raw records mode should not aggregate
SUMMARY
Update SQL query generator to properly handle adhoc metrics in order by. The idea is collect adhoc metrics in
orderby
before deciding whether to apply groupby. Closes #13465 .Also assume it is a
GROUP BY
query unlessmetrics
to set toNone
. Depends on: apache-superset/superset-ui#995 (released in@superset-ui/core
v0.17.18
)BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
Before
After
Correctly generates query for group by and sort by without metrics.
TEST PLAN
Unit tests and a new Cypress test
ADDITIONAL INFORMATION