Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Explore]Ensure ad-hoc metrics are not named the same as an existing column #13426

Closed
3 tasks
john-bodley opened this issue Mar 3, 2021 · 4 comments · Fixed by #13739
Closed
3 tasks

[Explore]Ensure ad-hoc metrics are not named the same as an existing column #13426

john-bodley opened this issue Mar 3, 2021 · 4 comments · Fixed by #13739
Labels
#bug:regression Bugs that are identified as regessions explore:metrics Related to metrics of Explore

Comments

@john-bodley
Copy link
Member

We should prevent ad-hoc metrics from being named the same as existing columns in the datasource as depending on the engine the generated SQL could result in a projection issue given it's not evident which column is being referred to.

Expected results

Ensure all metric names are unique to prevent such an issue.

Actual results

Presto queries of the form (generated using an ad-hoc query with the metric named cost),

SELECT
    SUM(cost) AS "cost"
FROM
    ...
ORDER BY
    SUM(cost)

result in the database error Invalid reference to output projection attribute from ORDER BY aggregation.

Screenshots

If applicable, add screenshots to help explain your problem.

How to reproduce the bug

  1. Go to explore
  2. Create an ad-hoc metric which is named the same as an existing column in the datasource.
  3. Add the same ad-hoc metric to the SORT BY.
  4. Click RUN.
  5. See the error.

Environment

(please complete the following information):

  • superset version: master
  • python version: 3.7
  • node.js version: node -v

Checklist

Make sure to follow these steps before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.
@john-bodley john-bodley added the #bug Bug report label Mar 3, 2021
@john-bodley john-bodley added the #bug:regression Bugs that are identified as regessions label Mar 3, 2021
@zuzana-vej zuzana-vej added explore:metrics Related to metrics of Explore and removed #bug Bug report labels Mar 3, 2021
@ktmud
Copy link
Member

ktmud commented Mar 3, 2021

Maybe the SQL query generator should prefix all adhoc metric/sort by with table names instead:

SELECT
    SUM(tbl.cost) AS "cost"
FROM tbl
ORDER BY
    SUM(tbl.cost)

@junlincc junlincc changed the title Ensure ad-hoc metrics are not named the same as an existing column [Explore]Ensure ad-hoc metrics are not named the same as an existing column Mar 3, 2021
@villebro
Copy link
Member

villebro commented Mar 3, 2021

Good idea @ktmud , should not be difficult to implement

@john-bodley
Copy link
Member Author

john-bodley commented Mar 3, 2021

I’m not sure I agree, i.e., one shouldn’t need to include a table alias in order for these queries to work. Additionally it might not be apparent to the user from the UI what they’re actually doing given there are multiple “columns” with the same name. Furthermore each engine has a perspective on how this should be treated (and how the ORDER BY clause should be composed, i.e., in this case it may have wanted the alias as opposed to the expression in the clause).

Finally this used to work and thus it seems there’s been a regression somewhere (possibly related to the other linked issue) and thus I think we should first try to track that down.

I guess the TL;DR from me is we should strive for defining concise SQL and let the various SQLAlchemy dialects deal with the SQL nuances.

@ktmud
Copy link
Member

ktmud commented Mar 11, 2021

@john-bodley AdhocMetrics are always created from the datasource columns so I think from a user's perspective, it is expected that the aggregation should apply only to known columns of the underlying table (be it virtual or physical)---not another ad-hoc metric or derived column.

A table name prefix will be totally transparent to the UI and users would not need to do anything extra. It is needed to create a concise & correct SQL query because in this case we obviously didn't give SQLAlchemy enough hints to generate unambiguous SQL queries for all data engines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
#bug:regression Bugs that are identified as regessions explore:metrics Related to metrics of Explore
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants