[WIP] fix: ignore aliases on where clauses #9854

villebro · 2020-05-20T07:07:20Z

SUMMARY

On BigQuery, some WHERE clauses on nested fields of RECORD type columns were failing due to aliases being added to all columns in the WHERE clause. This is in violation of the BigQuery spec, but also more generally inconsistent with the ANSI SQL spec:

You cannot reference column aliases from the SELECT list in the WHERE clause.

Source: https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#where_clause

Standard SQL disallows references to column aliases in a WHERE clause. This restriction is imposed because when the WHERE clause is evaluated, the column value may not yet have been determined.

Source: https://dev.mysql.com/doc/refman/8.0/en/problems-with-alias.html

While the BigQuery SqlAlchemy dialect should ignore the aliases automatically (see the linked issue and PR on pybigquery), this PR disables aliasing in WHERE clauses for all engines, and fixes the immediate problem for BigQuery. I propose getting this merged ASAP to fix the observed BigQuery bug, but recommend adding unit tests later for this as we add proper tests for officially supported database engines.

Links to related pybigquery issues/PRs:

BEFORE

Note the alias being referenced in the WHERE clause

AFTER

Note the original field name being referenced in the WHERE clause, with the alias being referenced by the GROUPBY clause as per ANSI SQL spec.

TEST PLAN

CI

ADDITIONAL INFORMATION

Has associated issue: closes Filter generated alias cause bigquery error: 400 Unrecognized name #9836
Changes UI
Requires DB Migration.
Confirm DB Migration upgrade and downgrade tested.
Introduces new feature or API
Removes existing feature or API

FYI: @esilver

villebro · 2020-05-20T07:09:32Z

superset/connectors/sqla/models.py

-            if df is not None and not df.empty:
+            if df is not None:


During testing I noticed that column names were not being renamed back to their original format if the dataframe was empty.

Somewhat unrelated but I thought we banished df being None.

codecov-commenter · 2020-05-20T07:13:36Z

Codecov Report

Merging #9854 into master will increase coverage by 0.12%.
The diff coverage is 88.23%.

@@            Coverage Diff             @@
##           master    #9854      +/-   ##
==========================================
+ Coverage   71.02%   71.14%   +0.12%     
==========================================
  Files         583      588       +5     
  Lines       30634    30794     +160     
  Branches     3165     3238      +73     
==========================================
+ Hits        21758    21909     +151     
- Misses       8765     8774       +9     
  Partials      111      111

Flag	Coverage Δ
#cypress	`54.05% <ø> (+0.55%)`	⬆️
#javascript	`59.30% <ø> (-0.01%)`	⬇️
#python	`71.27% <88.23%> (+<0.01%)`	⬆️

Impacted Files	Coverage Δ
superset/connectors/sqla/models.py	`88.63% <88.23%> (+0.03%)`	⬆️
...explore/components/AdhocMetricEditPopoverTitle.jsx	`84.61% <0.00%> (-8.72%)`	⬇️
...ponents/AdhocFilterEditPopoverSimpleTabContent.jsx	`86.50% <0.00%> (-7.20%)`	⬇️
.../src/explore/components/AdhocFilterEditPopover.jsx	`74.46% <0.00%> (-4.26%)`	⬇️
...rontend/src/visualizations/FilterBox/FilterBox.jsx	`70.83% <0.00%> (-3.34%)`	⬇️
.../src/dashboard/components/gridComponents/Chart.jsx	`87.64% <0.00%> (-2.25%)`	⬇️
.../src/explore/components/controls/SelectControl.jsx	`93.33% <0.00%> (-1.97%)`	⬇️
...erset-frontend/src/components/ListView/Filters.tsx	`88.88% <0.00%> (-1.68%)`	⬇️
...t-frontend/src/dashboard/actions/dashboardState.js	`58.00% <0.00%> (-1.34%)`	⬇️
superset-frontend/src/explore/controls.jsx	`80.95% <0.00%> (-1.20%)`	⬇️
... and 43 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c117e22...46ae952. Read the comment docs.

john-bodley · 2020-05-20T13:57:50Z

superset/connectors/sqla/models.py

@@ -167,15 +167,26 @@ def is_temporal(self) -> bool:
            self.type, utils.DbColumnType.TEMPORAL
        )

-    def get_sqla_col(self, label: Optional[str] = None) -> Column:
-        label = label or self.column_name
+    def get_sqla_col(


I’m kind of surprised that SQLAlchemy isn’t dealing with the labels correctly. Are we not constructing the SQL query in the appropriate manner?

Normally the dialect should do this, but apparently the BQ dialect isn't handling this properly (it does it correctly for all other operators except LIKE).

@villebro do you think there is merit in trying to fix the BigQuery DB-API instead, i.e., overriding the visit_label method? Note I've had to do this for a custom dialect.

This would ensure that i) this would be fixed for all BigQuery use cases (and not just those emanating from Superset), and ii) reduce the need for custom Superset overrides.

I thought about it, but laziness got the best of me. But why not, let me poke at it real quick and see if it can be fixed there.

I've played with this in the past by forcing render_label_as_label=None when calling the super(). I think this may be the issue.

Thanks for the pointers @john-bodley . I just forked it, will hack on it this weekend.

john-bodley · 2020-05-20T13:59:01Z

superset/connectors/sqla/models.py

-            if df is not None and not df.empty:
+            if df is not None:


Somewhat unrelated but I thought we banished df being None.

villebro · 2020-06-10T06:26:54Z

@esilver I will be closing this for now, as this is a bug in pybigquery, hence it should be fixed there. I haven't gotten around to crafting a fix yet, but will try to find the time over the coming months.

esilver · 2020-06-24T02:06:32Z

makes sense, thanks!

fix: ignore aliases on where clauses

46ae952

superset-github-bot bot added the preset-io label May 20, 2020

pull-request-size bot added the size/M label May 20, 2020

villebro requested a review from john-bodley May 20, 2020 07:07

villebro commented May 20, 2020

View reviewed changes

john-bodley reviewed May 20, 2020

View reviewed changes

villebro changed the title ~~fix: ignore aliases on where clauses~~ [WIP] fix: ignore aliases on where clauses May 21, 2020

villebro closed this Jun 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] fix: ignore aliases on where clauses #9854

[WIP] fix: ignore aliases on where clauses #9854

villebro commented May 20, 2020 •

edited

Loading

villebro May 20, 2020

john-bodley May 20, 2020

codecov-commenter commented May 20, 2020 •

edited

Loading

john-bodley May 20, 2020

villebro May 20, 2020

john-bodley May 20, 2020 •

edited

Loading

villebro May 20, 2020

john-bodley May 20, 2020 •

edited

Loading

villebro May 21, 2020

john-bodley May 20, 2020

villebro commented Jun 10, 2020

esilver commented Jun 24, 2020

[WIP] fix: ignore aliases on where clauses #9854

[WIP] fix: ignore aliases on where clauses #9854

Conversation

villebro commented May 20, 2020 • edited Loading

SUMMARY

BEFORE

AFTER

TEST PLAN

ADDITIONAL INFORMATION

villebro May 20, 2020

Choose a reason for hiding this comment

john-bodley May 20, 2020

Choose a reason for hiding this comment

codecov-commenter commented May 20, 2020 • edited Loading

Codecov Report

john-bodley May 20, 2020

Choose a reason for hiding this comment

villebro May 20, 2020

Choose a reason for hiding this comment

john-bodley May 20, 2020 • edited Loading

Choose a reason for hiding this comment

villebro May 20, 2020

Choose a reason for hiding this comment

john-bodley May 20, 2020 • edited Loading

Choose a reason for hiding this comment

villebro May 21, 2020

Choose a reason for hiding this comment

john-bodley May 20, 2020

Choose a reason for hiding this comment

villebro commented Jun 10, 2020

esilver commented Jun 24, 2020

villebro commented May 20, 2020 •

edited

Loading

codecov-commenter commented May 20, 2020 •

edited

Loading

john-bodley May 20, 2020 •

edited

Loading

john-bodley May 20, 2020 •

edited

Loading