-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] fix: ignore aliases on where clauses #9854
Conversation
if df is not None and not df.empty: | ||
if df is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During testing I noticed that column names were not being renamed back to their original format if the dataframe was empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat unrelated but I thought we banished df
being None
.
Codecov Report
@@ Coverage Diff @@
## master #9854 +/- ##
==========================================
+ Coverage 71.02% 71.14% +0.12%
==========================================
Files 583 588 +5
Lines 30634 30794 +160
Branches 3165 3238 +73
==========================================
+ Hits 21758 21909 +151
- Misses 8765 8774 +9
Partials 111 111
Continue to review full report at Codecov.
|
@@ -167,15 +167,26 @@ def is_temporal(self) -> bool: | |||
self.type, utils.DbColumnType.TEMPORAL | |||
) | |||
|
|||
def get_sqla_col(self, label: Optional[str] = None) -> Column: | |||
label = label or self.column_name | |||
def get_sqla_col( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m kind of surprised that SQLAlchemy isn’t dealing with the labels correctly. Are we not constructing the SQL query in the appropriate manner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally the dialect should do this, but apparently the BQ dialect isn't handling this properly (it does it correctly for all other operators except LIKE).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@villebro do you think there is merit in trying to fix the BigQuery DB-API instead, i.e., overriding the visit_label
method? Note I've had to do this for a custom dialect.
This would ensure that i) this would be fixed for all BigQuery use cases (and not just those emanating from Superset), and ii) reduce the need for custom Superset overrides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it, but laziness got the best of me. But why not, let me poke at it real quick and see if it can be fixed there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've played with this in the past by forcing render_label_as_label=None
when calling the super()
. I think this may be the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointers @john-bodley . I just forked it, will hack on it this weekend.
if df is not None and not df.empty: | ||
if df is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat unrelated but I thought we banished df
being None
.
@esilver I will be closing this for now, as this is a bug in |
makes sense, thanks! |
SUMMARY
On BigQuery, some WHERE clauses on nested fields of RECORD type columns were failing due to aliases being added to all columns in the WHERE clause. This is in violation of the BigQuery spec, but also more generally inconsistent with the ANSI SQL spec:
Source: https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#where_clause
Source: https://dev.mysql.com/doc/refman/8.0/en/problems-with-alias.html
While the BigQuery SqlAlchemy dialect should ignore the aliases automatically (see the linked issue and PR on
pybigquery
), this PR disables aliasing in WHERE clauses for all engines, and fixes the immediate problem for BigQuery. I propose getting this merged ASAP to fix the observed BigQuery bug, but recommend adding unit tests later for this as we add proper tests for officially supported database engines.Links to related
pybigquery
issues/PRs:BEFORE
Note the alias being referenced in the WHERE clause
AFTER
Note the original field name being referenced in the WHERE clause, with the alias being referenced by the GROUPBY clause as per ANSI SQL spec.
TEST PLAN
CI
ADDITIONAL INFORMATION
FYI: @esilver