Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(optimizer): optimize_projections rule bug #12184

Closed
wants to merge 2 commits into from

Conversation

JasonLi-cn
Copy link
Contributor

Which issue does this PR close?

Closes #12183

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label Aug 27, 2024
@jayzhan211
Copy link
Contributor

The plan looks like aliasing to the same name after optimize_projections

    Projection: test.a AS test.a
      TableScan: test projection=[a]

I wonder could we even remove the alias here, since the name is the same

@JasonLi-cn
Copy link
Contributor Author

The plan looks like aliasing to the same name after optimize_projections

    Projection: test.a AS test.a
      TableScan: test projection=[a]

I wonder could we even remove the alias here, since the name is the same

Removing the Alias will cause a schema change. 🤔

@jayzhan211
Copy link
Contributor

jayzhan211 commented Aug 27, 2024

The plan looks like aliasing to the same name after optimize_projections

    Projection: test.a AS test.a
      TableScan: test projection=[a]

I wonder could we even remove the alias here, since the name is the same

Removing the Alias will cause a schema change. 🤔

That is why I think the root cause might be the incorrect schema we have.

SELECT "test.a" FROM (SELECT a AS "test.a" FROM test)

The expected schema should be something like.

DFSchema { inner: Schema { fields: [Field { name: "test.a", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "test" })], functional_dependencies: FunctionalDependencies { deps: [] } }.

a is aliased to "test.a". The column should be test."test.a"

Upd:

query I
SELECT a FROM (SELECT a FROM test)
----
1

This is the schema for the query without alias

schema: DFSchema { inner: Schema { fields: [Field { name: "a", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "test" })], functional_dependencies: FunctionalDependencies { deps: [] } }
query I
SELECT b FROM (SELECT a as b FROM test)
----
1

This is the schema with alias

schema: DFSchema { inner: Schema { fields: [Field { name: "b", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [None], functional_dependencies: FunctionalDependencies { deps: [] } }

field_qualifiers is empty for alias case. Does that mean we didn't preserve field_qualifiers for alias. 🤔

field_qualifiers: [Some(Bare { table: "test" })]

@JasonLi-cn
Copy link
Contributor Author

The plan looks like aliasing to the same name after optimize_projections

    Projection: test.a AS test.a
      TableScan: test projection=[a]

I wonder could we even remove the alias here, since the name is the same

Removing the Alias will cause a schema change. 🤔

That is why I think the root cause might be the incorrect schema we have.

SELECT "test.a" FROM (SELECT a AS "test.a" FROM test)

The expected schema should be something like.

DFSchema { inner: Schema { fields: [Field { name: "test.a", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "test" })], functional_dependencies: FunctionalDependencies { deps: [] } }.

a is aliased to "test.a". The column should be test."test.a"

Upd:

query I
SELECT a FROM (SELECT a FROM test)
----
1

This is the schema for the query without alias

schema: DFSchema { inner: Schema { fields: [Field { name: "a", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "test" })], functional_dependencies: FunctionalDependencies { deps: [] } }
query I
SELECT b FROM (SELECT a as b FROM test)
----
1

This is the schema with alias

schema: DFSchema { inner: Schema { fields: [Field { name: "b", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [None], functional_dependencies: FunctionalDependencies { deps: [] } }

field_qualifiers is empty for alias case. Does that mean we didn't preserve field_qualifiers for alias. 🤔

field_qualifiers: [Some(Bare { table: "test" })]

Yes. Possibly because the b field itself is not part of the test table? 🤔
If we make the following changes to the SQL, the b field has field_qualifiers: [Some(Bare { table: "t0" })]

query I
SELECT b FROM (SELECT a as b FROM test) t0
----
1
schema: DFSchema { inner: Schema { fields: [Field { name: "b", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "t0" })], functional_dependencies: FunctionalDependencies { deps: [] } }

@findepi
Copy link
Member

findepi commented Aug 27, 2024

I wonder could we even remove the alias here, since the name is the same

due to my subjective recency bias, #1468 seems related.
would it be easier if output schema (column names) were explicit?

@jayzhan211
Copy link
Contributor

I wonder could we even remove the alias here, since the name is the same

due to my subjective recency bias, #1468 seems related. would it be easier if output schema (column names) were explicit?

Not pretty sure, I think they are unrelated 🤔

I think the issue here is that we lost the qualifier for alias in subquery case

@jonahgao
Copy link
Member

jonahgao commented Aug 27, 2024

It looks like we need to save the qualifiers in the NamePreserver 🤔, otherwise, we won't be able to distinguish between the unqualified column "test.a" and the qualified column test.a. Their schema_names are the same.

@jayzhan211
Copy link
Contributor

Should we get Column's relation for Alias in to_field?

            Expr::Alias(Alias { expr, name, .. }) => {
                let relation = match expr.as_ref() {
                    Expr::Column(c) => c.relation.to_owned(),
                    _ => None
                };

                let (data_type, nullable) = self.data_type_and_nullable(input_schema)?;
                Ok((
                    relation,
                    Field::new(name, data_type, nullable)
                        .with_metadata(self.metadata(input_schema)?)
                        .into(),
                ))
            }

@jonahgao
Copy link
Member

Should we get Column's relation for Alias in to_field?

I think we should directly use the relation from the Alias itself, as that is the purpose of an Alias expression. After an expression is rewritten, by adding an Alias we can ensure that the result of to_field remains the same, including the qualifier and name, thus maintaining the schema unchanged.

@alamb
Copy link
Contributor

alamb commented Sep 5, 2024

Here is an alternate implementation proposed by @jonahgao : #12341

@JasonLi-cn
Copy link
Contributor Author

Here is an alternate implementation proposed by @jonahgao : #12341

thanks @jonahgao @jayzhan211 @alamb

@JasonLi-cn JasonLi-cn closed this Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug triggered by special aliases in nested queries
5 participants