Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: preserve qualifiers when rewriting expressions #12341

Merged
merged 3 commits into from
Sep 6, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion datafusion/expr/src/expr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ use sqlparser::ast::{
pub enum Expr {
/// An expression with a specific name.
Alias(Alias),
/// A named reference to a qualified filed in a schema.
/// A named reference to a qualified field in a schema.
Column(Column),
/// A named reference to a variable in a registry.
ScalarVariable(DataType, Vec<String>),
Expand Down Expand Up @@ -1115,6 +1115,19 @@ impl Expr {
SchemaDisplay(self)
}

/// Returns the qualifier and the schema name of this expression.
///
/// Used when the expression forms the output field of a certain plan.
/// The result is the field's qualifier and field name in the plan's
/// output schema. We can use this qualified name to reference the field.
pub fn qualified_name(&self) -> (Option<TableReference>, String) {
match self {
Expr::Column(Column { relation, name }) => (relation.clone(), name.clone()),
Expr::Alias(Alias { relation, name, .. }) => (relation.clone(), name.clone()),
_ => (None, self.schema_name().to_string()),
}
}

/// Returns a full and complete string representation of this expression.
#[deprecated(note = "use format! instead")]
pub fn canonical_name(&self) -> String {
Expand Down
14 changes: 9 additions & 5 deletions datafusion/expr/src/expr_rewriter/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -303,9 +303,11 @@ pub struct NamePreserver {
use_alias: bool,
}

type QualifiedName = (Option<TableReference>, String);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally suggest creating an enum to make this more explicit (rather than two level of options)-- perhaps something like

pub enum SavedName {
  ///  name is not preserved
  None, 
  /// qualified name is preserved
  Saved {
    relation: QualifiedName,
    name: String,
  }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


/// If the name of an expression is remembered, it will be preserved when
/// rewriting the expression
pub struct SavedName(Option<String>);
pub struct SavedName(Option<QualifiedName>);

impl NamePreserver {
/// Create a new NamePreserver for rewriting the `expr` that is part of the specified plan
Expand All @@ -326,7 +328,7 @@ impl NamePreserver {

pub fn save(&self, expr: &Expr) -> Result<SavedName> {
let original_name = if self.use_alias {
Some(expr.name_for_alias()?)
Some(expr.qualified_name())
} else {
None
};
Expand All @@ -336,12 +338,14 @@ impl NamePreserver {
}

impl SavedName {
/// Ensures the name of the rewritten expression is preserved
/// Ensures the qualified name of the rewritten expression is preserved
pub fn restore(self, expr: Expr) -> Result<Expr> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value no longer needs Result, but removing it breaks many optimization rules. Maybe we can handle it in the next PR.

let Self(original_name) = self;
match original_name {
Some(name) => expr.alias_if_changed(name),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seems like we could potentially also remove Expr::alias_if_changed which doesn't properly account for qualifiers 🤔

The only other use of it seems to be in

pub fn rewrite_preserving_name<R>(expr: Expr, rewriter: &mut R) -> Result<Expr>
which itself is only used in tests 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alias_if_changed is being used by substrait. Maybe we can review this later.

rewrite_preserving_name has been removed.

None => Ok(expr),
Some(old_name) if expr.qualified_name() != old_name => {
Ok(expr.alias_qualified(old_name.0, old_name.1))
}
_ => Ok(expr),
}
}
}
Expand Down
35 changes: 6 additions & 29 deletions datafusion/expr/src/expr_schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -462,35 +462,12 @@ impl ExprSchemable for Expr {
&self,
input_schema: &dyn ExprSchema,
) -> Result<(Option<TableReference>, Arc<Field>)> {
match self {
Expr::Column(c) => {
let (data_type, nullable) = self.data_type_and_nullable(input_schema)?;
Ok((
c.relation.clone(),
Field::new(&c.name, data_type, nullable)
.with_metadata(self.metadata(input_schema)?)
.into(),
))
}
Expr::Alias(Alias { relation, name, .. }) => {
let (data_type, nullable) = self.data_type_and_nullable(input_schema)?;
Ok((
relation.clone(),
Field::new(name, data_type, nullable)
.with_metadata(self.metadata(input_schema)?)
.into(),
))
}
_ => {
let (data_type, nullable) = self.data_type_and_nullable(input_schema)?;
Ok((
None,
Field::new(self.schema_name().to_string(), data_type, nullable)
.with_metadata(self.metadata(input_schema)?)
.into(),
))
}
}
let (relation, schema_name) = self.qualified_name();
let (data_type, nullable) = self.data_type_and_nullable(input_schema)?;
let field = Field::new(schema_name, data_type, nullable)
.with_metadata(self.metadata(input_schema)?)
.into();
Ok((relation, field))
}

/// Wraps this expression in a cast to a target [arrow::datatypes::DataType].
Expand Down
15 changes: 15 additions & 0 deletions datafusion/sqllogictest/test_files/select.slt
Original file line number Diff line number Diff line change
Expand Up @@ -1685,6 +1685,9 @@ SELECT i + i FROM test WHERE i > 2;
----
6

statement ok
DROP TABLE test;

query error DataFusion error: Arrow error: Parser error: Error parsing timestamp from 'I AM NOT A TIMESTAMP': error parsing date
SELECT to_timestamp('I AM NOT A TIMESTAMP');

Expand Down Expand Up @@ -1741,3 +1744,15 @@ select a from t;

statement ok
set datafusion.optimizer.max_passes=3;

# Test issue: https://github.com/apache/datafusion/issues/12183
statement ok
CREATE TABLE test(a BIGINT) AS VALUES (1);

query I
SELECT "test.a" FROM (SELECT a AS "test.a" FROM test)
----
1

statement ok
DROP TABLE test;