-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement qualified expression alias #8008
Comments
To make this more concrete, here's a test (e.g. in #[test]
fn plan_schema_changed() -> Result<()> {
let original_plan = test_table_scan()?;
// Transformed plan, that has columns renamed
let new_plan = LogicalPlanBuilder::from(original_plan.clone())
.aggregate(vec![col("a")], vec![max(col("b")), min(col("c"))])?
.build()?;
// Attempt to rename columns to original
let projection_plan = LogicalPlanBuilder::from(new_plan)
.project(vec![
col("a"),
col("MAX(test.b)").alias("b"),
col("MIN(test.c)").alias("c"),
])?
.build()?;
// This will fail since the qualifiers are stripped when building a schema from any expression other than `Expr::Column`
assert_eq!(
original_plan.schema(),
projection_plan.schema(),
);
Ok(())
} This fails with:
Note that above the functional dependencies are also different, but that is not what is being asserted in-between optimization runs. |
Somehow, Projection doesn't seem to have this problem. Is the problem in your example that the original schema is Shouldn't it be something more like this (I can't remember if you have to explicitly build qualified identifiers specially): // Attempt to rename columns to original
let projection_plan = LogicalPlanBuilder::from(new_plan)
.project(vec![
col("a"),
col("MAX(test.b)").alias("test.b"),
col("MIN(test.c)").alias("test.c"),
])?
.build()?; |
Yeah I think this problem only surfaces when an initial plan gets transformed into a combination of other plans (one of which is an aggregation) during optimizations, such as in the case of
Not really, because the name of the alias always gets crammed into the
In fact that's why I opened this issue, because I see no way of building qualified fields via expressions currently.
diff --git a/datafusion/expr/src/expr_schema.rs b/datafusion/expr/src/expr_schema.rs
index 025b74eb5..61d1ffe53 100644
--- a/datafusion/expr/src/expr_schema.rs
+++ b/datafusion/expr/src/expr_schema.rs
@@ -295,6 +295,13 @@ impl ExprSchemable for Expr {
self.nullable(input_schema)?,
)
.with_metadata(self.metadata(input_schema)?)),
+ Expr::QualifiedAlias(qa) => Ok(DFField::new(
+ qa.relation.clone(),
+ &qa.name,
+ self.get_type(input_schema)?,
+ self.nullable(input_schema)?,
+ )
+ .with_metadata(self.metadata(input_schema)?)),
_ => Ok(DFField::new_unqualified(
&self.display_name()?,
self.get_type(input_schema)?, and so the correct projection would look something like: let projection_plan = LogicalPlanBuilder::from(new_plan)
.project(
new_plan
.schema()
.fields()
.iter()
.zip(original_plan.schema().fields().iter())
.map(|(new_field, old_field)| {
col(new_field.name())
.alias_qualified(old_field.qualifier(), old_field.name())
}),
)?
.build()?; |
I think one reason causing this issue is that the qualifier is sometimes present in the expr name (or column name), and sometimes is not:
In the previous example:
The first column's name is From this point, maybe |
Actually the qualifier is never inferred from an expression name, as demonstrated in
In case of projecting
Yeah, having both might be confusing (was inspired by |
I think the reasoning in this ticket makes sense to me. Thank you for raising it @gruuya I do agree with @waynexia that adding a Adding such a field would, of course, as @gruuya says, force us to go through many locations in the code to add handling of the new Perhaps @jackwener or @liukun4515 or @ozankabak have some additional thoughts to share |
There is also a mailing list thread about this: https://lists.apache.org/thread/9bvwjsxrt0o5tqfyf47wfw81z0kjgp7o. I am copying what @jdye64 said here so the discussion is in the same place:
|
BTW here is a proposal for removing |
Mailing List Thread: https://lists.apache.org/thread/9bvwjsxrt0o5tqfyf47wfw81z0kjgp7o
Is your feature request related to a problem or challenge?
The problem I'm encountering is related to #7981. Namely:
exprlist_to_fields
to generate the initial schema, however this function will always result in a unqualified field forExpr::Alias
unlike forExpr::Column
, hence the schemas can never matchDescribe the solution you'd like
Introduce a new enum variant along the lines of:
or alternatively extend the existing alias expression to accommodate for an optional relation:
This would allow for 1-1 mapping between fields and column aliases.
Describe alternatives you've considered
Everything else seems like a hack:
DFSchema::strip_qualifiers
when building the schema in plan constructors; this will result in conflicts for joinsExprSchemable::to_field
try to parse the qualifer and name from an alias name; seems error prone, confusing and potentially problematic for column names with dotsAdditional context
It seems like previously this could have been done with
Projection::try_new_with_schema
, but it looks like the overriding schema isn't being propagated through the optimizers after #7919.The text was updated successfully, but these errors were encountered: