-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
include input fields as output for Substrait consumer #12225
Conversation
FYI @Blizzara |
I don't feel this is right - unless I misunderstood something. This tries to fix the problem at the root level of the plan, so that we'd produce the columns the plan asked for, and it works in this test case where there is only a single project in the plan, but I don't think it would work for any more complicated plan. I think the proper fix would be around
exprs list (and also names I guess) before adding the new expressions. Or something along those lines
|
Thanks for the suggrestions! I'll take a try and adding more test cases. In the begining I was going to directly change the initisal expressions in |
b5dab19
to
4a8a158
Compare
substrait-java has a similar issue with mapping Substrait Projects, which include the input fields in the output, with Calcite Projects which do not. As @Blizzara suggests, a good fix for this is to include all of the input fields as expression in the Project. The following substrait-java code is what handles this https://github.com/substrait-io/substrait-java/blob/c8c31ecd3da047a4de0c33b21f3e960df628bda0/isthmus/src/main/java/io/substrait/isthmus/SubstraitRelNodeConverter.java#L155-L171 |
I think that's what I did in the updated code, just some other points I mentioned in the review |
@@ -214,6 +214,9 @@ pub async fn from_substrait_plan( | |||
Ok(from_substrait_rel(ctx, rel, &extensions).await?) | |||
}, | |||
plan_rel::RelType::Root(root) => { | |||
if !root.names.is_empty() { | |||
extensions = extensions.with_projection_names(root.names.clone()); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of storing the output names in the extension like this? They shouldn't be needed anywhere else but here, and it looks like the code below already applies them to the generated plan.
@@ -439,6 +458,27 @@ pub async fn from_substrait_rel( | |||
} | |||
names.insert(new_name); | |||
} | |||
let schema = input.schema(); | |||
if let (Some(extensions_names), true) = | |||
(extensions.names.as_ref(), p.common.is_some()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the final output names here doesn't really make sense. Those are intended to apply names to the results of the Substrait plan.
Given a Substrait plan like:
Project[a + b]
NamedScan[a, b, c, d, e]
What the DataFusion Project needs to include is all of the fields of the input relation like:
DataFusionProject[a, b, c, d, e, a + b]
DataFusionTable[a, b, c, d, e]
4a8a158
to
0b0659f
Compare
@@ -412,6 +428,10 @@ pub async fn from_substrait_rel( | |||
); | |||
let mut names: HashSet<String> = HashSet::new(); | |||
let mut exprs: Vec<Expr> = vec![]; | |||
input.schema().iter().for_each(|(qualifier, field)| { | |||
exprs.push(col(Column::from((qualifier, field)))) | |||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like the right thing - but then to get tests passing again I think we'll also need to add support for the remap
https://substrait.io/relations/basics/#emit-output-ordering, both here and in producer...
I am trying to clean up the review queue, so marking this PR as draft as the CI tests are failing. Let me know if that wasn't right |
I have some work in progress around remaps for #12347 which I suspect will overlap with this. @Lordworms I'd be happy to pull your changes into my work if that makes sense to you. |
Sure, that would be fine, Thanks a lot |
I believe this can be closed following #13127 |
Thanks @vbarua -- @Lordworms please let us know if this isn't right to close |
Which issue does this PR close?
Closes #12204
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?