-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust): eliminate ProjectionExprs and handle CSE by stacking extra columns #16682
feat(rust): eliminate ProjectionExprs and handle CSE by stacking extra columns #16682
Conversation
308a08c
to
10f2867
Compare
Nice, this cleans it up and exposes We don't have to create a dag. For input_q = ...
schema_input = input_q.schema
schema_out = schema_input.with_columns(hstack_operation)
q = input_q.with_columns(cse_expressions) # adds cse expressions add the end
q = q.with_columns(hstack_operation) # does with columns operation
q.simple_project(schema_out) # selects the proper columns and thuse prunes the cse columns |
10f2867
to
50a1456
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the streaming engine, the operators HStackOperator
and ProjectionOperator
should also be updated and stop expanding the columns.
The default engine's projection
should also stop expanding. As we can have cse's with different lengths in a single select
.
CodSpeed Performance ReportMerging #16682 will not alter performanceComparing Summary
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16682 +/- ##
==========================================
- Coverage 81.50% 81.46% -0.05%
==========================================
Files 1415 1415
Lines 186600 186767 +167
Branches 3014 3023 +9
==========================================
+ Hits 152097 152158 +61
- Misses 33973 34078 +105
- Partials 530 531 +1 ☔ View full report in Codecov by Sentry. |
Thanks, I think I handled them all. |
I'm not happy about leaving |
|
||
|
||
@pytest.mark.skip(reason="activate once fixed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears this one works now?
@@ -48,12 +48,7 @@ def test_cse_expr_selection_streaming(monkeypatch: Any, capfd: Any) -> None: | |||
) | |||
assert_frame_equal(result, expected) | |||
|
|||
err = capfd.readouterr().err | |||
assert "df -> projection[cse] -> ordered_sink" in err | |||
assert "df -> hstack[cse] -> ordered_sink" in err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're no longer checking that cse is performed, because it's not reported in the executor any more.
147d2c6
to
3ae7f2a
Compare
3ae7f2a
to
35d23cd
Compare
Looks great @wence-. Thanks a lot. |
Rather than special-casing CSE expressions, treat them as "just another HStack". That way they are more transparent to the executor. The single nit is that CSE might introduce columns that are not the same length as the dataframe we are hstacking against, so pass through (via
ProjectionOptions
) that the executor should not broadcast mismatching length-1 columns if they have appeared via CSE.