Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust): eliminate ProjectionExprs and handle CSE by stacking extra columns #16682

Merged
merged 5 commits into from
Jun 4, 2024

Conversation

wence-
Copy link
Collaborator

@wence- wence- commented Jun 3, 2024

Rather than special-casing CSE expressions, treat them as "just another HStack". That way they are more transparent to the executor. The single nit is that CSE might introduce columns that are not the same length as the dataframe we are hstacking against, so pass through (via ProjectionOptions) that the executor should not broadcast mismatching length-1 columns if they have appeared via CSE.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature rust Related to Rust Polars labels Jun 3, 2024
@wence- wence- force-pushed the wence/fea/remove-projectionexprs branch from 308a08c to 10f2867 Compare June 3, 2024 12:08
@ritchie46
Copy link
Member

ritchie46 commented Jun 3, 2024

Nice, this cleans it up and exposes CSE to the streaming engine. 👍

We don't have to create a dag. For HStack we should do something like this (in pseudo code):

input_q = ...

schema_input = input_q.schema
schema_out = schema_input.with_columns(hstack_operation)

q = input_q.with_columns(cse_expressions) # adds cse expressions add the end

q = q.with_columns(hstack_operation)  # does with columns operation

q.simple_project(schema_out) # selects the proper columns and thuse prunes the cse columns

@wence- wence- force-pushed the wence/fea/remove-projectionexprs branch from 10f2867 to 50a1456 Compare June 3, 2024 14:43
Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the streaming engine, the operators HStackOperator and ProjectionOperator should also be updated and stop expanding the columns.

The default engine's projection should also stop expanding. As we can have cse's with different lengths in a single select.

crates/polars-lazy/src/physical_plan/executors/stack.rs Outdated Show resolved Hide resolved
Copy link

codspeed-hq bot commented Jun 3, 2024

CodSpeed Performance Report

Merging #16682 will not alter performance

Comparing wence-:wence/fea/remove-projectionexprs (35d23cd) with main (9912af0)

Summary

✅ 37 untouched benchmarks

Copy link

codecov bot commented Jun 3, 2024

Codecov Report

Attention: Patch coverage is 90.90909% with 13 lines in your changes missing coverage. Please review.

Project coverage is 81.46%. Comparing base (534b655) to head (35d23cd).
Report is 35 commits behind head on main.

Files Patch % Lines
...s/polars-lazy/src/physical_plan/executors/stack.rs 62.50% 6 Missing ⚠️
py-polars/src/lazyframe/visitor/nodes.rs 0.00% 2 Missing ⚠️
...ars-lazy/src/physical_plan/executors/projection.rs 75.00% 1 Missing ⚠️
...zy/src/physical_plan/executors/projection_utils.rs 87.50% 1 Missing ⚠️
crates/polars-lazy/src/physical_plan/planner/lp.rs 80.00% 1 Missing ⚠️
.../polars-pipe/src/executors/operators/projection.rs 90.00% 1 Missing ⚠️
py-polars/src/lazyframe/visit.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16682      +/-   ##
==========================================
- Coverage   81.50%   81.46%   -0.05%     
==========================================
  Files        1415     1415              
  Lines      186600   186767     +167     
  Branches     3014     3023       +9     
==========================================
+ Hits       152097   152158      +61     
- Misses      33973    34078     +105     
- Partials      530      531       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wence-
Copy link
Collaborator Author

wence- commented Jun 3, 2024

For the streaming engine, the operators HStackOperator and ProjectionOperator should also be updated and stop expanding the columns.

The default engine's projection should also stop expanding. As we can have cse's with different lengths in a single select.

Thanks, I think I handled them all.

@wence-
Copy link
Collaborator Author

wence- commented Jun 3, 2024

I'm not happy about leaving should_broadcast as a public field in ProjectionOptions, but I am a little unsure of the appropriate idiom to hide it (even though it is used across crates).



@pytest.mark.skip(reason="activate once fixed")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears this one works now?

@@ -48,12 +48,7 @@ def test_cse_expr_selection_streaming(monkeypatch: Any, capfd: Any) -> None:
)
assert_frame_equal(result, expected)

err = capfd.readouterr().err
assert "df -> projection[cse] -> ordered_sink" in err
assert "df -> hstack[cse] -> ordered_sink" in err
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're no longer checking that cse is performed, because it's not reported in the executor any more.

py-polars/tests/unit/test_cse.py Outdated Show resolved Hide resolved
py-polars/tests/unit/test_cse.py Outdated Show resolved Hide resolved
@wence- wence- force-pushed the wence/fea/remove-projectionexprs branch 2 times, most recently from 147d2c6 to 3ae7f2a Compare June 4, 2024 08:48
@wence- wence- marked this pull request as ready for review June 4, 2024 08:49
@wence- wence- force-pushed the wence/fea/remove-projectionexprs branch from 3ae7f2a to 35d23cd Compare June 4, 2024 08:49
@ritchie46
Copy link
Member

Looks great @wence-. Thanks a lot.

@ritchie46 ritchie46 merged commit 0161efd into pola-rs:main Jun 4, 2024
27 checks passed
@wence- wence- deleted the wence/fea/remove-projectionexprs branch June 4, 2024 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants