You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Row Hash aggregation, loads whole aggregation state to memory before sending a single batch downstream. The resulting record batch will have more rows than predefined batch size
Describe the bug
Row Hash aggregation, loads whole aggregation state to memory before sending a single batch downstream. The resulting record batch will have more rows than predefined batch size
problematic part of code https://github.com/milenkovicm/arrow-datafusion/blob/17f069df4227b837cf2741a545c39a8b68d5fd76/datafusion/core/src/physical_plan/aggregates/row_hash.rs#L438
where iterator without limits is crated, and whole state is cloned, which doubles memory needed for the aggregation state.
function
poll_next
creates single batch https://github.com/milenkovicm/arrow-datafusion/blob/17f069df4227b837cf2741a545c39a8b68d5fd76/datafusion/core/src/physical_plan/aggregates/row_hash.rs#L146To Reproduce
Run an aggregation
Expected behavior
Resulting aggregation should be chunked according to the predefined batch size
Additional context
The text was updated successfully, but these errors were encountered: