-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
to_batches
can return wrong batch_size
with filter
#3220
Comments
Yes, this is by design at the moment. Recombining the batches can be expensive and so it should be an opt-in feature. It should probably have a few choices:
Datafusion already has a
|
However the
maybe this is a doc problem :) |
Oh. I read your original comment backwards. I thought your concern was the batch size was too small :) I agree this seems wrong. I will investigate. |
So this is arising from our use of https://docs.rs/datafusion/latest/datafusion/physical_plan/coalesce_batches/struct.CoalesceBatchesExec.html We use this when we are doing late materialization (which only happens when filtering) to ensure that take is not run too often. We do set the Longer term we should probably just have an |
I've proposed some updated wording in #3246 |
Expecting with 4 rows, but get 7 rows, the following batch will have correct size.
I'm building lance from source with
0.21.beta1
The text was updated successfully, but these errors were encountered: