Combine multiple selections into the same batch size in skip_records
#2358
Labels
enhancement
Any new improvement worthy of a entry in the changelog
parquet
Changes to the parquet crate
The skip records API added to the ArrayReader trait as part of #1998 does not provide a way to combine multiple selections into the same batch. This is unfortunate as columnar query engines will often want consistently large RecordBatch so that any dispatch overheads can be amortised over many rows. Whilst it could concatenate batches together, e.g. DataFusion's CoalesceBatchesExec, it would be more efficient to do this directly on read and eliminate an additional copy.
Originally posted by @tustvold in #2197 (comment)
The text was updated successfully, but these errors were encountered: