Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize DuckDBClient’s handling of small Parquet files #1614

Closed
mbostock opened this issue Aug 27, 2024 · 0 comments · Fixed by #1617
Closed

Optimize DuckDBClient’s handling of small Parquet files #1614

mbostock opened this issue Aug 27, 2024 · 0 comments · Fixed by #1617
Labels
enhancement New feature or request

Comments

@mbostock
Copy link
Member

We currently use CREATE VIEW for Parquet files:

`CREATE VIEW '${name}' AS SELECT * FROM parquet_scan('${file.name}')`

Because views are not physically materialized, this can cause many range requests as DuckDB reads the file. Sometimes the overhead of these range requests outweighs the benefits of not loading the entire file.

Using the new file.size property #1608, we could employ a heuristic that switches to CREATE TABLE instead of CREATE VIEW for “small” Parquet files. We might also want an option to provide an explicit hint to use a materialized table instead of a view.

@mbostock mbostock added the enhancement New feature or request label Aug 27, 2024
This was referenced Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant