Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load Data from Parquet file to PostgreSQL #1

Open
MahmoudHousam opened this issue Mar 20, 2024 · 0 comments
Open

Load Data from Parquet file to PostgreSQL #1

MahmoudHousam opened this issue Mar 20, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@MahmoudHousam
Copy link
Owner

MahmoudHousam commented Mar 20, 2024

Parquet files are read as batches, pulled to Pandas df and then to a PostgreSQL DB. This process takes up to 3 minutes to pull 3+ million rows from Parquet to PostgreSQL.

Recommended Fix: Use Duckdb and duckdb_fwd to read the Parquet file directly from the source and push it to PostgreSQL DB. (under investigation)

Alternative: Use Airbyte Cloud

Resources:

@MahmoudHousam MahmoudHousam added the enhancement New feature or request label Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant