Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: improve parallel processing #66

Closed
tilo opened this issue Jul 28, 2015 · 4 comments
Closed

Feature Request: improve parallel processing #66

tilo opened this issue Jul 28, 2015 · 4 comments

Comments

@tilo
Copy link
Owner

tilo commented Jul 28, 2015

@xjlin0 commented on May 25

Smarter_csv is a great gem! Save me ton's of time by parallel processing.

One possible improvement I am hoping here, is to let smarter_csv sending out the chunks before finishing reading the entire files. Smarter_csv use readline to read csv files, smartly avoiding reading the entire csv files into memory. However the it seems cannot sending chunks out before finishing the entire csv files.

@tilo
Copy link
Owner Author

tilo commented Jan 24, 2018

this looks related to issue #32

@tilo tilo added the v2.0 label Jan 24, 2018
@tilo
Copy link
Owner Author

tilo commented Aug 10, 2018

@tbolender
Copy link

tbolender commented Apr 30, 2021

I recently stumbled upon this gem when processing a ~400MB large CSV file. Your gem helped me a lot speeding the process up, thank you @tilo a lot for this!

However, it left me a bit helpless when it came to parallel processing. When studying the linked examples like https://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing/, I noticed that they assume that the file is small enough to load it completely in memory. That is not feasible nor practical in my case.

For actual parallel processing of arbitrary large files, I suggest some kind of Enumerable implementation on entry or chunk base. This would, e.g., allow the usage in the lambda syntax of parallel or the manual distribution over a worker infrastructure.

EDIT: If you have anything planned or sketched out already, I am happy to help.

@tilo
Copy link
Owner Author

tilo commented Mar 20, 2023

ear-marked for 2.0

@tilo tilo closed this as completed Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants