-
-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: improve parallel processing #66
Comments
this looks related to issue #32 |
needs to be addressed in 2.0 https://felipeelias.github.io/ruby/2017/01/02/fast-file-processing-ruby.html
https://dalibornasevic.com/posts/68-processing-large-csv-files-with-ruby |
I recently stumbled upon this gem when processing a ~400MB large CSV file. Your gem helped me a lot speeding the process up, thank you @tilo a lot for this! However, it left me a bit helpless when it came to parallel processing. When studying the linked examples like https://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing/, I noticed that they assume that the file is small enough to load it completely in memory. That is not feasible nor practical in my case. For actual parallel processing of arbitrary large files, I suggest some kind of Enumerable implementation on entry or chunk base. This would, e.g., allow the usage in the lambda syntax of parallel or the manual distribution over a worker infrastructure. EDIT: If you have anything planned or sketched out already, I am happy to help. |
ear-marked for 2.0 |
@xjlin0 commented on May 25
Smarter_csv is a great gem! Save me ton's of time by parallel processing.
One possible improvement I am hoping here, is to let smarter_csv sending out the chunks before finishing reading the entire files. Smarter_csv use readline to read csv files, smartly avoiding reading the entire csv files into memory. However the it seems cannot sending chunks out before finishing the entire csv files.
The text was updated successfully, but these errors were encountered: