-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
's5cmd cat' sub command is not using concurrent connections #245
Comments
I'm surprised that awscli can achieve better throughput than s5cmd on a similar execution. |
We are also facing this exact same behavior, and |
AWSCLI achieves better-than-single-stream-but-worse-than-fully-parallel throughput to stdout with slightly higher initial latency and significant RAM usage by filling a decently sized ring buffer in parallel and only cycling in new chunks when the earliest chunk completes. My understanding from the last time I looked was that s5cmd's cat didn't do this. Anecdotally, it's definitely possible to get better throughput than their python implementation for the same RAM cost, but the RAM cost is not exactly small. |
Thanks for the pointers @fiendish. I thought about the same but haven't had the time to read the source code. We can use the same approach. If anyone wants to contribute, we'd be very happy to review. |
I've implemented something for it in Python before as an experiment, but unfortunately I don't know golang so can't easily help here. If anyone is thinking of doing this without wanting to contemplate the method too hard, my dead simple approach for Python was to use a slightly modified concurrent.futures.Executor.map that only allowed max N results resident in RAM at a time (instead of the standard executor that limits in-flight threads but doesn't bound result storage). Then it was just setting some desired N and desired read size per thread, and the threads were bog standard range read requests. |
A download a load of compressed files and pipe them directly into the decoder. The lack of parallel downloads when outputting to stdout seems to hurt speed very noticeably, up to the point where it is faster to download the 3-times larger uncompressed version of the data. |
I did a quick prototype in Rust ( https://github.com/VeaaC/s3get ) that just uses X threads, keeps results in a sorted binary tree to be written by another thread, and limits the amount of pending data to 2*X blocks. This works very well, and I mostly saturate the 2.5 GBit connection. I am not very experienced in Go, so I cannot port such an approach myself, but I imagine that it should not be much longer / more difficult. |
This PR adds a new io.WriterAt adapter for non-seekable writers. It uses an internal linked list to order the incoming chunks. The implementation is independent from the download manager of aws-sdk-go, and because of that currently it can not bound the memory usage. In order to limit the memory usage, we would have had to write a custom manager other than the aws-sdk-go's implementation, which seemed unfeasible. The new implementation is about %25 percent faster than the older implementation for a 9.4 GB file with partSize=50MB and concurrency=20 parameters, with significantly higher memory usage, on average it uses 0.9 GB of memory and at most 2.1 GB is observed. Obviously, the memory usage and performance is dependent on the partSize-concurrency configuration and the link. Resolves #245 Co-authored-by: İbrahim Güngör <[email protected]>
This PR adds a new io.WriterAt adapter for non-seekable writers. It uses an internal linked list to order the incoming chunks. The implementation is independent from the download manager of aws-sdk-go, and because of that currently it can not bound the memory usage. In order to limit the memory usage, we would have had to write a custom manager other than the aws-sdk-go's implementation, which seemed unfeasible. The new implementation is about %25 percent faster than the older implementation for a 9.4 GB file with partSize=50MB and concurrency=20 parameters, with significantly higher memory usage, on average it uses 0.9 GB of memory and at most 2.1 GB is observed. Obviously, the memory usage and performance is dependent on the partSize-concurrency configuration and the link. Resolves peak#245 Co-authored-by: İbrahim Güngör <[email protected]>
's5cmd cat' sub command is not using concurrent connections, like 's5cmd cp' does.
My use-case is downloading a 427 GB tarball from S3 and extracting it on the fly:
time s5cmd cat s3://bucket/file.tar.zst - | pzstd -d | tar -xv -C /
Example EC2 instance type: c5d.9xlarge with 36 CPU cores, 72 GB RAM, 900 GB local SSD
When just comparing the download part with aws cli:
With 's5cmd cat':
With 's5cmd cp' and writing to disk (without decompression):
With higher concurrency and larger parts:
The text was updated successfully, but these errors were encountered: