Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

's5cmd cat' sub command is not using concurrent connections #245

Closed
kaazoo opened this issue Nov 2, 2020 · 7 comments · Fixed by #593
Closed

's5cmd cat' sub command is not using concurrent connections #245

kaazoo opened this issue Nov 2, 2020 · 7 comments · Fixed by #593
Assignees
Milestone

Comments

@kaazoo
Copy link

kaazoo commented Nov 2, 2020

's5cmd cat' sub command is not using concurrent connections, like 's5cmd cp' does.

My use-case is downloading a 427 GB tarball from S3 and extracting it on the fly:
time s5cmd cat s3://bucket/file.tar.zst - | pzstd -d | tar -xv -C /

Example EC2 instance type: c5d.9xlarge with 36 CPU cores, 72 GB RAM, 900 GB local SSD

When just comparing the download part with aws cli:

# time aws s3 cp s3://bucket/file.tar.zst - | cat >/dev/null
real    37m56.415s
user    22m50.195s
sys     19m8.677s
(around 192 MB/s)

With 's5cmd cat':

# time s5cmd cat s3://bucket/file.tar.zst >/dev/null
Still running. Only around 85 MB/s on a single S3 connection, according to netstat.

With 's5cmd cp' and writing to disk (without decompression):

time s5cmd cp s3://bucket/file.tar.zst /file.tar.zst
real    23m58.230s
user    7m56.734s
sys     22m40.482s
(around 304 MB/s)

With higher concurrency and larger parts:

# time s5cmd cp -c 36 -p 600 s3://bucket/file.tar.zst /file.tar.zst
real    10m3.064s
user    6m53.378s
sys     41m30.392s
(around 729 MB/s)
@igungor
Copy link
Member

igungor commented Dec 16, 2020

cat command uses stdout as the output "file". stdout is not a seekable writer, meaning we can use multiple connections for download, but can't use multiple threads for writes due to ordering guarantee.

I'm surprised that awscli can achieve better throughput than s5cmd on a similar execution.

@thecooltechguy
Copy link

We are also facing this exact same behavior, and aws s3 cp ... - provides better throughput than s5cmd cat. Does s5cmd support copying to stdout?

@fiendish
Copy link

fiendish commented Jul 20, 2021

I'm surprised that awscli can achieve better throughput than s5cmd on a similar execution.

AWSCLI achieves better-than-single-stream-but-worse-than-fully-parallel throughput to stdout with slightly higher initial latency and significant RAM usage by filling a decently sized ring buffer in parallel and only cycling in new chunks when the earliest chunk completes. My understanding from the last time I looked was that s5cmd's cat didn't do this. Anecdotally, it's definitely possible to get better throughput than their python implementation for the same RAM cost, but the RAM cost is not exactly small.

@igungor
Copy link
Member

igungor commented Jul 29, 2021

Thanks for the pointers @fiendish. I thought about the same but haven't had the time to read the source code.

We can use the same approach. If anyone wants to contribute, we'd be very happy to review.

@fiendish
Copy link

fiendish commented Jul 29, 2021

I've implemented something for it in Python before as an experiment, but unfortunately I don't know golang so can't easily help here. If anyone is thinking of doing this without wanting to contemplate the method too hard, my dead simple approach for Python was to use a slightly modified concurrent.futures.Executor.map that only allowed max N results resident in RAM at a time (instead of the standard executor that limits in-flight threads but doesn't bound result storage). Then it was just setting some desired N and desired read size per thread, and the threads were bog standard range read requests.

@VeaaC
Copy link

VeaaC commented Feb 21, 2022

A download a load of compressed files and pipe them directly into the decoder. The lack of parallel downloads when outputting to stdout seems to hurt speed very noticeably, up to the point where it is faster to download the 3-times larger uncompressed version of the data.

@VeaaC
Copy link

VeaaC commented Mar 1, 2022

I did a quick prototype in Rust ( https://github.com/VeaaC/s3get ) that just uses X threads, keeps results in a sorted binary tree to be written by another thread, and limits the amount of pending data to 2*X blocks. This works very well, and I mostly saturate the 2.5 GBit connection.

I am not very experienced in Go, so I cannot port such an approach myself, but I imagine that it should not be much longer / more difficult.

@igungor igungor added this to the v2.2.0 milestone Jun 19, 2023
@ilkinulas ilkinulas added this to s5cmd Jul 3, 2023
@denizsurmeli denizsurmeli moved this to Todo in s5cmd Jul 10, 2023
@denizsurmeli denizsurmeli moved this from Todo to In Progress in s5cmd Jul 17, 2023
@denizsurmeli denizsurmeli moved this from In Progress to Review in s5cmd Jul 21, 2023
igungor added a commit that referenced this issue Jul 27, 2023
This PR adds a new io.WriterAt adapter for non-seekable writers. It uses an internal linked list to order the incoming chunks. The implementation is independent from the download manager of aws-sdk-go, and because of that currently it can not bound the memory usage. In order to limit the memory usage, we would have had to write a custom manager other than the aws-sdk-go's implementation, which seemed unfeasible.

The new implementation is about %25 percent faster than the older implementation for a 9.4 GB file with partSize=50MB and concurrency=20 parameters, with significantly higher memory usage, on average it uses 0.9 GB of memory and at most 2.1 GB is observed. Obviously, the memory usage and performance is dependent on the partSize-concurrency configuration and the link.

Resolves #245

Co-authored-by: İbrahim Güngör <[email protected]>
@github-project-automation github-project-automation bot moved this from Review to Done in s5cmd Jul 27, 2023
ahmethakanbesel pushed a commit to ahmethakanbesel/s5cmd that referenced this issue Jul 28, 2023
This PR adds a new io.WriterAt adapter for non-seekable writers. It uses an internal linked list to order the incoming chunks. The implementation is independent from the download manager of aws-sdk-go, and because of that currently it can not bound the memory usage. In order to limit the memory usage, we would have had to write a custom manager other than the aws-sdk-go's implementation, which seemed unfeasible.

The new implementation is about %25 percent faster than the older implementation for a 9.4 GB file with partSize=50MB and concurrency=20 parameters, with significantly higher memory usage, on average it uses 0.9 GB of memory and at most 2.1 GB is observed. Obviously, the memory usage and performance is dependent on the partSize-concurrency configuration and the link.

Resolves peak#245

Co-authored-by: İbrahim Güngör <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants