's5cmd cat' sub command is not using concurrent connections #245

kaazoo · 2020-11-02T21:55:47Z

's5cmd cat' sub command is not using concurrent connections, like 's5cmd cp' does.

My use-case is downloading a 427 GB tarball from S3 and extracting it on the fly:
time s5cmd cat s3://bucket/file.tar.zst - | pzstd -d | tar -xv -C /

Example EC2 instance type: c5d.9xlarge with 36 CPU cores, 72 GB RAM, 900 GB local SSD

When just comparing the download part with aws cli:

# time aws s3 cp s3://bucket/file.tar.zst - | cat >/dev/null
real    37m56.415s
user    22m50.195s
sys     19m8.677s
(around 192 MB/s)

With 's5cmd cat':

# time s5cmd cat s3://bucket/file.tar.zst >/dev/null
Still running. Only around 85 MB/s on a single S3 connection, according to netstat.

With 's5cmd cp' and writing to disk (without decompression):

time s5cmd cp s3://bucket/file.tar.zst /file.tar.zst
real    23m58.230s
user    7m56.734s
sys     22m40.482s
(around 304 MB/s)

With higher concurrency and larger parts:

# time s5cmd cp -c 36 -p 600 s3://bucket/file.tar.zst /file.tar.zst
real    10m3.064s
user    6m53.378s
sys     41m30.392s
(around 729 MB/s)

The text was updated successfully, but these errors were encountered:

igungor · 2020-12-16T10:12:37Z

cat command uses stdout as the output "file". stdout is not a seekable writer, meaning we can use multiple connections for download, but can't use multiple threads for writes due to ordering guarantee.

I'm surprised that awscli can achieve better throughput than s5cmd on a similar execution.

thecooltechguy · 2020-12-30T18:51:22Z

We are also facing this exact same behavior, and aws s3 cp ... - provides better throughput than s5cmd cat. Does s5cmd support copying to stdout?

fiendish · 2021-07-20T17:48:19Z

I'm surprised that awscli can achieve better throughput than s5cmd on a similar execution.

AWSCLI achieves better-than-single-stream-but-worse-than-fully-parallel throughput to stdout with slightly higher initial latency and significant RAM usage by filling a decently sized ring buffer in parallel and only cycling in new chunks when the earliest chunk completes. My understanding from the last time I looked was that s5cmd's cat didn't do this. Anecdotally, it's definitely possible to get better throughput than their python implementation for the same RAM cost, but the RAM cost is not exactly small.

igungor · 2021-07-29T07:25:40Z

Thanks for the pointers @fiendish. I thought about the same but haven't had the time to read the source code.

We can use the same approach. If anyone wants to contribute, we'd be very happy to review.

fiendish · 2021-07-29T22:04:49Z

I've implemented something for it in Python before as an experiment, but unfortunately I don't know golang so can't easily help here. If anyone is thinking of doing this without wanting to contemplate the method too hard, my dead simple approach for Python was to use a slightly modified concurrent.futures.Executor.map that only allowed max N results resident in RAM at a time (instead of the standard executor that limits in-flight threads but doesn't bound result storage). Then it was just setting some desired N and desired read size per thread, and the threads were bog standard range read requests.

VeaaC · 2022-02-21T18:15:55Z

A download a load of compressed files and pipe them directly into the decoder. The lack of parallel downloads when outputting to stdout seems to hurt speed very noticeably, up to the point where it is faster to download the 3-times larger uncompressed version of the data.

VeaaC · 2022-03-01T11:04:20Z

I did a quick prototype in Rust ( https://github.com/VeaaC/s3get ) that just uses X threads, keeps results in a sorted binary tree to be written by another thread, and limits the amount of pending data to 2*X blocks. This works very well, and I mostly saturate the 2.5 GBit connection.

I am not very experienced in Go, so I cannot port such an approach myself, but I imagine that it should not be much longer / more difficult.

This PR adds a new io.WriterAt adapter for non-seekable writers. It uses an internal linked list to order the incoming chunks. The implementation is independent from the download manager of aws-sdk-go, and because of that currently it can not bound the memory usage. In order to limit the memory usage, we would have had to write a custom manager other than the aws-sdk-go's implementation, which seemed unfeasible. The new implementation is about %25 percent faster than the older implementation for a 9.4 GB file with partSize=50MB and concurrency=20 parameters, with significantly higher memory usage, on average it uses 0.9 GB of memory and at most 2.1 GB is observed. Obviously, the memory usage and performance is dependent on the partSize-concurrency configuration and the link. Resolves #245 Co-authored-by: İbrahim Güngör <[email protected]>

This PR adds a new io.WriterAt adapter for non-seekable writers. It uses an internal linked list to order the incoming chunks. The implementation is independent from the download manager of aws-sdk-go, and because of that currently it can not bound the memory usage. In order to limit the memory usage, we would have had to write a custom manager other than the aws-sdk-go's implementation, which seemed unfeasible. The new implementation is about %25 percent faster than the older implementation for a 9.4 GB file with partSize=50MB and concurrency=20 parameters, with significantly higher memory usage, on average it uses 0.9 GB of memory and at most 2.1 GB is observed. Obviously, the memory usage and performance is dependent on the partSize-concurrency configuration and the link. Resolves peak#245 Co-authored-by: İbrahim Güngör <[email protected]>

igungor added the enhancement label Sep 30, 2021

igungor added this to the v2.2.0 milestone Jun 19, 2023

ilkinulas added this to s5cmd Jul 3, 2023

seruman assigned denizsurmeli Jul 10, 2023

denizsurmeli moved this to Todo in s5cmd Jul 10, 2023

denizsurmeli mentioned this issue Jul 14, 2023

cat: make it concurrent #593

Merged

denizsurmeli moved this from Todo to In Progress in s5cmd Jul 17, 2023

denizsurmeli moved this from In Progress to Review in s5cmd Jul 21, 2023

igungor closed this as completed in #593 Jul 27, 2023

github-project-automation bot moved this from Review to Done in s5cmd Jul 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

's5cmd cat' sub command is not using concurrent connections #245

's5cmd cat' sub command is not using concurrent connections #245

kaazoo commented Nov 2, 2020 •

edited

Loading

igungor commented Dec 16, 2020

thecooltechguy commented Dec 30, 2020

fiendish commented Jul 20, 2021 •

edited

Loading

igungor commented Jul 29, 2021 •

edited

Loading

fiendish commented Jul 29, 2021 •

edited

Loading

VeaaC commented Feb 21, 2022

VeaaC commented Mar 1, 2022 •

edited

Loading

's5cmd cat' sub command is not using concurrent connections #245

's5cmd cat' sub command is not using concurrent connections #245

Comments

kaazoo commented Nov 2, 2020 • edited Loading

igungor commented Dec 16, 2020

thecooltechguy commented Dec 30, 2020

fiendish commented Jul 20, 2021 • edited Loading

igungor commented Jul 29, 2021 • edited Loading

fiendish commented Jul 29, 2021 • edited Loading

VeaaC commented Feb 21, 2022

VeaaC commented Mar 1, 2022 • edited Loading

kaazoo commented Nov 2, 2020 •

edited

Loading

fiendish commented Jul 20, 2021 •

edited

Loading

igungor commented Jul 29, 2021 •

edited

Loading

fiendish commented Jul 29, 2021 •

edited

Loading

VeaaC commented Mar 1, 2022 •

edited

Loading