Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploads From MC Client Are Killed #3376

Closed
capriciousduck opened this issue Aug 24, 2020 · 29 comments
Closed

Uploads From MC Client Are Killed #3376

capriciousduck opened this issue Aug 24, 2020 · 29 comments

Comments

@capriciousduck
Copy link

Expected behavior

The client should upload the file

Actual behavior

The client stops the upload with a "Killed" output.

Here is the issue I created earlier: here

And here is the debug created when uploading the file:

`mc: GET /movies/?location= HTTP/1.1
Host: minio.cleverapps.io
User-Agent: MinIO (linux; amd64) minio-go/v7.0.5 mc/2020-08-20T00:23:01Z
Authorization: AWS4-HMAC-SHA256 Credential=capriciousduck/20200824/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=REDACTED
X-Amz-Content-Sha256: UNSIGNED-PAYLOAD
X-Amz-Date: 20200824T154923Z
Accept-Encoding: gzip

mc: HTTP/1.1 200 OK
Content-Length: 128
Accept-Ranges: bytes
Content-Security-Policy: block-all-mixed-content
Content-Type: application/xml
Date: Mon, 24 Aug 2020 15:49:23 GMT
Server: MinIO/RELEASE.2020-08-18T19-41-00Z
Sozu-Id: 1820f1e6-026a-427e-960e-110c66c9d1e7
Vary: Origin
X-Amz-Request-Id: 162E3EFA8A9FCE51
X-Xss-Protection: 1; mode=block

mc: Response Time: 141.98146ms

mc: HEAD /movies/ HTTP/1.1
Host: minio.cleverapps.io
User-Agent: MinIO (linux; amd64) minio-go/v7.0.5 mc/2020-08-20T00:23:01Z
Authorization: AWS4-HMAC-SHA256 Credential=capriciousduck/20200824/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=REDACTED
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20200824T154923Z

mc: HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Security-Policy: block-all-mixed-content
Date: Mon, 24 Aug 2020 15:49:23 GMT
Server: MinIO/RELEASE.2020-08-18T19-41-00Z
Sozu-Id: 1820f1e6-026a-427e-960e-110c66c9d1e7
Vary: Origin
X-Amz-Request-Id: 162E3EFA8AF3A52B
X-Xss-Protection: 1; mode=block
Content-Length: 0

mc: Response Time: 4.144945ms

mc: HEAD /movies/ HTTP/1.1
Host: minio.cleverapps.io
User-Agent: MinIO (linux; amd64) minio-go/v7.0.5 mc/2020-08-20T00:23:01Z
Authorization: AWS4-HMAC-SHA256 Credential=capriciousduck/20200824/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=REDACTED
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20200824T154923Z

mc: HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Security-Policy: block-all-mixed-content
Date: Mon, 24 Aug 2020 15:49:23 GMT
Server: MinIO/RELEASE.2020-08-18T19-41-00Z
Sozu-Id: b2b9cd51-0156-4c65-acc5-bfe5bc25715d
Vary: Origin
X-Amz-Request-Id: 162E3EFA8B41E64A
X-Xss-Protection: 1; mode=block
Content-Length: 0

mc: Response Time: 3.915626ms

0 B / ? ┃░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░▓┃mc: HEAD /movies/ HTTP/1.1
Host: minio.cleverapps.io
User-Agent: MinIO (linux; amd64) minio-go/v7.0.5 mc/2020-08-20T00:23:01Z
Authorization: AWS4-HMAC-SHA256 Credential=capriciousduck/20200824/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=REDACTED
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20200824T154923Z

mc: HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Security-Policy: block-all-mixed-content
Date: Mon, 24 Aug 2020 15:49:23 GMT
Server: MinIO/RELEASE.2020-08-18T19-41-00Z
Sozu-Id: f3dfa6ec-df7a-45ef-bccf-681caca2f90a
Vary: Origin
X-Amz-Request-Id: 162E3EFA8BE7440F
X-Xss-Protection: 1; mode=block
Content-Length: 0

mc: Response Time: 4.594106ms

mc: GET /movies/?object-lock= HTTP/1.1
Host: minio.cleverapps.io
User-Agent: MinIO (linux; amd64) minio-go/v7.0.5 mc/2020-08-20T00:23:01Z
Authorization: AWS4-HMAC-SHA256 Credential=capriciousduck/20200824/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=REDACTED
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20200824T154923Z
Accept-Encoding: gzip

mc: HTTP/1.1 404 Not Found
Content-Length: 330
Accept-Ranges: bytes
Content-Security-Policy: block-all-mixed-content
Content-Type: application/xml
Date: Mon, 24 Aug 2020 15:49:23 GMT
Server: MinIO/RELEASE.2020-08-18T19-41-00Z
Sozu-Id: 853f3437-e49f-4775-9e36-7e143ddf2e6c
Vary: Origin
X-Amz-Request-Id: 162E3EFA8C38CE7F
X-Xss-Protection: 1; mode=block

ObjectLockConfigurationNotFoundErrorObject Lock configuration does not exist for this bucketmovies/movies/162E3EFA8C38CE7F1f5c4a5e-43c3-416f-92c5-0fddc0b2c28amc: Response Time: 4.243974ms

mc: POST /movies/The.One.and.Only.Ivan.2020.1080p.DSNP.WEBRip.DDP5.1.x264-CM.mkv?uploads= HTTP/1.1
Host: minio.cleverapps.io
User-Agent: MinIO (linux; amd64) minio-go/v7.0.5 mc/2020-08-20T00:23:01Z
Content-Length: 0
Authorization: AWS4-HMAC-SHA256 Credential=capriciousduck/20200824/us-east-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date, Signature=REDACTED
Content-Type: video/x-matroska
X-Amz-Content-Sha256: UNSIGNED-PAYLOAD
X-Amz-Date: 20200824T154923Z
Accept-Encoding: gzip

mc: HTTP/1.1 200 OK
Content-Length: 304
Accept-Ranges: bytes
Content-Security-Policy: block-all-mixed-content
Content-Type: application/xml
Date: Mon, 24 Aug 2020 15:49:23 GMT
Server: MinIO/RELEASE.2020-08-18T19-41-00Z
Sozu-Id: 0b9f5d17-7178-4470-907a-f060d45e50fc
Vary: Origin
X-Amz-Request-Id: 162E3EFA8C88E2B1
X-Xss-Protection: 1; mode=block

mc: Response Time: 4.883374ms`

@timhughes
Copy link

Check the exit code. When copying a lot of files, mc can use a lot of memory. If this is the case then the linux out-of-memory killer may be causing this.

[root@aws-cli-5c67c9bfb-vvt4d /]# mc mirror rook-ceph-rgw-my-store-a/orderbookevents-a rook-ceph-rgw-my-store-b/orderbookevents-b
...2020-01-08-00-20-00-000-execution.raw:  98.67 GiB / 98.67 GiB ┃▓▓▓▓▓▓▓┃ 297.81 MiB/s
Killed
[root@aws-cli-5c67c9bfb-vvt4d /]# echo $?
137

I am running mine in kubernetes so I needed to look at the node logs to get the error

Sep 21 16:54:07 lddata-live-kubeworker03 kernel: Out of memory: Kill process 25008 (mc) score 1151 or sacrifice child
Sep 21 16:54:07 lddata-live-kubeworker03 kernel: Killed process 25011 (mc), UID 0, total-vm:9874000kB, anon-rss:8634100kB, file-rss:0kB, shmem-rss:0kB
Sep 21 16:54:07 lddata-live-kubeworker03 kernel: rados_async invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=1000
Sep 21 16:54:07 lddata-live-kubeworker03 kernel: rados_async cpuset=docker-ec3ef33f5359ee048f1bf4ca32a94db4b8008a12ffcc1e99e42665021a31d50d.scope mems_allowed=0
Sep 21 16:54:07 lddata-live-kubeworker03 kernel: CPU: 5 PID: 29051 Comm: rados_async Kdump: loaded Tainted: G               ------------ T 3.10.0-1062.9.1.el7.x86_64 #1
Sep 21 16:54:07 lddata-live-kubeworker03 kernel: Hardware name: Red Hat KVM, BIOS 1.11.1-4.module_el8.2.0+320+13f867d7 04/01/2014
Sep 21 16:54:07 lddata-live-kubeworker03 kernel: Call Trace:
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaf37ac23>] dump_stack+0x19/0x1b
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaf375ce9>] dump_header+0x90/0x229
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaed06142>] ? ktime_get_ts64+0x52/0xf0
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaedc1714>] oom_kill_process+0x254/0x3e0
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaed32e71>] ? cpuset_mems_allowed_intersects+0x21/0x30
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaedc11bd>] ? oom_unkillable_task+0xcd/0x120
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaedc1266>] ? find_lock_task_mm+0x56/0xc0
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaedc1f66>] out_of_memory+0x4b6/0x4f0
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaedc8a6f>] __alloc_pages_nodemask+0xacf/0xbe0
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaee16b28>] alloc_pages_current+0x98/0x110
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaedbd617>] __page_cache_alloc+0x97/0xb0
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaedc01d8>] filemap_fault+0x298/0x490
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffc02f155e>] __xfs_filemap_fault+0x7e/0x1d0 [xfs]
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaecd4ffe>] ? finish_task_switch+0x4e/0x1c0
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffc02f175c>] xfs_filemap_fault+0x2c/0x30 [xfs]
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaedec15a>] __do_fault.isra.61+0x8a/0x100
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaedec70c>] do_read_fault.isra.63+0x4c/0x1b0
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaedf11ba>] handle_pte_fault+0x22a/0xe20
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaecc9e30>] ? hrtimer_get_res+0x50/0x50
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaedf3ecd>] handle_mm_fault+0x39d/0x9b0
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaf388653>] __do_page_fault+0x213/0x500
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaf388a26>] trace_do_page_fault+0x56/0x150
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaf387fa2>] do_async_page_fault+0x22/0xf0
Sep 21 16:54:07 lddata-live-kubeworker03 kernel:  [<ffffffffaf3847a8>] async_page_fault+0x28/0x30
Sep 21 16:54:07 lddata-live-kubeworker03 kernel: Mem-Info:
Sep 21 16:54:07 lddata-live-kubeworker03 kernel: active_anon:13829373 inactive_anon:12475 isolated_anon:0
                                                           active_file:17 inactive_file:533 isolated_file:12
                                                           unevictable:0 dirty:0 writeback:0 unstable:0
                                                           slab_reclaimable:23949 slab_unreclaimable:38242
                                                           mapped:9831 shmem:14077 pagetables:36281 bounce:0
                                                           free:75345 free_pcp:34 free_cma:0

I get this when copying lots ( > many thousand ) files and my guess is that mc is creating a list in memory of every file it sees including some metadata. This causes it to eventually use all the memory in my node instance and then the ever helpful oom-killer comes along and kills the process.

@gileri
Copy link

gileri commented Sep 27, 2020

Mirroring 153 local files (6.5GB) to S3 does use over 2GB of RAM, getting it OOM-killed on a 2GB VM. I'm not familiar with the internals of mc but that seems too high.

Edit : This behavior is not exhibited during local-to-local copy (no S3 usage). It happens on both AWS S3 and Scaleway Object Storage remotes. On a less memory-limited machine mirroring the same payload reaches 6GB memory usage on 60-63 threads.

@harshavardhana harshavardhana self-assigned this Sep 27, 2020
@gileri
Copy link

gileri commented Sep 28, 2020

Here is a mem profile top :

% go tool pprof -lines ./mc/mc mem.pprof
(pprof) top
Showing nodes accounting for 7GB, 99.66% of 7.02GB total
Dropped 298 nodes (cum <= 0.04GB)
      flat  flat%   sum%        cum   cum%
       7GB 99.66% 99.66%        7GB 99.66%  github.com/minio/minio-go/v7.Client.putObjectMultipartStreamFromReadAt /home/twix/.go/pkg/mod/github.com/minio/minio-go/[email protected]/api-put-object-streaming.go:152
         0     0% 99.66%     7.02GB 99.94%  github.com/minio/mc/cmd.(*ParallelManager).addWorker.func1 /home/twix/tmp/mc_memory/mc/cmd/parallel-manager.go:83
         0     0% 99.66%     7.02GB 99.94%  github.com/minio/mc/cmd.(*S3Client).Put /home/twix/tmp/mc_memory/mc/cmd/client-s3.go:1008
         0     0% 99.66%     7.02GB 99.94%  github.com/minio/mc/cmd.(*mirrorJob).doMirror /home/twix/tmp/mc_memory/mc/cmd/mirror-main.go:363
         0     0% 99.66%     7.02GB 99.94%  github.com/minio/mc/cmd.(*mirrorJob).startMirror.func1 /home/twix/tmp/mc_memory/mc/cmd/mirror-main.go:584
         0     0% 99.66%     7.02GB 99.94%  github.com/minio/mc/cmd.putTargetStream /home/twix/tmp/mc_memory/mc/cmd/common-methods.go:322
         0     0% 99.66%     7.02GB 99.94%  github.com/minio/mc/cmd.uploadSourceToTargetURL /home/twix/tmp/mc_memory/mc/cmd/common-methods.go:551
         0     0% 99.66%     7.02GB 99.94%  github.com/minio/minio-go/v7.Client.PutObject /home/twix/.go/pkg/mod/github.com/minio/minio-go/[email protected]/api-put-object.go:212
         0     0% 99.66%     7.02GB 99.94%  github.com/minio/minio-go/v7.Client.putObjectCommon /home/twix/.go/pkg/mod/github.com/minio/minio-go/[email protected]/api-put-object.go:246
         0     0% 99.66%     7.02GB 99.94%  github.com/minio/minio-go/v7.Client.putObjectMultipartStream /home/twix/.go/pkg/mod/github.com/minio/minio-go/[email protected]/api-put-object-streaming.go:49

https://github.com/minio/minio-go/blob/master/api-put-object-streaming.go#L152 seems incriminated.
This code was introduced by minio/minio-go#1357, which "avoid data race by copying the buffer".

@jmmaloney4
Copy link

I am suffering from a similar issue. In Kubernetes, if I remove resource limits mc will consume every scrap of memory it can get its hands on, and kswapd grinds the machine to a halt, however if I limit the memory to a gig or two, no progress seems to be made, although it doesn't appear to get immediately killed by k8s, however seems to die after a couple hours of no progress.

@harshavardhana
Copy link
Member

Will investigate this is a change that was necessary for content-md5 calculation in minio-go. Will see if there is a cleaner way to fix this.

@harshavardhana
Copy link
Member

https://github.com/minio/minio-go/blob/master/api-put-object-streaming.go#L152 seems incriminated.
This code was introduced by minio/minio-go#1357, which "avoid data race by copying the buffer".

This code is needed because there is no way to seek and read from multiple offsets in parallel on POSIX - *os.File is not locked. Introducing a mutex would simply serialize the entire thing.

Using independent memory is the only way to upload content in parallel

@stale
Copy link

stale bot commented Dec 28, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 28, 2020
@furkanmustafa
Copy link

I've hit the same issue, tried to mirror large video files using mc mirror. suddenly I've run out of memory, and thankfully mc got killed by the kernel.

Not sure how should I mirror(rsync?) my files to minio.

Thinking of looping files in my own shell script, and then using streaming upload by piping the file to mc.

@stale stale bot removed the stale label Dec 31, 2020
@furkanmustafa
Copy link

furkanmustafa commented Dec 31, 2020

mc mirror --disable-multipart

this workaround solved the memory issue for now.

@joshuafc
Copy link

joshuafc commented Jan 22, 2021

mc mirror --disable-multipart

this workaround solved the memory issue for now.

It also speed up my mc mirror. I am trying upload millions of small files.

@j3mdamas
Copy link

Thanks @furkanmustafa, it worked like a charm!

@bwplotka
Copy link

bwplotka commented Mar 15, 2021

@harshavardhana do you mind if we can rename title of this issue "Increased memory utilization for multi part upload" 🤗 or is there better issue to discuss this?

This code is needed because there is no way to seek and read from multiple offsets in parallel on POSIX - *os.File is not locked. Introducing a mutex would simply serialize the entire thing.
Using independent memory is the only way to upload content in parallel

Thanks for the explanation! I have a couple of ideas. But before I am trying to understand the issue minio/minio-go#1355, what was race about? Because if it's about book-keeping of offset we can just open file multiple times or calculate offsets manually, WDYT? 🤗

@harshavardhana
Copy link
Member

Thanks for the explanation! I have a couple of ideas. But before I am trying to understand the issue minio/minio-go#1355, what was race about? Because if it's about book-keeping of offset we can just open file multiple times or calculate offsets manually, WDYT?

The race is concurrent Seek's on *os.File are not possible and is not thread safe - leading to potentially incorrect content on the destination

@bwplotka
Copy link

I understand that, you have essentially one "offset" pointer. But we don't need to use Seek, we could just implement our own offset checking or just... open same file 4 times. I think it's better than allocating 0,5GB for single upload 🤗

@harshavardhana
Copy link
Member

I understand that, you have essentially one "offset" pointer. But we don't need to use Seek, we could just implement our own offset checking or just... open same file 4 times. I think it's better than allocating 0,5GB for single upload

Correct but the API is not designed to take 4 different *os.File @bwplotka this is only possible if the entire path name is provided to minio-go client so that it can open as many fd's as necessary internally.

For now setting a proper PartSize() to ensure that for most common use-cases you don't have to use 128MiB * 4 buffers locally @bwplotka

@bwplotka
Copy link

bwplotka commented Mar 15, 2021

Ah yea, we have only a reader.

For now setting a proper PartSize() to ensure that for most common use-cases you don't have to use 128MiB * 4 buffers locally @bwplotka

I would love to avoid this, I don't think that makes sense. PartSize should control how I interact with HTTP protocol and minio target. It should not suddenly control how much is allocated. In the same way when I upload 6GB of file without multi-part I don't want 6GB of mem to be allocated, right? I think this is an major in-efficiency that we need to remove. You can already see how many issues and project were impacted. Let's see how we can improve this.

I think we can rely on ReadAt logic, no? ReadAt is using pread which is thread safe:

The pread() and pwrite() system calls are especially useful in
multithreaded applications. They allow multiple threads to
perform I/O on the same file descriptor without being affected by
changes to the file offset by other threads.

If I am not wrong, we should be able to avoid using Seek just fine, no? We can just readAt offset we want into smaller fixed size buffer.

Alternatively I would check how AWS SDK is doing it. Maybe something we can be inspired by?

@harshavardhana
Copy link
Member

If I am not wrong, we should be able to avoid using Seek just fine, no? We can just readAt offset we want into smaller fixed size buffer.

ReadAt we had other issues as well in the past @bwplotka - that's why we had re-implemented to be a much simpler implementation (predictable) although using more memory at the same time.

Alternatively I would check how AWS SDK is doing it. Maybe something we can be inspired by?

AWS SDK is also buffering as they all have to due to S3 API requirements.

I would love to avoid this, I don't think that makes sense. PartSize should control how I interact with HTTP protocol and minio target. It should not suddenly control how much is allocated. In the same way when I upload 6GB of file without multi-part I don't want 6GB of mem to be allocated, right? I think this is an major in-efficiency that we need to remove. You can already see how many issues and project were impacted. Let's see how we can improve this.

Allocation is necessary no matter what @bwplotka because content-md5 calculation or sha256 needs to be calculated prior to uploading. If you re-seek your fd if you use *os.File and it is very tricky to manage these offsets cleanly in a safe way. Usage of ReadAt() makes it tricky because then you need to reflect upon an io.Seeker() to seek back anyways and that is not safe.

We also need to handle transparent retries - this is why all of this becomes even more tricky. A transparent retry requires re-seek to the beginning of the required offset.

@bwplotka
Copy link

bwplotka commented Mar 15, 2021

You can re-seek virtually though within a buffer.

Yes, looks like we have many things in place here (((: It escalated quickly to quite complex implementation, potentially for a good reason. Thanks for pointing out.

This brings the question: What is the negative impact of reducing multi part size to e.g 16MB? 🤗

@harshavardhana
Copy link
Member

This brings the question: What is the negative impact of reducing multi part size to e.g 16MB?

There is really no impact, in-fact I have a PR to change that as the default value for mc - in benchmarks, I see that 32MiB to be the best possible value - but 16MiB is perhaps the best of both worlds for memory point of view.

Users who have more leeway can use a higher buffer if needed. I suspect most of your files to be less than this value in terms of the full size of the object - or are you uploading with -1 as the content-length? @bwplotka

@bwplotka
Copy link

There is really no impact, in-fact I have a PR to change that as the default value for mc - in benchmarks, I see that 32MiB to be the best possible value - but 16MiB is perhaps the best of both worlds for memory point of view.

Thanks! Let's update default then if we can! 🤗

Users who have more leeway can use a higher buffer if needed. I suspect most of your files to be less than this value in terms of the full size of the object - or are you uploading with -1 as the content-length? @bwplotka

Yea, it depends if we can or cannot guess the size. Sometimes we cannot, so we use -1. GCS API is amazing because it deduces multipart or not from the reader directly - no need to pass full size. No idea how they do it 🙃 Magic. Probably prefetching to check.

But our files can have a lot, sometimes around 50GB. Then question is, at what size we should start doing multipart... always? 32MB if we have 16MB Part size? 🤗 We can definitely experiment and benchmark, that would be the best

@harshavardhana
Copy link
Member

harshavardhana commented Mar 16, 2021

But our files can have a lot, sometimes around 50GB. Then question is, at what size we should start doing multipart... always? 32MB if we have 16MB Part size? We can definitely experiment and benchmark, that would be the best

NOTE: you need to remember that at max you can upload upto 10,000 parts in S3 multipart API so you should set the value such that it caters to the maximum object size. Internally in minio-go we default to 128MiB but the idea is that you should calculate the optimal part-size based on the content-length available.

So essentially your best partSize is always

partSize := objectSize / 10000

So anyone uploading an object that is like 128GiB has to spend 128MiB worth of memory per multipart upload, as there is no way around it. This type of cacophony API is why sometimes we love GCS API where it takes an undefined upload to be uploaded without much memory requirement on the client-side.

Then there are other ways such as you can write your own uploader using https://github.com/minio/minio-go/blob/master/core.go where you have full control over all the underlying details but that puts some burden on your end where you can fully optimize the memory requirement.

@bwplotka
Copy link

Made some benchmarks and indeed it can be significant for our use case: thanos-io/thanos#3917 (comment)

Is there a way for upload logic to accept some kind of reader that locks things? Or ability to tell minio that we don't use multiple Uploads at once? (I think that is the only race that happens, no?)

NOTE: you need to remember that at max you can upload upto 10,000 parts in S3 multipart API so you should set the value such that it caters to the maximum object size. Internally in minio-go we default to 128MiB but the idea is that you should calculate the optimal part-size based on the content-length available.
So essentially your best partSize is always

Thanks! This is useful, but I wonder if this is always good to set 10 000 parts. e.g for 1MB file, it sounds like inefficiency to send 10kb parts only, no? Since this is blocking Thanos release, I am considering reverting race fix as sounds like we are not affected and figure long term solution later ): bwplotka/minio-go@f9927e5

I still have high hopes for some Reader implementation that will lock e.g ReadAt. It will make things slower, but do the work . It will be marginally slower than network call, no?

@harshavardhana
Copy link
Member

Thanks! This is useful, but I wonder if this is always good to set 10 000 parts. e.g for 1MB file, it sounds like inefficiency to send 10kb parts only, no? Since this is blocking Thanos release, I am considering reverting race fix as sounds like we are not affected and figure long term solution later ): bwplotka/minio-go@f9927e5

The minimum per part size is 5MiB @bwplotka only the last part can be < 5MiB

Is there a way for upload logic to accept some kind of reader that locks things? Or ability to tell minio that we don't use multiple Uploads at once? (I think that is the only race that happens, no?)

you can pass a ReadAt() which blocks uploads one by one or you change the concurrency default from '4' to '1'

https://pkg.go.dev/github.com/minio/minio-go/v7#PutObjectOptions

opts.NumThreads=1

@stale
Copy link

stale bot commented Jun 22, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@bwplotka
Copy link

I think this issue still exists. Can we reopen?

We have 4 workers (we don't change NumThreads) in our code and using "apparently" racy code which has proven to cause zero issues on production for anybody in Thanos. We use a fork with a reverted version bwplotka/minio-go@f9927e5

So either:

  1. There is a race we just did not hit the issue and we need to find better optimization of this code

OR

  1. The issue is unexisting and the current copying is not worth huge performance regressions. Our tests in the worst case shows 1.5 MB of mem allocated per upload vs 400 MB in the latest minio-go version and almost twice slower operation without our revert.

@harshavardhana
Copy link
Member

harshavardhana commented Jun 16, 2022

2. The issue is unexisting and the current copying is not worth huge performance regressions. Our tests in the worst case shows 1.5 MB of mem allocated per upload vs 400 MB in the latest minio-go version and almost twice slower operation without our revert.

From what I looked you are passing stream without the length - if we can't know the length we have to use a higher memory to cater to AWS S3 multipart requirement.

If the size is provided we know what the optimal part size is, if you want to control that further, you can set whatever part size you want.

@bwplotka
Copy link

bwplotka commented Jun 16, 2022 via email

@harshavardhana
Copy link
Member

harshavardhana commented Jun 16, 2022

We don't always know, but can double check! Still, I would love to revisit if copying is necessary.

If you cannot know you can still set a partSize that you are okay with - you can't have an unending stream - maximum allowed is 5TiB which requires a 500MiB buffer per part to upload to satisfy 10,000 part upper limit.

We need to calculate sha256sum, md5sum - we can't re-seek back on the stream either what else can the client do? other than buffering it? - If a Seeker or ReaderAt can be implemented then that is the best-case scenario for minio-go.

We can't use sync.Pool is because it's not useful - it is only useful if the given stream is always going to use the same partSize, otherwise if each stream has a different length which we don't know and have to dynamically choose the partSize - using - sync.Pool will simply be useless for memory re-usability.

Treating this as some kind of memory leak is naive, if you want to control the memory properly you have PartSize configurable - you should use that to control what you want to do with memory and how much leverage you have on the system where this is running.

Also, this is simply an allocated memory, not an active memory - it is given back to runtime when minio-go is done with it - unfortunately Go's GC might not kick in that soon.

The forked code you have @bwplotka is wrong - you are prone to races.

@harshavardhana
Copy link
Member

harshavardhana commented Jun 16, 2022

However, to not bike shed this further, I will explore ways to reduce memory usage @bwplotka would you mind writing a sample code to reproduce this in https://github.com/minio/minio-go/issues

@minio minio locked as resolved and limited conversation to collaborators Jun 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants