Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM when using S3TransferManager.downloadFile() #5744

Open
1 task
fivetran-aakashtiwari opened this issue Dec 6, 2024 · 2 comments
Open
1 task

OOM when using S3TransferManager.downloadFile() #5744

fivetran-aakashtiwari opened this issue Dec 6, 2024 · 2 comments
Assignees
Labels
bug This issue is a bug. p2 This is a standard priority issue

Comments

@fivetran-aakashtiwari
Copy link

fivetran-aakashtiwari commented Dec 6, 2024

Describe the bug

When we are downloading multiple files around 50 files of size around 500 Mb concurrently we're running into OOM issue and even when we reduced the MaxConcurrency of S3Client to as low as 5 we're still facing the issue.
Heap dump :
Screenshot 2024-12-06 at 12 45 17 PM

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

S3TransferManager can work fine no matter how many files or how big the file is.

Current Behavior

Getting OOM for some of the threads due to which some files are not fully downloaded:

Exception in thread "AwsEventLoop 3" Exception in thread "AwsEventLoop 1" Exception in thread "AwsEventLoop 7" java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
09:10:49.954 [AwsEventLoop 5] WARN software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - Transfer failed.
software.amazon.awssdk.core.exception.SdkClientException: Failed to send the request: OutOfMemoryError has been raised from JVM.
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
	at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47)
	at software.amazon.awssdk.services.s3.internal.crt.S3CrtResponseHandlerAdapter.handleError(S3CrtResponseHandlerAdapter.java:165)
	at software.amazon.awssdk.services.s3.internal.crt.S3CrtResponseHandlerAdapter.onFinished(S3CrtResponseHandlerAdapter.java:129)
	at software.amazon.awssdk.crt.s3.S3MetaRequestResponseHandlerNativeAdapter.onFinished(S3MetaRequestResponseHandlerNativeAdapter.java:25)
Exception in thread "AwsEventLoop 1" java.lang.OutOfMemoryError: Java heap space
09:10:50.038 [Thread-2] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:50.218 [Thread-31] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====                | 20.0%
09:10:50.298 [Thread-14] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:50.348 [Thread-34] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |=                   | 5.0%
09:10:50.368 [Thread-30] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:50.388 [Thread-17] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====================| 100.0%
09:10:50.408 [Thread-36] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:50.408 [Thread-28] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:50.478 [Thread-23] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:50.498 [Thread-19] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:50.498 [Thread-39] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:50.518 [Thread-20] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====================| 100.0%
09:10:50.528 [Thread-35] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:50.625 [sdk-async-response-0-0] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - Transfer complete!
09:10:50.702 [AwsEventLoop 5] WARN software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - Transfer failed.
software.amazon.awssdk.core.exception.SdkClientException: Failed to send the request: OutOfMemoryError has been raised from JVM.
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
	at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47)
	at software.amazon.awssdk.services.s3.internal.crt.S3CrtResponseHandlerAdapter.handleError(S3CrtResponseHandlerAdapter.java:165)
	at software.amazon.awssdk.services.s3.internal.crt.S3CrtResponseHandlerAdapter.onFinished(S3CrtResponseHandlerAdapter.java:129)
	at software.amazon.awssdk.crt.s3.S3MetaRequestResponseHandlerNativeAdapter.onFinished(S3MetaRequestResponseHandlerNativeAdapter.java:25)
09:10:51.398 [Thread-9] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====                | 20.0%
09:10:51.478 [Thread-34] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====                | 20.0%
09:10:51.508 [Thread-25] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |=================== | 95.0%
09:10:51.518 [Thread-5] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:51.548 [Thread-30] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====                | 20.0%
09:10:51.548 [Thread-36] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |=================   | 85.0%
09:10:51.568 [Thread-13] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====                | 20.0%
09:10:51.568 [Thread-8] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |=====               | 25.0%
09:10:51.598 [Thread-23] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====                | 20.0%
09:10:51.628 [Thread-19] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====                | 20.0%
09:10:51.648 [Thread-21] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====                | 20.0%

I Have also attached the full stack trace
StackTrace.txt

Reproduction Steps

//setting up client     
this.s3AsyncClient =
                S3AsyncClient.crtBuilder()
                        .credentialsProvider(StaticCredentialsProvider.create(getCreds()))
                        .region(Region.of("us-east-1"))
                        .build();
   this.s3TransferManager = S3TransferManager.builder().s3Client(s3AsyncClient).build();

//downloading multiple files concurrently 
   protected void downloadWithRetry(String s3Uri, File downloadDir, AtomicLong syncCallDuration, int numFiles) {
        List<CompletableFuture<CompletedFileDownload>> futures = new ArrayList<>();
        String s3Object = extractObjectLocation(s3Uri);
        System.out.println("Downloading " + s3Object + " s3 uri" + s3Uri);
        String[] tokens = s3Object.split("/");
        String fileName = tokens[tokens.length - 1];
        for (int i = 0; i < numFiles; i++) {
            File file = new File(downloadDir, String.valueOf(i) + fileName);
            try (FileOutputStream fos = new FileOutputStream(file)) {
                DownloadFileRequest downloadFileRequest =
                        DownloadFileRequest.builder()
                                .getObjectRequest(b -> b.bucket(s3BucketName).key(s3Object))
                                .addTransferListener(LoggingTransferListener.create())
                                .destination(file)
                                .build();
                var completableFuture = s3TransferManager.downloadFile(downloadFileRequest).completionFuture();

                // force syncing file
                long start = currentTimeMillis();
                fos.getFD().sync();
                syncCallDuration.addAndGet(currentTimeMillis() - start);
                futures.add(completableFuture);
            } catch (IOException e) {
                System.out.println("SEVERE : Failed in download and retry: error : " + e.getMessage());
                throw new UncheckedIOException(e);
            } catch (Exception e) {
                System.out.println("Identified exception during download of file" + e.getMessage());
                throw e;
            }
        }
        AtomicInteger num = new AtomicInteger();
        futures.forEach(
                future -> {
                    try {
                        future.join();
                    } catch (CompletionException e) {
                        System.out.println("CompletionException for : " + num.get() + e.getMessage());
                    } catch (Exception e) {
                        System.out.println("Exception in future.join(): " + num.get() + e.getMessage());
                    }
                    num.getAndIncrement();
                });
    }

i have also attached the dummy file link on which we reproduce issue.

Possible Solution

No response

Additional Information/Context

Initially we tried without any configuration but we got the issue so then i tried to setup the MaxConcurrency for the client and even when i reduced the value as slow as 5 we still got the OOM issue. It works fine when we reduced the concurrency to 1 or 2 but in that case i can see performance drop a lot. This is how we setup the MaxConcurrency:

      this.s3AsyncClient =
               S3AsyncClient.crtBuilder()
                       .maxConcurrency(5)
                       .credentialsProvider(StaticCredentialsProvider.create(getCreds()))
                       .region(Region.of("us-east-1"))
                       .build();

Then we tried to reduce the minimumPartSizeInBytes from default 8 MB to 1 MB in that case i didn't receive any OOM issue but this reduces the performance. This is how we setup minimumPartSizeInBytes:

 this.s3AsyncClient =
                S3AsyncClient.crtBuilder()
                        .minimumPartSizeInBytes(1*1024*1024l)
                        .credentialsProvider(StaticCredentialsProvider.create(getCreds()))
                        .region(Region.of("us-east-1"))
                        .build();

If you see the heapdump snapshot most of the memory is consumed by the Byte[] class, accounting for approximately 99%. We are suspecting the issue arises because during the multipart download of a large file, the file is split into multiple smaller parts, all of which are held in memory, leading to excessive memory usage and that’s why when we reducing the minimum part size from 8MB to 1MB we’re no more getting this issue.

AWS Java SDK version used

2.25.64

JDK version used

openjdk version "17.0.13" 2024-10-15 LTS OpenJDK Runtime Environment Corretto-17.0.13.11.1 (build 17.0.13+11-LTS) OpenJDK 64-Bit Server VM Corretto-17.0.13.11.1 (build 17.0.13+11-LTS, mixed mode, sharing)

Operating System and version

Operating System: Amazon Linux, EC2 instance: c7gd.2xlarge

@fivetran-aakashtiwari fivetran-aakashtiwari added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 6, 2024
@debora-ito debora-ito self-assigned this Dec 18, 2024
@debora-ito debora-ito added investigating This issue is being investigated and/or work is in progress to resolve the issue. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Dec 18, 2024
@debora-ito
Copy link
Member

@fivetran-aakashtiwari your support ticket got closed, so I'm copying here the additional info we need to investigate:

  1. I'd expect that reducing the maxConcurrency would help. When you lower the max concurrency to 5, does the heap space usage continue to grow steadily or memory gets deallocated at some point?

  2. Can you provide the heap dump file?

  3. In the code sample, there's a FileOutputStream being used in the loop, but not by the DownloadFile request. How calling fos.getFD().sync() affects the memory usage? As a test, have you tried to execute the downloads with only providing a File as a DownloadFileRequest destination and without the FileOutputStream?

@debora-ito debora-ito added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Dec 31, 2024
@fivetran-aakashtiwari
Copy link
Author

@debora-ito sorry for delay, i have answered all the question over support ticket but copying it here as well.

Ans.1 From what i observed memory continue to grow steadily.

Ans2. i have uploaded the heap dump here https://drive.google.com/file/d/146skx72h9NVzQ68xm8d-2Lxu9lUhrDPq/view?usp=sharing

Ans3. I haven't did this but as workaround what we did is now instead of submitting request for all the files in one go we are downloading them in batches of 10 for that we're not receiving any error but we suspect that it affects performance.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

2 participants