S3: Data read has a different checksum than expected #953

ben-manes · 2018-12-14T01:50:37Z

This might be fixed by 472f5bf, but not sure. I also am seeing some errors where the response.contentLength() differs from the actual size again, so there might be a race condition again. This is using 2.1.4 + netty.

Caused by: software.amazon.awssdk.core.exception.SdkClientException: Data read has a different checksum than expected.
        at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
        at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:39)
        at software.amazon.awssdk.services.s3.internal.handlers.AsyncChecksumValidationInterceptor.validatePutObjectChecksum(AsyncChecksumValidationInterceptor.java:105)
        at software.amazon.awssdk.services.s3.internal.handlers.AsyncChecksumValidationInterceptor.afterUnmarshalling(AsyncChecksumValidationInterceptor.java:91)
        at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.lambda$afterUnmarshalling$9(ExecutionInterceptorChain.java:152)
        at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.reverseForEach(ExecutionInterceptorChain.java:210)
        at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.afterUnmarshalling(ExecutionInterceptorChain.java:152)
        at software.amazon.awssdk.core.client.handler.BaseClientHandler.runAfterUnmarshallingInterceptors(BaseClientHandler.java:120)
        at software.amazon.awssdk.core.client.handler.BaseClientHandler.lambda$interceptorCalling$2(BaseClientHandler.java:133)
        at software.amazon.awssdk.core.client.handler.AttachHttpMetadataResponseHandler.handle(AttachHttpMetadataResponseHandler.java:40)
        at software.amazon.awssdk.core.client.handler.AttachHttpMetadataResponseHandler.handle(AttachHttpMetadataResponseHandler.java:28)
        at software.amazon.awssdk.core.internal.http.async.SyncResponseHandlerAdapter.lambda$prepare$0(SyncResponseHandlerAdapter.java:85)

The text was updated successfully, but these errors were encountered:

dylanmroweFS · 2018-12-14T20:20:31Z

Similar issue here, using 2.1.4. We're uploading files to buckets in multiple regions asynchronously. This happens after a few successful uploads.

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException: Data read has a different checksum than expected.
	at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:61)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:50)
	at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
	at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryErrorIfNeeded(AsyncRetryableStage.java:166)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:118)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:103)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$executeHttpRequest$1(MakeAsyncHttpRequestStage.java:136)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Data read has a different checksum than expected.
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
	at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:39)
	at software.amazon.awssdk.services.s3.internal.handlers.AsyncChecksumValidationInterceptor.validatePutObjectChecksum(AsyncChecksumValidationInterceptor.java:105)
	at software.amazon.awssdk.services.s3.internal.handlers.AsyncChecksumValidationInterceptor.afterUnmarshalling(AsyncChecksumValidationInterceptor.java:91)
	at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.lambda$afterUnmarshalling$9(ExecutionInterceptorChain.java:152)
	at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.reverseForEach(ExecutionInterceptorChain.java:210)
	at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.afterUnmarshalling(ExecutionInterceptorChain.java:152)
	at software.amazon.awssdk.core.client.handler.BaseClientHandler.runAfterUnmarshallingInterceptors(BaseClientHandler.java:120)
	at software.amazon.awssdk.core.client.handler.BaseClientHandler.lambda$interceptorCalling$2(BaseClientHandler.java:133)
	at software.amazon.awssdk.core.client.handler.AttachHttpMetadataResponseHandler.handle(AttachHttpMetadataResponseHandler.java:40)
	at software.amazon.awssdk.core.client.handler.AttachHttpMetadataResponseHandler.handle(AttachHttpMetadataResponseHandler.java:28)
	at software.amazon.awssdk.core.internal.http.async.SyncResponseHandlerAdapter.lambda$prepare$0(SyncResponseHandlerAdapter.java:85)
	at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
	... 1 common frames omitted

GJKrupa · 2018-12-26T00:30:02Z

I'm seeing something similar with 2.2.0. I have a basic integration test using LocalStack that's producing this checksum error every time the test calls getObject asynchronously with ByteArrayAsyncResponseTransformer. As far as I can see from debugging, the checksum validation is running before the response content is fully read (the test data is a simple UUID but the exception is raised with only half the UUID in the transformer's underlying BAOS.

The failing test is in commit bfd3a767 of https://gitlab.com/gkrupa/s3web (S3BackendSpec).

zoewangg · 2019-01-09T01:44:20Z

Related to #965. Fixed via #966 and the fix will be included in the next release.

zoewangg · 2019-01-10T01:37:26Z

The fix has been released in 2.3.0. Closing the issue.

harrylaou · 2019-01-16T10:27:46Z

Using 2.3.1 and still have this.
[SdkClientException: Data read has a different checksum than expected. Was -1560488262, but expected 1701186097]

GJKrupa · 2019-01-16T11:25:23Z

Same. If I update the BOM version in the test I linked above it still fails.

harrylaou · 2019-01-24T13:49:30Z

@ben-manes is this going to re-open? or is it fixed in the next release?

vicpara · 2019-01-24T14:31:57Z

We also have the same problem. The workaround is pretty ugly and inefficient.

ben-manes · 2019-01-24T15:19:51Z

@harrylaou I switched back to v1 as I was fighting bugs daily with v2 for a few months. Since then I’ve had no problems. Unfortunately AWS didn’t dog food this sdk before releasing and I don’t have the time to continue beta testing. I’ll switch over later once it matures. You might feel similar. I’m not reopening but you can file a new issue if the AWS team doesn’t reopen this one.

millems · 2019-01-24T18:55:20Z

Reopening, because it sounds like there are more cases where this can occur that haven't already been covered by previous fixes.

spfink · 2019-01-24T19:59:30Z

@ben-manes, @vicpara, do either of you have an easily reproducible test case for this?

millems · 2019-02-08T00:54:21Z

We're unfortunately not able to reproduce this on our end. Is there some specific configuration you have on your object that may be causing this?

vicpara · 2019-02-19T11:12:18Z

@millems We are using the library in scala. Ideally we would like to call the method you have defined as follows
CompletableFuture<ReturnT> getObject(Consumer<GetObjectRequest.Builder> getObjectRequest, AsyncResponseTransformer<GetObjectResponse, ReturnT> asyncResponseTransformer)

in scala the code would be :
client.getObject(getObjectRequest(bucket, fileName), new ByteArrayAsyncResponseTransformer[GetObjectResponse]()).toTask

Also, please note that AsyncResponseTransformer.toBytes() wouldn't work because of limitations of calling java static methods in scala for classes that have generics. Syntax like AsyncResponseTransformer[GetObjectResponse,[ResponseBytes]].toBytes() or AsyncResponseTransformer.toBytes():GetObjectResponse doesn't compile.
Calling new ByteArrayAsyncResponseTransformer[GetObjectResponse]() should work if it was not for the #953

Since this doesn't work we are using the following workaround :

val bucket = "bucket_one"
val fileName = "file1.txt"
val tempFile: File = File.createTempFile("temp-", fileName.flattenPath)
tempFile.deleteOnExit()
val localPath: Path = tempFile.toPath
client.getObject(getObjectRequest(bucket, fileName), localPath)

dementiev · 2019-03-18T08:31:31Z

Still have this issue on "software.amazon.awssdk:s3:2.5.10" version:

Code example:

... this.client = S3Client.create();

        client.getObject(
                GetObjectRequest.builder().bucket(bucket).key(key).build(),
                ResponseTransformer.toFile(download)
        );

Caused by: software.amazon.awssdk.core.exception.SdkClientException: Data read has a different checksum than expected. Was 1229635159, but expected 0

Any suggestions on how to fix?

millems · 2019-03-18T17:29:03Z

@dementiev
Please help answer a few questions to enable us to reproduce this issue:

Does this happen sporadically or consistently?
Can you share how you are creating the S3 client?
Are there any 'unusual' characters in the bucket or key name (non-alphanumeric)?
Are there any 'unusual' configurations on the bucket (non-defaults)?

tim-fdc · 2019-03-25T11:27:30Z

Hi,
also getting this error for an async client only.

Using version software.amazon.awssdk:aws-sdk-java:2.5.15.

Error happens consistently using the following client (Kotlin code)

class S3CredentialsProvider : AwsCredentialsProvider {
    override fun resolveCredentials(): AwsCredentials {
        val awsCredentialsProcessBuilder = AwsSessionCredentials.create(
                "<SECRET>",
                "<SECRET>",
                "<SECRET>")

        return awsCredentialsProcessBuilder
    }
}
 
   val s3Client = S3AsyncClient.builder()
            .credentialsProvider(S3CredentialsProvider())
            .region(Region.EU_WEST_1)
            .build()

Writing an object as follows:

    val toByteArray = "test".toByteArray()
    val asyncRequestBody = AsyncRequestBody.fromBytes(toByteArray)
    s3Client.putObject(
            PutObjectRequest.builder()
                    .bucket(config.bucket)
                    .key("test")
                    .build(),
            asyncRequestBody).await()

Using Kotlin Coroutines extensions (await()) to transform Java CompletableFuture to Coroutine. Works for listings. I am also using the sessionToken. Seems to work without a sessionToken.

For non async client

S3Client.builder()
            .region(Region.EU_WEST_1)
            .credentialsProvider(S3CredentialsProvider())
            .build()

it works.

Here is the exception:

Exception in thread "main" software.amazon.awssdk.core.exception.SdkClientException: Data read has a different checksum than expected.
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
	at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:39)
	at software.amazon.awssdk.services.s3.checksums.ChecksumsEnabledValidator.validatePutObjectChecksum(ChecksumsEnabledValidator.java:134)
	at software.amazon.awssdk.services.s3.internal.handlers.AsyncChecksumValidationInterceptor.afterUnmarshalling(AsyncChecksumValidationInterceptor.java:86)
	at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.lambda$afterUnmarshalling$9(ExecutionInterceptorChain.java:152)
	at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.reverseForEach(ExecutionInterceptorChain.java:210)
	at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.afterUnmarshalling(ExecutionInterceptorChain.java:152)
	at software.amazon.awssdk.core.client.handler.BaseClientHandler.runAfterUnmarshallingInterceptors(BaseClientHandler.java:138)
	at software.amazon.awssdk.core.client.handler.BaseClientHandler.lambda$interceptorCalling$2(BaseClientHandler.java:151)
	at software.amazon.awssdk.core.client.handler.AttachHttpMetadataResponseHandler.handle(AttachHttpMetadataResponseHandler.java:40)
	at software.amazon.awssdk.core.client.handler.AttachHttpMetadataResponseHandler.handle(AttachHttpMetadataResponseHandler.java:28)
	at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler.lambda$prepare$0(AsyncResponseHandler.java:88)
	at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
	at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler$BaosSubscriber.onComplete(AsyncResponseHandler.java:129)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$FullResponseContentPublisher$1.request(ResponseHandler.java:369)
	at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler$BaosSubscriber.onSubscribe(AsyncResponseHandler.java:108)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$FullResponseContentPublisher.subscribe(ResponseHandler.java:360)
	at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler.onStream(AsyncResponseHandler.java:71)
	at software.amazon.awssdk.core.internal.http.async.AsyncAfterTransmissionInterceptorCallingResponseHandler.onStream(AsyncAfterTransmissionInterceptorCallingResponseHandler.java:86)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onStream(MakeAsyncHttpRequestStage.java:249)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelRead0(ResponseHandler.java:112)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelRead0(ResponseHandler.java:65)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at com.typesafe.netty.http.HttpStreamsHandler.channelRead(HttpStreamsHandler.java:129)
	at com.typesafe.netty.http.HttpStreamsClientHandler.channelRead(HttpStreamsClientHandler.java:148)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at software.amazon.awssdk.http.nio.netty.internal.FutureCancelHandler.channelRead0(FutureCancelHandler.java:42)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297)
	at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1436)
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203)
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:677)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:612)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:529)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:491)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
	at java.lang.Thread.run(Thread.java:748)

Any ideas?

Thanks!
Tim

…n't work: aws/aws-sdk-java-v2#953

millems · 2019-04-16T19:24:03Z

@tim-fdc Do you have any special bucket configuration, or special characters in the bucket name?

The following works for me (Java runtime):

AwsSessionCredentials credentials = AwsSessionCredentials.create("<>",
                                                                 "<>",
                                                                 "<>");
try (S3AsyncClient client = S3AsyncClient.builder()
                                         .credentialsProvider(() -> credentials)
                                         .region(Region.EU_WEST_1)
                                         .build()) {
    client.createBucket(r -> r.bucket("tmp-millem")).join();
    client.putObject(r -> r.bucket("tmp-millem").key("test"), AsyncRequestBody.fromBytes("test".getBytes())).join();

    System.out.println(client.getObject(r -> r.bucket("tmp-millem").key("test"), AsyncResponseTransformer.toBytes())
                             .join()
                             .asUtf8String());

    client.deleteObject(r -> r.bucket("tmp-millem").key("test")).join();
    client.deleteBucket(r -> r.bucket("tmp-millem")).join();
}

- Use async apis instead of sync apis for everything but the S3 download jobs because of aws/aws-sdk-java-v2#953 - Actually make the `threads` setting work: now we can limit the amount of parallel executions with it - Download artifacts after each job has finished and not at the end of all the jobs - Moved the helper functions to separate files Closes #6 #2

marc-christian-schulze · 2020-07-30T15:38:32Z

@millems, I've provided a reproducer here:
https://github.com/marc-christian-schulze/kotlinx-reproducer-2109

It shows that the race condition appears when the onNext method is called from a netty IO thread and a non-netty IO thread. Your example does not redprocue the race condition since it is using Flowable.fromIterable which does not use other threads than the calling one to deliver the items.

I've spent quiet some time together with the colleagues from the Kotlin project to narrow down the issue and finally ended up at OrderedWriteChannelHandlerContext::doInOrder

private void doInOrder(Runnable task) {
        if (!channel().eventLoop().inEventLoop()) {
            task.run();
        } else {
            // If we're in the event loop, queue a task to perform the write, so that it occurs after writes that were scheduled
            // off of the event loop.
            channel().eventLoop().execute(task);
        }
    }

There the actual "magic" happens depending on which thread is doing the ctx.writeAndFlush invocation. Once threads (netty IO and non-netty IO) start to invoke this doInOrder with fast iterations the actual network packets will be enqueued in random order since the channel's event loop is a thread pool that does not guarantee FIFO execution of scheduled tasks (otherwise the whole concept of the pool would be pointless).

millems · 2020-07-30T18:06:09Z

@marc-christian-schulze OrderedWriteChannelHandlerContext::doInOrder was written with an assumption that the task to be executed will always end up executing the task via the event loop. When that assumption holds, the class behaves correctly.

I can't determine where this assumption is breaking down without a repro case, and I can't fix the issue without determining where that assumption is breaking down.

I'll give your repro case a try and see where that assumption is wrong. Thanks!

millems · 2020-07-31T01:08:11Z

As an update from today's investigation, I can reproduce the exception using @marc-christian-schulze's code, but cannot attribute it to the OrderedWriteChannelHandlerContext. Even after removing the OrderedWriteChannelHandlerContext, the test still fails.

I don't see anything obviously wrong with the publisher to take some of the blame off the SDK, but that's really difficult to tell at a glance. It might be the publisher and it might be the SDK. More research is still needed... That said, it does look like the publisher is working correctly from a naive perspective (onNext is never invoked in parallel, it's being invoked in the correct order, etc.).

The working theory should be that it's the SDK until proven otherwise...

millems · 2020-07-31T01:28:16Z

As a last update for today, I think the issue is that the OrderedWriteChannelHandlerContext isn't always being attached (e.g. on the second request using the same channel). The bug we're encountering is the bug that the OrderedWriteChannelHandlerContext is intended to fix. It's just not always being applied. Tomorrow, I'll see if I can fix the OrderedWriteChannelHandlerContext to always be attached, which will ideally fix the issue.

millems · 2020-07-31T18:09:24Z

Thanks to @marc-christian-schulze for the repro case, which I was able to fix with #1967. It looks like netty-reactive-streams can attach the same handler to a connection twice, so the OrderedWriteChannelHandlerContext wasn't always used for all streaming writes. This means that the race condition it is meant to fix could still occur sometimes. The fix ensures that the order-write fixing is always applied.

This fixes a cause of "S3: Data read has a different checksum than expected": #953

…27850d2c Pull request: release <- staging/a7058983-8371-4514-a451-f9d527850d2c

pavellikin · 2020-10-21T15:36:29Z

Hi. Unfortunately, I am still able to reproduce the issue on the SDK version 2.15.11. I prepared a demo project to show the issue.

The project has 3 tests. First to upload ~100kB array with checksum validation. This is the only green test in the scope.
Both other tests upload ~1mB array with and without checksum validation. In the test with checksum validation, I get mentioned "Data read has a different checksum than expected" error. Without checksum validation bytes are uploaded to s3 but they are different from the original byte array.

Looks like the packets still be reordered as @marc-christian-schulze mentioned.

millems · 2021-10-21T21:52:03Z

Self-contained repro:

S3AsyncClient s3 = S3AsyncClient.create();
String bucketKey = "some-bucket-name";

try {
    s3.createBucket(r -> r.bucket(bucketKey)).join();
} catch (Exception e) {
}

byte[] expectedOutputBytes = new byte[1_000_000];
ThreadLocalRandom.current().nextBytes(expectedOutputBytes);

for (int iteration = 1; iteration < 1000; iteration++) {
    List<byte[]> output = new ArrayList<>();
    ByteArrayInputStream bytesInput = new ByteArrayInputStream(expectedOutputBytes);
    while (true) {
        byte[] buffer = new byte[ThreadLocalRandom.current().nextInt(0, 1000)];
        int read = bytesInput.read(buffer);
        if (read < 0) {
            break;
        }

        output.add(buffer);
    }

    System.out.println("Uploading " + output.size() + " chunks");

    s3.putObject(r -> r.bucket(bucketKey).key(bucketKey).contentLength((long) expectedOutputBytes.length),
                 AsyncRequestBody.fromPublisher(Flux.fromIterable(output).map(ByteBuffer::wrap)))
      .join();

    System.out.println("Downloading " + output.size() + " chunks");

    ResponseBytes<GetObjectResponse> response =
        s3.getObject(r -> r.bucket(bucketKey).key(bucketKey),
                     AsyncResponseTransformer.toBytes()).join();

    assertThat(response.asByteArrayUnsafe()).isEqualTo(expectedOutputBytes);
}

millems · 2021-10-22T20:34:43Z

Added a fix for the repro from me above with #2788. Will keep looking into the repro from burnout. Theirs is a complicated repro that involves proxying requests to a mock S3....

millems · 2021-10-23T01:12:31Z

@burnout171 's repro fails because it relies on the source publisher being a netty service. That underlying service is reusing the byte buffers passed to the SDK, before the SDK reads them. The data is being changed by the SDK caller within the ByteBuffer before the SDK can read them.

This is an extremely unusual case, so I suspect that most customers will be fixed by #2788. I'm not even sure if I want to fix @burnout171 's use-case, because it would require the SDK defensively copying every byte provided by the customer. It's an extremely unusual use-case, and working around it would potentially impact everyone.

Instead, I wonder whether it's possible to just warn when it happens. I'll think about it and report back. LMK if anyone here has an opinion.

Bennett-Lynch · 2021-10-23T01:53:35Z

An opt-in configuration for the SDK to defensively copy all ByteBuffers might make sense, purely as a convenience option, but the general Netty guidance is to not expose pooled ByteBufs to components outside of your control, for the exact reasons above. Overall, I think we should advise users to either perform a deep-copy of pooled ByteBufs or disable ByteBuf pooling altogether.

…length returned by the publisher. This fixes two potential bugs: 1. A source of "checksum mismatch" exceptions (#953) when the published data length exceeds the content-length. 2. A source of "hung futures" (#2576?) when the published data is shorter than the content-length. This may not be the definitive fix for those two issues.

…length returned by the publisher. (#2788) This fixes two potential bugs: 1. A source of "checksum mismatch" exceptions (#953) when the published data length exceeds the content-length. 2. A source of "hung futures" (#2576?) when the published data is shorter than the content-length. This may not be the definitive fix for those two issues.

millems · 2021-10-25T23:48:13Z

I've updated the error message to indicate that internal byte buffers changing might be a cause of this, as it was in @burnout171 's case. The user doing a data copy or disabling pooling in the underlying data source is an option in that situation: #2798

This error message improvement in addition to #2788, should put a lid on this issue. We have further error message improvements coming down the chain as well, with #2796, but the primary cause (shorter content-lengths from the requester than data provided by the data source) should be fixed with tomorrow's release.

github-actions · 2021-10-25T23:48:42Z

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

zoewangg added investigating This issue is being investigated and/or work is in progress to resolve the issue. bug This issue is a bug. labels Dec 17, 2018

zoewangg added the awaiting-release label Jan 9, 2019

zoewangg removed the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Jan 9, 2019

zoewangg closed this as completed Jan 10, 2019

millems reopened this Jan 24, 2019

millems removed the awaiting-release label Jan 24, 2019

matsluni mentioned this issue Mar 16, 2019

Use localstack matsluni/aws-spi-akka-http#1

Closed

zoewangg mentioned this issue Apr 1, 2019

Better Documentation for using S3 Async Client with KMS encrypted buckets #1111

Closed

derwasp pushed a commit to albumprinter/junit-synnefo that referenced this issue Apr 4, 2019

use the multipart upload for all the files, since the normal one does…

2126fb2

…n't work: aws/aws-sdk-java-v2#953

derwasp pushed a commit to albumprinter/junit-synnefo that referenced this issue Apr 18, 2019

use the sync s3 client because of aws/aws-sdk-java-v2#953

220a043

derwasp mentioned this issue Apr 20, 2019

Use async apis albumprinter/junit-synnefo#10

Merged

millems assigned millems and unassigned millems Jul 28, 2020

millems mentioned this issue Jul 31, 2020

Fixed an issue where streaming writes could be misordered. #1967

Merged

millems added a commit that referenced this issue Jul 31, 2020

Fixed an issue where streaming writes could be misordered.

5043ad6

This fixes a cause of "S3: Data read has a different checksum than expected": #953

millems added a commit that referenced this issue Jul 31, 2020

Fixed an issue where streaming writes could be misordered.

3304935

This fixes a cause of "S3: Data read has a different checksum than expected": #953

aws-sdk-java-automation added a commit that referenced this issue Sep 11, 2020

Merge pull request #953 from aws/staging/a7058983-8371-4514-a451-f9d5…

b82aec6

…27850d2c Pull request: release <- staging/a7058983-8371-4514-a451-f9d527850d2c

millems mentioned this issue Oct 22, 2021

Fix SDK behavior when request content-length does not match the data length returned by the publisher. #2788

Merged

millems closed this as completed Oct 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3: Data read has a different checksum than expected #953

S3: Data read has a different checksum than expected #953

ben-manes commented Dec 14, 2018

dylanmroweFS commented Dec 14, 2018

GJKrupa commented Dec 26, 2018

zoewangg commented Jan 9, 2019

zoewangg commented Jan 10, 2019

harrylaou commented Jan 16, 2019

GJKrupa commented Jan 16, 2019

harrylaou commented Jan 24, 2019

vicpara commented Jan 24, 2019

ben-manes commented Jan 24, 2019

millems commented Jan 24, 2019

spfink commented Jan 24, 2019

millems commented Feb 8, 2019

vicpara commented Feb 19, 2019

dementiev commented Mar 18, 2019 •

edited

Loading

millems commented Mar 18, 2019

tim-fdc commented Mar 25, 2019 •

edited

Loading

millems commented Apr 16, 2019

marc-christian-schulze commented Jul 30, 2020 •

edited

Loading

millems commented Jul 30, 2020

millems commented Jul 31, 2020

millems commented Jul 31, 2020

millems commented Jul 31, 2020

pavellikin commented Oct 21, 2020

millems commented Oct 21, 2021

millems commented Oct 22, 2021 •

edited

Loading

millems commented Oct 23, 2021 •

edited

Loading

Bennett-Lynch commented Oct 23, 2021 •

edited

Loading

millems commented Oct 25, 2021 •

edited

Loading

github-actions bot commented Oct 25, 2021

S3: Data read has a different checksum than expected #953

S3: Data read has a different checksum than expected #953

Comments

ben-manes commented Dec 14, 2018

dylanmroweFS commented Dec 14, 2018

GJKrupa commented Dec 26, 2018

zoewangg commented Jan 9, 2019

zoewangg commented Jan 10, 2019

harrylaou commented Jan 16, 2019

GJKrupa commented Jan 16, 2019

harrylaou commented Jan 24, 2019

vicpara commented Jan 24, 2019

ben-manes commented Jan 24, 2019

millems commented Jan 24, 2019

spfink commented Jan 24, 2019

millems commented Feb 8, 2019

vicpara commented Feb 19, 2019

dementiev commented Mar 18, 2019 • edited Loading

millems commented Mar 18, 2019

tim-fdc commented Mar 25, 2019 • edited Loading

millems commented Apr 16, 2019

marc-christian-schulze commented Jul 30, 2020 • edited Loading

millems commented Jul 30, 2020

millems commented Jul 31, 2020

millems commented Jul 31, 2020

millems commented Jul 31, 2020

pavellikin commented Oct 21, 2020

millems commented Oct 21, 2021

millems commented Oct 22, 2021 • edited Loading

millems commented Oct 23, 2021 • edited Loading

Bennett-Lynch commented Oct 23, 2021 • edited Loading

millems commented Oct 25, 2021 • edited Loading

github-actions bot commented Oct 25, 2021

⚠️COMMENT VISIBILITY WARNING⚠️

dementiev commented Mar 18, 2019 •

edited

Loading

tim-fdc commented Mar 25, 2019 •

edited

Loading

marc-christian-schulze commented Jul 30, 2020 •

edited

Loading

millems commented Oct 22, 2021 •

edited

Loading

millems commented Oct 23, 2021 •

edited

Loading

Bennett-Lynch commented Oct 23, 2021 •

edited

Loading

millems commented Oct 25, 2021 •

edited

Loading