-
Notifications
You must be signed in to change notification settings - Fork 864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uncaught exception stops KinesisAsyncClient #3335
Comments
Hi @MediaMarco apologies for taking this long to get a response. We have other reports of this issue and we have some leads on what can be the cause, but as you said it's not easy to reproduce. We'll investigate. How often do you see the error, does it happen intermittently? At what rate? |
Hi @debora-ito, it's been some time. I'll have to check the logs. We've built a workaround with a timeout around the whole call that retries if it is stalled. |
Hello folks! I'm having a similar issue but with S3AsyncClient. When we do a get operation, the process stalls when we receive that message. Do you want to create another issue? I'm using version 2.17.204. |
@Maxranderson-itv can you reproduce it reliably? If so, can you share a repro code and the stacktrace with the error? |
Here is the stacktrace. Let me see how I can reproduce it, I'll tell you as soon as possible.
|
Hi @MediaMarco and @Maxranderson-itv, can you share how your configuring your SDK clients? @Maxranderson-itv does your |
Sure, the config is below. For the
|
@debora-ito , sorry for not giving your feedback earlier. Unfortunately, I still did not have time to reproduce the issue and that is not frequent in our production environment. We did the same approach to fix it, wrapping the code with a timeout. |
@MediaMarco @Maxranderson-itv I was not able to precisely reproduce this bug (could not get A fix for it was released in in |
Cool @dagnir! Thank you very much! I'll try to update the version and check if the issue disappears. |
Marking this auto close soon. Let us know if you have any comments. |
|
…90d6e7ad0 Pull request: release <- staging/1785bbfa-a173-4e75-99be-0be90d6e7ad0
Describe the bug
We use the KinesisAsyncClient to read messages from Kinesis like that:
GetRecordsResponse response = kinesisClient.getRecords(GetRecordsRequest.builder().shardIterator(id).build()).join();
This worked fine for ages, but since SDK version 2.17.115 the reader sometimes stalls and the join() blocks forever.
It seems to have something to do with this change:
Expected Behavior
The CompletableFuture returned by GetRecords should complete exceptionally when the netty I/O error occurs to be able to handle it gracefully in application code.
Current Behavior
kinesisClient.getRecords(GetRecordsRequest.builder().shardIterator(id).build()).join();
blocks forever.
When the blocking occurs, we see the following Exception in our logs, nothing else:
Message:
An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
Logger:
io.netty.channel.DefaultChannelPipeline
Thread:
aws-java-sdk-NettyEventLoop-3-7
Stack Trace:
software.amazon.awssdk.http.nio.netty.internal.FutureCancelledException: java.io.IOException: Response had content-length of 25 bytes, but only received 0 bytes before the connection was closed. at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.lambda$null$3(NettyRequestExecutor.java:136) at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) at io.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106) at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: java.io.IOException: Response had content-length of 25 bytes, but only received 0 bytes before the connection was closed. at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.validateResponseContentLength(ResponseHandler.java:163) at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.access$700(ResponseHandler.java:75) at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onComplete(ResponseHandler.java:369) at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.complete(HandlerPublisher.java:447) at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.channelInactive(HandlerPublisher.java:430) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) at io.netty.handler.codec.http2.AbstractHttp2StreamChannel$Http2ChannelUnsafe$2.run(AbstractHttp2StreamChannel.java:737) ... 7 common frames omitted
Reproduction Steps
Read a huge amount of records from kinesis using the KinesisAsyncClient with the getRecords() function.
It's hard to reproduce, the issue occurs only if the content length check fails.
Possible Solution
Let the CompletableFuture complete exceptionally as when other I/O errors occur.
Additional Information/Context
No response
AWS Java SDK version used
2.17.233
JDK version used
openjdk version "18.0.1" 2022-04-19 OpenJDK Runtime Environment Temurin-18.0.1+10 (build 18.0.1+10) OpenJDK 64-Bit Server VM Temurin-18.0.1+10 (build 18.0.1+10, mixed mode, sharing)
Operating System and version
Ubuntu 22.04.1 LTS
The text was updated successfully, but these errors were encountered: