Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] EventHubConsumerClient.GetPartitionPropertiesAsync throws InvalidOperationException from AMQP library instead of retrying #46525

Closed
tomas-pajurek opened this issue Oct 10, 2024 · 6 comments
Assignees
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Event Hubs issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close.
Milestone

Comments

@tomas-pajurek
Copy link
Contributor

Library name and version

Azure.Messaging.EventHubs 5.11.5

Describe the bug

When calling method GetPartitionPropertiesAsync on type EventHubConsumerClient, exception InvalidOperationException from NuGet Microsoft.Azure.Amqp is thrown with the following message:

"Can't create session when the connection is closing"

I am pretty sure that we are not closing the EventHubConsumerClient or its EventHubConnection prematurely in our code. This claim is supported by:

  1. The fact that other concurrently called methods on the same EventHubConsumerClient and other clients using the same EventHubConnection are not throwing at the same time.

  2. Historically, there were issues in the Azure's Event Hub and Service Bus SDKs that manifested in the same way:

In both cases, the conclusion was that the thrown InvalidOperationException should be treated properly as an transient exception. Both issues were resolved by catching this exception in the client SDK and wrap it into retriable client exception (which would be EventHubsException in this case). Example of such fix: https://github.com/Azure/azure-sdk-for-net/pull/15984/files

Stack trace:

at Microsoft.Azure.Amqp.AmqpConnection.AddSession(AmqpSession session, Nullable`1 channel)\n  
   at Microsoft.Azure.Amqp.AmqpConnection.CreateSession(AmqpSessionSettings sessionSettings)\n  
   at Azure.Messaging.EventHubs.Amqp.AmqpConnectionScope.CreateManagementLinkAsync(AmqpConnection connection, TimeSpan operationTimeout, TimeSpan linkTimeout, CancellationToken cancellationToken)\n  
   at Azure.Messaging.EventHubs.Amqp.AmqpConnectionScope.OpenManagementLinkAsync(TimeSpan operationTimeout, TimeSpan linkTimeout, CancellationToken cancellationToken)\n  
   at Microsoft.Azure.Amqp.FaultTolerantAmqpObject`1.OnCreateAsync(TimeSpan timeout, CancellationToken cancellationToken)\n  
   at Microsoft.Azure.Amqp.Singleton`1.GetOrCreateAsync(TimeSpan timeout, CancellationToken cancellationToken)\n  
   at Microsoft.Azure.Amqp.Singleton`1.GetOrCreateAsync(TimeSpan timeout, CancellationToken cancellationToken)\n  
   at Azure.Messaging.EventHubs.Amqp.AmqpClient.GetPartitionPropertiesAsync(String partitionId, EventHubsRetryPolicy retryPolicy, CancellationToken cancellationToken)\n  
   at Azure.Messaging.EventHubs.Amqp.AmqpClient.GetPartitionPropertiesAsync(String partitionId, EventHubsRetryPolicy retryPolicy, CancellationToken cancellationToken)\n  
   at Azure.Messaging.EventHubs.EventHubConnection.GetPartitionPropertiesAsync(String partitionId, EventHubsRetryPolicy retryPolicy, CancellationToken cancellationToken)\n  
   at Azure.Messaging.EventHubs.Consumer.EventHubConsumerClient.GetPartitionPropertiesAsync(String partitionId, CancellationToken cancellationToken)\n
   at [redacted]  

Expected behavior

The InvalidOperationException should be wrapped into EventHubsException { IsTransient = true } exception which should be properly retried by existing retry policies within the client SDK.

Actual behavior

The InvalidOperationException is thrown out of the client SDK into the user code unexpectedly.

Reproduction Steps

I not able to deterministically reproduce the error but it happens every few days in our stream processing service running in AKS that periodically calls the problematic GetPartitionPropertiesAsync as well as PartitionReceiver.ReceiveBatchAsync (which never throws such exception).

Environment

  • .NET 8.
  • Azure Kubernetes Service.
  • Event Hub Standard tier.
@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Event Hubs needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Oct 10, 2024
Copy link

Thank you for your feedback. Tagging and routing to the team member best able to assist.

@jsquire
Copy link
Member

jsquire commented Oct 10, 2024

Hi @tomas-pajurek. Thanks for reaching out and we regret that you're experiencing difficulties. There is special handling in the Azure.Messaging.EventHubs library for this scenario which translates the InvalidOperationException into a transient EventHubsException so that it can be retried. (src)

It looks like we applied that logic to producer and consumer links and did not apply the same pattern to management links. This is confirmed as a bug; we'll get this fixed up.

@jsquire jsquire added bug This issue requires a change to an existing behavior in the product in order to be resolved. and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Oct 10, 2024
@jsquire jsquire added this to the 2024-11 milestone Oct 10, 2024
@jsquire jsquire moved this to In Progress in Azure SDK for Event Hubs Oct 10, 2024
@jsquire
Copy link
Member

jsquire commented Oct 10, 2024

fixed by #46544

@jsquire
Copy link
Member

jsquire commented Oct 11, 2024

The fix has merged and will be included in our November release.

@jsquire jsquire added the issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. label Oct 11, 2024
Copy link

Hi @tomas-pajurek. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

@github-actions github-actions bot removed the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Oct 11, 2024
Copy link

Hi @tomas-pajurek, since you haven’t asked that we /unresolve the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve to reopen the issue.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Azure SDK for Event Hubs Oct 18, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jan 16, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Event Hubs issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close.
Projects
Status: Done
Development

No branches or pull requests

2 participants