-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] System.InvalidOperationException at Azure.Messaging.EventHubs.EventProcessorClient+<OnInitializingPartitionAsync>d__60.MoveNext #12466
Comments
Hi @petrformanek. We're sorry that you're experiencing difficulties and appreciate you reaching out to let us know. This isn't one that we've seen before, but seems like a race condition at first glance. The good news is that this would trigger retry and recovery paths and should not have an impact on the overall heath of the processor. In the worst case, I would expect to see a small delay in this partition being processed. I'd definitely like to take a deeper look at this, and would appreciate any additional context or insight that you're able to provide. Some things that may be helpful:
I'd also welcome any additional insight that you think may be relevant. Again, thank you for taking the time to report this and your help in gathering information for investigation. |
Hi @jsquire. Thank you for a quick response. As you wrote the processor health was ok, it was able to process events on other partitions, because they were already initialized and were processing events for some time. Answers to your points:
public virtual Task PartitionInitializingAsync(PartitionInitializingEventArgs arg)
{
if (arg.CancellationToken.IsCancellationRequested)
{
return Task.CompletedTask;
}
arg.DefaultStartingPosition = EventPosition.Latest;
_logger.LogInformation($"Initializing partition {arg.PartitionId}");
return Task.CompletedTask;
} |
Thanks, @petrformanek. I appreciate the additional context. We'll start investigating. |
The focus of these changes is a set of small refactorings and tweaks, mostly unrelated to any core theme. Included are: - Refactoring of the processor blob storage manager, mostly changes to formatting to help improve readability. - Refactoring the initialization of starting position for partitions into a concurrent dictionary to resolve a potential race condition (Azure#12466) - Fixing timing on some track one tests for test-infrastructure. (Azure#12874) - Ignoring tests with intermittent failures (Azure#12929, Azure#12930) - Enhancing log information for when the Event Processor starts processing a partition to contain the starting position used. - Enhancing log information for the core service operations (send/receive) to group the start/error/end events together with an operation identifier and report the total number of retries used when interacting with the service.
The focus of these changes is a set of small refactorings and tweaks, mostly unrelated to any core theme. Included are: - Refactoring of the processor blob storage manager, mostly changes to formatting to help improve readability. - Refactoring the initialization of starting position for partitions into a concurrent dictionary to resolve a potential race condition (#12466) - Fixing timing on some track one tests for test-infrastructure. (#12874) - Ignoring tests with intermittent failures (#12929, #12930) - Enhancing log information for when the Event Processor starts processing a partition to contain the starting position used. - Enhancing log information for the core service operations (send/receive) to group the start/error/end events together with an operation identifier and report the total number of retries used when interacting with the service.
Apologies for the delay and, again, thanks for calling attention to this. We introduced a race condition by not using a concurrent set where we should have for tracking the partitions being processed. This was fixed by #12928 and will be included in our next release. I'm going to close this out as resolved. Please feel free to reopen if you believe that further discussion is needed. |
The focus of these changes is a set of small refactorings and tweaks, mostly unrelated to any core theme. Included are: - Refactoring of the processor blob storage manager, mostly changes to formatting to help improve readability. - Refactoring the initialization of starting position for partitions into a concurrent dictionary to resolve a potential race condition (#12466) - Fixing timing on some track one tests for test-infrastructure. (#12874) - Ignoring tests with intermittent failures (#12929, #12930) - Enhancing log information for when the Event Processor starts processing a partition to contain the starting position used. - Enhancing log information for the core service operations (send/receive) to group the start/error/end events together with an operation identifier and report the total number of retries used when interacting with the service.
Description of the bug
Thread concurrency issue found in logs.
Expected behavior
No exceptions thrown.
Actual behavior (include Exception or Stack Trace)
To Reproduce
Hard to reproduce.
Environment:
The text was updated successfully, but these errors were encountered: