-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChangeFeedProcessorFix - IllegalArgumentException #34618
Conversation
API change check API changes are not detected in this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @xinlian12
...java/com/azure/cosmos/implementation/changefeed/common/EqualPartitionsBalancingStrategy.java
Outdated
Show resolved
Hide resolved
...in/java/com/azure/cosmos/implementation/changefeed/epkversion/PartitionLoadBalancerImpl.java
Show resolved
Hide resolved
...in/java/com/azure/cosmos/implementation/changefeed/epkversion/PartitionLoadBalancerImpl.java
Show resolved
Hide resolved
...ain/java/com/azure/cosmos/implementation/changefeed/pkversion/PartitionLoadBalancerImpl.java
Show resolved
Hide resolved
...ain/java/com/azure/cosmos/implementation/changefeed/pkversion/PartitionLoadBalancerImpl.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, missing changelog entry :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks!
@xinlian12 per our discussion, I do not see the splitting case addressed here when we detect "maxScaleCount" > 0. |
|
...c/main/java/com/azure/cosmos/implementation/changefeed/epkversion/LeaseStoreManagerImpl.java
Outdated
Show resolved
Hide resolved
...azure-cosmos/src/main/java/com/azure/cosmos/implementation/changefeed/LeaseStoreManager.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should expand our E2E tests to include this maxScaleCount case where the negative partitionNeededForMe was an issue.
You can start with a feed container of at least 3 partitions and one instance of CFP (max scale count will be 3). Then you stop and start another instance with exact same name but smaller maxScaleCount. At the same time have another instance that will be started as well. At the end you want to check that leases are equally distributed and whatever documents are fed into the feed container are accounted for (see https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/rx/changefeed/pkversion/IncrementalChangeFeedProcessorTest.java#L939).
Added tests.
|
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
Issue:
when customer configure ChangeFeedProcessor to use maxScaleCount, and when split happens, there is a possibility each changeFeedInstance will own more than maxScaleCount. An IllegalArgumentException can happen during load balancing stage when maxScaleCount is being configured. And then due to the exception, the configured AcquiredInterval will be ignored, which can cause request spike on the lease container.
Fix:
EqualPartitionsBalancingStrategyIf
- if it ever happens that the change feed processor instance owns more than maxScaleCount leases, no illegalArgumentException will be thrownPartitionLoadBalancerImpl
- theacquireInterval
will be used even for error casesPartitionLoadBalancerImpl
- skip child leases assignment if maxScaleCount > 0