Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChangeFeedProcessorFix - IllegalArgumentException #34618

Merged
merged 18 commits into from
Apr 30, 2023

Conversation

xinlian12
Copy link
Member

@xinlian12 xinlian12 commented Apr 24, 2023

Issue:
when customer configure ChangeFeedProcessor to use maxScaleCount, and when split happens, there is a possibility each changeFeedInstance will own more than maxScaleCount. An IllegalArgumentException can happen during load balancing stage when maxScaleCount is being configured. And then due to the exception, the configured AcquiredInterval will be ignored, which can cause request spike on the lease container.

Fix:

  • EqualPartitionsBalancingStrategyIf - if it ever happens that the change feed processor instance owns more than maxScaleCount leases, no illegalArgumentException will be thrown
  • PartitionLoadBalancerImpl - the acquireInterval will be used even for error cases
  • PartitionLoadBalancerImpl - skip child leases assignment if maxScaleCount > 0

@azure-sdk
Copy link
Collaborator

API change check

API changes are not detected in this pull request.

Copy link
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @xinlian12

Copy link
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, missing changelog entry :)

Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@milismsft
Copy link
Contributor

@xinlian12 per our discussion, I do not see the splitting case addressed here when we detect "maxScaleCount" > 0.

@xinlian12
Copy link
Member Author

@xinlian12 per our discussion, I do not see the splitting case addressed here when we detect "maxScaleCount" > 0.
no, it does not. That change will come in a second PR - to always honor the maxScaleCount

@xinlian12 xinlian12 closed this Apr 24, 2023
@xinlian12 xinlian12 reopened this Apr 25, 2023
Copy link
Contributor

@milismsft milismsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should expand our E2E tests to include this maxScaleCount case where the negative partitionNeededForMe was an issue.
You can start with a feed container of at least 3 partitions and one instance of CFP (max scale count will be 3). Then you stop and start another instance with exact same name but smaller maxScaleCount. At the same time have another instance that will be started as well. At the end you want to check that leases are equally distributed and whatever documents are fed into the feed container are accounted for (see https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/rx/changefeed/pkversion/IncrementalChangeFeedProcessorTest.java#L939).

@xinlian12
Copy link
Member Author

We should expand our E2E tests to include this maxScaleCount case where the negative partitionNeededForMe was an issue. You can start with a feed container of at least 3 partitions and one instance of CFP (max scale count will be 3). Then you stop and start another instance with exact same name but smaller maxScaleCount. At the same time have another instance that will be started as well. At the end you want to check that leases are equally distributed and whatever documents are fed into the feed container are accounted for (see https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/rx/changefeed/pkversion/IncrementalChangeFeedProcessorTest.java#L939).

Added tests.
What has included in the test:

  1. Create a changeFeedProcessor with maxScaleCount = partitionCountBeforeSplit
  2. write items, and validate the change feed processor reads all documents
  3. trigger a split
  4. write items
  5. validate only partitionCountBeforeSplit leases have owner
  6. Start another change feed processor
  7. Validate all documents are processed

@xinlian12
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12 xinlian12 merged commit 0a831e6 into Azure:main Apr 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants