-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing a race condition in EnrichCoordinatorProxyAction that can leave an item stuck in its queue #90688
Fixing a race condition in EnrichCoordinatorProxyAction that can leave an item stuck in its queue #90688
Conversation
…e an item in its queue
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @masseyke, I've created a changelog YAML for you. |
This PR causes a few more loops in the code, but I don't think it will be a noticeable performance hit -- the additional loops are rare and fast. I ran the test ( |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, fixed a small typo is all.
...enrich/src/main/java/org/elasticsearch/xpack/enrich/action/EnrichCoordinatorProxyAction.java
Outdated
Show resolved
Hide resolved
…ich/action/EnrichCoordinatorProxyAction.java Co-authored-by: James Baiera <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* main: (150 commits) Remove ToXContent interface from ChunkedToXContent (elastic#90409) Remove extra SearchService constructor (elastic#90733) Update min version for the diagnosis yaml test (elastic#90731) Use the AggTestConfig object in testCase (elastic#90699) [DOCS] Add links to clear trained model deployment cache API (elastic#90727) Assert wildcards are not expanded as specified by request options (elastic#90641) [TEST] Fix exit snapshot restore exit condition (elastic#90696) [TEST] Change to atomic file contents save (elastic#90695) Update forbiddenapis to 3.4 (elastic#90624) [Tests] Don't use concurrent search in scripted field type tests (elastic#90712) [ML] Move scaling is possible check for starting trained model (elastic#90706) Add new base test case for chunked xcontent types (elastic#90707) Fix testRedNoBlockedIndicesAndRedAllRoleNodes (elastic#90671) Fix nullpointer in docs test setup (elastic#90660) Don't produce build logs artifact when in a composite build Fixing a race condition in EnrichCoordinatorProxyAction that can leave an item stuck in its queue (elastic#90688) docs: update fleet/agent pipeline docs (elastic#90659) [HealthAPI] Use plural consistently in resource types (elastic#90682) [Testing] Enable bwc and fix sorting for 500_date_range (elastic#90681) Add profiling and documentation for dfs phase (elastic#90536) ... # Conflicts: # x-pack/plugin/mapper-aggregate-metric/src/test/java/org/elasticsearch/xpack/aggregatemetric/mapper/AggregateDoubleMetricFieldMapperTests.java
There is a race condition in EnrichCoordinatorProxyAction that can result in an item being stuck in its queue even once all threads related to any
schedule()
calls have completed. The item will be flushed out on the next call toschedule()
but there is no guarantee if or when that will happen. This PR adds an additional check for orphaned items in the queue.Here's what I believe is happening (I can only reproduce it in fewer than 1 in 10,000 tries so I don't have direct evidence):
(Note that there are actually more threads than just the 2 I mention since coordinateLookups() makes an async call back to itself)
Closes #90598