-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Incorrect watch count in watcher stats api in tests #52453
Comments
Pinging @elastic/es-core-features (:Core/Features/Watcher) |
Add watcher to trigger server after index operation has succeeded, instead of adding a watch to trigger service before the actual index operation has performed on the shard level. This logic is simpler to reason about in the case that a failure does occur during the execution of an index operation on the shard level. Relates to elastic#52453, but I think doesn't fix it, but makes it easier to debug.
Reported by @dakrone in #33326:
I was unable to reproduce this on the 7.x branch. Failure: |
This (^) is another test that failed because incorrect stats counts are reported. I suspect the main cause of these failures is that watcher, is not fully started on all shard instances that it serves watches from. More specifically the |
Add watcher to trigger server after index operation has succeeded, instead of adding a watch to trigger service before the actual index operation has performed on the shard level. This logic is simpler to reason about in the case that a failure does occur during the execution of an index operation on the shard level. Relates to #52453, but I think doesn't fix it, but makes it easier to debug.
|
I want to see how these tests respond to #52627. Otherwise I think we should investigate changing the watcher put and delete APIs to wait for the watch to be added to the trigger service before returning a response. Tests assume that this always happens, but that is not the case. In the meantime specific tests can be muted. |
Add watcher to trigger server after index operation has succeeded, instead of adding a watch to trigger service before the actual index operation has performed on the shard level. This logic is simpler to reason about in the case that a failure does occur during the execution of an index operation on the shard level. Relates to elastic#52453, but I think doesn't fix it, but makes it easier to debug.
Backport: #52627 Add watcher to trigger server after index operation has succeeded, instead of adding a watch to trigger service before the actual index operation has performed on the shard level. This logic is simpler to reason about in the case that a failure does occur during the execution of an index operation on the shard level. Relates to #52453, but I think doesn't fix it, but makes it easier to debug.
There was a failure today that was related to #33326, which looks like this issue replaces. |
There are no instances of these failures since May 11 (There are a couple SSL failures in a FIPs container ... but that is not what this issue is about) |
SmokeTestWatcherTestSuiteIT Failure:
The failure matches with recent failures reported in #32299. The #51466 fix didn't make this test stop from failing.
The failure has failed a few times now and needs to be re-investigated.
Build failures:
WatchAckTests.testAckAllActions failure:
Build log: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+multijob+fast+part2/3568/console
Build scan: https://gradle-enterprise.elastic.co/s/ua3yon2njbyja
Failure:
Reproduce with:
Can't reproduce locally.
The text was updated successfully, but these errors were encountered: