-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility with segment replication #407
Comments
Request owners to add |
-> First we need to check if we use |
The only use of any Line 90 in 0132436
That REST API is tested here, but it doesn't rely on checking replicas: Lines 50 to 62 in 0132436
So I wouldn't expect any failures in the first two steps. |
#407 (comment) : "Before verifying this step make sure this PR: opensearch-project/OpenSearch#8200 is merged. (will be merged soon)." @Rishikesh1159 this PR #8200 is not merged yet and today being 7/6. Can you please get to having this PR merged today? cc @anasalkouz |
Looks like many of the tests in LockServiceIT test updates to the system index for creating or deleting locks, in particular a multi-threaded test Given the need for these locks to be replicated to be of any use, I'm expecting that this test likely would fail (at least randomly some of the time) and the action needed on this issue is to change the lock acquiring code in |
However, the above strategy could be a performance impact, particularly for plugins that use the lock service (I'm thinking Real Time HCAD, @kaituo ): This could impact performance, though:
|
@saratvemulapalli / @joshpalis / @owaiskazi19 can you please see comment: #407 (comment) and share your feedback if the approach is right way to go about? @kaituo please do share your thoughts as well since this would have performance impact. |
I agree with this approach, but I am doubtful that the performance impact would be significant as the number of documents that would be indexed into this would be relatively small. A lock (document) is indexed prior to the start of a job execution and is the sole lock used for this job (and all subsequent executions of the job). |
sorry there is some correction needed on above comment. To enable segment replication if you are building opensearch core tarball locally and use it with plugin then you need to change this in IndicesService to ReplicationType.SEGMENT, instead I incorrectly pointed you to change in IndexMetadata which is not needed. (Again this is hack to build tarball with segrep enabled quickly. The correct way to enable segrep is still by passing |
Hey @Rishikesh1159 thanks for the clarity. I would like for Job Scheduler lock index (system index), to always choose DOCUMENT, overriding any cluster settings if there are any, as discussed in opensearch-project/OpenSearch#8211 (comment) If so I think this task boils down to:
I believe this meets the intent of this issue and also preserves the utility of a performance-sensitive "lock" index having replica consistency as soon as possible. |
ProblemSearching through code for all
None of Job Scheduler code base uses
@dreamer-89 @Rishikesh1159 let me know if we missed anything else. Next Steps
Refs[1] https://github.com/search?q=repo%3Aopensearch-project%2Fjob-scheduler%20IndexRequest&type=code |
Just checked with @mch2 and because we use "Get by ID" that presently assures us the consistency we need, but as I understand from him, may not yet provide those guarantees under SEGREP. I believe he'll be posting a comment here shortly with more details. |
Comment posted in linked core issue: opensearch-project/OpenSearch#8211 (comment) My plan to address this issue is:
|
Added an Integ test to try to get the same lock and tried many different times to get it to fail in SegRep, without success (well, without failure, but you know what I mean.) I still defensively added the DOCUMENT index config just in case. So I think action is complete here. |
Completed in #417 |
Summary
With 2.9.0 release, there are lot of enhancements going in for segment replication[1][2] feature (went GA in 2.7.0), we need to ensure different plugins are compatible with current state of this feature. Previously, we ran tests on plugin repos to verify this compatibility but want plugin owners to be aware of these changes so that required updates (if any) can be made. With
2.10.0
release, remote store feature is going GA which internally uses SEGMENT replication strategy only i.e. it enforces all indices to useSEGMENT
replication strategy. So, it is important to validate plugins are compatible with segment replication feature.What changed
1. Refresh policy behavior
2. Refresh lag on replicas
With segment replication, there is inherent delay in documents to be searchable on replica shard copies. This is due to the fact that replica shard copies over data (segment) files from primary. Thus, compared to document replication, there will be on average increase in amount of time the replica shards are consistent with primaries.
3. System/hidden indices support
With opensearch-project/OpenSearch#8200, system and hidden indices are now supported with
SEGMENT
replication strategy. We need to ensure there are no bottlenecks which prevents system/hidden indices with segment replication.Next steps
With segment replication strong reads are not guaranteed. Thus, if the plugin needs strong reads guarantees specially as alternative to change in behavior of refresh policy and lag on replicas (point 1 and 2 above), we need to update search requests to target primary shard only. With opensearch-project/OpenSearch#7375, core now supports primary shards only based search. Please follow documentation for examples and details
Open questions
In case of any questions or issues, please post it in core issue
Reference
[1] Design
[2] Documentation
The text was updated successfully, but these errors were encountered: