-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy data stream rollover is not triggered when using reroute #112781
Comments
Pinging @elastic/es-data-management (Team:Data Management) |
Is this a more general case of lazy rollover only being triggered post ingest pipeline, and not specific to rerouting? We're also seeing issues related to upgrading from older versions of of APM (e.g. 8.12.1) to 8.15.1, without any reroute processor involved. |
Hi @axw, we did not know you were doing version checks on the pipelines, so yes, that is definitely a side effect of the lazy rollover happening only upon a write to the index. The timing of the rollover is important though because if we rollover earlier we risk creating empty indices. We discussed possible approaches to solve this in a way that does not produce extra indices and we have the following proposal:
This way we have the following benefits:
The drawbacks:
|
@gmarouli thanks, sounds reasonable. Just to clarify, we don't do version checks in recent versions of our ingest pipeline - that only applies to versions before 8.13.0.
+1 that was also my first thought. Would it make sense to extend this approach to also update the marked data stream after executing the pipeline if there were no writes? |
What do you mean with this? |
@gmarouli sorry, that was very unclear, let me try again. If the data stream is marked for lazy rollover, do what you described where we resolve any settings (e.g. ingest pipeline) that may affect ingestion from the matching index template; then if there was a change in template, execute the rollover even if there were no writes to the data stream's backing index. That way we wouldn't need to do the template resolution on every write to the data stream, only once per lazy rollover. |
@axw thank you for the explanation, I get it now. You are right, that would address the potential latency but we would be creating empty indices which is something we want to avoid. Let's say what's the impact and if it can be sustained until we have a more structural solution available. |
@gmarouli is this something that can be fixed for |
@simitt we're discussing some options to address this. Will post back here later today with our suggested approach and timeline. |
Ok @parkertimmins is going to work on this. Our initial thought is that we can get this done in a week or so, and we'll target |
I've been working on this ticket today, and have added a prototype change that re-resolves default pipeline from templates if lazy rollover is set. This appears to work fine. Currently, it does the pipeline resolution for every index request within a bulk request. This will need to be optimized to only do resolution once per index written to within bulk request, which will add some complexity. I think finishing the feature itself will take another 2 days. So, including the time for functional and performance tests, I think 1 week is a decent estimate. |
If datastream rollover on write flag is set in cluster state, resolve pipelines from templates rather than from metadata. This fixes the following bug: when a pipeline reroutes every document to another index, and rollover is called with lazy=true (setting the rollover on write flag), changes to the pipeline do not go into effect, because the lack of writes means the data stream never rolls over and pipelines in metadata are not updated. The fix is to resolve pipelines from templates if the lazy rollover flag is set. To improve efficiency we only resolve pipelines once per index in the bulk request, caching the value, and reusing for other requests to the same index. Fixes: elastic#112781
If datastream rollover on write flag is set in cluster state, resolve pipelines from templates rather than from metadata. This fixes the following bug: when a pipeline reroutes every document to another index, and rollover is called with lazy=true (setting the rollover on write flag), changes to the pipeline do not go into effect, because the lack of writes means the data stream never rolls over and pipelines in metadata are not updated. The fix is to resolve pipelines from templates if the lazy rollover flag is set. To improve efficiency we only resolve pipelines once per index in the bulk request, caching the value, and reusing for other requests to the same index. Fixes: elastic#112781
If datastream rollover on write flag is set in cluster state, resolve pipelines from templates rather than from metadata. This fixes the following bug: when a pipeline reroutes every document to another index, and rollover is called with lazy=true (setting the rollover on write flag), changes to the pipeline do not go into effect, because the lack of writes means the data stream never rolls over and pipelines in metadata are not updated. The fix is to resolve pipelines from templates if the lazy rollover flag is set. To improve efficiency we only resolve pipelines once per index in the bulk request, caching the value, and reusing for other requests to the same index. Fixes: elastic#112781 (cherry picked from commit 6db39d1) # Conflicts: # server/src/main/java/org/elasticsearch/action/bulk/TransportAbstractBulkAction.java # server/src/main/java/org/elasticsearch/ingest/IngestService.java
… (#116131) * Resolve pipelines from template if lazy rollover write (#116031) If datastream rollover on write flag is set in cluster state, resolve pipelines from templates rather than from metadata. This fixes the following bug: when a pipeline reroutes every document to another index, and rollover is called with lazy=true (setting the rollover on write flag), changes to the pipeline do not go into effect, because the lack of writes means the data stream never rolls over and pipelines in metadata are not updated. The fix is to resolve pipelines from templates if the lazy rollover flag is set. To improve efficiency we only resolve pipelines once per index in the bulk request, caching the value, and reusing for other requests to the same index. Fixes: #112781 * Remute tests blocking merge * Remute tests blocking merge
#116132) * Resolve pipelines from template if lazy rollover write (#116031) If datastream rollover on write flag is set in cluster state, resolve pipelines from templates rather than from metadata. This fixes the following bug: when a pipeline reroutes every document to another index, and rollover is called with lazy=true (setting the rollover on write flag), changes to the pipeline do not go into effect, because the lack of writes means the data stream never rolls over and pipelines in metadata are not updated. The fix is to resolve pipelines from templates if the lazy rollover flag is set. To improve efficiency we only resolve pipelines once per index in the bulk request, caching the value, and reusing for other requests to the same index. Fixes: #112781 * Remute tests block merge * Remute tests block merge
…6137) If datastream rollover on write flag is set in cluster state, resolve pipelines from templates rather than from metadata. This fixes the following bug: when a pipeline reroutes every document to another index, and rollover is called with lazy=true (setting the rollover on write flag), changes to the pipeline do not go into effect, because the lack of writes means the data stream never rolls over and pipelines in metadata are not updated. The fix is to resolve pipelines from templates if the lazy rollover flag is set. To improve efficiency we only resolve pipelines once per index in the bulk request, caching the value, and reusing for other requests to the same index. Fixes: #112781 (cherry picked from commit 6db39d1)
If datastream rollover on write flag is set in cluster state, resolve pipelines from templates rather than from metadata. This fixes the following bug: when a pipeline reroutes every document to another index, and rollover is called with lazy=true (setting the rollover on write flag), changes to the pipeline do not go into effect, because the lack of writes means the data stream never rolls over and pipelines in metadata are not updated. The fix is to resolve pipelines from templates if the lazy rollover flag is set. To improve efficiency we only resolve pipelines once per index in the bulk request, caching the value, and reusing for other requests to the same index. Fixes: elastic#112781
As there are some concerns about a performance regression on this ticket, I've run the following benchmarks. Benchmark StructureTwo separate benchmark test were run, one with 3 data streams and one with 2 data streams, which we call 3-layer and 2-layer. The layer 2 test attempts to insert into data-stream-1 which reroutes to data-steam-2, where all docs are inserted. The layer 3 test attempts to insert into data-stream-1, which reroutes to data-steam-2, which reroutes to data-stream 3, where the docs are inserted. The fix made in the ticket will behave differently on the 2-layer and 3-layer tests. In both cases, on the initial data-stream, the pipeline will only be resolved from templates once per bulk request. This is because the resolved template is cached per data stream being inserted into, and in this test there is only one per request. One the other hand, in the 3 layer test, in the second reroute, every doc in a bulk request requires a separate pipeline resolution. In both tests, the data streams being rerouted away from have the lazy rollover flag set. In both tests, the data streams with reroute pipelines each had 10 matching index templates. Each index template was composed of 10 component templates. The final data stream which received documents only had a single matching index templates. The bulk request batch size was varied between 10, 100, 1000, and 10000 docs. The inserted dataset contains 3.2 millions documents. All data streams indices had 1 primary and 0 replicas. This was run on a single node cluster, on a single machine with 64gb ram and 20 CPUs. The tests were run in rally. Though this configuration is not typical of a production cluster, we expected a single node cluster with 0 replicas to have the worst-case behavior for tested change. ResultsThe following plots show the throughput in docs/second of the test vs baseline code on the 2-layer and 3-layer benchmarks. These are combined in the following plot which shows the symmetric percent difference between test and baseline for both 2 and 3 layer benchmarks As expected, the test version performs worse in most cases. On average we see a 2% decrease in throughput across all tests, and a max decrease in throughput of 10.7% for the 3 layer test with batch size of 1000. Notably, the 3 layer test with batch size of 10k, only has a decrease of 2.9%. This is likely a result of the pipeline caching making up for the slowdown caused by template resolution. For this same reason, we see a 10% improvement over the baseline in the 2 layer test with 10k docs per batch. ConclusionThis test was designed to show the worst case scenario for the new feature. In most cases, overhead from other operations will obviate the slowdown caused by this change. Given this, the average decrease in performance of 2% seems an acceptable trade-off for a necessary bug fix. |
Elasticsearch Version
8.15.1
Installed Plugins
No response
Java Version
bundled
OS Version
N/A
Problem Description
Lazy rollover on a data stream is not triggered when writing a document that is rerouted to another data stream. This affects the apm-data plugin, where we perform a lazy rollover of matching data stream patterns when installing or updating index templates. The data stream never rolls over. See elastic/apm-server#14060 (comment)
Should a write that leads to a reroute also trigger the lazy rollover? I think so, otherwise the default pipeline will not change.
Steps to Reproduce
Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: