-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Research data streams #3824
Comments
Related elastic/elasticsearch#53100 |
Composable and Index TemplatesTemplate inheritance will not be supported by index templates. Instead index templates can make use of template composition. Multiple component templates can be created. Component templates do not define an index pattern and can consist of settings, mappings and aliases. Index templates can also directly define settings, mappings and aliases, but can also contain an array of component templates. If multiple index templates would match an index, the template with the highest priority gets applied. Index templates can contain a definition for lifecycle management. When using index templates without data streams, lifecycle management settings are defined the same way as for legacy templates. Legacy templates mark template inheritance as deprecated but supports it until end of 7.x. Data StreamsData Streams contains a stream of data for time based data sources. It keeps track of underlying, generational indices. A new generation is created when the data stream rolls over. Only the latest index is a write index, and the data stream keeps track of it. Data streams only support append-only writes. Updates on data streams are not supported, they need to be applied directly to the underlying index. When using index templates with data streams, a data stream gets automatically created on data ingestion, if no rollover alias or index with the same name exists. The main change for ILM is that no rollover_alias needs to be created upfront. As soon as the template with the data_stream definition exists, a data stream will automatically be created. Deleting a data stream deletes all underlying indices. APM ServerCurrent ILM setup requires to set up rollover aliases before ingesting data, which then automatically creates the index for data ingestion. When the rollover alias is missing, or gets deleted during data ingestion, an unmanaged index with the same name gets created. This leads to ever growing indices, not being able to rollover and indexing errors in the APM Server, not being able to ingest any more data. The big advantage of using data streams is, that as soon as an index template with a data stream definition exists, a datastream gets automatically created on data ingestion. If the data stream gets manually deleted during ingestion, ES automatically creates a new one. In case data ingestion starts while no index template with data stream definition exists, an unmanaged index is created, and no data stream with the same name can be created afterwards. When using APM Server for setting up templates, this should never happen. Even when data get accepted by the APM Server during the setup process, the server only sends the data to ES after the setup has finished. Since template inheritance will not be supported from 8.0 on, we need to switch to not using template inheritance before the latest 7.x version of APM Server. APM Server currently makes use of legacy templates, leading to deprecated warnings from ES. An elastic product should not necessarily create deprecation warnings, which is another reason for switching to new index templates early on. Switching to use index templates and leveraging data streams by default would improve the current situation with ILM. We should provide an option to switch back to legacy The first required step is to remove template inheritance and create a self-contained index template per index. If we keep having the To avoid special casing for some index templates to not have the For 7.9 before releasing the customization of rollover aliases, change implementation to only allow changing suffixes. This would be more aligned with the ingest management solution. In case support for customizable suffix is clashing with users use cases, we can reiterate. There are some problems when switching between legacy and index templates within the same version: legacy -> index templatesFor managed indices, if a rollover alias (v1) exists, than no data stream is created although an index template with a data stream attribute exists and is generally applied to the index. This means that switching between legacy and index templates within the same version would lead to unexpected behavior. When the index template contains a lifecycle management setting for the rollover alias it would lead to indices managed with rollover aliases, if the template does not contain this information, the index would be marked as having issues when trying to rollover, as no data stream can be created, but also no rollover alias is associated with it. A solution could be to delete rollover aliases when switching from legacy to index templates, but that would be rather invasive and might destroy ongoing ingestion. Another solution could be to add a suffix to indices For unmanaged indices switching seems to work just fine. The new index template would only be applied to newly created indices. index -> legacy templatesFor managed indices this would result in an error as existing data streams are matched when trying to create an alias. Again, we could resolve this with different suffixes in the index names and patterns. For unmanaged indices this would also lead to errors as index patterns match existing data streams. For unmanaged indices users can configure one Proposed solution
Switch to index templates with data streams by default for managed indices and offer a deprecated setting to fallback to legacy templates for 7.x. Remove this setting in 8.0 and only support index templates and data streams from 8.0 on. Include changes to manage all indices, as it allows for less special casing in code and makes sense for users. Alternative Scenario - minimal effort in 7.x, breaking change in 8.0In 7.x APM Server keeps setting up templates and rollover aliases as is, using legacy templates. From 8.0 on it only supports managed indices with index templates. One index template per event is created with a medium priority, and one index template matching a fallback APM index is created with a low priority. Disadvantage: APM Server cannot switch to use ILM with data streams. Current ILM issues are not resolved. Update: APM Server 7.x needs to support ES 8.0, therefore we need to get rid of template inheritance within 7.x. POCTODO: add link to branches Open Decisions
Beats StrategyNeeds to be clarified with beats contributors. First conversations suggest to take minimal effort on beats for the changes. ResourcesES Composable and Index Templates |
I have some comments, and some questions which you may have already answered. Sorry, this might take a little while to sink in for me.
Why do we need a fallback index in managed mode? We're in control of both the indices and what documents go into them, so can't we guarantee that there's always an appropriate index?
I think it makes sense for onboarding to be managed too, but not sure about sourcemap. Sourcemaps are not events. I think we might be better off separating sourcemap indexing from managed/unmanaged altogether. Since sourcemapping requires a direct connection to Elasticsearch, we could do this outside of libbeat. I would expect sourcemaps to be relatively low volume, so we could just have a plain old index; advanced users could modify that to set up ILM out of band if needed. WDYT? (Again not sure about fallback here.)
Why might it destroy ongoing ingestion? Could we do something like:
Disabling vs. deleting legacy templates would cater for customisation, in case users upgrade without setting the config to stay in legacy template mode. We could disable the legacy templates by changing the index_patterns to something non-matching. It's true that during that upgrade process other apm-server's may fail to index (e.g. after deleting the alias, because auto_create_index was disabled), but it's generally expected that indexing may fail due to transient error conditions, and need to be retried.
Why would you go from index templates to legacy templates? I can understand continuing to support legacy templates, but I'm not sure we need to support migrating and then going back again. If people need to do that, I think they can do it manually. |
Thanks for the review @axw , you are raising some really important questions here:
With managed indices, we don't need it. It's simply moved over from the time we used unmanaged indices. When removing it, we should add a check that every event has one of the allowed ILM event types set before pushing the event to the queue, to avoid regressions in the future, potentially leading to a full queue.
Agreed that it is not necessary to also manage sourcemaps. In my POC I did add sourcemaps with a rollover policy only based on size and not on age. We are not versioning sourcemaps, therefore in some edge cases people might run into issues with ever growing sourcemap indices (although unlikely). While it is not an event in the APM sense, we are already today treating it as an event by setting
Yes, that could work. The reason I haven't picked up the idea of disabling
Because I'd like to enable index templates by default, in which case users would go this direction if they want to stick with legacy templates. But I agree, that we do not necessarily need to support this out-of-the-box. That is basically why I raised the last question:
Based on this feedback and feedback during our offline discussions I'm going to take a bit of time to work further on the POC. |
from elastic/beats#17829 (comment):
|
Follow up on index template and data stream investigations, point (4) is specific to data streams. |
Hello 👋
This can be, potentially, very problematic. Because:
With Elastic Agent we don't have |
No description provided.
The text was updated successfully, but these errors were encountered: