Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research data streams #3824

Closed
axw opened this issue May 27, 2020 · 7 comments
Closed

Research data streams #3824

axw opened this issue May 27, 2020 · 7 comments
Assignees
Milestone

Comments

@axw
Copy link
Member

axw commented May 27, 2020

No description provided.

@axw axw added this to the 7.9 milestone May 27, 2020
@axw axw assigned simitt and unassigned simitt May 27, 2020
@simitt
Copy link
Contributor

simitt commented May 29, 2020

Related elastic/elasticsearch#53100

@simitt
Copy link
Contributor

simitt commented Jun 10, 2020

Composable and Index Templates

Template inheritance will not be supported by index templates. Instead index templates can make use of template composition. Multiple component templates can be created. Component templates do not define an index pattern and can consist of settings, mappings and aliases. Index templates can also directly define settings, mappings and aliases, but can also contain an array of component templates. If multiple index templates would match an index, the template with the highest priority gets applied. Index templates can contain a definition for lifecycle management. When using index templates without data streams, lifecycle management settings are defined the same way as for legacy templates.

Legacy templates mark template inheritance as deprecated but supports it until end of 7.x.
Beginning 8.0 legacy templates will throw an error. Support for legacy templates will be removed in 9.0.

Data Streams

Data Streams contains a stream of data for time based data sources. It keeps track of underlying, generational indices. A new generation is created when the data stream rolls over. Only the latest index is a write index, and the data stream keeps track of it. Data streams only support append-only writes. Updates on data streams are not supported, they need to be applied directly to the underlying index. When using index templates with data streams, a data stream gets automatically created on data ingestion, if no rollover alias or index with the same name exists. The main change for ILM is that no rollover_alias needs to be created upfront. As soon as the template with the data_stream definition exists, a data stream will automatically be created.

Deleting a data stream deletes all underlying indices.

APM Server

Current ILM setup requires to set up rollover aliases before ingesting data, which then automatically creates the index for data ingestion. When the rollover alias is missing, or gets deleted during data ingestion, an unmanaged index with the same name gets created. This leads to ever growing indices, not being able to rollover and indexing errors in the APM Server, not being able to ingest any more data. The big advantage of using data streams is, that as soon as an index template with a data stream definition exists, a datastream gets automatically created on data ingestion. If the data stream gets manually deleted during ingestion, ES automatically creates a new one.

In case data ingestion starts while no index template with data stream definition exists, an unmanaged index is created, and no data stream with the same name can be created afterwards. When using APM Server for setting up templates, this should never happen. Even when data get accepted by the APM Server during the setup process, the server only sends the data to ES after the setup has finished.

Since template inheritance will not be supported from 8.0 on, we need to switch to not using template inheritance before the latest 7.x version of APM Server.

APM Server currently makes use of legacy templates, leading to deprecated warnings from ES. An elastic product should not necessarily create deprecation warnings, which is another reason for switching to new index templates early on.

Switching to use index templates and leveraging data streams by default would improve the current situation with ILM. We should provide an option to switch back to legacy setup.template.legacy:false, in case users have some custom setup using legacy templates. Maybe it is good enough to not document the config option and only keep it in as a hidden option until 8.0.

The first required step is to remove template inheritance and create a self-contained index template per index. If we keep having the apm-{version} index as fallback index, we need to ensure to set up an index template with index pattern apm-{version}* that has lowest priority.
As soon as two index templates with same priority match, an error is raised when trying to ingest data.
Alternatively we could switch the fallback index to contain a suffix default and create a dedicated index template for it. This would only work for managed indices. For unmanaged indices we need to keep the generic apm-* template, as user defined indices can be set up.

To avoid special casing for some index templates to not have the data_stream option set, we could move all APM indices to managed indices. Currently sourcemap, onboarding and the fallback index are not managed.

For 7.9 before releasing the customization of rollover aliases, change implementation to only allow changing suffixes. This would be more aligned with the ingest management solution. In case support for customizable suffix is clashing with users use cases, we can reiterate.

There are some problems when switching between legacy and index templates within the same version:

legacy -> index templates

For managed indices, if a rollover alias (v1) exists, than no data stream is created although an index template with a data stream attribute exists and is generally applied to the index. This means that switching between legacy and index templates within the same version would lead to unexpected behavior. When the index template contains a lifecycle management setting for the rollover alias it would lead to indices managed with rollover aliases, if the template does not contain this information, the index would be marked as having issues when trying to rollover, as no data stream can be created, but also no rollover alias is associated with it.

A solution could be to delete rollover aliases when switching from legacy to index templates, but that would be rather invasive and might destroy ongoing ingestion.

Another solution could be to add a suffix to indices ds or legacy, .. to ensure different index names are used for different kinds of setup. This is a bit complex to implement with current ILM implementation, but seems doable.

For unmanaged indices switching seems to work just fine. The new index template would only be applied to newly created indices.

index -> legacy templates

For managed indices this would result in an error as existing data streams are matched when trying to create an alias. Again, we could resolve this with different suffixes in the index names and patterns.

For unmanaged indices this would also lead to errors as index patterns match existing data streams. For unmanaged indices users can configure one setup.template.pattern, we should probably not change that.

Proposed solution

Keep unmanaged indices unchanged (as far as possible), deprecate it for 7.x, remove support in 8.0.
Adapt unmanaged indices to ensure no template inheritance is created with the event specific templates. This will be tricky without deleting event specific templates when switching from managed indices. (needs some more investigation)

Switch to index templates with data streams by default for managed indices and offer a deprecated setting to fallback to legacy templates for 7.x. Remove this setting in 8.0 and only support index templates and data streams from 8.0 on.

Include changes to manage all indices, as it allows for less special casing in code and makes sense for users.

Alternative Scenario - minimal effort in 7.x, breaking change in 8.0

In 7.x APM Server keeps setting up templates and rollover aliases as is, using legacy templates. From 8.0 on it only supports managed indices with index templates. One index template per event is created with a medium priority, and one index template matching a fallback APM index is created with a low priority.

Disadvantage: APM Server cannot switch to use ILM with data streams. Current ILM issues are not resolved.

Update: APM Server 7.x needs to support ES 8.0, therefore we need to get rid of template inheritance within 7.x.

POC

TODO: add link to branches

Open Decisions

  • Should standalone APM Server 8.x still be available or only via elastic agent setup?
    Assuming it should still be available, we need to remove template inheritance.
    -> Yes APM Server 8.x needs to be able to take care of setup
  • Change the default to index templates within 7.x? (To use data streams by default).
    Users that have either setup.template.enabled or apm-server.ilm.setup.enabled disabled could be impacted by this, as their
    own templates might not be applied (index templates take precedence over legacy templates), or restart on upgrade could break APM Server
    setup if conflicts are detected.
    If only APM Server takes care of setup, API details should not matter, if only user takes care of it, they would not see any changes.
  • How much support do we want to offer for automatic cleanup of previous setups when switching between legacy and index templates?
    Might depend on what the default is.

Beats Strategy

Needs to be clarified with beats contributors. First conversations suggest to take minimal effort on beats for the changes.

Resources

ES Composable and Index Templates
Data Streams
Ingest Manager

@axw
Copy link
Member Author

axw commented Jun 15, 2020

I have some comments, and some questions which you may have already answered. Sorry, this might take a little while to sink in for me.

If we keep having the apm-{version} index as fallback index, we need to ensure to set up an index template with index pattern apm-{version}* that has lowest priority.
...
Alternatively we could switch the fallback index to contain a suffix default and create a dedicated index template for it. This would only work for managed indices. For unmanaged indices we need to keep the generic apm-* template, as user defined indices can be set up.

Why do we need a fallback index in managed mode? We're in control of both the indices and what documents go into them, so can't we guarantee that there's always an appropriate index?

To avoid special casing for some index templates to not have the data_stream option set, we could move all APM indices to managed indices. Currently sourcemap, onboarding and the fallback index are not managed.

I think it makes sense for onboarding to be managed too, but not sure about sourcemap. Sourcemaps are not events. I think we might be better off separating sourcemap indexing from managed/unmanaged altogether. Since sourcemapping requires a direct connection to Elasticsearch, we could do this outside of libbeat. I would expect sourcemaps to be relatively low volume, so we could just have a plain old index; advanced users could modify that to set up ILM out of band if needed. WDYT?

(Again not sure about fallback here.)

For managed indices, if a rollover alias (v1) exists, than no datastream is created although an index template with a datastream attribute exists and is generally applied to the index. This means that switching between legacy and index templates within the same version would lead to unexpected behavior. When the index template contains a lifecycle management setting for the rollover alias it would lead to indices managed with rollover aliases, if the template does not contain this information, the index would be marked as having issues when trying to rollover, as no data stream can be created, but also no rollover alias is associated with it.

A solution could be to delete rollover aliases when switching from legacy to index templates, but that would be rather invasive and might destroy ongoing ingestion.

Why might it destroy ongoing ingestion? Could we do something like:

  1. Check for rollover alias
  2. Disable auto_create_index
  3. Create index template
  4. Delete rollover alias
  5. "Disable" legacy template

Disabling vs. deleting legacy templates would cater for customisation, in case users upgrade without setting the config to stay in legacy template mode. We could disable the legacy templates by changing the index_patterns to something non-matching.

It's true that during that upgrade process other apm-server's may fail to index (e.g. after deleting the alias, because auto_create_index was disabled), but it's generally expected that indexing may fail due to transient error conditions, and need to be retried.

index -> legacy templates
For managed indices this would result in an error as existing data streams are matched when trying to create an alias. Again, we could resolve this with different suffixes in the index names and patterns.

Why would you go from index templates to legacy templates? I can understand continuing to support legacy templates, but I'm not sure we need to support migrating and then going back again. If people need to do that, I think they can do it manually.

@simitt
Copy link
Contributor

simitt commented Jun 16, 2020

Thanks for the review @axw , you are raising some really important questions here:

Why do we need a fallback index in managed mode?

With managed indices, we don't need it. It's simply moved over from the time we used unmanaged indices. When removing it, we should add a check that every event has one of the allowed ILM event types set before pushing the event to the queue, to avoid regressions in the future, potentially leading to a full queue.
While it is nice to get rid of the fallback index and can simplify the code a bit, the main problem with the default index lies in the unmanaged indices though, and how to avoid template inheritance with legacy templates for it.

I think it makes sense for onboarding to be managed too, but not sure about sourcemap.

Agreed that it is not necessary to also manage sourcemaps. In my POC I did add sourcemaps with a rollover policy only based on size and not on age. We are not versioning sourcemaps, therefore in some edge cases people might run into issues with ever growing sourcemap indices (although unlikely). While it is not an event in the APM sense, we are already today treating it as an event by setting processor.event: "sourcemap". When disabling auto_create_index for managed indices, we then need to ensure it is not disabled for sourcemaps (not a problem, just something to consider).
IMO whatever is easier to implement for sourcemaps is fine.

A solution could be to delete rollover aliases when switching from legacy to index templates, but that would be rather invasive and might destroy ongoing ingestion.

Why might it destroy ongoing ingestion? Could we do something like:
Check for rollover alias
Disable auto_create_index
Create index template
Delete rollover alias
"Disable" legacy template

Yes, that could work. The reason I haven't picked up the idea of disabling auto_create_index yet is that it would require additional cluster: manage privileges for the apm-server user. Not saying it's not worth exploring, but if we can avoid it, that might make sense. There is no default user shipped for APM Server, therefore on upgrade, the existing apm-server user role would need to be changed manually.

Why would you go from index templates to legacy templates?

Because I'd like to enable index templates by default, in which case users would go this direction if they want to stick with legacy templates. But I agree, that we do not necessarily need to support this out-of-the-box. That is basically why I raised the last question:

How much support do we want to offer for automatic cleanup of previous setups when switching between legacy and index templates?

Based on this feedback and feedback during our offline discussions I'm going to take a bit of time to work further on the POC.

@simitt
Copy link
Contributor

simitt commented Jun 16, 2020

from elastic/beats#17829 (comment):

For large deployment like our the template inheritance is very common and needed. For example we use beats templates together with our templates. Multiple templates match same indices. The beats should reflect this. Maybe it could just allow the beats template to be generated as a component so other templates could include it as a component template.

@simitt
Copy link
Contributor

simitt commented Jul 7, 2020

Follow up on index template and data stream investigations, point (4) is specific to data streams.

@jalvz
Copy link
Contributor

jalvz commented Oct 27, 2020

Hello 👋

In case data ingestion starts while no index template with data stream definition exists, an unmanaged index is created, and no data stream with the same name can be created afterwards.

This can be, potentially, very problematic. Because:

When using APM Server for setting up templates, this should never happen.

With Elastic Agent we don't have apm-server setup. I can imagine an user downloading Elastic Agent, copy-paste in elastic-agent.yml some apm config section (presumably from our own docs), simply run the agent, and then is already too late to solve the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants