Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink template inheritance to allow for better validation #21105

Closed
jpountz opened this issue Oct 25, 2016 · 23 comments
Closed

Rethink template inheritance to allow for better validation #21105

jpountz opened this issue Oct 25, 2016 · 23 comments
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates >enhancement Team:Data Management Meta label for data/management team

Comments

@jpountz
Copy link
Contributor

jpountz commented Oct 25, 2016

Validation of index templates has proven difficult to get right due to how multiple templates that might be invalid on their own can result in a valid template in the end, for instance if two templates match a given index name and one of them is providing analysis index settings while the other one is providing mappings that make use of these analyzers. (#20479)

The fact that we cannot validate index templates is not good, so I think we need to think about how we could validate them all the time without false positives. We were discussing this issue yesterday and the following options have been suggested:

  • Require that templates are valid on their own. In that case we might also want to remove the ability for several templates to match an index, as several templates could be merged into an invalid one (Template Ordering Results in Inheritance of Incompatible Setting #20988).
  • Make the inheritance of templates explicit: instead of merging templates together based on order, a template could refer to other templates that should be included in order to not have to repeat the same bits over multiple templates. And again, we would probably only allow one template to be applied per index.
@dadoonet
Copy link
Member

Make the inheritance of templates explicit

Definitely a good idea IMO. In such a case, you know exactly what you are doing as a end user.

@s1monw
Copy link
Contributor

s1monw commented Oct 25, 2016

I wonder if we really need to make it all that sophisticated. It might be enough for 99% of the usecases to just have a single _root_ template that you can inherit from and that's it. This means that the root is applied as part of each individual template independent of what other templates match for a specific index. When users update a specific template we can validate all effective template associated with each other. I also think we might be able to make this happen for 5.1?

@jpountz
Copy link
Contributor Author

jpountz commented Oct 25, 2016

I just discussed it with Simon since I was not clear how this would help. Since the perceived benefit of having multiple matching templates is to save some typing when putting them, the _root_ template would be applied when the template is put (rather than at index creation time) so that the template could be validated as it would be expected to be complete.

@acarstoiu
Copy link

acarstoiu commented Oct 28, 2016

What if the _root_ template also has it's own _root_?! The inheritance chain must be free from any constraints, except for non-circularity, of course. ;)

A second question is how on Earth are you going to enforce the single association to a brand new index that just happens to be created by a document insertion request? You have to resort to limitations on the declared template patterns like "only non-overlapping prefixes" and such. I personally don't like artificial limits.
Another idea is to still keep the order field in templates, only that it will indicate the priority for application on a new index. The highest priority of the matching templates (matched through free patterns) points to the winning template - this would be in line with what happens right now (the last merged template has the highest order).

@marshall007
Copy link

We just ran into this trying to upgrade our cluster to 2.4. We essentially have one "root" template containing all our custom filters and anaylzers that our other index templates inherit from. Would definitely be in favor of explicitly declaring template dependencies (from @jpountz initial comment). I think a single _root_ would be too restricting for our use case. We tend to have a single "root" template per "index type".

Probably worth mentioning this as a breaking change in the 2.4 migration docs, btw.

@mbarker
Copy link

mbarker commented Jan 4, 2017

I think explicit template dependencies could work. Requiring templates to be valid on their own would be a little difficult since we have a somewhat complex index/template strategy. We recently upgrade from ES 0.90 to 2.3.x and refactored our massive single template into many much smaller templates where we rely on inheritance to keep them concise and re-use shared parts. With 2.3.x we had to partition our indices by language, so we ended up with templates with specific responsibilities.

Settings
We have cluster specific defaults (# shards/replicas, allocation settings, etc), this matches *. Then we have overrides for some subsets of index languages, matching *-en for example, changes the # shards for example.

Analyzers/Filters
We have a set of custom analyzers/filters that are common across all of our mappings (again matching *), and then we override the default analyzer for each language index (again matching *-en for example).

Mappings
We broke down our mappings into multiple templates for the top level parts of our documents, so they are all matching * and they just define the document mapping and reference analyzers from the previous section.

Here's a simplified example of the templates: https://gist.github.com/mbarker/40bd82de268cd5d73d3788621c4b80df

@synhershko
Copy link
Contributor

(subscribing)

@evanvolgas
Copy link

@s1monw I'm not sure that a single root dependency would work either. Like @mbarker we have a relatively complex index/template strategy that works pretty well for us and we'd like to keep it. In particular we tend to define global settings like compression and whether or not _all is enabled at a default template with order 0. Our shared mappings (which correspond to what I think you have in mind) are usually defined at order 10, and this is where we put all the fields in common, default number of shards, etc for indices of a certain type. There are usually 3-4 of these shared mappings indices on any given cluster. Then, we have one or two additional templates at order 20 (and sometimes 30 as well) that specify the individual mappings and analyzers unique to a certain index. There are usually more like a dozen of these on any given cluster.

Inheritance is a certainly a feature we like and use regularly (albeit we've definitely been hit with a few gotchas, eg #20479) but I also agree that the explicit dependencies would be very easy to work with (and arguably a lot easier to write tests against too, which we also do). @acarstoiu brings up a good point about nested inheritance. For what it's worth, it would not be difficult at all for us to work without nested inheritance.

@jpountz
Copy link
Contributor Author

jpountz commented Mar 12, 2017

We had a long discussion about this issue yesterday. We want to be able to validate index templates at PUT time in all cases, yet template inheritance makes it hard. For instance if a root template is modified, all child templates would need to be validated again. Furthermore, clusters are expected to have a reasonable total number of templates, which means having some duplicate sections in multiple templates is not that much of an issue. In conclusion we decided to remove support for merging templates entirely in 6.0 and use the order parameter in order to know which template to use rather than how to merge multiple templates.

@acarstoiu
Copy link

If you:

  • enforce a single inheritance model among templates (with some new property)
  • choose a single template to apply when a new index gets created (by following a suggestion I made 4.5 months ago 👍)

what is the problem in validating a (sub)tree of templates when a parent template is changed?

You're sparring the machine from doing more computations at the expense of more manual work necessary to multiply and keep in sync the common parts - which is itself error-prone.
Is this your understanding of software evolution, folks?! Are you all really sure that in the newest Elasticseach version you want to remove a useful functionality (as proven by many comments here and in related issues) in order to give more work to humans?
@kimchy Help, please! :)

@evanvolgas
Copy link

@jpountz would explicit inheritance simplify the validation at all?

If not, I think it will be important to write a tool that generates explicit templates outside of ES (for example, in Bash/Python/Go/whatever) by using a hierarchy/import statements/something. I'd be happy to help out with said tool since, without inheritance, it's something a lot of us are going to need, myself included.

One clarification if you don't mind: "clusters are expected to have a reasonable total number of templates, which means having some duplicate sections in multiple templates is not that much of an issue." Do you have an upper bound on what you'd consider reasonable? Is this expectation documented?

My experience and yours seem quite different. I do expect this to be an issue for quite a few people. I actually do agree that there are good reasons to make the change you have in mind, and ways to work around it (additional tooling, mentioned above). Among the reasons this change makes sense, though, I'm personally not convinced "It shouldn't be much of an issue" is one of them.

@acarstoiu
Copy link

@jpountz is just a messenger here, so there's no reason to shoot him. Yet, at least 😆

@clintongormley
Copy link
Contributor

Another possibility that has been put forward for discussion is to specify the template name in an indexing request.

/cc @uboness

@acarstoiu
Copy link

That's a no go - it would add an extra thing to worry about in each and every indexing query.

@Battleroid
Copy link

Yeah, that's not ideal. However, if that were optional that would be pretty interesting.

@geekpete
Copy link
Member

Templates could also be validated/converted at upgrade time if we don't have to worry about merging. A migration plugin could also provide early warning about invalid templates ahead of time before an upgrade.

@PhilMarsh
Copy link

I don't see this in the Breaking Changes for 6.0 yet.
Is removing Index Template merging still the plan?

My company is still working on upgrading from 1.7 to 2.4.
My team in particular makes extensive use of template composition to support combinations of index types, 30+ languages, and field sets from 200+ vendors. So #20479 is a pretty big show-stopper for us.

It seems like the simplest approach for us is to copy ES's merge logic into a custom tool (already done) and generate fully-qualified templates.
This works, but it seems sub-optimal from a product support standpoint.
Until someone open-sources such a tool (I don't yet know if I will be allowed to), every consumer needs to write their own version, which seems about as error-prone as copy-pasting analyzers across hundreds of index templates.

If Index Templates really aren't going to be modular go forward, would it be possible to release an official tool that encapsulates the merge logic? Or perhaps a helper endpoint that we can send a list of templates and get back the combined version? I don't have any concrete API suggestions, but I'd be happy to discuss if there's interest.

@willemdh
Copy link

willemdh commented Aug 1, 2017

Template inheritance allows us to add aliasses to for example the default Beats index templates, like this:

PUT _template/filebeat-linux 
{
  "order": 1,
  "template": "filebeat-linux-*",
  "aliases" : {
    "a-filebeat-linux" : {}
  }
}

Please don't deprecate template inheritance completely or else we will have to manually update each beats template after each upgrade.

@evanvolgas
Copy link

We've started to migrate away from nested templates, but we're finding it really hard to get away from a default level 0 template that contains our dynamic mapping settings. If you guys do remove template inheritance, would there be any way to add an option to templates to import one other template that contains common dynamic mapping config? It would not be too hard for us to be explicit about the fields we expect in a template.... but we really don't want to have to repeat the dynamic mapping rules in 100 different places if we can avoid it.

@acarstoiu
Copy link

For those who want to migrate to 5.5.1, I have ported the patch I made for 2.4.1 (see this comment and the next one) in order to make the template validation lenient with respect to analyzers.

The port is here.

@astral303
Copy link

It seems unreasonable to take away convenience and DRY in the name of dubiously-useful index template validation. The end result is that I will have to code the template merging myself.

@pickypg pickypg added v7.0.0 and removed v6.0.0 labels Nov 7, 2017
@clintongormley clintongormley added :Data Management/Indices APIs APIs to create and manage indices and templates and removed :Index Templates labels Feb 13, 2018
@hub-cap hub-cap added :Data Management/Indices APIs APIs to create and manage indices and templates and removed :Data Management/Indices APIs APIs to create and manage indices and templates labels Mar 21, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@dakrone
Copy link
Member

dakrone commented Nov 6, 2020

Closing this as it's been resolved by composable index templates (#53101)

@dakrone dakrone closed this as completed Nov 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates >enhancement Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests