Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster metadata consistency among filters. #4558

Closed
stevenzzzz opened this issue Sep 28, 2018 · 11 comments
Closed

cluster metadata consistency among filters. #4558

stevenzzzz opened this issue Sep 28, 2018 · 11 comments
Labels
design proposal Needs design doc/proposal before implementation

Comments

@stevenzzzz
Copy link
Contributor

stevenzzzz commented Sep 28, 2018

Bug Template

Title: We need a mechanism to make sure cluster(a service)'s metadata is consistent between filters.

Description:

For now cluster metadata is hosted in ClusterInfo, and each filter access it via FactoryContext.clusterManager().get(cluster_name).info().metadata.
This is not thread-safe as the metadata may get updated between different filters' de/encodeHeaders() call.
Consider the following scenario:

  1. FilterA read metadata, get "version 1"  metadata.
    
  2. CDS update the metadata of the cluster to "version 2".
    
  3. FilterB read metadata,  get "version 2"  metadata.  Although the clusterManager.get() returns a threadlocalCluster, it's the updated threadlocalCluster by the previous CDS update operation.
    

Proposal:

We should introduce a mechanism to make sure all filters sees the same version of metadata for the whole filter chain.
A place to latch this per-request metadata (or the clusterinfo) would be in the RouteEntry, instead of having the clusterName(), we can latch the clusterInfo to the RouteEntry.

This even works when we reselect route and the cluster changes.

@mattklein123 mattklein123 added question Questions that are neither investigations, bugs, nor enhancements design proposal Needs design doc/proposal before implementation and removed question Questions that are neither investigations, bugs, nor enhancements labels Sep 28, 2018
@mattklein123
Copy link
Member

This is not thread-safe

Sorry to be pedantic but it is thread safe, it's just not consistent across filter invocations, right?

I agree that in general what you are saying makes sense to me. My concern is that someone might actually be expecting the alternate behavior which is/was immediate update. Can you come up with a more fleshed out API proposal/change that we can discuss?

@stevenzzzz
Copy link
Contributor Author

Ah, right. Sorry for the inaccurate description.

If we ever use the clusterManager.clusters() interface to get a cluster from the returned dict, it might be thread-unsafe. As the CDS operates on the clustermanager.active_clusters_ directly as well. But indeed with clusterManager.get() we have only the consistency problem.

Our use case is that a protobuf contains config data that'd be used by different filters, and consistency is definitely a problem in this scenario.

I think we can in VirtualHostImpl::getRouteFromEntries, instead of returning the current RouteEntryImpl, we can have a RouteRequestImpl(with a better name), which wraps around the current RouteEntryImplBase, it has a cluster_info_ member which gets set to the threadlocalCluster.info() in the constructor. To achieve that, RouteEntryImplBase need to keep the pointer to FactoryContext.

@mattklein123
Copy link
Member

I think we can in VirtualHostImpl::getRouteFromEntries, instead of returning the current RouteEntryImpl, we can have a RouteRequestImpl(with a better name), which wraps around the current RouteEntryImplBase, it has a cluster_info_ member which gets set to the threadlocalCluster.info() in the constructor. To achieve that, RouteEntryImplBase need to keep the pointer to FactoryContext.

The issue is that I don't think we should require an allocation to be performed to return a new route entry that wraps the underlying data unless the user cares about this type of consistency. So it could either be opt-in via an API, or, I suspect there is some way we can do this with TLS such that the cluster info ends up being picked up by RDS and merged into the route table in a stable way.

@htuch
Copy link
Member

htuch commented Oct 1, 2018

@mattklein123 are you asking to avoid the allocation overhead here for performance reasons (e.g. don't add unnecessary overhead to the null request path as it's Envoy's data plane fast path)?

Is there an example you can think of where someone would expect the inconsistent view of ClusterInfo and be making use of it?

@mattklein123
Copy link
Member

@mattklein123 are you asking to avoid the allocation overhead here for performance reasons (e.g. don't add unnecessary overhead to the null request path as it's Envoy's data plane fast path)?

Yes. I think we can satisfy this case without adding extra overhead in the basic forwarding path.

Is there an example you can think of where someone would expect the inconsistent view of ClusterInfo and be making use of it?

I'm not sure, but it wouldn't surprise me if there was a case of people wanting the freshest data for some reason. I agree that in general it seems fine to change this behavior, and if someone wants the freshest data they can always call directly into the CM to get it. The key is really how to latch it in a performance neutral way. I see two options:

  1. Have some variant of route() that returns a latched cluster info, and allow grabbing it off the route. I'm not sure how this would play with route caching.
  2. When there are cluster info updates, raise an event that would cause RDS to recompute a static route table that embeds cluster info inside the route. This would take care of the perf issue and also utilize the existing eventually consistent nature of route updates.

I think I prefer option 2 as I think to the end user it will be simpler.

@stevenzzzz
Copy link
Contributor Author

stevenzzzz commented Oct 1, 2018

Loop in @alyssawilk.

Although I feel strongly there should be consistency of cluster config a request sees among filters' invocations, I can see the other ends as well.

Your TLS idea is inspiring, but there might be some details I am not fully clear:

It's possible to latch clusterinfo to a routeEntry when creating the routeEntries, but not for route entry with a route_specifier = "cluster_header"; which we may either (1. latch all the threadlocalCluster info to this sort of Routes [dirty hack]; (2. use the method I proposed which returns a wrapper per request, and latch the threadlocal clusterInfo to the route entry wrapper.

A minor problem with TLS is that there is a eventual consistency problem between RDS and CDS (which is minor if control plane is careful): RDS may have a temporary incorrect cluster config( say a to-be-deleted cluster), and if we do this "latch clusterinfo when creating routeEntry" trick, the incorrect config can only be corrected until next RDS update happens.

P/S. We had the discussion that we can probably fix this problem(For us, we need to guarantee the consistency in the whole filter-chain), by asking the first filter latching the clusterinfo/metadata to the requestinfo, downstream filters refer to the requestinf.perRequestState to get the metadata. But we/I think this adds too much constraints to customer's implementation, and better we solve this problem upstream.

@mattklein123
Copy link
Member

A minor problem with TLS is that there is a eventual consistency problem between RDS and CDS

This is actually a major problem in the sense that there is no guarantee that a cluster continue to exist between filter invocations. It's possible that you latch cluster info when the request chain starts, but if a filter pauses, the cluster might be deleted before the request chain resumes. Filters need to handle this.

But we/I think this adds too much constraints to customer's implementation, and better we solve this problem upstream.

Would be happy for an upstream solution but I don't think there is any easy solution here which will cover all cases including cluster deletion mid request. I agree with you that TLS does not work for the cluster_header case (and that doesn't even consider deletion).

The more that I think about this I wonder if potentially we could have filters opt-in to cluster info caching if they take responsibility for the other lifetimes. For example, perhaps the first filter tells the route cache to re-cache with embedded cluster info, which would allocate and replace the cache entry with a route with embedded data? Where otherwise the default looks it up each time?

@alyssawilk
Copy link
Contributor

I'd really like it if refreshCachedRoute could optionally cache cluster info, either based on config or based on a filter telling it to, so filters can snag the "consistent" info for their current route. We have a bunch of filters already which latch the cluster info, and it's not clear to me if any thought was put into the route and cluster maybe changing mid-request and the local latched state being out of date. I don't think we can convert them all but maybe reach out to OWNERS?

@mattklein123
Copy link
Member

I'd really like it if refreshCachedRoute could optionally cache cluster info, either based on config or based on a filter telling it to, so filters can snag the "consistent" info for their current route.

Yeah, I think that might be the way to go. What if we just have the HCM cache cluster info at the same time, and then expose it via a filter callback? That would be a nice cleanup and allow filters to just grab cluster info via callbacks instead of fetching each time themselves?

@stevenzzzz
Copy link
Contributor Author

I'd really like it if refreshCachedRoute could optionally cache cluster info, either based on config or based on a filter telling it to, so filters can snag the "consistent" info for their current route.

Yeah, I think that might be the way to go. What if we just have the HCM cache cluster info at the same time, and then expose it via a filter callback? That would be a nice cleanup and allow filters to just grab cluster info via callbacks instead of fetching each time themselves?

This will do. It also avoids overhead.
I can work on it if everybody happy about it.

@mattklein123
Copy link
Member

This is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design proposal Needs design doc/proposal before implementation
Projects
None yet
Development

No branches or pull requests

4 participants