Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalizing config overrides via headers #1935

Closed
wants to merge 1 commit into from

Conversation

nmittler
Copy link

Istio currently supports overriding a few elements of configuration via http headers (https://envoyproxy.github.io/envoy/configuration/http_filters/router_filter.html#http-headers).

This PR is a work in progress. It proposes a more general approach for supporting these overrides in a way that is more transparent from the application code.

The basic idea is to convert the cluster configuration early on from protobuf to a new type ClusterConfig(Impl). ClusterInfo then has been made to extend OverridingClusterConfig which supports overriding config values with headers. All of the application code will start to use ClusterConfig, rather than the protobuf class so that the overridden values are always available.

Istio currently supports overriding a few elements of configuration via http headers (https://envoyproxy.github.io/envoy/configuration/http_filters/router_filter.html#http-headers).

This PR is a work in progress.  It proposes a more general approach for supporting these overrides in a way that is more transparent from the application code.

The basic idea is to convert the cluster configuration early on from protobuf to a new type ClusterConfig(Impl). ClusterInfo then has been made to extend OverridingClusterConfig which supports overriding config values with headers. All of the application code will start to use ClusterConfig, rather than the protobuf class so that the overridden values are always available.
@nmittler
Copy link
Author

FYI @mattklein123 @louiscryan

This PR is a work in progress with the eventual goal of supporting new http header overrides to control circuit breaking.

@mattklein123
Copy link
Member

This is a giant change. Before even starting to review, can you please provide a list of things that you want to override that are not currently overridable? It's not clear that a change of this magnitude is required.

@nmittler
Copy link
Author

@mattklein123 Yeah understandable. My immediate need is to change cluster.OutlierDetection.BaseEjectionTimeMS, but I suspect other circuit breaking params may need to be overridable eventually as well.

The issue I saw was that the Detector currently uses DetectorConfig which is a copy of the initial parameters, but it never has access to the headers as they're received. Rather than making the Detector header-aware, I thought a better approach would be to make the configuration overridable, so that the detector will just always have access to the latest config.

@mattklein123
Copy link
Member

@nmittler I don't think changing the outlier detection base ejection interval on a per request basis really makes sense, as this is a cluster global setting that is utilized in a deferred manner. IMO header overrides should be on clearly per request things.

Can we step back to the scenario you are trying to enable? Why do we think that this needs to be overridden by a header in the request flow? Instead, should we potentially have an admin endpoint that can alter runtime settings (runtime is generally already wired up to allow changing things).

@nmittler
Copy link
Author

@mattklein123 sure, yeah good idea ...

I'm trying to provide an adapter layer in Istio for Spring Boot applications using Hystrix for circuit breaking. The basic idea is to have Istio recognize @HystrixCommand annotations on service methods and to convert these parameters into a configuration update for the Envoy sidecar. We had considered a couple of ways this could be done:

  1. Go through Pilot/Config DB. This would have each service backend push out a circuit breaking route rule for the service. Each backend would be pushing duplicate configuration, so there would have to be some work to avoid the duplication. This seemed overly complex, however, since what the application really just wants to configure it's local Envoy instance.

  2. Add support in Envoy to allow overriding circuit breaker configuration via headers. This would allow the application to have direct control over their Envoy instance without concern for duplication. Changing the configuration on a cluster-wide basis would be exactly what we'd want in this case (e.g. from now on, any messages going to abc.com from this Envoy instance should use this circuit breaker config). Each message that includes the same headers used previously would simply be a no-op.

So Envoy seems to be the path of least resistance ... but the real purpose of this PR is to start the discussion.

@htuch
Copy link
Member

htuch commented Oct 25, 2017

Architecturally, (1) is quite a bit cleaner, we have a single source of truth for cluster config, it's the management server. Changing this on the data path seems significantly more complicated from an Envoy perspective and also breaks the separation between control/data paths. But, at the end of the day, it's just software.

@louiscryan
Copy link

louiscryan commented Oct 25, 2017 via email

@htuch
Copy link
Member

htuch commented Oct 25, 2017

@louiscryan Envoy has support for modifying per-request settings for point config items that relate to that specific request. It doesn't have support for persisting changes to config provided via header on the request path in the current implementation.

Why can't the app signal to Pilot that there needs to be a config change? Is this because the app is not aware of Pilot? If there is code being injected into the app runtime anyway (which seems to be the case, to add these headers), can't this code tell it to speak to Pilot for making cluster changes that are intended to persist?

@louiscryan
Copy link

louiscryan commented Oct 25, 2017 via email

@mattklein123
Copy link
Member

There are a few things here to unpack:

  1. Configuration that is per-request, per-cluster, global, etc. I have zero issues with per-request overrides. We have lots of them already. There are some things missing and we can add them easily.

  2. The configuration originally references in this PR (outlier ejection base eject time), is an async concept to the request flow, and is per cluster. It's extremely confusing that it might be set to different values by different requests.

If we are specifically talking about Java and annotations, could we not do the following:

  1. For per-request overrides, allow method annotations and generate Envoy headers.
  2. For global/cluster overrides, load/parse all annotations via reflection at app load time, make sure they are sane, and then issue modifications to the local envoy, either via admin endpoint, "runtime" modification, etc.?

@nmittler
Copy link
Author

nmittler commented Oct 25, 2017

@mattklein123,

  1. For global/cluster overrides, load/parse all annotations via reflection at app load time, make sure they are sane, and then issue modifications to the local envoy, either via admin endpoint, "runtime" modification, etc.?

Figuring this out at app load time will be non-trivial. All we have to go on are the Hystrix annotations of this form:

@Service
public class MyService {
  // This annotation indicates that all outbound calls from this method should have the
  // given circuit breaker configuration applied.
  @HystrixCommand(commandProperties = {
                @HystrixProperty(name = "circuitBreaker.sleepWindowInMilliseconds",
                                 value = "10000")
        })
  public void doSomething() {
     ...
    // At runtime, we make a request that should have the circuit breaker config applied.
    restTemplate.get(url, ...);
     ...
  }
}

It's not easy to determine which endpoint the method will talk to. We could try reading the bytecode for the method, but then we'll also have to read the bytecode of every method that the method calls recursively until we identify the request URL. A non-trivial approach and it's not clear how well it will work in practice.

The approach we've gone with in our initial prototyping has been to dynamically determine the URL by using an interceptor in Spring's RestTemplate that first checks the URL and applies the circuit breaker config if needed.

So it's a cluster-wide config that is applied dynamically when the first request is made to that cluster.

@mattklein123
Copy link
Member

@nmittler I think we probably need to see a list of all the annotations you are looking to support. And presumably you are trying to do this on top of existing code?

I don't think it's reasonable that we should put what I consider to be large hacks into Envoy to cover every possible use case. I think we can figure out a way to deal with this, but seeing a full list will be good.

For example, for global/cluster settings, you could potentially have an annotation handler that lazily figures out the intention and then calls Envoy admin/runtime endpoints to change settings.

@nmittler
Copy link
Author

@mattklein123

For example, for global/cluster settings, you could potentially have an annotation handler that lazily figures out the intention and then calls Envoy admin/runtime endpoints to change settings.

That seems reasonable. I'm new to Envoy ... could you point me to the appropriate API?

@mattklein123
Copy link
Member

That seems reasonable. I'm new to Envoy ... could you point me to the appropriate API?

Said API does not exist. :)

My recommendation here is to close this PR and open an issue with all of the annotations you want to support (or just link to a spreadsheet), and we can sort out a complete solution?

@nmittler
Copy link
Author

@mattklein123 sgtm, thanks!

@nmittler nmittler closed this Oct 26, 2017
@nmittler
Copy link
Author

@mattklein123 after thinking about it some more, I don't think that using an admin API will work. This would effectively make the application a secondary Pilot, where we're distributing configuration to Envoy. The problem is that the real Pilot will eventually see that it's config differs from that in Envoy and will overwrite it.

Using headers wouldn't suffer from the same problem because they would be internal overrides, which Pilot would never see.

If headers aren't an option, then it seems we have to go the long way around town: config DB->Pilot->Envoy.

@mattklein123
Copy link
Member

@mattklein123 after thinking about it some more, I don't think that using an admin API will work. This would effectively make the application a secondary Pilot, where we're distributing configuration to Envoy. The problem is that the real Pilot will eventually see that it's config differs from that in Envoy and will overwrite it.

It will work if it's "runtime" settings that are meant to be overrides and that Pilot does not modify. As I said before, I really need to see an entire list of annotations to be able to help you further.

@nmittler
Copy link
Author

It will work if it's "runtime" settings that are meant to be overrides and that Pilot does not modify. As I said before, I really need to see an entire list of annotations to be able to help you further.

Ah, that makes sense. I think to be safe, we'd want to be able to change any of Istio's circuit breaker params:

  • cluster.MaxRequestsPerConnection
  • cluster.CircuitBreaker.Default.MaxConnections
  • cluster.CircuitBreaker.Default.MaxRequests
  • cluster.CircuitBreaker.Default.MaxPendingRequests
  • cluster.OutlierDetection.MaxEjectionPercent
  • cluster.OutlierDetection.BaseEjectionTimeMS
  • cluster.OutlierDetection.ConsecutiveErrors
  • cluster.OutlierDetection.IntervalMS
  • cluster.OutlierDetection.MaxEjectionPercent

@mattklein123
Copy link
Member

@nmittler did we open an issue on this? If not can we please open? I would rather track work items there. I have a good idea of how we can get this done.

@nmittler nmittler mentioned this pull request Oct 30, 2017
@nmittler
Copy link
Author

@mattklein123 Yup, sure thing ... done: #1967

@trabetti
Copy link
Contributor

trabetti commented Dec 6, 2017

@nmittler @louiscryan we are working on issue #1244 - Adding ability for Envoy to output hystrix event stream for monitoring activity using hystrix dashboard.
The main challenge we have is how to address the differences between the two circuit breaking mechanisms, so that what user sees in Hystrix dashboard makes sense. I am interested to know how you addressed this. I would appreciate if you could take a look at the gdoc we linked there and give your opinion.

@nmittler
Copy link
Author

nmittler commented Dec 8, 2017

@trabetti My initial work on supporting Spring+Hystrix in Istio has been temporarily shelved, as it's blocked by #514.

This work is focused on getting circuit breaking working and is currently ignoring metrics altogether. Even without metrics, there was a significant mismatch between configuration parameters. My approach was to simply ignore all Hystrix command parameters except for circuitBreakerSleepWindowInMilliseconds, which was the only parameter that had an Istio counterpart (sleepWindow).

I took a look through the doc and it seems reasonable. I think the metrics dashboard will just have some holes in it for systems running on Istio.

jpsim pushed a commit that referenced this pull request Nov 28, 2022
This wrapper allows developers to quickly get off the ground without
having to install bazel or bazelisk manually.

Signed-off-by: Keith Smiley <[email protected]>
Signed-off-by: JP Simard <[email protected]>
jpsim pushed a commit that referenced this pull request Nov 29, 2022
This wrapper allows developers to quickly get off the ground without
having to install bazel or bazelisk manually.

Signed-off-by: Keith Smiley <[email protected]>
Signed-off-by: JP Simard <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants