-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Singleton HttpClient doesn't respect DNS changes #18348
Comments
Default is 60secs and you can set it manually to whatever you want. https://github.com/dotnet/corefx/blob/d0dc5fc099946adc1035b34a8b1f6042eddb0c75/src/System.Net.Http.WinHttpHandler/src/System/Net/Http/WinHttpHandler.cs#L76 |
I don't think that's the same property. That's the connection timeout, not the ConnectionLeaseTimeout. As documented in https://msdn.microsoft.com/en-us/library/system.net.servicepoint.connectionleasetimeout.aspx, it should be dropped periodically in some scenarios. But there's no apparent way to change the behavior on .NET Core. |
@onovotny Yeah that's a weird thing that seems to do an explicit connection:close after a certain time frame. There is no reason that couldn't be implemented as a piece of middleware if you really wanted to. It's just a HTTP header. |
@darrelmiller so that's getting out of my area, not sure... but if there's a documented workaround that addresses the original issue, that would be a huge benefit. |
@onovotny, there is no |
@onovotny The original issue, as I understand it is that, on the server side, someone wants to update a DNS entry so all future requests to host A go to new IP address. The best solution in my opinion is when that DNS entry is changed, the application on host A should be told to return a Connection:close header to the client on all future responses. That will force all clients to drop the connection and establish a new one. The solution is not to have a timeout value on the client so the client sends a Connection:close when that "lease" timeout expires. |
This jives well with the question in Stackoverflow Unit testing DNS failover, using a custom DNS resolver and may give further context around this issue, so I'll cross-link (I'll remove here if inappropriate). |
This is absolutely an issue. The proposed solution by @darrelmiller seems to demonstrate a lack of understanding of the use case. We use Azure Traffic Managers to evaluate the health of an instance. They work on top of DNS with short TTLs. A host doesn't magically know it is unhealthy and start issuing Connection:close headers. The traffic manager simply detects the unhealthy state and adjusts DNS resolution accordingly. By design this is COMPLETELY TRANSPARENT to clients and hosts... as it stands, an application with a static HttpClient resolving DNS (unknowingly) by a traffic manager will never receive a new host when the previously resolved host becomes unhealthy. As of netcore 1.1 I'm currently searching for a resolution to this exact problem. I don't see one, nativly anyway. |
@kudoz83 You are correct, a host doesn't magically know when a traffic manager has adjusted the DNS resolution, hence why I said
If you don't believe that it is practical to notify an origin server that it should return connection:close headers then you are left with just a few choices:
And regarding me not understanding the use case. The original issue was based on a blog post where the use case was related to switching between production and staging slots and had nothing to do with health monitoring. Although both scenarios could benefit from being able to receive notifications of DNS/slot switches. |
AFAIK there is not a way to notify a failed host that it is failed? Generally this would mean it is in some kind of error state. The original post is talking about failover scenarios. Azure Traffic Manager is explained here https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-monitoring It operates at the DNS level and is transparent to clients and hosts - by design. Azure instances communicating with one another via Traffic Manager for DNS resolution are in a pickle without being able to set a TTL on DNS refresh for an HttpClient. The way HttpClient currently operates appears to be at odds with Microsoft's own Azure service offerings - which is the entire point. Not to mention, any similar service (E.G. Amazon Route 53) works exactly the same way, and has the exact same short DNS TTL dependency. Obviously I found this thread because I'm impacted by the issue. I'm simply trying to clarify that there doesn't appear to be an ideal workaround currently other than just arbitrarily recreating HttpClients (at the cost of performance and complexity) for the sake of making sure DNS failover succeeds. |
If a "failed" host could not execute any code then we wouldn't need the HTTP status code 503 because an origin server would never be capable of returning it. There are scenarios where a failed host may not be capable of processing a notification, but in those cases it probably isn't going to hold an active TCP/IP connection open for very long either. When Ali originally brought this topic up on Twitter, before he wrote the blog post, he was concerted about the fact that a client would continue making requests and getting responses from a server that had been swapped out. I understand your situation is different and that you cannot rely on an origin server being able to reliably close a connection. What I'm not sure I understand is why for your situation, you can't use a HttpMessageHandler to detect 5XX responses, close the connection and retry the request. Or even simpler, just convince your web server to return connection:close whenever it returns a 5XX status code. It's also worth noting that the behavior you are concerned isn't in HttpClient. It is in the underlying HttpHandler or even below that, and there are a number of different implementations of HttpHandler in use. In my opinion, the current behavior is consistent with the way HTTP is designed to work. I agree there is a challenge when DNS settings are updated while connections are open, but this isn't a .net problem. This is a HTTP architecture problem. That doesn't mean .net couldn't do something to make this less of a problem. I curious as to what you believe the solution should be. Do you think something like the ConnectionLeaseTimeout should be re-implemented that simply drops a connection periodically regardless of whether there has been a failover or not? Or do you have a better solution? |
Sorry, I Shouldn't have come off as combative as I sounded yesterday. Was having a bad day. Honestly I'm not aware how ConnectionLeaseTimeout works beneath the covers. The desired functionality would be for HttpClient to be configurable so as to perform a new DNS lookup, prior to establishing new connections as opposed to caching the previous DNS lookup until the client is disposed. I don't believe this requires killing an existing connections... |
@kudoz83 No worries, we all have days like that :-) From what I understand ConnectionLeaseTimeout is literally a timer that periodically closes an idle connection from the client side. It is implemented in ServicePointManager, which doesn't exist in core. This is different than the "idle connection" timeout which only closes a connection once it has not be used for a certain period of time. There is likely a Win32 call that would allow flushing the DNS client cache. That would ensure new connections do a DNS lookup. I took a quick look for it but didn't find it, but I'm sure it is there somewhere. In .net core on Windows, the management of HTTP connections is actually left to the OS. The closest you get is the call to the Win32 API WinHttpOpen https://github.com/dotnet/corefx/blob/master/src/System.Net.Http.WinHttpHandler/src/System/Net/Http/WinHttpHandler.cs#L718 There are no public APIs that I can find to do anything with the _sessionHandle. To get the kind of behavior change you are looking for would require the OS folks to make changes. At least that's how I understand it. Wouldn't it be easier to get Azure Traffic Manager just to set a very low TTL on the DNS entries? If you aren't concerned about long lived connections, then a low TTL should minimize the chance of new connections being setup against a dead server. The upside of the way WinHttpHandler is built is that we get things like HTTP/2 for free with the OS update. The downside is we don't get a whole lot of control in managed code. |
It seems that |
I think there's some miscommunication here: DNS TTLs are completely unrelated to HTTP/TCP connection lifetimes. New HTTP connections do a DNS lookup and connect to the returned host address. Multiple HTTP requests can be made over the same single connection but once opened the HTTP connection (and the underlying TCP connection) does not do a DNS lookup again. This is correct behavior so even if you make 1000's of requests over a connection that lasts for days, the DNS resolution doesn't matter since the connection hasn't changed. If you make a new HTTP request over a new HTTP connection then you will see a DNS lookup for the new connection. This DNS refresh has nothing to do with Default settings are meant to have connections stay open indefinitely for efficiency. If the connected host becomes unavailable then the connection will be broken automatically, causing a new connection with a new DNS lookup, however DNS TTLs will not cause any new connections. If you need that, you'll have to code that into your app. If you want another DNS lookup to be done, you need to reset the connection on the client. You can force a max-lifetime of the TCP connection, send a Hope that helps, if I'm off on this then please let me know. |
Yes DNS TTLs are fairly unrelated, but not completely.
Yes, that is correct. But Windows has a local DNS cache that respects the TTL. If the TTL is high and the DNS server has changed the DNS record, then the new connection will be opened against the old IP address due to client caching of the DNS record. Flushing the client DNS cache is the only workaround to this that I am aware of.
Correct again. In a production/staging swap where the swapped out server is still responding, this will cause many future requests to go to the old server, which is bad. However, the most recent conversation has been about the scenario where a server fails and the Traffic Manager swaps to a new server. What happens to the existing connection to the failed server is unknown. If it is a severe hardware failure, then there is a good chance the connection will be dropped. That would force the client to re-establish a connection and that is where a low TTL is useful. If however, the server failure doesn't break the server and it simply returns 5XX responses then it would be useful for the server to return connection:close to ensure that future requests will be made on a new connection, hopefully to the new working server
Yes and no. On .Net core, the HttpClientHandler has been re-written and no-longer uses HttpWebRequest and therefore doesn't use ServicePointManager. Hence the reason for this issue. I took a closer look at IIS and there doesn't appear to be a way to get 5XX responses to come with connection:close headers. It would be necessary to implement a HttpModule to do that magic. |
As an FYI, here's a related post covering this issue and an interim solution. Beware of HttpClient |
While we have brought ServicePointManager and HttpWebRequest for platform parity, it doesn't support most of the functionality in .Net Core as the underlying stack resolves to either WinHTTP or Curl which do not expose the advanced control knobs. Using ServicePointManager to control HttpClient is actually a side-effect on using HttpWebRequest and the managed HTTP stack on .Net Framework. Since WinHttpHandler/CurlHandler are used on .Net Core, SPM has no control over HttpClient instances.
Here's WinHTTP team's answer:
To force clients to the next IP, the only viable solution is ensuring that the server rejects new connections and closes existing ones (already mentioned on this thread). Given other network appliance devices that are themselves caching DNS (e.g. home routers/APs) I wouldn't recommend DNS fail-over as a solution for low-latency, high availability technique. If that is required, my recommendation for a controlled fail-over would be having a dedicated LB (hardware or software) that knows how to properly drain-stop.
As mentioned above, if by unhealthy you mean that the server returns a HTTP error code and closes the TCP connection while you observe the DNS TTL expired, this is indeed a bug. Please describe your scenario in more detail:
|
If you connect to a machine and the TCP/HTTP connection continues to work, why should you at some point assume the server is no longer appropriate. |
@manigandham How? It is not disposable and the only relevant method I can see is |
.NET Core 2.0 brings back the You can go back to using the connection lease timeout to set a max timeframe for active connections before they get reset. Nothing exotic about closing the connection while With the connection lease timeout setting you can ensure this is done only after a request is sent and reset before the next one over the time limit. |
Actually, it doesn't bring back the functionality completely. The API surface came back. But it is basically a no-op since the HTTP stacks don't use that object model. |
I'd consider many of the participants in this thread to be something of a "dream team" in this subject matter, yet I'm still not sure if any consensus has been reached on what the best practices are given what we have today. It would be great to get a decisive answer from this group on a couple key points. Let's say I have a general purpose HTTP library targeting many platforms, not all of which support
(Not a hypothetical situation btw; I'm wrestling with these exact questions right now.) Thanks! |
I wouldn't even be invited to the dream team tryouts, but I do have one insight regarding your singleton As for the DNS issue, the consensus here seems to be that:
If this is not viable for you, Darrel Miller suggested some more options here: https://github.com/dotnet/corefx/issues/11224#issuecomment-270498159. |
@ohadschn Thanks, and hopefully I'm not getting too off topic, but is there any benefit at all to sharing the same HttpClient (or HttpClientHandler) between multiple hosts? I could be wrong but it seems to me that a new host means new DNS lookup, new connection, new socket. Can a socket in TIME_WAIT be reclaimed and re-purposed for a different host? I had assumed not, but we're definitely reaching the limits of my expertise. If it can, perhaps this is the benefit of using an HttpClient singleton even with multiple hosts? Regarding the DNS advice, I felt like that's where consensus was lacking most, and understandably so. Ensuring the server behaves correctly isn't viable if you don't have control of what's happening on that end. I could design something where instances are disposed periodically. The question is whether that's worth the trade-off in the majority of cases. I guess that's largely up to me to make a judgement call. :) |
To me the most obvious benefit to sharing the same Regarding DNS, you mentioned one possible approach when you don't control the server. You could also implement a mechanism that periodically queries the DNS and disposes instances only if an actual change occurred... |
While the given answers that specify what should happen on the server-side are accurate for an ideal world, in the real world developers often do not have control of the full server-side stack, that they are communicating with. In that regard I really like the concept of a ConnectionLeaseTimeout, as it is a concession to the realities of the daily life of John and Jane Doeveloper. public class ConnectionLeaseTimeoutHandler : DelegatingHandler
{
private static string GetConnectionKey(Uri requestUri) => $"{requestUri.Scheme}://{requestUri.Host}:{requestUri.Port}";
private readonly IClock clock;
private readonly Duration leaseTimeout;
private ImmutableDictionary<string, Instant> connectionLeases = ImmutableDictionary<string, Instant>.Empty;
public ConnectionLeaseTimeoutHandler(HttpMessageHandler innerHandler, IClock clock, Duration leaseTimeout)
: base(innerHandler)
{
this.clock = clock;
this.leaseTimeout = leaseTimeout;
}
protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
string key = GetConnectionKey(request.RequestUri);
Instant now = clock.GetCurrentInstant();
Instant leaseStart = ImmutableInterlocked.GetOrAdd(ref connectionLeases, key, now);
if (now - leaseStart > leaseTimeout)
{
request.Headers.ConnectionClose = true;
ImmutableInterlocked.TryRemove(ref connectionLeases, key, out leaseStart);
}
return base.SendAsync(request, cancellationToken);
}
} |
@snboisen I think that looks right but don't hold me to that. :) For me the bigger question is whether sending a
I respect Darrel's opinion and it makes this whole approach seem rather hackish. On the other hand, I agree that what's a developer to do who doesn't have control of what's happening on the server side? If it works sometimes, maybe it is the best option we have? It's certainly "cheaper" than periodically disposing HttpClients or periodically doing DSN lookups. |
@onovotny @karelz @stephentoub, Would you all concur that this is an accurate summary of how best to solve the original singleton HttpClient/DNS problem?
As I continue to learn things from this thread and others (such as |
You can always recycle the instance on regular basis (e.g. just create new instance and set it to static variable which others use). |
@karelz Thank you, I think this provides me a way forward. Perhaps I should just skip the platform sniffing and recycle instances everywhere, or is advisable to use the other approaches where available (to avoid overhead, etc.)? The tricky bit might be knowing when it's safe to dispose recycled instances (I need to worry about concurrency/pending requests), but I can take a peek at how |
@tmenier You do not want to recycle instances every |
@KallDrexx Right, I'm just talking about periodically recycling a singleton (or "pseudo"-singleton), such as every few minutes. I do believe it will involve managing a pool and some kind of delayed disposal of dead instances, once it's determined they have no pending requests. |
Ah sorry misunderstood what you meant by recycle. |
You should not need to dispose old ones. Let GC collect them. |
I’ve found that not disposing them can lead to slowdowns as you run out of http connections in high performance scenarios. |
@tmenier About using SocketsHttpHandler.PooledConnectionLifetime in In .NET Core 2.1, the creation of HttpClientHandler in completely encapsulated in https://github.com/dotnet/wcf/blob/master/src/System.Private.ServiceModel/src/System/ServiceModel/Channels/HttpChannelFactory.cs. Is there a way to do this when using the WCF client? |
@edavedian I'd suggest to ask on dotnet/wcf repo. cc @mconnew @Lxiamail |
Is there in .Net Core an HTTP client that works over sockets? |
Yes, SocketsHttpHandler, and as of 2.1 it's the default. |
Does I see that caching is only done in |
HttpClient does not do any caching. |
@spiritbob it's not clear what form of caching you're referring to. This issue is closed; if you would please open a new issue with questions we can help you find answers. |
@scalablecory I believe davidsh responded with what I had in mind. Thanks! |
Just finished reading this thread, and it seems like there was a use case that wasn't addressed. When using If not, what is the workaround to make I'm currently using |
As described in the following post: http://byterot.blogspot.co.uk/2016/07/singleton-httpclient-dns.html, once you start keeping a shared
HttpClient
instance to improve performance, you'll hit an issue where the client won't respect DNS record updates in case of failover scenarios.The underlying issue is that the default value of
ConnectionLeaseTimeout
is set to-1
, infinite. It'll only close on dispose of the client, which is very inefficient.The fix is to update the value on the service point like this:
Unfortunately, there's no way to do this with .NET Core today.
Either ServicePointManager should be brought over to .NET Core or similar equivalent functionality should be enabled some other way.
The text was updated successfully, but these errors were encountered: