Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return 503 on empty backend #1748

Merged

Conversation

m3co-code
Copy link
Contributor

@m3co-code m3co-code commented Jun 14, 2017

This PR changes the response code Traefik sends to 503, given it has an empty backend.

This should resolve the issue #1688. In the issue it is furthermore mentioned that a 503 should also be sent when a backend is completely missing. At the moment though, a frontend is not loaded in the loadConfig method, given there is no dedicated backend for it. See here. This looks to me that this case is actually not intentioned to happen. @dtomcej and @timoreimann you had quite some discussions about this topic, WDYT about that case?

Fixes #1688

@m3co-code m3co-code force-pushed the 1688-respond-503-on-missing-or-empty-backend branch from 3cccc1d to 3577218 Compare June 14, 2017 10:12
@timoreimann
Copy link
Contributor

@marco-jantke If I look at the Kubernetes provider (which we kind of implicitly used as the canonical implementation for a proper 404/503 distinction), what we do there is start off with a non-empty backend as soon as we have a legitimate Ingress, remove the entire frontend if the Service turns out to be missing, and populate the backend servers if we have found proper Endpoints. With this approach, the Kubernetes provider produces nil backends only if the Ingress doesn't include any paths (because this loop will never enter its body), and otherwise creates backends that fit the specification.

So overall, this should work. nil backends not causing a frontend to be built (and, in turn, leading to a 404) sounds like a reasonable assumption since it's easy to end up with an error. Reaching a 503 should require some (minimum) effort with each provider.

@dtomcej WDYT?

@m3co-code m3co-code force-pushed the 1688-respond-503-on-missing-or-empty-backend branch from 3577218 to 5bb3e69 Compare June 14, 2017 13:59
@dtomcej
Copy link
Contributor

dtomcej commented Jun 14, 2017

I agree with the scenario @timoreimann proposed.

@m3co-code
Copy link
Contributor Author

Ok, when I understand it correctly, than this PR should actually deliver the needs we have for the 503 responses. 👍 for the quick responses, thanks a lot!

Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One test design suggestion, but otherwise :+1

},
},
}
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a parameterizable helper function or (maybe even better) apply the builder pattern to construct the frontends and backends for test purposes?

We have tests like the ones in provider/kubernetes/kubernetes_test.go which are using inline structs so extensively, it has become fairly hard to understand and extend the tests. I'd like to make sure we're not doing the same for newish tests.

Copy link
Contributor

@ldez ldez Jun 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice article recommendation. I applied the pattern there and I think the readability of the tests improved quite a lot. Furthermore I tried to keep the builder generic, so that they can easily be reused and extended by other tests in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big improvement indeed! Me ❤️ it too!

timoreimann
timoreimann previously approved these changes Jun 15, 2017
Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@djalal
Copy link
Contributor

djalal commented Jun 16, 2017

@timoreimann what about adding logs ?

an "empty backend" is not a nominal state, and should throw a WARNING, just like events "no route to host" or "connect timeout"

@timoreimann
Copy link
Contributor

@djalal the events preceeding a potential 503 should eventually be logged in each provider. The Kubernetes one does this already, others should follow. I think this should be the first priority as you likely want to have a logging notification before messages actually get routed incorrectly.

I saw you created a dedicated issue too, so I'll place the second part of my answer there. :-)

@emilevauge emilevauge dismissed timoreimann’s stale review June 21, 2017 09:01

We still have to complete the design review :)

@timoreimann
Copy link
Contributor

timoreimann commented Jun 22, 2017

@emilevauge what part of the design review is still missing from your perspective?

We seem to have discussed the topic to great length on Slack.

@m3co-code
Copy link
Contributor Author

Is there anything I can do to bring this PR forward?

@emilevauge
Copy link
Member

@marco-jantke Should we return 503 each time RebalancerErrorHandler is called? Does RebalancerErrorHandler really fit our needs here?

@m3co-code
Copy link
Contributor Author

m3co-code commented Jul 5, 2017

@emilevauge I checked the implementation of the load balancers in detail now.

The case I am describing holds true for the RoundRobin LB as well as for the Rebalancer LB implementation. They call the error handler you pass them basically in two cases:

  1. nextServer returns an error. This is because they can not find any valid server in the list they can route to. For that case it is correct to return the 503.
  2. when sticky sessions are enabled and the server in the cookie is not yet available anymore. For that case currently we also serve a 503, but before we served a 500 because of the default error handler implementation that was used.

I think for case number 2 even the former implementation is not complete. I can't see anything that would invalidate the cookie again at some point and clients with that cookie won't be able to connect to a new Server, when the server list of the Backend changes. AFAICS we should just remove the error handling in the case this happens and try to connect to a new Server in the backend list, for the following two places. When it can select a new Server it will set the cookie again, effectively overriding the old cookie and in case no Server could be found we want to still have our 503:

WDYT about it?

@emilevauge
Copy link
Member

emilevauge commented Jul 5, 2017

@marco-jantke I'm sorry but I don't agree to use RebalancerErrorHandler / ErrorHandler as a workaround to know if a backend is empty or not. We should not assume why an error is set in the errHandler by the load balancer. It could even change in the future.
Couldn't you use the number of servers in a backend directly?

@m3co-code
Copy link
Contributor Author

I get your point. I actually did not perceive using the LoadBalancers ErrorHandler as a workaround, as they are kind of the canonical place to keep track of the service place. Anyway I think we can just directly check when building the Backend whether it has no Servers and use an http.Handler that directly returns 503 without the need of even setting up LoadBalancers or the Forwarder. Does this sound like an approach you imagine? I would give it a shot, should be easy to verify as my tests should not change.

As additional point, WDYT about the "potential problem" I described above regarding sticky sessions and a change of the Server list?

I think for case number 2 even the former implementation is not complete. I can't see anything that would invalidate the cookie again at some point and clients with that cookie won't be able to connect to a new Server, when the server list of the Backend changes. AFAICS we should just remove the error handling in the case this happens and try to connect to a new Server in the backend list, for the following two places. When it can select a new Server it will set the cookie again, effectively overriding the old cookie and in case no Server could be found we want to still have our 503. Rebalancer LoadBalancer

I had no time yet to proof this point and I am not sure whether I will have time in the near future as sticky sessions are not on my list of required features, but maybe it is worth opening an issue for this to enable further investigation in the future or by others?

@emilevauge
Copy link
Member

Anyway I think we can just directly check when building the Backend whether it has no Servers and use an http.Handler that directly returns 503 without the need of even setting up LoadBalancers or the Forwarder. Does this sound like an approach you imagine? I would give it a shot, should be easy to verify as my tests should not change.

Indeed, that's a simple solution. The only issue is that healthchecks can add/remove servers from a lb dynamically. So this solution can't be that simple :)

On the sticky session thing, couldn't we just make the cookie expire in this case?

@timoreimann
Copy link
Contributor

timoreimann commented Jul 6, 2017

From what I can see, the sticky session implementation does reroute to a different server if the one associated with the cookie is knowingly gone.

Here's the flow how I see it:

  1. The ServeHTTP method is invoked.
  2. GetBackend is invoked with the list of currently known servers (which should also be the one that the health check logic manipulates).
  3. For a known cookie, GetBackend checks if the if the recorded server is still alive. This is the case if the URL from the cookie can be found within the list of servers (called haystack in the code).
  4. If the server cannot be found, we try another one unless we have depleted them all. Only in the latter case we seem to call the error handler.
  5. While we do update the sticky cookie with the new URL, we unfortunately call http.SetCookie(). Unlike the name of the function may imply, it only adds another cookie header (of which they can be multiple) but does not set/overwrite an existing one.

@m3co-code m3co-code force-pushed the 1688-respond-503-on-missing-or-empty-backend branch from 975dbf0 to b04da6d Compare July 7, 2017 08:47
@m3co-code
Copy link
Contributor Author

m3co-code commented Jul 7, 2017

@timoreimann thanks for the clarification on the cookie issue! I in fact misinterpreted the behaviour and apart of the already open issues there seems to nothing wrong. To clarify a bit more, it is right to use the Set-Cookie header in the Servers response. In fact there is no other way to set the cookie from the server-side AFAIK. In the issue #1744 its only the problem that multiple different paths are specified and therefore the cookie gets seemingly re-created. A cookie is considered unique and will be overwritten in case the cookie_name, domain and path parts are equal. Said this, I think further communication about the cookie issue should be moved to the dedicated issue then.

@timoreimann
Copy link
Contributor

@marco-jantke agreed to moving the cookie discussion to a different issue.

Final comment: SetCookie does not seem to overwrite identical (per your definition) cookies.

@emilevauge
Copy link
Member

@marco-jantke can we also agree to change the design, and directly use the number of servers in backend (being careful to health-checks) ?

@m3co-code
Copy link
Contributor Author

Yes, I had a look at the implementation and this sounds reasonable to me. So I would write a middleware that gets a healthcheck.LoadBalancer and on each request checks whether there are active Servers. If this is not the case it will directly serve a 503 response. Does this sound reasonable to you?

I will update this PR accordingly.

@m3co-code m3co-code force-pushed the 1688-respond-503-on-missing-or-empty-backend branch 3 times, most recently from ab87dad to fcf7761 Compare July 10, 2017 10:20
@m3co-code
Copy link
Contributor Author

@emilevauge @timoreimann PTAL again. I refactored the implementation now to a middleware that knows about the healthcheck.LoadBalancer implementation and responds with a 503, given a Backend is empty.

timoreimann
timoreimann previously approved these changes Jul 10, 2017
Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit-pick left.

I really like the middleware-approach. ❤️ Middleware all the things!

handler := NewEmptyBackendHandler(&healthCheckLoadBalancer{test.amountServer}, nextHandler)

recorder := httptest.NewRecorder()
req := httptest.NewRequest("GET", "http://localhost", nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/"GET"/http.MethodGet/

Copy link
Contributor Author

@m3co-code m3co-code Jul 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always.. So thanks for reminding me constantly Timo. At some point I will also learn it, don't give up the hope :D 🐼

@timoreimann timoreimann dismissed their stale review July 10, 2017 10:36

Oops, missed that design review is still outstanding. Withdrawing my approval for formal reasons, even though I like the PR from both design and code review perspectives.

@emilevauge
Copy link
Member

Design LGTM
Thanks @marco-jantke :)

Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reinstating my code review LGTM. 👍

@m3co-code m3co-code force-pushed the 1688-respond-503-on-missing-or-empty-backend branch from 2ca3ff7 to f1cad9d Compare July 18, 2017 09:54
Copy link
Contributor

@ldez ldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@dtomcej dtomcej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

:shipit:

Copy link
Contributor

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐯

@ldez ldez force-pushed the 1688-respond-503-on-missing-or-empty-backend branch from f1cad9d to 0c2a899 Compare July 19, 2017 17:11
@ldez ldez merged commit 074b31b into traefik:master Jul 19, 2017
@ldez ldez changed the title return 503 on empty backend Return 503 on empty backend Aug 14, 2017
@ldez ldez added this to the 1.4 milestone Oct 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/middleware kind/enhancement a new or improved feature.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Return 503 on missing or empty backend (with available frontend)
7 participants