Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wildcard hostname match not possible #1637

Open
szuecs opened this issue Dec 7, 2020 · 16 comments
Open

wildcard hostname match not possible #1637

szuecs opened this issue Dec 7, 2020 · 16 comments

Comments

@szuecs
Copy link
Member

szuecs commented Dec 7, 2020

There are some issues specific to Kubernetes:

  1. ingress does not allow this setting, but we can weaken RouteGroup to support it
  2. You can't specify "*.example.org" because of the regexp match that enforces [a-z0-9]. in CRD
  3. "*.example.org" would need special handling, because it should be treated as Host(/[a-z0-9]+[.]example[.]org/)

The matching would need to be done carefully such that an ingress with www.example.org is not caught by a RouteGroup with *.example.org

An example how we could use Weight():

 ./bin/skipper -inline-routes='r0: Host(/www[.]example[.]org/) -> status(200) -> <shunt>; rstar: Host(/[a-z0-9]+[.]example[.]org/) && Weight(0.1) -> status(418) -> <shunt>;'
[APP]INFO[0000] Expose metrics in codahale format
[APP]INFO[0000] support listener on :9911
[APP]INFO[0000] proxy listener on :9090
[APP]INFO[0000] TLS settings not found, defaulting to HTTP
[APP]INFO[0000] route settings, reset, route: r0: Host(/www[.]example[.]org/) -> status(200) -> <shunt>
[APP]INFO[0000] route settings, reset, route: rstar: Host(/[a-z0-9]+[.]example[.]org/) && Weight(0.1) -> status(418) -> <shunt>
[APP]INFO[0000] route settings received
[APP]INFO[0000] route settings applied
127.0.0.1 - - [07/Dec/2020:17:26:57 +0100] "GET / HTTP/1.1" 200 0 "-" "curl/7.49.0" 0 www.example.org - -
127.0.0.1 - - [07/Dec/2020:17:27:02 +0100] "GET / HTTP/1.1" 418 0 "-" "curl/7.49.0" 0 foo.example.org - -
127.0.0.1 - - [07/Dec/2020:17:27:04 +0100] "GET / HTTP/1.1" 200 0 "-" "curl/7.49.0" 0 www.example.org - -
127.0.0.1 - - [07/Dec/2020:17:27:07 +0100] "GET / HTTP/1.1" 418 0 "-" "curl/7.49.0" 0 foo.example.org - -

Right now the dataclient kubernetes would create errors while creating a route for host with *.example.org:

[APP]time="2020-12-07T16:32:30Z" level=info msg="route settings, update, route: kube_rg__default__my_route_group_1__all__2_0: PathSubtree(\"/\") && Host(\"^(*[.]example[.]org|my-rg1[.]example[.]org && Weight(0.1) -> status(418) -> inlineContent(\"I'm a teapot\") -> <shunt>"
[APP]time="2020-12-07T16:32:30Z" level=info msg="route settings, update, route: kube_rg______example_org__catchall__0_0: Host(\"^(*[.]example[.]org)$\") -> <shunt>"
[APP]time="2020-12-07T16:32:30Z" level=info msg="route settings received"
[APP]time="2020-12-07T16:32:30Z" level=info msg="filterRoutes incoming=148 outgoing=148"
[APP]time="2020-12-07T16:32:30Z" level=error msg="kube_rg______example_org__catchall__0_0 [22]: error parsing regexp: missing argument to repetition operator: `*`"
[APP]time="2020-12-07T16:32:30Z" level=error msg="kube_rg__default__my_route_group_1__all__1_0_https_redirect [35]: error parsing regexp: missing argument to repetition operator: `*`"
...```
@szuecs
Copy link
Member Author

szuecs commented Dec 7, 2020

@NorthFury @aryszka can this be considered safe to reduce weight by adding Weight(0.1)?

@aryszka
Copy link
Contributor

aryszka commented Dec 8, 2020

We need to change for this what we allow as the hostname. OTOH, I would try to avoid the solution based on the Weight, because it can lead to unforeseen long term challenges.

Would an approach like the following help here?

www: Host("www[.]example[.]org") -> "docs1";
others: Host("(foo|bar|baz)[.]example[.]org") -> "docs2";

This would limit the matched hostnames to fixed set, but I believe that should be managaeble, unless we face a situation where dozens/hunderds of hostnames, or an unknown set of hostnames need to be handled. In that case the following trick can work:

specialCase: Host(".*[.]example[.]org") && Host("www[.]example[.]org") -> "docs1";
genericCase: Host(".*[.]example[.]org") -> "docs2";

Note that we don't use the Weight() predicate. The additional Host() header for the specialCase can be added as a predicate in the route group.

@szuecs
Copy link
Member Author

szuecs commented Dec 8, 2020

This Host("(foo|bar|baz)[.]example[.]org") does not work if the owner of the wildcard domain does not know the hostnames in advance.

What I wonder was that when I did not had the Weight() I IIRC got 404, now it just work without Weight().

% ./bin/skipper -inline-routes='r0: Host(/www[.]example[.]org/) -> status(200) -> <shunt>; rstar: Host(/[a-z0-9]+[.]example[.]org/) -> status(418) -> <shunt>;'
[APP]INFO[0000] Expose metrics in codahale format
[APP]INFO[0000] support listener on :9911
[APP]INFO[0000] proxy listener on :9090
[APP]INFO[0000] TLS settings not found, defaulting to HTTP
[APP]INFO[0000] route settings, reset, route: r0: Host(/www[.]example[.]org/) -> status(200) -> <shunt>
[APP]INFO[0000] route settings, reset, route: rstar: Host(/[a-z0-9]+[.]example[.]org/) -> status(418) -> <shunt>
[APP]INFO[0000] route settings received
[APP]INFO[0000] route settings applied
127.0.0.1 - - [08/Dec/2020:15:31:36 +0100] "GET / HTTP/1.1" 418 0 "-" "curl/7.49.0" 0 foo.example.org - -
127.0.0.1 - - [08/Dec/2020:15:31:46 +0100] "GET / HTTP/1.1" 200 0 "-" "curl/7.49.0" 0 www.example.org - -
127.0.0.1 - - [08/Dec/2020:15:31:48 +0100] "GET / HTTP/1.1" 200 0 "-" "curl/7.49.0" 0 www.example.org - -
127.0.0.1 - - [08/Dec/2020:15:31:48 +0100] "GET / HTTP/1.1" 200 0 "-" "curl/7.49.0" 0 www.example.org - -
127.0.0.1 - - [08/Dec/2020:15:31:51 +0100] "GET / HTTP/1.1" 418 0 "-" "curl/7.49.0" 0 foo.example.org - -
127.0.0.1 - - [08/Dec/2020:15:31:52 +0100] "GET / HTTP/1.1" 418 0 "-" "curl/7.49.0" 0 foo.example.org - -
^C

@aryszka
Copy link
Contributor

aryszka commented Dec 8, 2020

for that case, when the owner doesn't know the hostnames in advance, I recommend the second example described above. Of course it still needs changes to what we allow in the route groups.

The 404 might have been just a mistake resulting in an invalid route, I cannot imagine a different case.

..., now it just works without Weight().

Careful! It's still an undefined behavior, if you restart skipper a couple of times, you will probably have different results, receiving only 418s.

@szuecs
Copy link
Member Author

szuecs commented Dec 8, 2020

for that case, when the owner doesn't know the hostnames in advance, I recommend the second example described above. Of course it still needs changes to what we allow in the route groups.

I don't think that approach is a great one. :)

The 404 might have been just a mistake resulting in an invalid route, I cannot imagine a different case.

I consider this as, I was doing something wrong. I could have tricked myself.

..., now it just works without Weight().

Careful! It's still an undefined behavior, if you restart skipper a couple of times, you will probably have different results, receiving only 418s.

I think we should consider this as bug and we should fix it long term (v1 ?) with a routing tree change. Maybe the internal proposed "hostname before path match" should be also considered. So a basically 2 tree lookups (or one tree with both), first host, second path to reduce the predicate list to match.

@NorthFury
Copy link
Member

NorthFury commented Dec 11, 2020

There is this option of having high priority predicates. Following through with this idea in the current code setup, one could have a HostStrict predicate that needs to be an exact match. This would need to behave similarly to how the headersExact matcher works. Basically this can add a hostExact property to the leafMatcher and in the matchLeaf function it would have to be checked before matching hostRxs.

Notes

  • names are just suggestions. the hostExact property suggestion is to match the existing headersExact property in the leafMatcher.
  • as far as the new predicate name goes I don't have a strong opinion except that it's not similar to the Path and PathRegexp predicates and it can't be without a breaking change.

@AlexanderYastrebov
Copy link
Member

There is this option of having high priority predicates.

If it is about

skipper/proxy/proxy.go

Lines 814 to 823 in 4dcbf4b

func (p *Proxy) lookupRoute(ctx *context) (rt *routing.Route, params map[string]string) {
for _, prt := range p.priorityRoutes {
rt, params = prt.Match(ctx.request)
if rt != nil {
return rt, params
}
}
return ctx.routeLookup.Do(ctx.request)
}

that already seemed adhoc to me

The problem of wildcard domain match feels similar to wildcard path matching.

@NorthFury
Copy link
Member

NorthFury commented Dec 12, 2020

The problem of wildcard domain match feels similar to wildcard path matching.

I would argue that wildcard domain matching is much more limited in scope. I would choose not to increase the complexity because of it, at least not without having good use cases for it.

And to explain my suggestion, there is already a list of "high priority" predicates, although not marked as such. In fact, the current Host predicate is part of that "high priority" predicates list. Here is where you can find them:

type leafMatcher struct {
wildcardParamNames []string // in reverse order
hasFreeWildcardParam bool
exactPath string
method string
weight int
hostRxs []*regexp.Regexp
pathRxs []*regexp.Regexp
headersExact map[string]string
headersRegexp map[string][]*regexp.Regexp
predicates []Predicate
route *Route
}

And here is where those are used:

skipper/routing/matcher.go

Lines 447 to 473 in aa497f6

func matchLeaf(l *leafMatcher, req *http.Request, path, exactPath string) bool {
if l.exactPath != "" && l.exactPath != path {
return false
}
if l.method != "" && l.method != req.Method {
return false
}
if !matchRegexps(l.hostRxs, req.Host) {
return false
}
if !matchRegexps(l.pathRxs, exactPath) {
return false
}
if !matchHeaders(l.headersExact, l.headersRegexp, req.Header) {
return false
}
if !matchPredicates(l.predicates, req) {
return false
}
return true
}

@tkrop
Copy link
Member

tkrop commented Dec 22, 2020

@szuecs While wildcards are fine, we could solve the problem also more efficiently with a host-suffix support instead of wildcards.

@szuecs
Copy link
Member Author

szuecs commented Dec 22, 2020

@tkrop the normal way of suffix match won’t be possible for DNS (only one wildcard until the dot is possible) and I am not yet talking about implementation details only about what to achieve. We could perfectly do in skipper a suffix search and enforce via jsonSchema.

@szuecs
Copy link
Member Author

szuecs commented Jan 19, 2021

I think we should do the work around mentioned in #1637 (comment) to add in all cases the Host(".*[.]example[.]org") predicate to all routes with Host() to ensure we have 2x Host() to have the priority for non wildcard definitions.
What we need to clarify, too is:

@AlexanderYastrebov
Copy link
Member

@szuecs

can this be considered safe to reduce weight by adding Weight(0.1)

I think it should be other way around - use Weight (or True) to give preference to exact match. Note that Weight requires integer argument we just do not validate it strict enough and round 0.1 to 0:

func parseWeightPredicateArgs(args []interface{}) (int, error) {
if len(args) != 1 {
return 0, errInvalidWeightParams
}
if weight, ok := args[0].(float64); ok {
return int(weight), nil
}
if weight, ok := args[0].(int); ok {
return weight, nil
}
return 0, errInvalidWeightParams
}

E.g. (note that correct wildcard regexp should be more complex https://stackoverflow.com/questions/106179/regular-expression-to-match-dns-hostname-or-ip-address):

./bin/skipper -inline-routes='www: Host(/www[.]example[.]org/) && Weight(1) -> inlineContent("www") -> <shunt>; wildcard: Host(/[a-z-0-9]+[.]example[.]org/) -> inlineContent("wildcard") -> <shunt>;'

@szuecs
Copy link
Member Author

szuecs commented Nov 24, 2021

@AlexanderYastrebov I would rather use True(), but maybe we can also change the weight in the routing package.
Maybe @aryszka has a suggestion.

@AlexanderYastrebov
Copy link
Member

Related #284

@AlexanderYastrebov
Copy link
Member

AlexanderYastrebov commented Nov 17, 2022

Another idea is to allow RouteGroup without hosts and let users use Host(/[a-z0-9]+[.]example[.]org/) predicate:

apiVersion: zalando.org/v1
kind: RouteGroup
metadata:
  name: my-api
spec:
  hosts: []
  backends:
    - name: my-api-http
      serviceName: my-api-service
      servicePort: 8080
      type: service
  routes:
    - pathSubtree: /api
      predicates:
        - Host(/[a-z0-9]+[.]example[.]org/)
      backends:
        - backendName: my-api-service

The DNS entries for the hosts then might be created either via external-dns CRD or via another auxiliary RouteGroup that enumerates all allowed hosts:

apiVersion: zalando.org/v1
kind: RouteGroup
metadata:
  name: my-api-hosts
spec:
  hosts:
    - foo.example.org
    - bar.example.org
    - baz.example.org
  backends:
    - name: shunt
      type: shunt
  routes:
    - path: /whatever
      predicates:
        - False()
      backends:
        - backendName: shunt

Another usecase for the host-less RouteGroup could be definition of cluster-wide catch-all routes without Host predicate.

@szuecs
Copy link
Member Author

szuecs commented Nov 21, 2022

We could also create a spec.hostRegexp that is optional and one of spec.hosts and spec.hostRegexp should be defined

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants