Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2: Use r.URL.RawPath for rewrite #3596

Closed
go-d opened this issue Jul 23, 2020 · 8 comments · Fixed by #3918
Closed

v2: Use r.URL.RawPath for rewrite #3596

go-d opened this issue Jul 23, 2020 · 8 comments · Fixed by #3918
Labels
bug 🐞 Something isn't working
Milestone

Comments

@go-d
Copy link
Contributor

go-d commented Jul 23, 2020

When using e.g. uri replace paths get automatically unescaped. This might be undesirable. Something like this would fix it:

diff --git a/modules/caddyhttp/rewrite/rewrite.go b/modules/caddyhttp/rewrite/rewrite.go
index d47c388a..4370a77f 100644
--- a/modules/caddyhttp/rewrite/rewrite.go
+++ b/modules/caddyhttp/rewrite/rewrite.go
@@ -303,13 +303,24 @@ func (rep replacer) do(r *http.Request, repl *caddy.Replacer) bool {
        find := repl.ReplaceAll(rep.Find, "")
        replace := repl.ReplaceAll(rep.Replace, "")
 
-       oldPath := r.URL.Path
+       oldPath := r.URL.RawPath
        oldQuery := r.URL.RawQuery
 
-       r.URL.Path = strings.Replace(oldPath, find, replace, lim)
        r.URL.RawQuery = strings.Replace(oldQuery, find, replace, lim)
 
-       return r.URL.Path != oldPath && r.URL.RawQuery != oldQuery
+       if oldPath == "" {
+               oldPath = r.URL.Path
+               r.URL.Path = strings.Replace(oldPath, find, replace, lim)
+
+               return r.URL.Path != oldPath && r.URL.RawQuery != oldQuery
+       }
+
+       r.URL.RawPath = strings.Replace(oldPath, find, replace, lim)
+       if p, err := url.PathUnescape(r.URL.RawPath); err == nil {
+               r.URL.Path = p
+       }
+
+       return r.URL.RawPath != oldPath && r.URL.RawQuery != oldQuery
 }
 
 // Interface guard

What do you think? Maybe there's a better way?

@mholt
Copy link
Member

mholt commented Aug 1, 2020

Thanks for the issue / suggestion!

Usually I think r.URL.EscapedPath() is preferable (according to the Go docs).

But why do we need this?

This might be undesirable

Do you have any examples of when this is the case?

@go-d
Copy link
Contributor Author

go-d commented Aug 1, 2020

We need this so that we can proxy requests largely unmodified to backends. Here's an example. Note that retrieving the TEST2 value using the same path doesn't work.

Caddyfile:

:80
uri replace /kv/ /key-value/
reverse_proxy :8080

Simple HTTP backend server in Go:

package main

import (
	"io/ioutil"
	"net/http"
	"net/url"
)

func main() {
	m := make(map[url.URL][]byte)
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		println(r.URL.String())
		w.Write(m[*r.URL])
		if r.Method == http.MethodPut {
			if b, err := ioutil.ReadAll(r.Body); err == nil {
				m[*r.URL] = b
			}
		}
	})
	http.ListenAndServe(":8080", nil)
}

Test by making requests to caddy:

$ echo TEST1 | curl -XPUT --data-binary @- "localhost/kv/Caddy%20v2%20rulez"
$ curl "localhost/key-value/Caddy%20v2%20rulez"
TEST1

$ echo TEST2 | curl -XPUT --data-binary @- "localhost/kv/Caddy%2Fv2%20rulez!"
$ curl "localhost/key-value/Caddy%2Fv2%20rulez!"

$ curl "localhost/key-value/Caddy/v2%20rulez%21"
TEST2

@mholt
Copy link
Member

mholt commented Aug 6, 2020

Hmmm my senses are tingling and I think it's because your example has a mistake. localhost/kv/Caddy%2Fv2%20rulez! is not properly URL-encoded; as you can see, the ! should be %21 just as much as a space should be %20. So I believe this is a case of bad/poor client behavior (a poorly-constructed or malformed URL). It doesn't make sense to encode only some characters but not others.

There is a standardized form of the URL you can use in your server's map (m), but using the entire URL struct as the key will also consider the unstandardized form, which is not what you want.

So there are two places where the behavior is actually buggy (the client for sending a malformed URL) and the Go program for indexing on the un-standardized form of the URL.

I am going to close this as I'm pretty convinced we shouldn't make the proposed change (which will probably break a lot of other cases) but feel free to continue discussion if necessary. Thanks!

@mholt mholt closed this as completed Aug 6, 2020
@go-d
Copy link
Contributor Author

go-d commented Aug 7, 2020

I think by standardized you mean decoded. The example deliberately uses URL struct as map key to expose the problem with Caddy's rewrite. Focusing on how this example code would improve by using decoded form misses the point. Namely that URL.Path is lossy.

My example does not have a mistake. Encoding ! isn't mandatory. See for example this in your browser's JavaScript console:

encodeURIComponent("Caddy/v2 rulez!")
"Caddy%2Fv2%20rulez!"

These URLs aren't semantically equivalent:

  1. /a/b/c/d!e
  2. /a/b%2Fc/d%21e

Because:

  1. has 4 path components, the 4th having two subcomponents: d and e.
  2. just 3 path components.

A server must be able to distinguish which one a client has sent, and optionally treat them differently.

Passing requests unmangled is currently not possible with rewrite, either v1 or v2, but this v1 Caddyfile works fine, because without uses URL.RawPath:

:80
proxy /kv/ :8080/key-value { without /kv }
proxy / :8080

@mholt
Copy link
Member

mholt commented Aug 8, 2020

Will give this some more consideration.

@mholt mholt reopened this Aug 8, 2020
@segevfiner
Copy link

When the Go builtin ReverseProxy does rewriting (httputil.NewSingleHostReverseProxy) it explicitly takes care to manipulate both Path and RawPath, otherwise the path ends up losing the exact way it was escaped as, which leads to stuff like escaped slashes becoming slashes, breaking the request when that is important. e.g. A path parameter containing slashes.

@mholt
Copy link
Member

mholt commented Dec 2, 2020

Been fiddling with this for a while. Thought the proposed fix made tests fail, but turns out git hadn't applied the whole thing for some reason so I manually verified it before running tests again and now they are passing.

@go-d Do you want to submit a PR, and add some test cases to TestRewrite() in rewrite_test.go to verify? Then you can get credit for it (and probably we can get it merged in sooner rather than later). If I do it, it might take longer until I get around to it.

@mholt mholt added this to the 2.x milestone Dec 2, 2020
@mholt mholt added bug 🐞 Something isn't working help wanted 🆘 Extra attention is needed labels Dec 2, 2020
@mholt
Copy link
Member

mholt commented Dec 12, 2020

Pinging @go-d -- it's too late to put this in v2.3, but I'll add it to the 2.3.1 or 2.4 tree in a couple of weeks (-ish) unless you want to submit a PR to get the credit for it?

@mholt mholt modified the milestones: 2.x, v2.4.0, v2.3.1 Dec 12, 2020
go-d added a commit to go-d/caddy that referenced this issue Dec 13, 2020
Prevent information loss, i.e. the encoded form that was sent by the
client, when using URL strip/replace.
mholt pushed a commit that referenced this issue Jan 11, 2021
Prevent information loss, i.e. the encoded form that was sent by the
client, when using URL strip/replace.
@mholt mholt removed the help wanted 🆘 Extra attention is needed label Jan 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants