Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoded slashes in URL #797

Closed
robsdedude opened this issue Nov 5, 2015 · 5 comments
Closed

Encoded slashes in URL #797

robsdedude opened this issue Nov 5, 2015 · 5 comments

Comments

@robsdedude
Copy link

As wished in Issue #477 I suggest to give the user the possibility to let %2F s in URL undecoded.
Apache supports this behaviour since version 2.1.5 (http://httpd.apache.org/docs/2.2/mod/core.html#allowencodedslashes) with the directive AllowEncodedSlashes NoDecode. So why shouldn't werkzeug offer a possibility to configure it in a way that it leaves the decoding to the user which would allow e.g.distinguishing the URLs /foo/bar%2F1 and /foo%2Fbar/1

@mitsuhiko
Copy link
Contributor

Even if apache passes it through it cannot do in all cases. Primarily anything that requires partial matching on the server has to resolve them. The way this directive works is that once determined that no matching against the file path is necessary it just passes them on. This however does not help us because we still need to perform matching everywhere else (script name / path info handling).

This just opens a can of worms and it's only for URLs which should not exist in the first place. If you need to preserver slashes, move them into a query string where they belong.

Everything in Werkzeug is based on paths being decode as this is the only sensible way to perform URL matching. Unless we rewrite everything from scratch to change this behavior and break backwards compat this cannot happen.

@exhuma
Copy link

exhuma commented Sep 26, 2017

I'm sorry to touch this subject again, as it is quite clear that this issue is very difficult to address in flask. I have to disagree with the following statement though:

If you need to preserver slashes, move them into a query string where they belong.

And please correct me if I'm wrong, I'm still trying to wrap my head around this issue.

Following RFC-3986 Section 1.2.3 URIs should allow hierarchical identifiers. It also says that some schemes consider everything after the scheme as "opaque" (i.e. with no additional meaning attached to any character). I was unable to find an authoritative answer to my question: "Should HTTP consider the part after the : as opaque?" But I would wager the answer to that should be "no", and using the / as hierarchical separator.

For the sake of argument imagine that there's an API which exposes books in the form /book/<author>/<title>. As soon as both the the author and title contains a / in its name, this would be impossible to route any WSGI application (if I understood the core issue correctly).

One obvious workaround is to use surrogate identifiers (like generated IDs). For example:

  • /book/Example%2FAuthor/The%2FTitle (using names)
  • /book/123/1 (using surrogate IDs)

Both examples don't seem to violate the RFC, so I don't think saying that "slashes belong in the query string" is valid.

In which case a request would look like the following:

  • /book?author=Example%2FAuthor&title=The%2FTitle

This is certainly a workable workaround as well, but I personally don't like it as much.

Now, again, from reading through the related issues, this seems to be an issue with the WSGI spec, which is unfortunate.

For context: I am currently faced with an issue where an application uses natural keys of entities in the URL (quite similar to the Book/Author/Title example above). This seems to me to be a very sensible, and understandable URI design. But now an entry has appeared in the database which contains a slash in the natural key, which broke routing. You might still argue that the real issue is exposing the natural keys of the DB to the URI (but I would still argue that it is a perfectly sensible design).

I can use the workaround using the path converter in flask routing for now, as only one element in the URI has the possibility of including a slash. But the day I need to add a route with two segments containing a slash I'll be in trouble and need to find another workaround.

@mitsuhiko
Copy link
Contributor

mitsuhiko commented Sep 26, 2017

The TL;DR is that a webserver or anything really that routes needs to decode and normalize the URL and that removes the information you are after. There is not much anyone can do about this.

Just to clarify: this is not a shortcoming of the WSGI spec but a limitation of how URLs work in practice.

@robsdedude
Copy link
Author

@exhuma I worked around this by double URL-encoding the slashes. E.g. if you want to query /api/category/<catname> with catname := "text/javascript" then query /api/category/text%252Fjavascript.

@sp-7indigo
Copy link

@exhuma I've bypass this by subclass werkzeug.serving.WSGIRequestHandler override make_environ and putting a new environ variable PATH_INFO_RAW with contains of self.path

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants