Encoded slashes in URL #797

robsdedude · 2015-11-05T12:44:16Z

As wished in Issue #477 I suggest to give the user the possibility to let %2F s in URL undecoded.
Apache supports this behaviour since version 2.1.5 (http://httpd.apache.org/docs/2.2/mod/core.html#allowencodedslashes) with the directive AllowEncodedSlashes NoDecode. So why shouldn't werkzeug offer a possibility to configure it in a way that it leaves the decoding to the user which would allow e.g.distinguishing the URLs /foo/bar%2F1 and /foo%2Fbar/1

The text was updated successfully, but these errors were encountered:

mitsuhiko · 2015-11-05T12:47:58Z

Even if apache passes it through it cannot do in all cases. Primarily anything that requires partial matching on the server has to resolve them. The way this directive works is that once determined that no matching against the file path is necessary it just passes them on. This however does not help us because we still need to perform matching everywhere else (script name / path info handling).

This just opens a can of worms and it's only for URLs which should not exist in the first place. If you need to preserver slashes, move them into a query string where they belong.

Everything in Werkzeug is based on paths being decode as this is the only sensible way to perform URL matching. Unless we rewrite everything from scratch to change this behavior and break backwards compat this cannot happen.

exhuma · 2017-09-26T09:36:31Z

I'm sorry to touch this subject again, as it is quite clear that this issue is very difficult to address in flask. I have to disagree with the following statement though:

If you need to preserver slashes, move them into a query string where they belong.

And please correct me if I'm wrong, I'm still trying to wrap my head around this issue.

Following RFC-3986 Section 1.2.3 URIs should allow hierarchical identifiers. It also says that some schemes consider everything after the scheme as "opaque" (i.e. with no additional meaning attached to any character). I was unable to find an authoritative answer to my question: "Should HTTP consider the part after the : as opaque?" But I would wager the answer to that should be "no", and using the / as hierarchical separator.

For the sake of argument imagine that there's an API which exposes books in the form /book/<author>/<title>. As soon as both the the author and title contains a / in its name, this would be impossible to route any WSGI application (if I understood the core issue correctly).

One obvious workaround is to use surrogate identifiers (like generated IDs). For example:

/book/Example%2FAuthor/The%2FTitle (using names)
/book/123/1 (using surrogate IDs)

Both examples don't seem to violate the RFC, so I don't think saying that "slashes belong in the query string" is valid.

In which case a request would look like the following:

/book?author=Example%2FAuthor&title=The%2FTitle

This is certainly a workable workaround as well, but I personally don't like it as much.

Now, again, from reading through the related issues, this seems to be an issue with the WSGI spec, which is unfortunate.

For context: I am currently faced with an issue where an application uses natural keys of entities in the URL (quite similar to the Book/Author/Title example above). This seems to me to be a very sensible, and understandable URI design. But now an entry has appeared in the database which contains a slash in the natural key, which broke routing. You might still argue that the real issue is exposing the natural keys of the DB to the URI (but I would still argue that it is a perfectly sensible design).

I can use the workaround using the path converter in flask routing for now, as only one element in the URI has the possibility of including a slash. But the day I need to add a route with two segments containing a slash I'll be in trouble and need to find another workaround.

mitsuhiko · 2017-09-26T09:44:21Z

The TL;DR is that a webserver or anything really that routes needs to decode and normalize the URL and that removes the information you are after. There is not much anyone can do about this.

Just to clarify: this is not a shortcoming of the WSGI spec but a limitation of how URLs work in practice.

robsdedude · 2017-09-26T15:29:19Z

@exhuma I worked around this by double URL-encoding the slashes. E.g. if you want to query /api/category/<catname> with catname := "text/javascript" then query /api/category/text%252Fjavascript.

sp-7indigo · 2017-10-02T07:25:37Z

@exhuma I've bypass this by subclass werkzeug.serving.WSGIRequestHandler override make_environ and putting a new environ variable PATH_INFO_RAW with contains of self.path

mitsuhiko closed this as completed Nov 5, 2015

jgehrcke mentioned this issue Feb 24, 2016

http/wsgi.py: do not unquote path before injecting into environ benoitc/gunicorn#1211

Closed

samuelfekete mentioned this issue Jan 27, 2017

Add RAW_URI to environ (as Gunicorn does) #1048

Closed

github-actions bot locked as resolved and limited conversation to collaborators Nov 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoded slashes in URL #797

Encoded slashes in URL #797

robsdedude commented Nov 5, 2015

mitsuhiko commented Nov 5, 2015

exhuma commented Sep 26, 2017

mitsuhiko commented Sep 26, 2017 •

edited

Loading

robsdedude commented Sep 26, 2017

sp-7indigo commented Oct 2, 2017

Encoded slashes in URL #797

Encoded slashes in URL #797

Comments

robsdedude commented Nov 5, 2015

mitsuhiko commented Nov 5, 2015

exhuma commented Sep 26, 2017

mitsuhiko commented Sep 26, 2017 • edited Loading

robsdedude commented Sep 26, 2017

sp-7indigo commented Oct 2, 2017

mitsuhiko commented Sep 26, 2017 •

edited

Loading