-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoded slashes in URL #797
Comments
Even if apache passes it through it cannot do in all cases. Primarily anything that requires partial matching on the server has to resolve them. The way this directive works is that once determined that no matching against the file path is necessary it just passes them on. This however does not help us because we still need to perform matching everywhere else (script name / path info handling). This just opens a can of worms and it's only for URLs which should not exist in the first place. If you need to preserver slashes, move them into a query string where they belong. Everything in Werkzeug is based on paths being decode as this is the only sensible way to perform URL matching. Unless we rewrite everything from scratch to change this behavior and break backwards compat this cannot happen. |
I'm sorry to touch this subject again, as it is quite clear that this issue is very difficult to address in flask. I have to disagree with the following statement though:
And please correct me if I'm wrong, I'm still trying to wrap my head around this issue. Following RFC-3986 Section 1.2.3 URIs should allow hierarchical identifiers. It also says that some schemes consider everything after the scheme as "opaque" (i.e. with no additional meaning attached to any character). I was unable to find an authoritative answer to my question: "Should HTTP consider the part after the For the sake of argument imagine that there's an API which exposes books in the form One obvious workaround is to use surrogate identifiers (like generated IDs). For example:
Both examples don't seem to violate the RFC, so I don't think saying that "slashes belong in the query string" is valid. In which case a request would look like the following:
This is certainly a workable workaround as well, but I personally don't like it as much. Now, again, from reading through the related issues, this seems to be an issue with the WSGI spec, which is unfortunate. For context: I am currently faced with an issue where an application uses natural keys of entities in the URL (quite similar to the Book/Author/Title example above). This seems to me to be a very sensible, and understandable URI design. But now an entry has appeared in the database which contains a slash in the natural key, which broke routing. You might still argue that the real issue is exposing the natural keys of the DB to the URI (but I would still argue that it is a perfectly sensible design). I can use the workaround using the |
The TL;DR is that a webserver or anything really that routes needs to decode and normalize the URL and that removes the information you are after. There is not much anyone can do about this. Just to clarify: this is not a shortcoming of the WSGI spec but a limitation of how URLs work in practice. |
@exhuma I worked around this by double URL-encoding the slashes. E.g. if you want to query |
@exhuma I've bypass this by subclass werkzeug.serving.WSGIRequestHandler override make_environ and putting a new environ variable PATH_INFO_RAW with contains of self.path |
As wished in Issue #477 I suggest to give the user the possibility to let %2F s in URL undecoded.
Apache supports this behaviour since version 2.1.5 (http://httpd.apache.org/docs/2.2/mod/core.html#allowencodedslashes) with the directive
AllowEncodedSlashes NoDecode
. So why shouldn't werkzeug offer a possibility to configure it in a way that it leaves the decoding to the user which would allow e.g.distinguishing the URLs/foo/bar%2F1
and/foo%2Fbar/1
The text was updated successfully, but these errors were encountered: