Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect decoding of the URI path #451

Open
neilgmiller opened this issue Jul 25, 2017 · 13 comments
Open

Incorrect decoding of the URI path #451

neilgmiller opened this issue Jul 25, 2017 · 13 comments

Comments

@neilgmiller
Copy link

NanoHTTPD percent decodes the entire request-target (path) without first splitting on the segment delimiter. This means that when handling requests like "GET /foo/bar/7%2F24%20file.ext" the HTTPSession reports the "uri" property as "/foo/bar/7/24 file.ext" which is now indistinguishable from the request for "GET /foo/bar/7/24%20file.ext".

NanoHTTP must either: split the path into segments before decoding, not decode '%2F' to '/', or not decode at all.

@LordFokas
Copy link
Member

Is this in violation of any RFCs?

@neilgmiller
Copy link
Author

https://tools.ietf.org/html/rfc3986#section-2.4

"When a URI is dereferenced, the components and subcomponents
significant to the scheme-specific dereferencing process (if any)
must be parsed and separated before the percent-encoded octets within
those components can be safely decoded, as otherwise the data may be
mistaken for component delimiters. The only exception is for
percent-encoded octets corresponding to characters in the unreserved
set, which can be decoded at any time."

@LordFokas
Copy link
Member

A bug it is, then!

@LordFokas LordFokas added the bug label Jul 26, 2017
@ai6aj
Copy link

ai6aj commented Nov 13, 2018

I ran across this issue as well and discovered that much of the web ignores the difference between / and %2F, even the almighty WIkipedia. These two links will take you to the same page even though they're technically different URLs:

https://en.wikipedia.org/wiki%2FEscape_sequence

https://en.wikipedia.org/wiki/Escape_sequence

So getUri()'s behavior, while technically incorrect, should probably continue as is for the sake of convenience.

However for those of us who need strict parsing, IHTTPSession should provide a getRawUri() method. I hacked this into my own local copy pretty easily.

@LordFokas
Copy link
Member

I'll try to keep this in mind. I'll also mark this as an enhancement so that I don't forget about the features as well if I end up being the dude implementing this.

Also, I find it very funny that you picked the escape sequence article to illustrate your point, well played sir!

@ai6aj
Copy link

ai6aj commented Nov 24, 2018

Excellent, thank you! One further refinement would be to skip my getRawUri() hack in favor of a method like this:

String[] getUriComponents()

...which parses the URI exactly according to spec and returns the decoded pieces, i.e. "foo/with%2Fslashes/bar" would become ["foo","with/slashes","bar"].

Thanks for your work on this project, it's incredibly useful for those times you need to expose something to the Web but don't want to rewire it to fit some framework's idea of the Right Way to Do Things.

@LordFokas
Copy link
Member

I was maybe thinking of just returning one of these:
https://docs.oracle.com/javase/7/docs/api/java/net/URI.html

But maybe that won't work for us. We could, however, reinvent the square wheel and roll our own representation...

My aim is somewhat of returning an immutable object (immutable == good) that represents a single URI that can get the user whatever representation they want (string, segment array, specific segment [say, domain] and so on).

@ai6aj
Copy link

ai6aj commented Nov 25, 2018

Java's URI class also has this bug:

URI uri = new URI("http://foo.com/decode%2Fme");
System.out.println(uri.getPath());

produces

/decode/me

It has a getRawPath() method, but that puts the responsibility of decoding everything right back on the programmer. It's also marked final so you can't subclass it and add in a getPathComponents() method. Thanks, Sun!

Not sure it's necessary to reinvent a complete URI class since the Request class already has (almost) everything you need from the URI anyway. The only issue is that a String[] is not immutable, but you could skirt this by parsing the raw URI path every time getPathComponents() is called - or alternatively, return a copy of a precalculated result. That way the programmer is assured that every call to getPathComponents() returns an untampered value, if they want to mess with it from there that's their problem.

@LordFokas
Copy link
Member

There are also immutable lists that we can return instead of String[] and be done with it.

@ai6aj
Copy link

ai6aj commented Nov 26, 2018

TIL there's a Collections.unmodifiableList() method... perfect!

@LordFokas
Copy link
Member

And you'll learn many more things if you stick around 😉

@hrishikesh-kadam
Copy link

@LordFokas Even I need some fix for this.

From client code I am asking for -
http://127.0.0.1:8080/sikan/OEBPS/Text/%E4%BA%B2%E5%AD%902018%E5%B9%B44%E6%9C%88%E5%88%8A_0001.xhtml

But getting this in IHTTPSession.uri -
/sikan/OEBPS/Text/亲子2018年4月刊_0001.xhtml

May be this isn't the bug as explained by @ai6aj here - #451 (comment)

because If I copy paste this URL http://127.0.0.1:8080/sikan/OEBPS/Text/%E4%BA%B2%E5%AD%902018%E5%B9%B44%E6%9C%88%E5%88%8A_0001.xhtml in Chrome address bar and hit enter, it shows URL like this http://127.0.0.1:8080/sikan/OEBPS/Text/亲子2018年4月刊_0001.xhtml

What should be done in this case to fetch the exact original URL?

@ai6aj
Copy link

ai6aj commented Dec 27, 2018

@hrishikesh-kadam this is similar to my problem, but not the same. You are getting a correctly decoded URI. %E4 in a URI means "a byte with the value 0xE4", which displays as 亲. If you want to pass the literal string "%E4" you need to encode the '%' on the client as '%25':

http://localhost/%25E4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants