-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect decoding of the URI path #451
Comments
Is this in violation of any RFCs? |
https://tools.ietf.org/html/rfc3986#section-2.4 "When a URI is dereferenced, the components and subcomponents |
A bug it is, then! |
I ran across this issue as well and discovered that much of the web ignores the difference between / and %2F, even the almighty WIkipedia. These two links will take you to the same page even though they're technically different URLs: https://en.wikipedia.org/wiki%2FEscape_sequence https://en.wikipedia.org/wiki/Escape_sequence So getUri()'s behavior, while technically incorrect, should probably continue as is for the sake of convenience. However for those of us who need strict parsing, IHTTPSession should provide a getRawUri() method. I hacked this into my own local copy pretty easily. |
I'll try to keep this in mind. I'll also mark this as an enhancement so that I don't forget about the features as well if I end up being the dude implementing this. Also, I find it very funny that you picked the escape sequence article to illustrate your point, well played sir! |
Excellent, thank you! One further refinement would be to skip my getRawUri() hack in favor of a method like this: String[] getUriComponents() ...which parses the URI exactly according to spec and returns the decoded pieces, i.e. "foo/with%2Fslashes/bar" would become ["foo","with/slashes","bar"]. Thanks for your work on this project, it's incredibly useful for those times you need to expose something to the Web but don't want to rewire it to fit some framework's idea of the Right Way to Do Things. |
I was maybe thinking of just returning one of these: But maybe that won't work for us. We could, however, reinvent the square wheel and roll our own representation... My aim is somewhat of returning an immutable object (immutable == good) that represents a single URI that can get the user whatever representation they want (string, segment array, specific segment [say, domain] and so on). |
Java's URI class also has this bug:
produces
It has a getRawPath() method, but that puts the responsibility of decoding everything right back on the programmer. It's also marked final so you can't subclass it and add in a getPathComponents() method. Thanks, Sun! Not sure it's necessary to reinvent a complete URI class since the Request class already has (almost) everything you need from the URI anyway. The only issue is that a String[] is not immutable, but you could skirt this by parsing the raw URI path every time getPathComponents() is called - or alternatively, return a copy of a precalculated result. That way the programmer is assured that every call to getPathComponents() returns an untampered value, if they want to mess with it from there that's their problem. |
There are also immutable lists that we can return instead of String[] and be done with it. |
TIL there's a Collections.unmodifiableList() method... perfect! |
And you'll learn many more things if you stick around 😉 |
@LordFokas Even I need some fix for this. From client code I am asking for - But getting this in May be this isn't the bug as explained by @ai6aj here - #451 (comment) because If I copy paste this URL http://127.0.0.1:8080/sikan/OEBPS/Text/%E4%BA%B2%E5%AD%902018%E5%B9%B44%E6%9C%88%E5%88%8A_0001.xhtml in Chrome address bar and hit enter, it shows URL like this http://127.0.0.1:8080/sikan/OEBPS/Text/亲子2018年4月刊_0001.xhtml What should be done in this case to fetch the exact original URL? |
@hrishikesh-kadam this is similar to my problem, but not the same. You are getting a correctly decoded URI. %E4 in a URI means "a byte with the value 0xE4", which displays as 亲. If you want to pass the literal string "%E4" you need to encode the '%' on the client as '%25': |
NanoHTTPD percent decodes the entire request-target (path) without first splitting on the segment delimiter. This means that when handling requests like "GET /foo/bar/7%2F24%20file.ext" the HTTPSession reports the "uri" property as "/foo/bar/7/24 file.ext" which is now indistinguishable from the request for "GET /foo/bar/7/24%20file.ext".
NanoHTTP must either: split the path into segments before decoding, not decode '%2F' to '/', or not decode at all.
The text was updated successfully, but these errors were encountered: