Issue #18 path modes #420

gregw · 2021-09-21T02:04:15Z

For #18
@markt-asf this is my first draft of the things we've been discussing in #18. I think a bit of work is needed and there are still some changes that are at best to be considered as thought bubbles.
I've allowed servlet-api committers to edit this branch so update if you like, alternatively if you think this is in the ballpark, then I can move it to a branch in eclipse-ee4j repo so we can work on it collaboratively.

First draft of clarification of paths for jakartaee#18

gregw · 2021-09-21T02:10:39Z

api/src/main/java/jakarta/servlet/ServletContext.java

+     * <a href="https://datatracker.ietf.org/doc/html/rfc2396#section-3.3">RFC2396 Section 3.3</a>.</li>
+     * </ul>
+     */
+    public enum PathCodingMode {


The approach I landed on is to have the 3 modes: LEGACY is what we have now and will be the default for any web.xml <= 5.0. The DECODED mode is more or less what we've called SAFE before, it will be the default for >=6.0 and paths will not have any encodings in them and encoded reserved characters are just illegal. ENCODED is the optional mode that allows encoded reserved characters.

I might be persuaded that the 6.0 default should be ENCODED... in which case perhaps we could drop DECODED and go to a boolean mode? But naming is difficult and I can't convince myself 100% of this.

I've included path parameters behaviours, but perhaps that needs a n independent mode?

We could put something in web.xml for this... but let's finalize the design before thinking about that.

I prefer the boolean approach. I also prefer taking the boolean approach further and essentially making LEGACY container specific configuration.
I'm not currently convinced of the need for multiple modes but given the large number of complexities and edge cases I am keeping an open in case there is a need for multiple modes.

I'd like to get to boolean approach too.... but so far I've been unable to reduce down the behaviours to just two. So let's iterate on how we see the various modes working... and if we get down to 2 of them, we can go to the binary model and get rid of this enum and associated setter.

Sounds good to me.

api/src/main/java/jakarta/servlet/ServletContext.java

…ed characters.

markt-asf

Lots of detail in the comments but the overview is I don't (currently) think we need modes:

Make LEGACY a container specific option
Simplify DECODED mode
Use DECODED mode to map requests
Use ENCODED mode for paths in the API (return values and parameters)

markt-asf · 2021-09-22T08:39:41Z

api/src/main/java/jakarta/servlet/ServletContext.java

+     * <a href="https://datatracker.ietf.org/doc/html/rfc2396#section-3.3">RFC2396 Section 3.3</a>.</li>
+     * </ul>
+     */
+    public enum PathCodingMode {


I prefer the boolean approach. I also prefer taking the boolean approach further and essentially making LEGACY container specific configuration.
I'm not currently convinced of the need for multiple modes but given the large number of complexities and edge cases I am keeping an open in case there is a need for multiple modes.

markt-asf · 2021-09-22T08:41:42Z

api/src/main/java/jakarta/servlet/ServletContext.java

+         * Effected methods will return/accept paths as was done prior to the introduction of the PathCodingMode in Servlet 6.0.
+         * The path returned by {@link HttpServletRequest#getContextPath()} is not decoded. Paths returned by the methods
+         * {@link HttpServletRequest#getServletPath()} and {@link HttpServletRequest#getPathInfo()} are fully decoded and have
+         * all path parameters removed. Decoding may include ambiguous characters such as a decoded <code>%2F</code>, which is


I would never expect to see '%2F' decoded. To do so is opening up the possibility of a path traversal vulnerability.

I don't see how we can avoid decoding %2F as the current api says that the strings are decoded, so any container that doesn't decode is in violation. For jetty we by default reject such requests with a 400 unless the application is configured to say that it is prepared to accept such decoded strings.

For me that is the key difference between LEGACY and DECODED mode: the former will allow containers to act as they have previous, whilst the later is a well defined mode that never returns a decoded %2F because they have been rejected with a 400.

By default Undertow does not decode these, however if I had my time again I would respond with a 400 instead. Either way we will need a container specific option to allow this, as some people want it, but IMHO we should be strict about rejecting this in the spec.

Although something I just thought of is that we likely need to be clearer about reserved characters in the different parts of the URI, e.g. %2F is likely fine as part of a query string. A query string does not really have the same security issues as the actual URI, so rejecting encoded reserved characters in the query string is almost certainly the wrong thing to do.

We can change the API to state that %2F is never decoded if we wish. The need for this is rare but there was a Tomcat user that needed to use an identifier in the REST API that included %2F. Decoding it isn't an option because of path traversal attacks. Rejecting it would work for most users but break stuff for the small number that actually need it. Leaving it as %2F should work for everyone. Users that don't use it will be unaffected and users that do need it can %nn decode it. If we go this route we should add a clear warning that only %2F should be decode and no other %nn sequences else there will be the possibility of double decoding and the range of security issues that opens up.

I agree query string needs a different handling. This should cover it:

First (unencoded) '?' denotes the start of the query string

Parse the query string into key/value pairs

%nn decode (all characters) both keys and values

%nn decoding needs to be after parsing in case keys and/or values contain & or =.

markt-asf · 2021-09-22T08:44:28Z

api/src/main/java/jakarta/servlet/ServletContext.java

+         * {@link HttpServletRequest#getServletPath()} and {@link HttpServletRequest#getPathInfo()} are fully decoded and have
+         * all path parameters removed. Decoding may include ambiguous characters such as a decoded <code>%2F</code>, which is
+         * indistinguishable from a segment separator '/' in the request URI. Methods accepting paths may accept encoded
+         * characters. A '?' character in a path will be treated as indicating a query string and will not be encoded. Some


"...and will not be encoded." sounds odd to me. It suggests to me that the container will be encoding some characters. How about "The first unencoded '?' in a path will be treated as the delimiter indicating the start of the query string."

I think some characters can be encoded by at least some of the methods. For example, in LEGACAY mode getRequestDispatcher("/foo bar") may result in a dispatcher for /foo%20bar. But the main intention here is to say that if you have a resource called foo?bar then you need to pass in /foo%3Fbar and not /foo?bar

Got it. I understand the intention here now and agree with it. How about something along the lines of:

"Paths passed to this method should use appropriate %nn encoding if the path includes characters that would otherwise be considered to be a delimiter per RFC 3986 such as '?'."

markt-asf · 2021-09-22T08:53:40Z

api/src/main/java/jakarta/servlet/ServletContext.java

+         * characters. A '?' character in a path will be treated as indicating a query string and will not be encoded. Some
+         * implementation specific behavioural differences may be apparent in this mode.
+         */
+        LEGACY,


How much a LEGACY mode is required is going to depend on how different the Servlet 6.0 behaviour is compared to what was implemented for Servlet 5.0. That is going to be container dependent. If there are some containers where the change in behaviour is small enough to be considered part of the usual changes between major versions then LEGACY mode may not be required.

yep - very happy to get rid of modal behaviour. Let's iterate on what we think the minimal modes are and try to reduce to 2... or better yet 1

markt-asf · 2021-09-22T09:02:40Z

api/src/main/java/jakarta/servlet/ServletContext.java

+         * TODO or this: If a path with encoded reserve characters is received, then the request will be rejected by the servlet
+         * container with a 400 BAD REQUEST response.
+         */
+        DECODED,


I think this can be simplified. Once path parameters have been removed, it is safe to decode any %nn reserved characters apart from %2F. Therefore, I think we can remove the requirement to throw an error in this mode entirely. This mode then becomes:

decode unreserved %nn

remove all path parameters

normalize

decode reserved %nn except %2F

This achieves the same result without causing problems for apps (like Jira) that use encoded reserved characters in some requests and removes the need to triggering exceptions that will break some apps that would otherwise work correctly.

Note: There is an interesting edge case here. /aaa/..;a=b/. The proposal above would result in /. However it can be argued that this is inconsistent with RFC 3986. There are other approaches. Given there is no legitimate reason I can think of for sequences like this I quite like rejecting such requests with a 400 response.

The beauty of what you propose here is that it get's us really close to not needing to be modal, as we would only ever return sane paths here and insane ones would be rejected with 400.

But I'm not so sure that %2F is really the only problem character?

Potentially there are also issues with URIs like /aaa/..%5c../foo which according to RFC3986 should decode to /aaa/..\../foo, which can be missinterpreted on some file systems.

We've also seen issues with URIs like /foo%00. Ah but this indicates a problem with specifing in terms of reserved from the RFC. I think we probably should specify in terms of unreserved, as there is a gap between those definitions that includes control characters.

Also if we allow encoded '%' (%25), then the resulting paths can contain percent sequences that can be dangerous to pass onto methods that a little flexible about receiving encoded characters or not. So for example /foo/%252E%252E/bar could result in a /foo/%2e%2e/bar path that may then get later converted to /foo/../bar and normalized to just /bar

So I'm not totally convinced that %2F is the only danger encoding.

Regarding the edge case of /aaa/..;, jetty is now rejecting such requests with 400 unless told to do otherwise.
The question is, if we do let such requests through, should the result be / or /aaa/.. ? The later is compliant with RFC3986 but potentially dangerous when passed to file handling code. I think your process about would result in /

So to be compliant with RFC3986 we'd could have a process like:

decode unreserved %nn

normalize

remove all path parameters

reject ambigous/scary/insane URIs. eg containing %2F, ".." etc. See Jetty's enumeration of scary URIs.

decode reserved %nn

But despite these reservations, I think the general approach of path methodes never returning scary paths may be workable. But my concern is that we then move the "modes" to the definition of what is scary. Is that something we need to standardize? I think we need to at least enumerate the ones that should be rejected by default... we could then perhaps leave it up to containers to configure exceptions.

However, if we want applications that handle the exceptional cases by calling getRequestURI to be portable, then maybe we need to only fail scary URIs if the path methods are called rather than reject up front?

I also think we need to think through the modes on how methods that accept paths act. Currently I think there is a lot of flexibility, which is why so many scary URIs can be damaging if let flow through to these methods. I'm not sure how we can make them stricter without introducing at least a LEGACY mode.

Geeze this is a horrid problem!

A horrid problem indeed.

I don't recall any user requests specifically around %00 but I suspect that there are requirements around other control characters like %0A and %0D. Tomcat handles all these issues at the point the URI is converted to a file system path. The short version is that if the canonical file name and path does not match the requested file name and path then a 404 response is generated. I don't think we want to define how containers protect against this class of problem but I do think it is worth calling it out. Something like:

"URIs may permit characters which have a special meaning for the underlying file system. When mapping a request URI, or part of a URI, to a file system path containers are expected (required?) to ensure that any characters with special meaning for the underlying file system are appropriately handled to avoid any potential security issues."

/aaa/..;b=2/ and similar constructs have been an ongoing source of problems including security issues. I've had long discussions with the httpd team - including briefly with Roy Fielding who co-authored RFC 3986 - and the short version is:

/..;b=2/ is not equivalent to /../ and should not be treated as such

If the servlet spec chooses to ignore path parameters in mappings, security constraints etc then that is going to create problems when interacting with other systems like httpd

If we want to allow path parameters (I think we should as some users need them) then we either need to completely redefine the mapping process, security constraints etc or we need to introduce special handling to avoid the path traversal issues that removing path parameters from /..;b=2/ creates. I prefer the latter option. I think the scary cases can all be covered be something along these lines:

"Any URI containing a segment with path parameters that, when those path parameters are removed, is a dot-segment must be rejected with a 400 response."

Expanding on your proposed processing sequence and tweaking it a bit I'd suggest:

extract query string, if any, for separate processing

decode unreserved %nn

normalize

reject dot-segemts with path parameters

remove all path parameters

decode reserved %nn expect %2F

map the request

Looking through Jetty's scary URIs, I see various types:

Those I'd class as 'suspicious' like /%2E%2E/ that look like attempts to do something nefarious although the attempt should fail. I don't think the spec should require these to be rejected. Containers would be free to block these as a differentiating security feature.

// != '/' as far as RFC 3986 is concerned but given the huge number of apps that make mistakes when building URIs I think we need to define 'normalize' to include collapsing multiple '/' to a single '/'.

%2F leave as %2F but as with the suspicious sequences containers could offer an option to reject

dot-segments with path parameters should be rejected with a 400 as above

Use of %25 - see comments below on double decoding

Containers should be free to allow URIs to use encodings other than UTF-8. I'd expect strict enforcement though. Any URI that fails decoding as per the container configured encoding for URIs should be rejected

Double decoding is another area where there is opportunity to introduce security vulnerabilities. I think we need to go through the Servlet API and for every method that either accepts or returns a path we should state which of the following categories that the path falls into:

encoded, unnormalized and still containing any path parameters

decoded unreserved characters, normalized, path parameters removed, all reserved characters encoded

decoded unreserved characters, normalized, path parameters removed, all reserved characters decoded except %2F

My initial thoughts are that we'd use 1) for returned paths currently described as encoded / 'original'; 2) for paths passed into methods (mainly so we can distinguish between ? in a path and the start of a query string and 3) for returned paths currently described as decoded.

I think we can define a small set of 'scary' URIs that should always be rejected without impacting users. Rejection of 'suspicious' URIs I think we can leave to the container.

markt-asf · 2021-09-22T09:26:21Z

api/src/main/java/jakarta/servlet/ServletContext.java

+         * non-encoded reserved character is not possible, then the method will throw {@link MalformedURLException} if declared,
+         * otherwise a {@link IllegalArgumentException}.
+         */
+        ENCODED,


If path parameters may be present in any segment present then it is not going to be possible write a <url-pattern> that matches the servlet path. The only place path parameters will be usable is in path info. I think that is too restrictive.

I currently favour a mode that maps requests to servlets based using the simplified version of DECODED that I described above but for methods that return path, those paths include the path parameters as described here. The mapped path (i.e. the one without the path parameters) remains available via HttpServletMapping.

The idea is that with this approach things are essentially unchanged for the majority of applications. The small number of applications using path parameters may need changes. Containers have the option of providing a LEGACY mode if there is sufficient demand from their user base from users with applications that use path parameters. I suspect that given that most apps are going to migrate from Jakarta EE 8 to Jakarta EE 10 then the few apps where changes to path parameters have an impact will recode the relevant bits of their app to handle the changes.

I don't like treating ';' as a normal character for path methods, but special for path matching. I think it needs to be one or the other, not both... specially when we consider the complexities of methods taking paths.

We could continue to treat ';' as special and remove all parameters from path methods. Apps wanting to handle them would need to use the requestURI as they do now. But we need to specify handling of scary URIs that use ';' to hide '..' etc.

Or we treat ';' as a character allowed in a segment (as is now apparently is). Then /..; is not a problem because the segment is ..; and not ... But if we are treating is as part of a segment, the it needs to be able to be mapped. I don't see it is impossible to map that because ';' can be included in a <url-pattern> as the spec says that all other parts of a pattern string are used for exact matches, so they can contain ;, \, ? and other scary characters.

Sorry - forgot to click the add button for this comment.

Arrgghhhh! Just when I think there is a workable solution in sight, I realise I have forgotten about one of the requirements. I think the suggestions above would give us a workable solution that removes most (hopefully all) the ambiguity in the current API. What it doesn't address is providing access to the path parameters and/or the 'raw' versions of contextPath, servletPath and pathInfo.

What if, rather than introducing the concept of modes, we added methods to HttpServletMapping. I'm thinking:

getContextPathEncoded()

getServletPathEncoded()

getPathInfoEncoded()

where only the following had been performed;

extract query string, if any, for separate processing

decode unreserved %nn

normalize

reject dot-segemts with path parameters

gregw · 2021-09-22T23:53:40Z

@markt-asf I've kind of rebutted all your comments..... but I don't reject your basic premise... my comments are really just me thinking out loud. I do think there is potential value in the general approach you are advocating - ie make the returns from the path methods non-modal and handle the scary URIs in a different way. My concern remains defining scary URIs and then making sure we equally consider the risks of methods taking path strings.

This non modal approach is essentially how jetty is working now... at least for the methods producing paths.

I'm going to ponder your feedback more as getting rid of modes would be good and I don't think they are perfect either. But the good aspect of modes is that they do allow us to be stricter on what methods accepting paths will accept.

markt-asf

@gregw Thanks for your comments re rebuttal. I feel much the same way but couldn't figure out how to express it.

I understand and share your concerns regarding security. Tomcat has had to deal with a bunch of issues in this area and I really don't want to do anything that opens up a new (or old) security issue.

markt-asf · 2021-09-23T07:13:38Z

api/src/main/java/jakarta/servlet/ServletContext.java

+     * <a href="https://datatracker.ietf.org/doc/html/rfc2396#section-3.3">RFC2396 Section 3.3</a>.</li>
+     * </ul>
+     */
+    public enum PathCodingMode {


Sounds good to me.

markt-asf · 2021-09-23T07:20:43Z

api/src/main/java/jakarta/servlet/ServletContext.java

+         * Effected methods will return/accept paths as was done prior to the introduction of the PathCodingMode in Servlet 6.0.
+         * The path returned by {@link HttpServletRequest#getContextPath()} is not decoded. Paths returned by the methods
+         * {@link HttpServletRequest#getServletPath()} and {@link HttpServletRequest#getPathInfo()} are fully decoded and have
+         * all path parameters removed. Decoding may include ambiguous characters such as a decoded <code>%2F</code>, which is


We can change the API to state that %2F is never decoded if we wish. The need for this is rare but there was a Tomcat user that needed to use an identifier in the REST API that included %2F. Decoding it isn't an option because of path traversal attacks. Rejecting it would work for most users but break stuff for the small number that actually need it. Leaving it as %2F should work for everyone. Users that don't use it will be unaffected and users that do need it can %nn decode it. If we go this route we should add a clear warning that only %2F should be decode and no other %nn sequences else there will be the possibility of double decoding and the range of security issues that opens up.

I agree query string needs a different handling. This should cover it:

First (unencoded) '?' denotes the start of the query string

Parse the query string into key/value pairs

%nn decode (all characters) both keys and values

%nn decoding needs to be after parsing in case keys and/or values contain & or =.

markt-asf · 2021-09-23T07:35:18Z

api/src/main/java/jakarta/servlet/ServletContext.java

+         * {@link HttpServletRequest#getServletPath()} and {@link HttpServletRequest#getPathInfo()} are fully decoded and have
+         * all path parameters removed. Decoding may include ambiguous characters such as a decoded <code>%2F</code>, which is
+         * indistinguishable from a segment separator '/' in the request URI. Methods accepting paths may accept encoded
+         * characters. A '?' character in a path will be treated as indicating a query string and will not be encoded. Some


Got it. I understand the intention here now and agree with it. How about something along the lines of:

"Paths passed to this method should use appropriate %nn encoding if the path includes characters that would otherwise be considered to be a delimiter per RFC 3986 such as '?'."

markt-asf · 2021-09-23T08:42:11Z

api/src/main/java/jakarta/servlet/ServletContext.java

+         * TODO or this: If a path with encoded reserve characters is received, then the request will be rejected by the servlet
+         * container with a 400 BAD REQUEST response.
+         */
+        DECODED,


A horrid problem indeed.

I don't recall any user requests specifically around %00 but I suspect that there are requirements around other control characters like %0A and %0D. Tomcat handles all these issues at the point the URI is converted to a file system path. The short version is that if the canonical file name and path does not match the requested file name and path then a 404 response is generated. I don't think we want to define how containers protect against this class of problem but I do think it is worth calling it out. Something like:

"URIs may permit characters which have a special meaning for the underlying file system. When mapping a request URI, or part of a URI, to a file system path containers are expected (required?) to ensure that any characters with special meaning for the underlying file system are appropriately handled to avoid any potential security issues."

/aaa/..;b=2/ and similar constructs have been an ongoing source of problems including security issues. I've had long discussions with the httpd team - including briefly with Roy Fielding who co-authored RFC 3986 - and the short version is:

/..;b=2/ is not equivalent to /../ and should not be treated as such

If the servlet spec chooses to ignore path parameters in mappings, security constraints etc then that is going to create problems when interacting with other systems like httpd

If we want to allow path parameters (I think we should as some users need them) then we either need to completely redefine the mapping process, security constraints etc or we need to introduce special handling to avoid the path traversal issues that removing path parameters from /..;b=2/ creates. I prefer the latter option. I think the scary cases can all be covered be something along these lines:

"Any URI containing a segment with path parameters that, when those path parameters are removed, is a dot-segment must be rejected with a 400 response."

Expanding on your proposed processing sequence and tweaking it a bit I'd suggest:

extract query string, if any, for separate processing

decode unreserved %nn

normalize

reject dot-segemts with path parameters

remove all path parameters

decode reserved %nn expect %2F

map the request

Looking through Jetty's scary URIs, I see various types:

Those I'd class as 'suspicious' like /%2E%2E/ that look like attempts to do something nefarious although the attempt should fail. I don't think the spec should require these to be rejected. Containers would be free to block these as a differentiating security feature.

// != '/' as far as RFC 3986 is concerned but given the huge number of apps that make mistakes when building URIs I think we need to define 'normalize' to include collapsing multiple '/' to a single '/'.

%2F leave as %2F but as with the suspicious sequences containers could offer an option to reject

dot-segments with path parameters should be rejected with a 400 as above

Use of %25 - see comments below on double decoding

Containers should be free to allow URIs to use encodings other than UTF-8. I'd expect strict enforcement though. Any URI that fails decoding as per the container configured encoding for URIs should be rejected

Double decoding is another area where there is opportunity to introduce security vulnerabilities. I think we need to go through the Servlet API and for every method that either accepts or returns a path we should state which of the following categories that the path falls into:

encoded, unnormalized and still containing any path parameters

decoded unreserved characters, normalized, path parameters removed, all reserved characters encoded

decoded unreserved characters, normalized, path parameters removed, all reserved characters decoded except %2F

My initial thoughts are that we'd use 1) for returned paths currently described as encoded / 'original'; 2) for paths passed into methods (mainly so we can distinguish between ? in a path and the start of a query string and 3) for returned paths currently described as decoded.

I think we can define a small set of 'scary' URIs that should always be rejected without impacting users. Rejection of 'suspicious' URIs I think we can leave to the container.

gregw · 2021-09-23T23:34:49Z

@markt-asf I'm going to do a meta response first before diving into the details of your reply. I think we need to agree a general approach and then deep dive to stress test it.

I like your 'suspicious' sub-catagory of 'scary'. The key sub-catagory in Jetty's handling is ambiguous, with the key example being decoding %2F where it is impossible for the app to tell if it is a real '/' without looking at the requestURI. We also catagorize things like '%2e%2e' as ambiguous as currently the spec is a bit vague about what normalization should be done, specially for requests that may transit a RequestDispatcher (or 2) so an app can have a hard time knowing if a '..' is really a '..' or the result of decoding, double decoding or even tripple decoding etc.

But regardless of the exact membership of the catagories, I'm seeing a general approaches we are discussing fitting into the following pattern/options:

existing path providing methods will have their exact decoding,normalization and de-parameterization defined. For the existing methods they will:
- never return undecoded characters in a path. A % character in the path could only be there by being decoded from %25(including from RequestDispatcher).
- never contain path parameters. a ; character must have been encoded (including from RequestDispatcher).
- never contain a real '..' or '.' segment and so if they are present, they must have been encoded (including from RequestDispatcher)
default 6.0 behavior will be that path providing methods will not return 'ambiguous' paths. This will be prevented either by:
a) rejecting the request with a 400 before handling; or
b) throwing an exception if the decoded path methods are called.
A container may also be configured to reject 'suspicious' paths.
Apps that want 'suspicious' or 'ambiguous' paths can:
c) configure the container so that these methods may return decoded suspicious and/or ambigous paths (modal approach); and/or
d) use new methods that are never ambiguous (method approach)
methods that take paths:
e) will be default reject any paths that are not allowed to be returned by the current settings for the path providing methods (modal approach); and/or
f) new method that take specifically encoded paths will be defined that avoid ambiguities.

I'm not saying we do all of a-f, but that our solution is probably a subset of a-f.

The approach a) feels better that b), except that it doesn't work well with new methods approaches d) and/or f). If an app is safe because it is using the new methods then we should not reject with a 400 so they can use the new methods... but then if they have one component that calls the old methods they could be unsafe. So perhaps the default rejection is a), but an app can declare it wants b) so that it can access new methods.

The advantage of c) modal approach is that we can match the behavior of the methods taking paths e) so that they only accept what can be produced.
Conversely the problem of d) new methods is that the methods that take paths cannot know from where a passed string came. It could be made from old methods, new methods, the app, the user or a brain-blowing combination of all!

You know... I think we are perhaps giving ourselves an impossible task. If we want an unambiguous safe well specified string representation of a URI, then that is a normalized encoded URI. I don't think we can come up with a string representation any better than that. Perhaps it would be better if any new methods were not string based. We already have the (badly named) getRequestURI method that returns the URI string and can be passed to whatever URI decoding library the application wishes to use.

So rather than trying to invent a new way to make a safe string URI representation, we should just prevent the exising methods from working with unsafe string URI representations. Then we provide new methods that are not based on strings, but rather classes. We are after all programming in a strongly typed language. Perhaps these methods could use the existing java.net.URI class, but I don't think that is very helpful, as it really just provides help with the structure of an absolute URI and for the path segment is basically just is a well checked string, but with all the same issues on getPath() vs getRawPath() and no smarts about HTTP paths (eg parameters). What we need is a HttpURIPath class that provides safe and conveniant access to the path, segment by segment split into matched segments and info segments etc. A new class like that would stop the brain explosions resulting from any method taking a string path, whose providence is entirely unknown.

In fact, the idea of a new class is kind of just aggregatting the new methods suggested by @markt-asf into a class that can have a lot more convenience methods and use modern language features (Optional, Streams etc.) for navigating the segments/mapping of a URI.

I'll stop at this point, as this is already a too long response... but is the idea of a new class a non-starter or something I should flesh out a bit more?

Sigh... just realized we need to include HttpServletMapping.getMatchValue() in our considerations!

markt-asf · 2021-09-27T11:40:35Z

I think the idea of a new class is a good one.

I also agree with your general approach for the existing methods and much of the detail as well although there are a few places where we aren't quite aligned.

The existing methods are what I am most concerned about at this point. As much as it pains me to suggest it, given the time constraints around Servlet 6.0, how do you feel about a call to try and work through some of the detail? I'm thinking you, me and any other interested parties with the aim of bringing a proposal back here for wider feedback.

gregw · 2021-09-27T23:14:39Z

@markt-asf I think a call would be good to progress this, as time is short. But I think we need to do a bit of preparation for it, so all coming in are up to speed with the exact issues and we can focus on solution. So can we make the call later in the week - Thursday or Friday? Timezone might be a bit of an issue, although @stuartwdouglas and I are in the same zone. Is there anybody from a European timezone that wishes to attend?

What I'd like to do is move the preparatory discussion back to #18 itself rather than here on this PR, as I think no matter what there will be changes needed to this PR and we should not bias the discussion one way or another. So today I will try to write a concise summary of the exact issue together with some example scary URIs. I'll then try to put together a series of criteria/questions against which we can evaluate all the potential solutions. I'll post something to #18 shortly. Should I close this PR for now?

markt-asf · 2021-09-28T07:54:31Z

I am in UTC+1 but can go late into my evening. Thursday works for me. No objections to moving the discussion back to #18. I think closing this PR for now makes sense.

gregw added 2 commits September 21, 2021 11:42

First draft of clarification of paths for jakartaee#18

a5d57fd

First draft of clarification of paths for jakartaee#18

Updated resource methods

981bd8a

gregw requested review from markt-asf and stuartwdouglas September 21, 2021 02:04

gregw linked an issue Sep 21, 2021 that may be closed by this pull request

Clarify behaviour of getRequestURI(), getContextPath(), getServletPath() and getPathInfo() #18

Open

gregw commented Sep 21, 2021

View reviewed changes

gregw added 3 commits September 21, 2021 12:16

improved line wrap

ae9d248

improved javadoc

caa68e0

fixed format

6b760b9

stuartwdouglas reviewed Sep 22, 2021

View reviewed changes

api/src/main/java/jakarta/servlet/ServletContext.java Show resolved Hide resolved

more javadoc

624a8ed

stuartwdouglas reviewed Sep 22, 2021

View reviewed changes

api/src/main/java/jakarta/servlet/ServletContext.java Show resolved Hide resolved

gregw added 2 commits September 22, 2021 11:16

alternative for DECODED handling

107e9b3

simple DECODED handling. Just reject any requests with encoded reserv…

5c896a1

…ed characters.

markt-asf reviewed Sep 22, 2021

View reviewed changes

markt-asf reviewed Sep 23, 2021

View reviewed changes

gregw closed this Sep 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue #18 path modes #420

Issue #18 path modes #420

gregw commented Sep 21, 2021

gregw Sep 21, 2021

markt-asf Sep 22, 2021

gregw Sep 22, 2021

markt-asf Sep 23, 2021

markt-asf left a comment

markt-asf Sep 22, 2021

markt-asf Sep 22, 2021

gregw Sep 22, 2021

stuartwdouglas Sep 22, 2021

markt-asf Sep 23, 2021

markt-asf Sep 22, 2021

gregw Sep 22, 2021

markt-asf Sep 23, 2021

markt-asf Sep 22, 2021

gregw Sep 22, 2021

markt-asf Sep 22, 2021

gregw Sep 22, 2021

markt-asf Sep 23, 2021

markt-asf Sep 22, 2021

gregw Sep 22, 2021

markt-asf Sep 23, 2021

gregw commented Sep 22, 2021

markt-asf left a comment

markt-asf Sep 23, 2021

markt-asf Sep 23, 2021

markt-asf Sep 23, 2021

markt-asf Sep 23, 2021

gregw commented Sep 23, 2021

markt-asf commented Sep 27, 2021

gregw commented Sep 27, 2021

markt-asf commented Sep 28, 2021

Issue #18 path modes #420

Issue #18 path modes #420

Conversation

gregw commented Sep 21, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markt-asf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregw commented Sep 22, 2021

markt-asf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregw commented Sep 23, 2021

markt-asf commented Sep 27, 2021

gregw commented Sep 27, 2021

markt-asf commented Sep 28, 2021