-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #18 path modes #420
Issue #18 path modes #420
Conversation
First draft of clarification of paths for jakartaee#18
* <a href="https://datatracker.ietf.org/doc/html/rfc2396#section-3.3">RFC2396 Section 3.3</a>.</li> | ||
* </ul> | ||
*/ | ||
public enum PathCodingMode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach I landed on is to have the 3 modes: LEGACY
is what we have now and will be the default for any web.xml <= 5.0. The DECODED
mode is more or less what we've called SAFE
before, it will be the default for >=6.0 and paths will not have any encodings in them and encoded reserved characters are just illegal. ENCODED
is the optional mode that allows encoded reserved characters.
I might be persuaded that the 6.0 default should be ENCODED
... in which case perhaps we could drop DECODED
and go to a boolean mode? But naming is difficult and I can't convince myself 100% of this.
I've included path parameters behaviours, but perhaps that needs a n independent mode?
We could put something in web.xml for this... but let's finalize the design before thinking about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the boolean approach. I also prefer taking the boolean approach further and essentially making LEGACY
container specific configuration.
I'm not currently convinced of the need for multiple modes but given the large number of complexities and edge cases I am keeping an open in case there is a need for multiple modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to get to boolean approach too.... but so far I've been unable to reduce down the behaviours to just two. So let's iterate on how we see the various modes working... and if we get down to 2 of them, we can go to the binary model and get rid of this enum and associated setter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of detail in the comments but the overview is I don't (currently) think we need modes:
- Make
LEGACY
a container specific option - Simplify
DECODED
mode - Use
DECODED
mode to map requests - Use
ENCODED
mode for paths in the API (return values and parameters)
* <a href="https://datatracker.ietf.org/doc/html/rfc2396#section-3.3">RFC2396 Section 3.3</a>.</li> | ||
* </ul> | ||
*/ | ||
public enum PathCodingMode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the boolean approach. I also prefer taking the boolean approach further and essentially making LEGACY
container specific configuration.
I'm not currently convinced of the need for multiple modes but given the large number of complexities and edge cases I am keeping an open in case there is a need for multiple modes.
* Effected methods will return/accept paths as was done prior to the introduction of the PathCodingMode in Servlet 6.0. | ||
* The path returned by {@link HttpServletRequest#getContextPath()} is not decoded. Paths returned by the methods | ||
* {@link HttpServletRequest#getServletPath()} and {@link HttpServletRequest#getPathInfo()} are fully decoded and have | ||
* all path parameters removed. Decoding may include ambiguous characters such as a decoded <code>%2F</code>, which is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would never expect to see '%2F' decoded. To do so is opening up the possibility of a path traversal vulnerability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how we can avoid decoding %2F as the current api says that the strings are decoded, so any container that doesn't decode is in violation. For jetty we by default reject such requests with a 400 unless the application is configured to say that it is prepared to accept such decoded strings.
For me that is the key difference between LEGACY
and DECODED
mode: the former will allow containers to act as they have previous, whilst the later is a well defined mode that never returns a decoded %2F because they have been rejected with a 400.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default Undertow does not decode these, however if I had my time again I would respond with a 400 instead. Either way we will need a container specific option to allow this, as some people want it, but IMHO we should be strict about rejecting this in the spec.
Although something I just thought of is that we likely need to be clearer about reserved characters in the different parts of the URI, e.g. %2F is likely fine as part of a query string. A query string does not really have the same security issues as the actual URI, so rejecting encoded reserved characters in the query string is almost certainly the wrong thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can change the API to state that %2F
is never decoded if we wish. The need for this is rare but there was a Tomcat user that needed to use an identifier in the REST API that included %2F
. Decoding it isn't an option because of path traversal attacks. Rejecting it would work for most users but break stuff for the small number that actually need it. Leaving it as %2F
should work for everyone. Users that don't use it will be unaffected and users that do need it can %nn decode it. If we go this route we should add a clear warning that only %2F
should be decode and no other %nn
sequences else there will be the possibility of double decoding and the range of security issues that opens up.
I agree query string needs a different handling. This should cover it:
- First (unencoded) '?' denotes the start of the query string
- Parse the query string into key/value pairs
- %nn decode (all characters) both keys and values
%nn decoding needs to be after parsing in case keys and/or values contain &
or =
.
* {@link HttpServletRequest#getServletPath()} and {@link HttpServletRequest#getPathInfo()} are fully decoded and have | ||
* all path parameters removed. Decoding may include ambiguous characters such as a decoded <code>%2F</code>, which is | ||
* indistinguishable from a segment separator '/' in the request URI. Methods accepting paths may accept encoded | ||
* characters. A '?' character in a path will be treated as indicating a query string and will not be encoded. Some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"...and will not be encoded." sounds odd to me. It suggests to me that the container will be encoding some characters. How about "The first unencoded '?' in a path will be treated as the delimiter indicating the start of the query string."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some characters can be encoded by at least some of the methods. For example, in LEGACAY mode getRequestDispatcher("/foo bar")
may result in a dispatcher for /foo%20bar
. But the main intention here is to say that if you have a resource called foo?bar
then you need to pass in /foo%3Fbar
and not /foo?bar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I understand the intention here now and agree with it. How about something along the lines of:
"Paths passed to this method should use appropriate %nn encoding if the path includes characters that would otherwise be considered to be a delimiter per RFC 3986 such as '?'."
* characters. A '?' character in a path will be treated as indicating a query string and will not be encoded. Some | ||
* implementation specific behavioural differences may be apparent in this mode. | ||
*/ | ||
LEGACY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much a LEGACY
mode is required is going to depend on how different the Servlet 6.0 behaviour is compared to what was implemented for Servlet 5.0. That is going to be container dependent. If there are some containers where the change in behaviour is small enough to be considered part of the usual changes between major versions then LEGACY
mode may not be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep - very happy to get rid of modal behaviour. Let's iterate on what we think the minimal modes are and try to reduce to 2... or better yet 1
* TODO or this: If a path with encoded reserve characters is received, then the request will be rejected by the servlet | ||
* container with a 400 BAD REQUEST response. | ||
*/ | ||
DECODED, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be simplified. Once path parameters have been removed, it is safe to decode any %nn reserved characters apart from %2F
. Therefore, I think we can remove the requirement to throw an error in this mode entirely. This mode then becomes:
- decode unreserved %nn
- remove all path parameters
- normalize
- decode reserved %nn except
%2F
This achieves the same result without causing problems for apps (like Jira) that use encoded reserved characters in some requests and removes the need to triggering exceptions that will break some apps that would otherwise work correctly.
Note: There is an interesting edge case here. /aaa/..;a=b/
. The proposal above would result in /
. However it can be argued that this is inconsistent with RFC 3986. There are other approaches. Given there is no legitimate reason I can think of for sequences like this I quite like rejecting such requests with a 400 response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The beauty of what you propose here is that it get's us really close to not needing to be modal, as we would only ever return sane paths here and insane ones would be rejected with 400.
But I'm not so sure that %2F is really the only problem character?
Potentially there are also issues with URIs like /aaa/..%5c../foo
which according to RFC3986 should decode to /aaa/..\../foo
, which can be missinterpreted on some file systems.
We've also seen issues with URIs like /foo%00
. Ah but this indicates a problem with specifing in terms of reserved from the RFC. I think we probably should specify in terms of unreserved, as there is a gap between those definitions that includes control characters.
Also if we allow encoded '%' (%25), then the resulting paths can contain percent sequences that can be dangerous to pass onto methods that a little flexible about receiving encoded characters or not. So for example /foo/%252E%252E/bar
could result in a /foo/%2e%2e/bar
path that may then get later converted to /foo/../bar
and normalized to just /bar
So I'm not totally convinced that %2F is the only danger encoding.
Regarding the edge case of /aaa/..;
, jetty is now rejecting such requests with 400 unless told to do otherwise.
The question is, if we do let such requests through, should the result be /
or /aaa/..
? The later is compliant with RFC3986 but potentially dangerous when passed to file handling code. I think your process about would result in /
So to be compliant with RFC3986 we'd could have a process like:
- decode unreserved %nn
- normalize
- remove all path parameters
- reject ambigous/scary/insane URIs. eg containing %2F, ".." etc. See Jetty's enumeration of scary URIs.
- decode reserved %nn
But despite these reservations, I think the general approach of path methodes never returning scary paths may be workable. But my concern is that we then move the "modes" to the definition of what is scary. Is that something we need to standardize? I think we need to at least enumerate the ones that should be rejected by default... we could then perhaps leave it up to containers to configure exceptions.
However, if we want applications that handle the exceptional cases by calling getRequestURI to be portable, then maybe we need to only fail scary URIs if the path methods are called rather than reject up front?
I also think we need to think through the modes on how methods that accept paths act. Currently I think there is a lot of flexibility, which is why so many scary URIs can be damaging if let flow through to these methods. I'm not sure how we can make them stricter without introducing at least a LEGACY mode.
Geeze this is a horrid problem!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A horrid problem indeed.
I don't recall any user requests specifically around %00
but I suspect that there are requirements around other control characters like %0A
and %0D
. Tomcat handles all these issues at the point the URI is converted to a file system path. The short version is that if the canonical file name and path does not match the requested file name and path then a 404 response is generated. I don't think we want to define how containers protect against this class of problem but I do think it is worth calling it out. Something like:
"URIs may permit characters which have a special meaning for the underlying file system. When mapping a request URI, or part of a URI, to a file system path containers are expected (required?) to ensure that any characters with special meaning for the underlying file system are appropriately handled to avoid any potential security issues."
/aaa/..;b=2/
and similar constructs have been an ongoing source of problems including security issues. I've had long discussions with the httpd team - including briefly with Roy Fielding who co-authored RFC 3986 - and the short version is:
/..;b=2/
is not equivalent to/../
and should not be treated as such- If the servlet spec chooses to ignore path parameters in mappings, security constraints etc then that is going to create problems when interacting with other systems like httpd
If we want to allow path parameters (I think we should as some users need them) then we either need to completely redefine the mapping process, security constraints etc or we need to introduce special handling to avoid the path traversal issues that removing path parameters from /..;b=2/
creates. I prefer the latter option. I think the scary cases can all be covered be something along these lines:
"Any URI containing a segment with path parameters that, when those path parameters are removed, is a dot-segment must be rejected with a 400 response."
Expanding on your proposed processing sequence and tweaking it a bit I'd suggest:
- extract query string, if any, for separate processing
- decode unreserved
%nn
- normalize
- reject dot-segemts with path parameters
- remove all path parameters
- decode reserved
%nn
expect%2F
- map the request
Looking through Jetty's scary URIs, I see various types:
- Those I'd class as 'suspicious' like
/%2E%2E/
that look like attempts to do something nefarious although the attempt should fail. I don't think the spec should require these to be rejected. Containers would be free to block these as a differentiating security feature. //
!= '/' as far as RFC 3986 is concerned but given the huge number of apps that make mistakes when building URIs I think we need to define 'normalize' to include collapsing multiple '/' to a single '/'.%2F
leave as%2F
but as with the suspicious sequences containers could offer an option to reject- dot-segments with path parameters should be rejected with a 400 as above
- Use of %25 - see comments below on double decoding
- Containers should be free to allow URIs to use encodings other than UTF-8. I'd expect strict enforcement though. Any URI that fails decoding as per the container configured encoding for URIs should be rejected
Double decoding is another area where there is opportunity to introduce security vulnerabilities. I think we need to go through the Servlet API and for every method that either accepts or returns a path we should state which of the following categories that the path falls into:
- encoded, unnormalized and still containing any path parameters
- decoded unreserved characters, normalized, path parameters removed, all reserved characters encoded
- decoded unreserved characters, normalized, path parameters removed, all reserved characters decoded except
%2F
My initial thoughts are that we'd use 1) for returned paths currently described as encoded / 'original'; 2) for paths passed into methods (mainly so we can distinguish between ?
in a path and the start of a query string and 3) for returned paths currently described as decoded.
I think we can define a small set of 'scary' URIs that should always be rejected without impacting users. Rejection of 'suspicious' URIs I think we can leave to the container.
* non-encoded reserved character is not possible, then the method will throw {@link MalformedURLException} if declared, | ||
* otherwise a {@link IllegalArgumentException}. | ||
*/ | ||
ENCODED, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If path parameters may be present in any segment present then it is not going to be possible write a <url-pattern>
that matches the servlet path. The only place path parameters will be usable is in path info. I think that is too restrictive.
I currently favour a mode that maps requests to servlets based using the simplified version of DECODED
that I described above but for methods that return path, those paths include the path parameters as described here. The mapped path (i.e. the one without the path parameters) remains available via HttpServletMapping
.
The idea is that with this approach things are essentially unchanged for the majority of applications. The small number of applications using path parameters may need changes. Containers have the option of providing a LEGACY
mode if there is sufficient demand from their user base from users with applications that use path parameters. I suspect that given that most apps are going to migrate from Jakarta EE 8 to Jakarta EE 10 then the few apps where changes to path parameters have an impact will recode the relevant bits of their app to handle the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like treating ';' as a normal character for path methods, but special for path matching. I think it needs to be one or the other, not both... specially when we consider the complexities of methods taking paths.
We could continue to treat ';' as special and remove all parameters from path methods. Apps wanting to handle them would need to use the requestURI as they do now. But we need to specify handling of scary URIs that use ';' to hide '..' etc.
Or we treat ';' as a character allowed in a segment (as is now apparently is). Then /..;
is not a problem because the segment is ..;
and not ..
. But if we are treating is as part of a segment, the it needs to be able to be mapped. I don't see it is impossible to map that because ';' can be included in a <url-pattern>
as the spec says that all other parts of a pattern string are used for exact matches, so they can contain ;
, \
, ?
and other scary characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - forgot to click the add button for this comment.
Arrgghhhh! Just when I think there is a workable solution in sight, I realise I have forgotten about one of the requirements. I think the suggestions above would give us a workable solution that removes most (hopefully all) the ambiguity in the current API. What it doesn't address is providing access to the path parameters and/or the 'raw' versions of contextPath, servletPath and pathInfo.
What if, rather than introducing the concept of modes, we added methods to HttpServletMapping
. I'm thinking:
getContextPathEncoded()
getServletPathEncoded()
getPathInfoEncoded()
where only the following had been performed;
- extract query string, if any, for separate processing
- decode unreserved %nn
- normalize
- reject dot-segemts with path parameters
@markt-asf I've kind of rebutted all your comments..... but I don't reject your basic premise... my comments are really just me thinking out loud. I do think there is potential value in the general approach you are advocating - ie make the returns from the path methods non-modal and handle the scary URIs in a different way. My concern remains defining scary URIs and then making sure we equally consider the risks of methods taking path strings. This non modal approach is essentially how jetty is working now... at least for the methods producing paths. I'm going to ponder your feedback more as getting rid of modes would be good and I don't think they are perfect either. But the good aspect of modes is that they do allow us to be stricter on what methods accepting paths will accept. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gregw Thanks for your comments re rebuttal. I feel much the same way but couldn't figure out how to express it.
I understand and share your concerns regarding security. Tomcat has had to deal with a bunch of issues in this area and I really don't want to do anything that opens up a new (or old) security issue.
* <a href="https://datatracker.ietf.org/doc/html/rfc2396#section-3.3">RFC2396 Section 3.3</a>.</li> | ||
* </ul> | ||
*/ | ||
public enum PathCodingMode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
* Effected methods will return/accept paths as was done prior to the introduction of the PathCodingMode in Servlet 6.0. | ||
* The path returned by {@link HttpServletRequest#getContextPath()} is not decoded. Paths returned by the methods | ||
* {@link HttpServletRequest#getServletPath()} and {@link HttpServletRequest#getPathInfo()} are fully decoded and have | ||
* all path parameters removed. Decoding may include ambiguous characters such as a decoded <code>%2F</code>, which is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can change the API to state that %2F
is never decoded if we wish. The need for this is rare but there was a Tomcat user that needed to use an identifier in the REST API that included %2F
. Decoding it isn't an option because of path traversal attacks. Rejecting it would work for most users but break stuff for the small number that actually need it. Leaving it as %2F
should work for everyone. Users that don't use it will be unaffected and users that do need it can %nn decode it. If we go this route we should add a clear warning that only %2F
should be decode and no other %nn
sequences else there will be the possibility of double decoding and the range of security issues that opens up.
I agree query string needs a different handling. This should cover it:
- First (unencoded) '?' denotes the start of the query string
- Parse the query string into key/value pairs
- %nn decode (all characters) both keys and values
%nn decoding needs to be after parsing in case keys and/or values contain &
or =
.
* {@link HttpServletRequest#getServletPath()} and {@link HttpServletRequest#getPathInfo()} are fully decoded and have | ||
* all path parameters removed. Decoding may include ambiguous characters such as a decoded <code>%2F</code>, which is | ||
* indistinguishable from a segment separator '/' in the request URI. Methods accepting paths may accept encoded | ||
* characters. A '?' character in a path will be treated as indicating a query string and will not be encoded. Some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I understand the intention here now and agree with it. How about something along the lines of:
"Paths passed to this method should use appropriate %nn encoding if the path includes characters that would otherwise be considered to be a delimiter per RFC 3986 such as '?'."
* TODO or this: If a path with encoded reserve characters is received, then the request will be rejected by the servlet | ||
* container with a 400 BAD REQUEST response. | ||
*/ | ||
DECODED, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A horrid problem indeed.
I don't recall any user requests specifically around %00
but I suspect that there are requirements around other control characters like %0A
and %0D
. Tomcat handles all these issues at the point the URI is converted to a file system path. The short version is that if the canonical file name and path does not match the requested file name and path then a 404 response is generated. I don't think we want to define how containers protect against this class of problem but I do think it is worth calling it out. Something like:
"URIs may permit characters which have a special meaning for the underlying file system. When mapping a request URI, or part of a URI, to a file system path containers are expected (required?) to ensure that any characters with special meaning for the underlying file system are appropriately handled to avoid any potential security issues."
/aaa/..;b=2/
and similar constructs have been an ongoing source of problems including security issues. I've had long discussions with the httpd team - including briefly with Roy Fielding who co-authored RFC 3986 - and the short version is:
/..;b=2/
is not equivalent to/../
and should not be treated as such- If the servlet spec chooses to ignore path parameters in mappings, security constraints etc then that is going to create problems when interacting with other systems like httpd
If we want to allow path parameters (I think we should as some users need them) then we either need to completely redefine the mapping process, security constraints etc or we need to introduce special handling to avoid the path traversal issues that removing path parameters from /..;b=2/
creates. I prefer the latter option. I think the scary cases can all be covered be something along these lines:
"Any URI containing a segment with path parameters that, when those path parameters are removed, is a dot-segment must be rejected with a 400 response."
Expanding on your proposed processing sequence and tweaking it a bit I'd suggest:
- extract query string, if any, for separate processing
- decode unreserved
%nn
- normalize
- reject dot-segemts with path parameters
- remove all path parameters
- decode reserved
%nn
expect%2F
- map the request
Looking through Jetty's scary URIs, I see various types:
- Those I'd class as 'suspicious' like
/%2E%2E/
that look like attempts to do something nefarious although the attempt should fail. I don't think the spec should require these to be rejected. Containers would be free to block these as a differentiating security feature. //
!= '/' as far as RFC 3986 is concerned but given the huge number of apps that make mistakes when building URIs I think we need to define 'normalize' to include collapsing multiple '/' to a single '/'.%2F
leave as%2F
but as with the suspicious sequences containers could offer an option to reject- dot-segments with path parameters should be rejected with a 400 as above
- Use of %25 - see comments below on double decoding
- Containers should be free to allow URIs to use encodings other than UTF-8. I'd expect strict enforcement though. Any URI that fails decoding as per the container configured encoding for URIs should be rejected
Double decoding is another area where there is opportunity to introduce security vulnerabilities. I think we need to go through the Servlet API and for every method that either accepts or returns a path we should state which of the following categories that the path falls into:
- encoded, unnormalized and still containing any path parameters
- decoded unreserved characters, normalized, path parameters removed, all reserved characters encoded
- decoded unreserved characters, normalized, path parameters removed, all reserved characters decoded except
%2F
My initial thoughts are that we'd use 1) for returned paths currently described as encoded / 'original'; 2) for paths passed into methods (mainly so we can distinguish between ?
in a path and the start of a query string and 3) for returned paths currently described as decoded.
I think we can define a small set of 'scary' URIs that should always be rejected without impacting users. Rejection of 'suspicious' URIs I think we can leave to the container.
@markt-asf I'm going to do a meta response first before diving into the details of your reply. I think we need to agree a general approach and then deep dive to stress test it. I like your 'suspicious' sub-catagory of 'scary'. The key sub-catagory in Jetty's handling is But regardless of the exact membership of the catagories, I'm seeing a general approaches we are discussing fitting into the following pattern/options:
I'm not saying we do all of a-f, but that our solution is probably a subset of a-f. The approach a) feels better that b), except that it doesn't work well with new methods approaches d) and/or f). If an app is safe because it is using the new methods then we should not reject with a 400 so they can use the new methods... but then if they have one component that calls the old methods they could be unsafe. So perhaps the default rejection is a), but an app can declare it wants b) so that it can access new methods. The advantage of c) modal approach is that we can match the behavior of the methods taking paths e) so that they only accept what can be produced. You know... I think we are perhaps giving ourselves an impossible task. If we want an unambiguous safe well specified string representation of a URI, then that is a normalized encoded URI. I don't think we can come up with a string representation any better than that. Perhaps it would be better if any new methods were not string based. We already have the (badly named) So rather than trying to invent a new way to make a safe string URI representation, we should just prevent the exising methods from working with unsafe string URI representations. Then we provide new methods that are not based on strings, but rather classes. We are after all programming in a strongly typed language. Perhaps these methods could use the existing java.net.URI class, but I don't think that is very helpful, as it really just provides help with the structure of an absolute URI and for the path segment is basically just is a well checked string, but with all the same issues on In fact, the idea of a new class is kind of just aggregatting the new methods suggested by @markt-asf into a class that can have a lot more convenience methods and use modern language features (Optional, Streams etc.) for navigating the segments/mapping of a URI. I'll stop at this point, as this is already a too long response... but is the idea of a new class a non-starter or something I should flesh out a bit more? Sigh... just realized we need to include |
I think the idea of a new class is a good one. I also agree with your general approach for the existing methods and much of the detail as well although there are a few places where we aren't quite aligned. The existing methods are what I am most concerned about at this point. As much as it pains me to suggest it, given the time constraints around Servlet 6.0, how do you feel about a call to try and work through some of the detail? I'm thinking you, me and any other interested parties with the aim of bringing a proposal back here for wider feedback. |
@markt-asf I think a call would be good to progress this, as time is short. But I think we need to do a bit of preparation for it, so all coming in are up to speed with the exact issues and we can focus on solution. So can we make the call later in the week - Thursday or Friday? Timezone might be a bit of an issue, although @stuartwdouglas and I are in the same zone. Is there anybody from a European timezone that wishes to attend? What I'd like to do is move the preparatory discussion back to #18 itself rather than here on this PR, as I think no matter what there will be changes needed to this PR and we should not bias the discussion one way or another. So today I will try to write a concise summary of the exact issue together with some example scary URIs. I'll then try to put together a series of criteria/questions against which we can evaluate all the potential solutions. I'll post something to #18 shortly. Should I close this PR for now? |
I am in UTC+1 but can go late into my evening. Thursday works for me. No objections to moving the discussion back to #18. I think closing this PR for now makes sense. |
For #18
@markt-asf this is my first draft of the things we've been discussing in #18. I think a bit of work is needed and there are still some changes that are at best to be considered as thought bubbles.
I've allowed servlet-api committers to edit this branch so update if you like, alternatively if you think this is in the ballpark, then I can move it to a branch in eclipse-ee4j repo so we can work on it collaboratively.