Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent default port number in HttpURI and HostPort #11488

Closed
cowwoc opened this issue Mar 4, 2024 · 23 comments · Fixed by #11578
Closed

Inconsistent default port number in HttpURI and HostPort #11488

cowwoc opened this issue Mar 4, 2024 · 23 comments · Fixed by #11578
Assignees
Labels
Bug For general bugs on Jetty side

Comments

@cowwoc
Copy link
Contributor

cowwoc commented Mar 4, 2024

Jetty version(s)
12.0.6

Jetty Environment
core

Description
HttpURI.Immutable(String) uses a default port number of -1 but HostPort uses a default value of 0. This means that anyone who wants to parse URIs coming from Jetty needs to handle both possible values.

Expected behavior: use a consistent default across all classes in Jetty.

Related issue/comment: #7269 (comment)

@cowwoc cowwoc added the Bug For general bugs on Jetty side label Mar 4, 2024
@cowwoc cowwoc changed the title Inconsistent default value between HostPort and HttpURI Inconsistent default port number Mar 4, 2024
@cowwoc
Copy link
Contributor Author

cowwoc commented Mar 4, 2024

Personally, I prefer a default value of -1 as it is consistent with the value used by URI. This means that if I try constructing a new URI, like this:

new URI(requestUri.getScheme(), requestUri.getUser(), requestUri.getHost(), requestUri.getPort(), "/callback", null, null);

if the port number is -1 I will get a URI that uses the default port that is associated with the scheme. If the port number is 0 I will end up with a URI whose port number is literally 0 and the URI cannot be opened.

@joakime
Copy link
Contributor

joakime commented Mar 5, 2024

@cowwoc the HttpURI class has undergone some changes in Jetty 12.0.7, esp around port numbers.

You can test 12.0.7 right now by using the maven repository URL https://oss.sonatype.org/content/groups/jetty-with-staging/

Also, be careful of the multi-parameter URI constructors.
They will encode the various parameters before constructing the URI.

Eg:

jshell> var u = new URI("http", null, "localhost", 4444, "/a%2Fb", "c=d%2Fe", "frag")
u ==> http://localhost:4444/a%252Fb?c=d%252Fe#frag

jshell> var u = new URI("http", null, "localhost", 4444, "/a\\b", "c=d\\e", "frag");
u ==> http://localhost:4444/a%5Cb?c=d%5Ce#frag

The host, path, query, and fragment parameters can all experience this encoding step.

@cowwoc
Copy link
Contributor Author

cowwoc commented Mar 5, 2024

@joakime Interesting. Take a look at https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/net/URI.html#identities-heading

So you're right that the constructors encode the parameters, but it turns out that the getters (e.g. getPath()) decode the values before they return the value. So will I really run into a problem?

So long as HttpURI does the same (return decoded values) then we should be good.

@joakime
Copy link
Contributor

joakime commented Mar 5, 2024

So long as HttpURI does the same (return decoded values) then we should be good.

Nope, as the URI.toASCIIString() has the (double) encoded values, it's been mangled and no longer represents the input.
Eg: you pass in something pct-encoded, you now have the % itself encoded to %25.

jshell> var u = new URI("http", null, "localhost", 4444, "/a\\b", "c=d\\e", "frag")
u ==> http://localhost:4444/a%5Cb?c=d%5Ce#frag

jshell> u.toASCIIString()
$2 ==> "http://localhost:4444/a%5Cb?c=d%5Ce#frag"

jshell> var u1 = new URI("http", null, "localhost", 4444, "/a%2Fb", "c=d%2Fe", null)
u1 ==> http://localhost:4444/a%252Fb?c=d%252Fe

jshell> u1.toASCIIString()
$4 ==> "http://localhost:4444/a%252Fb?c=d%252Fe"

@joakime
Copy link
Contributor

joakime commented Mar 5, 2024

This doesn't happen with the single constructor version, (or URI.create(String) method)

jshell> var u2 = new URI("http://localhost:4444/a%2Fb")
u2 ==> http://localhost:4444/a%2Fb

jshell> u2.toASCIIString()
$6 ==> "http://localhost:4444/a%2Fb"

jshell> var u3 = URI.create("http://localhost:5555/foo%2Fbar/")
u3 ==> http://localhost:5555/foo%2Fbar/

jshell> u3.toASCIIString()
$8 ==> "http://localhost:5555/foo%2Fbar/"

Note that per the URI spec, the path /a%2Fbar/ and /a/b/ are not equivalent.
The first one is a single path segment of a/b, while the second is two path segments a and b (due to delimitor rules)
If you are not careful with your encoding & decoding you can wind up in this situation quite easily.

@cowwoc
Copy link
Contributor Author

cowwoc commented Mar 5, 2024

It doesn't sound like we're talking about the same thing exactly. I agree that toASCIIString() returns an encoded string. My point was that all of URI's component-oriented constructors/getters work well with each other.

If you use the URI constructor that accepts multiple components as input, and then read them out using the individual getters, you should get back your original values. I believe the problem only occurs if you push in multiple components then try to pull out a full string, or vice-versa. So long as your in and out operations are consistent you should be okay.

That's why I said that if HttpURI's getters were to also return the decoded values, then I would be able to pass them unchanged into the multi-argument URI constructor.

@joakime
Copy link
Contributor

joakime commented Mar 5, 2024

Yes, but the flaw is that things like URI.getPath() is returning a fully decoded version of all pct-encoded octets, it frankly decodes too much.

It should never decode a reserved character - https://datatracker.ietf.org/doc/html/rfc3986#section-2.2
And it should only decode unreserved characters - https://datatracker.ietf.org/doc/html/rfc3986#section-2.3
Any other pct-encoded octets should only be decoded according to either the scheme specific rules or application rules. (Thankfully this is still possible, but only by using the URI.getRaw*() methods. the normal getters, like URI.getPath(), have no value to either an Http Server or an Http Client implementation)

If you use the multi-arg URI constructor, and have very precise pct-encoding requirements, such as needing to use %2F in place of / in a specific place on the path section of a URI, then you are not producing the results that you need. Since the multi-arg constructor encodes, you might think, oh, but I want it to encode, so I can use things like ...

jshell> var u = new URI("http", null, "local", 5555, "/a\\b/", null, null)
u ==> http://local:5555/a%5Cb/

jshell> var u = new URI("http", null, "local", 5555, "/a b/", null, null)
u ==> http://local:5555/a%20b/

But what if what you want to have encoded is a reserved character?

jshell> var u = new URI("http", null, "local", 5555, "/a/b/", null, null)
u ==> http://local:5555/a/b/

No, we cannot do that with the multi-arg constructor.

Note that URI is also braindead when it comes to UTF-8.

jshell> var u = new URI("http", null, "local", 5555, "/euro/€/", null, null)
u ==> http://local:5555/euro/€/

jshell> u.toASCIIString()
$5 ==> "http://local:5555/euro/%E2%82%AC/"

jshell> u.toString()
$7 ==> "http://local:5555/euro/€/"

jshell> u.getRawPath()
$8 ==> "/euro/€/"

jshell> u.getPath()
$9 ==> "/euro/€/"

There's no way in the URI class to obtain the properly encoded path segment with UTF-8 encoding.

Unlike other auto-encoding characters.

jshell> var u = new URI("http", null, "local", 5555, "/a b/", null, null)
u ==> http://local:5555/a%20b/

jshell> u.getPath()
$11 ==> "/a b/"

jshell> u.getRawPath()
$12 ==> "/a%20b/"

Spring has this bug btw, as it relies on the multi-arg constructor. It's not possible to send, using the spring Http client, the path /foo/a%2Fb/ on the HTTP protocol, it will send /foo/a%252Fb/ instead. This was a big problem in Servlet 5 and older releases, now with Servlet 6 we can reject those kinds of Path definitions now (as it's impossible to properly represent that path in a consistent way across the API with all of the encoding / decoding behaviors present in the Servlet spec)

If you want to see how messy things can get, check out the https://github.com/jetty/jetty.project/blob/jetty-12.0.x/jetty-ee10/jetty-ee10-servlet/src/test/resources/jar-resource-odd.jar
That is an example of a resource jar that Servlet is expected to handle.

[jetty-12.0.x]$ jar -tvf jetty-ee10/jetty-ee10-servlet/src/test/resources/jar-resource-odd.jar 
     0 Wed Apr 10 13:59:32 CDT 2019 rez/oddities/
    24 Wed Apr 10 13:47:34 CDT 2019 rez/oddities/;
    35 Wed Apr 10 13:49:02 CDT 2019 rez/oddities/#hashcode
    30 Wed Apr 10 13:49:18 CDT 2019 rez/oddities/index.html#fragment
    20 Wed Apr 10 13:59:32 CDT 2019 rez/oddities/other%2fkind%2Fof%2fslash
    40 Wed Apr 10 13:46:36 CDT 2019 rez/oddities/a file with a space
    53 Wed Apr 10 13:48:00 CDT 2019 rez/oddities/;" onmousedown="alert(document.location)"
    35 Wed Apr 10 13:46:22 CDT 2019 rez/oddities/some\slash\you\got\there

While these are difficult to exist on all filesystems, they are perfectly valid names for resources from a jar/zip/war, and can be accessed via Jetty (even have properly encoded directory listings produced)
Hopefully you can see why we are especially touchy about encoding / decoding on URI, and why we have our own HttpURI implementation.

Anyway, this is a long side conversation now. 😄
Back to the port concerns. 🤓

Have you tested 12.0.7 changes?
Are you happy with them?

@cowwoc
Copy link
Contributor Author

cowwoc commented Mar 5, 2024

:) Sorry, I was tied up with frontend bugs today. I'll take a look later on this week.

Regarding the URI discussion, it sounds like you are saying that the only safe constructor to use is URI(String). Fair enough, but users need to be able to manipulate individual components of a URI (e.g. add/remove path elements, query parameters, etc). Not to mention having to handle encoding.

Honestly, I think that we need a UriBuilder-style API with a clean room implementation.

In an ideal world, we'd have a framework-independent implementation that Jetty and others could just link against. Baring that, you could just add your own version inside Jetty. Using URI correctly is turning out to be much harder than I initially expected.

@janbartel
Copy link
Contributor

The discussion seems to have strayed from the original topic: do we need to change Jetty's default port number to -1?

@cowwoc
Copy link
Contributor Author

cowwoc commented Mar 6, 2024

I would still recommend it for consistency. I'd there any downside to it?

The URI class uses -1 and everywhere else in Jetty's code port 0 means "any available port".

The reason we strayed into the URI conversation was that the original use-case that promoting this issue was the ability to build a new URI based on an existing HttpURI. Unless you plan to remove all getter methods and force users to use toURI() then I'd argue HttpURI must behave likes a URI itself.

I still haven't had a chance to try out version 12.0.7 but I'll try to get around to it later on today.

@cowwoc
Copy link
Contributor Author

cowwoc commented Mar 6, 2024

I reviewed the latest version HttpURI. It looks good, except for a possible bug at

.

Shouldn't this set the port to -1 instead of 0? The rest of the file uses -1...

@cowwoc
Copy link
Contributor Author

cowwoc commented Mar 12, 2024

Hey guys, I think version 12.0.7 is broken.

When a client requests:

https://licensed.app/callback?code=...&state=...

I invoke request.getHttpURI().toURI() and I get back:

https://licensed.app:0/callback?code=...&state=...

This is because HttpURI.toURI() invokes return new URI(getScheme(), null, getHost(), getPort(), getPath(), query == null ? null : UrlEncoded.decodeString(query), null); where getPort() returns 0.

You might want to consider releasing a hot-patch to fix this because this is a pretty common operation.

@cowwoc
Copy link
Contributor Author

cowwoc commented Mar 12, 2024

I tracked the problem down to the HttpURI.Mutable() class. The default constructor leaves the _port field unset, which means it gets initialized to 0 which causes this problem later on.

I suggest changing

to private int _port = -1;

as well as fixing #11488 (comment)

And it wouldn't hurt to add a unit test to cover some of these use-cases...

@cowwoc
Copy link
Contributor Author

cowwoc commented Mar 20, 2024

@joakime any update on fixing this bug?

@cowwoc
Copy link
Contributor Author

cowwoc commented Mar 26, 2024

@sbordet Do you guys plan to fix this bug before releasing version 12.0.8?

joakime added a commit that referenced this issue Mar 26, 2024
+ Changing unset port in HttpURI to -1 (from 0)
+ Changing unset port in HostPort to -1 (from 0)
@joakime
Copy link
Contributor

joakime commented Mar 26, 2024

@cowwoc this is a change that will not make it into 12.0.8.
While it seems straight forward, once to you start the first domino, it actually impacts quite a bit of the codebase.

See initial impl at PR #11578

@gregw
Copy link
Contributor

gregw commented Mar 27, 2024

Only tuning in here late in the conversation. Given that a) there is inconsistent usage of -1 and 0; b) neither -1 nor 0 are valid ports; then we should be strict in what we generate and lenient in what we accept.

I.e. passing in a port <=0 should mean the same thing. We should never ever generate a URI with "host:0" as the authority (can we fix that at least for 12.0.8?)
We should always report -1 for an unset/unknown port.

@gregw
Copy link
Contributor

gregw commented Mar 27, 2024

@cowwoc I believe we have already fixed the ":0" port issue in #11468 that was in 12.0.7??? So I cannot reproduce what you report in #11488 (comment)

I just wrote/ran the test:

    @Test
    public void testDefaultPort()
    {
        HttpURI uri = HttpURI.from("http://host/path/info;param?query=value#fragment");

        assertThat(uri.getScheme(), is("http"));
        assertThat(uri.getUser(), nullValue());
        assertThat(uri.getHost(), is("host"));
        assertThat(uri.getPort(), is(-1));
        assertThat(uri.getPath(), is("/path/info;param"));
        assertThat(uri.getCanonicalPath(), is("/path/info"));
        assertThat(uri.getParam(), is("param"));
        assertThat(uri.getQuery(), is("query=value"));
        assertThat(uri.getFragment(), is("fragment"));
        assertThat(uri.getAuthority(), is("host"));
        assertThat(uri.toURI().toASCIIString(), is("http://host/path/info;param?query=value#fragment"));
    }

which passes fine.

I also note that our asString code says:

                if (normalizedPort > 0)
                    out.append(':').append(normalizedPort);

and we never call the URI constructor that takes components anymore (as it just re-assembles them and parses the string anyway).

So I cannot reproduce. If we did produce a URI like https://licensed.app:0/callback?code=...&state=..., then I agree that we should definitely fix in this release cycle. But as for consistency between 0 and -1, that can wait until the next.

So please check if you can reproduce in 12.0.7 ASAP, as we are closing the door on 12.0.8 very soon.

@cowwoc
Copy link
Contributor Author

cowwoc commented Mar 27, 2024

@gregw I'll get back to you with a testcase later on today.

@cowwoc
Copy link
Contributor Author

cowwoc commented Mar 27, 2024

@gregw It turns out that HttpURI.from() is the wrong method to start with. If you run a full-fledged server, the following call sequence occurs:

HttpConnection.startRequest(String method, String uri, HttpVersion version) invokes
HttpConnection.newHttpStream(String method, String uri, HttpVersion version) which invokes
new HttpStreamOverHTTP1(method, uri, version); which invokes
_uri = uri == null ? null : HttpURI.build(method, uri); which invokes
return HttpURI.build().pathQuery(uri); which invokes
return new Mutable() which constructs a mutable HttpURI with _port having a default value of zero.

Consequently, when user code invokes Request.getHttpURI().getPort() they get a value of zero.

Is this sufficient information to reproduce the problem?

I'd make the following changes:

  1. Initialize Mutable._port with a value of -1.
  2. Have normalize() set the port number to -1. Given that this method is public, there is no guarantee that users won't invoke it and then HttpURI.getPort() will return an unexpected value.
  3. HostHeaderCustomizer.customize() uses URIUtil.normalizePortForScheme() to construct a new HttpURI with a port of zero, which is bad. You could either fix HostHeaderCustomizer or have URIUtil.normalizePortForScheme() return -1. I personally prefer the latter for consistency across the entire API.

joakime added a commit that referenced this issue Apr 4, 2024
…11578)

+ Changing unset port in HttpURI to -1 (from 0)
+ Changing unset port in HostPort to -1 (from 0)
@cowwoc
Copy link
Contributor Author

cowwoc commented Apr 4, 2024

Is it safe to assume that this fix will be included in version 12.0.9, which is due out in ~1 month from now?

@joakime
Copy link
Contributor

joakime commented Apr 4, 2024

Is it safe to assume that this fix will be included in version 12.0.9, which is due out in ~1 month from now?

Yes, barring anything that would cause us to revert it, this is going to be in 12.0.9

@cowwoc
Copy link
Contributor Author

cowwoc commented Apr 4, 2024

Perfect, thank you!

@joakime joakime changed the title Inconsistent default port number Inconsistent default port number in HttpURI and HostPort Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants