Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: parse the incoming http message url with the whatwg url class #51498

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

stefanbeigel
Copy link

@stefanbeigel stefanbeigel commented Jan 17, 2024

Draft for #51311
TODO:

  • add test
  • should the url object be added to the request object? (This would allow easy access to the search parameters)
  • should the property be named originalUrl or rawUrl

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/http
  • @nodejs/net

@nodejs-github-bot nodejs-github-bot added http Issues or PRs related to the http subsystem. needs-ci PRs that need a full CI run. labels Jan 17, 2024
@@ -1030,6 +1035,11 @@ function parserOnIncoming(server, socket, state, req, keepAlive) {
if (req.upgrade)
return 2;
}
if(server.parseIncomingUrlWithWhatWg) {
req.originalUrl = req.url;
const newUrl = new URL(req.url, 'http://localhost');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't using the request host header be better than using localhost?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but it depends on if we should expose the whole url instance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 here, let's rebuild the original URL using the scheme and the host.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 here, let's rebuild the original URL using the scheme and the host.

@mcollina
Copy link
Member

This will:

  1. slow signicantly the performance of the HTTP server down when enabled
  2. possibly create all sort of compatibility issues

Moreover, a user can always do it themselves, and both Fastify and Express offer ways to specify custom http servers and intercept the request prior to routing.

As a result, I'm not convinced this is useful.

@stefanbeigel
Copy link
Author

stefanbeigel commented Jan 17, 2024

Hi @mcollina

  1. slow signicantly the performance of the HTTP server down when enabled

Yes thats true, as currently no parsing is done.
Because of this it's an option you need to activate.
But if we would expose the parsed url object in http server it's maybe possible remove parsing in express and fastify, as you can use the query params from the url object (please correct me if I am wrong here)

  1. possibly create all sort of compatibility issues

Also true, but I think there are already compatibility issues, as the message.url does not follow the whatwg standard.
As stated karwa:

Furthermore, I think RFC-3986 is clear that . and .. components are intended only for hierarchical traversal within the path, and any servers which give them a different interpretation could reasonably be described as "incorrect".

Moreover, a user can always do it themselves, and both Fastify and Express offer ways to specify custom http servers and intercept the request prior to routing.

Hmm yes, but I think it is really is useful when a framework behaves consistent.
Why the url property does not follow the URL rules by the URL class?

BR Stefan

@dougwilson
Copy link
Member

I believe message.url is actually a http request-target and not a url (so the name in Node.js is confusing). For example it could be the single character * too, for the OPTIONS * HTTP/1.1 requests, or an absolute url for things like CONNECT https://www.google.com HTTP/1.1.

@stefanbeigel
Copy link
Author

stefanbeigel commented Jan 17, 2024

@dougwilson

(so the name in Node.js is confusing).

I guess most Node.Js starters think that message.url is the full url and not only the path + query.

CONNECT https://www.google.com HTTP/1.1

And I also guess that most of the experienced Node.Js developer don't know that message.url can also contain a full url if the request is forged right :D, as they only saw path + query requests.

@dougwilson
Copy link
Member

I cannot comment on what those groups may or may not think. I was just trying to answer the "Why the url property does not follow the URL rules by the URL class?"

@@ -445,6 +445,11 @@ function storeHTTPOptions(options) {
validateInteger(maxHeaderSize, 'maxHeaderSize', 0);
this.maxHeaderSize = maxHeaderSize;

const parseIncomingUrlWithWhatWg = options.parseIncomingUrlWithWhatWg;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option name needs to be definitely shorter. :)

@@ -1030,6 +1035,11 @@ function parserOnIncoming(server, socket, state, req, keepAlive) {
if (req.upgrade)
return 2;
}
if(server.parseIncomingUrlWithWhatWg) {
req.originalUrl = req.url;
const newUrl = new URL(req.url, 'http://localhost');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 here, let's rebuild the original URL using the scheme and the host.

@@ -1030,6 +1035,11 @@ function parserOnIncoming(server, socket, state, req, keepAlive) {
if (req.upgrade)
return 2;
}
if(server.parseIncomingUrlWithWhatWg) {
req.originalUrl = req.url;
const newUrl = new URL(req.url, 'http://localhost');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 here, let's rebuild the original URL using the scheme and the host.

@ShogunPanda
Copy link
Contributor

I agree with @mcollina and @dougwilson. This will slow down things and maybe be confusing for the user.
Rather than adding an option to automatically parse the URL, I'd rather add a new method or getter to IncomingMessage so that we get a situation where the current server is unaffected and the user has a convenience method to access a WHATWG compliant URL only when needed.

To clarify the last sentence, while your example for .. holds, it might only be useful on some routes (to stay in your example). With a server wide approach this is not possible.

@ShogunPanda
Copy link
Contributor

It looks fine to me now.
Can you please add docs and a test?

@targos
Copy link
Member

targos commented Jan 23, 2024

I'm not sure it's fine to provide a builtin getter that can throw an error on invalid incoming messages. It may even become a security risk.

@ShogunPanda
Copy link
Contributor

I get it's not ideal to have a getter that might throw.
But I don't get the security risk. What could pose such a risk?

@targos
Copy link
Member

targos commented Jan 23, 2024

It would be easy for the delevoper to trust the URL returned without thinking about the fact it is entirely built from user-provided input. For example, the Host header can be anything while the developer expects it to be the regular host name of their service.

@stefanbeigel
Copy link
Author

stefanbeigel commented Jan 23, 2024

Thanks for all the responses and the input

I am not sure if the URL getter really helps for solving the original issue, that ../ and ./ is not evaluated for incoming requests.
Frameworks like fastify and express would need to use pathname of the url instance for routing / handler matching.
And I guess @mcollina you also don't like this idea. Ofcourse it would possible to use this url instance to rewrite req.url but well the developer would still need to know about this...

@targos

It would be easy for the delevoper to trust the URL returned without thinking about the fact it is entirely built from user-provided input. For example, the Host header can be anything while the developer expects it to be the regular host name of their service.

Hmm, as far as I know there is no other way to resolve the host.
We could also put in the ip adresse there, but this is not helpfull at all I guess.

@ShogunPanda
Copy link
Contributor

It would be easy for the delevoper to trust the URL returned without thinking about the fact it is entirely built from user-provided input. For example, the Host header can be anything while the developer expects it to be the regular host name of their service.

Good point. We have to make this very clear in the docs.
Also, if we choose a method over a property getter we might allow an option host parameter that user can use to eliminate the risk. Do you think this would work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http Issues or PRs related to the http subsystem. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants