Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URLPattern #1011

Open
zloirock opened this issue Nov 21, 2021 · 3 comments
Open

URLPattern #1011

zloirock opened this issue Nov 21, 2021 · 3 comments

Comments

@zloirock
Copy link
Owner

Since core-js already contains URL and URLSearchParams, it could be good to implement URLPattern. I hope that we could reuse the URL parser from the web.url module.

https://developer.mozilla.org/en-US/docs/Web/API/URL_Pattern_API
https://wicg.github.io/urlpattern/
https://github.com/kenchris/urlpattern-polyfill (however, I think that core-js should follow another way)

I still didn't start to work on it, so if someone wanna contribute - feel free to do it.

@precious-void
Copy link

@zloirock Hey, unfortunately, it looks like they emphasize the difference between URL parser and theirs:

The URLPattern constructor string algorithm is very similar to the basic URL parser algorithm, but some differences prevent us from using that algorithm directly.

First, the URLPattern constructor string parser operates on tokens generated using the "lenient" tokenize policy. In constrast, basic URL parser operates on code points. Operating on tokens allows the URLPattern constructor string parser to more easily distinguish between code points that are significant pattern syntax and code points that might be a URL component separator. For example, it makes it trivial to handle named groups like ":hmm" in "https://a.c:hmm.example.com:8080" without getting confused with the port number.

Second, the URLPattern constructor string parser needs to avoid applying URL canonicalization to all code points like basic URL parser does. Instead, we perform canonicalization on only parts of the pattern string we know are safe later when compiling each component pattern string.

Finally, the URLPattern constructor string parser does not handle some parts of the basic URL parser state machine. For example, it does not treat backslashes specially as they would all be treated as pattern characters and would require excessive escaping. In addition, this parser may not handle some more esoteric parts of the URL parsing algorithm like file URLs with a hostname. The goal with this parser was to handle the most common URLs while allowing any niche case to be handled instead via the URLPatternInit constructor.

https://wicg.github.io/urlpattern/#constructor-string-parsing

I have started some "draft" branch, but unfortunately, I'm stuck in several places and I'm not sure how to solve them.

  1. Usage of RegExpCreate on line https://github.com/shtelzerartem/core-js/blob/feature/url-pattern/packages/core-js/modules/web.url-pattern.js#L580
  2. How to check if code point is contained in IndentifierStart / IndentifierPart? And honestly, I'm not really sure what these things are.
    On line https://github.com/shtelzerartem/core-js/blob/feature/url-pattern/packages/core-js/modules/web.url-pattern.js#L30

@zloirock
Copy link
Owner Author

Thanks, @shtelzerartem.

Yes, I saw this note. They are different, but in my vision, it can be possible to modify the parser for both cases.

RegExpCreate is not a problem, it can be just a RegExp constructor, the problem is in the usage of the u flag. core-js does not polyfill the u flag - that requires a full Unicode implementation that could be too heavy. I think that it's possible to detect the u flag support in the engine, if it's supported - create a regex with this flag. If it's not supported and does not contain entries that require this flag - create a regex without any flags, otherwise - throw an error.

The same situation with IndentifierStart / IndentifierPart. For example, for the same reason, core-js uses incomplete RegExpIdentifierName for NCG.

@zloirock
Copy link
Owner Author

zloirock commented Aug 1, 2022

nodejs/node#42133

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants