Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoding URL Components #234

Closed
ivan-tymoshenko opened this issue Jan 14, 2022 · 13 comments · Fixed by #282
Closed

Decoding URL Components #234

ivan-tymoshenko opened this issue Jan 14, 2022 · 13 comments · Fixed by #282
Labels

Comments

@ivan-tymoshenko
Copy link
Collaborator

There is how decoding URL components is working right now. If URL contains "%" symbol - URL will be decoded by standard js decodeURI function and if URL contains one of this symbols "# $ & + , / : ; = ? @", URL params will be parsed from the original (non-decoded) url and will be decoded either by fastDecode or customDecode function. There is a problem here. If one part of a URL has one of this symbols "# $ & + , / : ; = ? @", all url params will be parsed by fastDecode function and not parsed with decodeURI function.

test('Decode url components', t => {
  t.plan(3)

  const findMyWay = FindMyWay()

  findMyWay.on('GET', '/:param1/:param2', () => {})

  t.same(findMyWay.find('GET', '/foo%23bar/foo%23bar').params, { param1: 'foo#bar', param2: 'foo#bar' })
  t.same(findMyWay.find('GET', '/%F0%9F%8D%8C/%F0%9F%8D%8C').params, { param1: '🍌', param2: '🍌' })
  t.same(findMyWay.find('GET', '/%F0%9F%8D%8C/foo%23bar'), null) // shouldn't be true
  // t.same(findMyWay.find('GET', '/%F0%9F%8D%8C/foo%23bar'), { param1: '🍌', param2: 'foo#bar' }) // should be true
})
@ivan-tymoshenko
Copy link
Collaborator Author

I'm pretty sure that the previous test should work how I described it. But here I have a problem. Which of the two last commented test cases should be correct.

test('Decode url components', t => {
  t.plan(4)

  const findMyWay = FindMyWay()

  findMyWay.on('GET', '/:param', () => {})

  t.same(findMyWay.find('GET', '/foo%23bar').params, { param: 'foo#bar' })
  t.same(findMyWay.find('GET', '/%F0%9F%8D%8C').params, { param: '🍌' })
  t.same(findMyWay.find('GET', '/%F0%9F%8D%8C-foo').params, { param: '🍌-foo' })
  t.same(findMyWay.find('GET', '/%F0%9F%8D%8C-foo%23bar').params, { param: '🍌' }) // IMHO, it's wrong
  // t.same(findMyWay.find('GET', '/%F0%9F%8D%8C-foo%23bar').params, { param: '🍌-foo#bar' }) // First option
  // t.same(findMyWay.find('GET', '/%F0%9F%8D%8C-foo%23bar').params, { param: '%F0%9F%8D%8C-foo#bar' }) // Second option
})

@ivan-tymoshenko
Copy link
Collaborator Author

Another way of this problem: when people create a custom decoder what url they expect? If we pass to the decoder completely raw URL as we do know, then in one case if it's just '/%F0%9F%8D%8C' they will receive '🍌', but if it's '/%F0%9F%8D%8C' + something that contains "# $ & + , / : ; = ? @", they will receive "/%F0%9F%8D%8C" + something. And they will have to parse not only this symbols "# $ & + , / : ; = ? @", but also banana.

@ivan-tymoshenko
Copy link
Collaborator Author

There is a test that I don't understand.

  1. There defined a custom decoder.
  2. There is a dynamic route
    findMyWay.on('GET', '/:param', paramHandler)
  3. We pass "/%23%F0%9F%8D%8C" (encoded /#🍌 as I understand)
    const crazyRoute = findMyWay.find('GET', '/%23%F0%9F%8D%8C')
  4. We expected "/%23%F" as a result

The last part is simply ignored.

@Eomm
Copy link
Contributor

Eomm commented Jan 14, 2022

If URL contains "%" symbol - URL will be decoded by standard js decodeURI function

True

if URL contains one of this symbols "# $ & + , / : ; = ? @"

Almost: the URL must contain those encoded symbols

If one part of a URL has one of this symbols "# $ & + , / : ; = ? @", all url params will be parsed by fastDecode function and not parsed with decodeURI function

Do you think fastDecode has some issues?

t.same(findMyWay.find('GET', '/%F0%9F%8D%8C-foo%23bar').params, { param: '🍌' }) // IMHO, it's wrong

I agree

Which of the two last commented test cases should be correct.

I think the first one: '🍌-foo#bar'

We expected "/%23%F" as a result

I agree that it is an error. There is some issue with the parameter counter when the slice function is applied.

{
  originPath: '/%23%F0%9F%8D%8C',
  decoded: '/%23🍌',
  containsEncodedComponents: true
}

@ivan-tymoshenko
Copy link
Collaborator Author

I have a problem here and I need your help with it. I will describe how we decode a URL with params. At first, we decode the whole URL with decodeURI function, in order to decode symbols we need to match the URL to a registered path. Then if we find a param, we need to parse and decode it again from the original path (from the path that we didn't decode with decodeURI). We found the param in the decoded path and should parse it from the non-decoded path, and these paths could mismatch.

Example:

  1. We set up /🍌/:param URL
  2. Trying to match this URL /%F0%9F%8D%8C/param%23 (encoded 🍌 symbol == %F0%9F%8D%8C). We expect that '%F0%9F%8D%8C' would be decoded to the 🍌, then we will find a param and will decode it. Param should be equal 'param#'
  3. We parse the URL with decodeURI function and get /🍌/param%23 URL
  4. We find a param in the decoded URL /🍌/param%23, it starts at index 3
  5. We need to parse and decode param from the original URL /%F0%9F%8D%8C/param%23, but all we know is that param starts at index 3 in the decoded URI.

Any good ideas of how should we do that?

@ivan-tymoshenko
Copy link
Collaborator Author

@mcollina @Eomm I need your help here. Maybe you know a good example of another lib or standard which should describe how it should work?

@mcollina
Copy link
Collaborator

I would recommend looking at path-to-regexp or URLPatter https://wicg.github.io/urlpattern/

@ivan-tymoshenko
Copy link
Collaborator Author

@mcollina I tested the path-to-regexp library.

const { match } = require("path-to-regexp")

const matchStatic = match("/🍌", { decode: decodeURIComponent });
console.log(matchStatic("/%F0%9F%8D%8C")) // returns false

console.log(decodeURI('/%F0%9F%8D%8C')) // returns "/🍌"

Is this correct behavior?

@ivan-tymoshenko
Copy link
Collaborator Author

ivan-tymoshenko commented Apr 18, 2022

URLPattern matches the url, but it doesn't decode the param.

const pattern = new URLPattern('https://host.com/🍌/:id')
console.log(pattern.exec('https://host.com/%F0%9F%8D%8C/123%23')) // returns param id as 123%23, not as 123# 

@ivan-tymoshenko
Copy link
Collaborator Author

path-to-regexp, URLPattern, and find-my-way have three different ways to deal with encoded URLs.

@mcollina
Copy link
Collaborator

I don't think there is a preferred way to handle this.

@ivan-tymoshenko
Copy link
Collaborator Author

Ok, but the problem is that find-my-way has an issue with that.
#234 (comment)

@mcollina
Copy link
Collaborator

mcollina commented Apr 19, 2022

Unfortunately I do not have enough time right now to design a solution for this. I'd be happy to review a PR.

IMHO we should align to either path-to-regexp or URLPattern. Ideally in a way that minimizes breaking changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants