-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve uri parsing #739
Improve uri parsing #739
Conversation
You can see the tests results between the current/new implementation here: 4c67dc4?diff=unified&w=0#diff-869a2a47503f122e64605995d33927aecc3ca2014575ce859982df0dcae33023 |
lsp/src/uri0.ml
Outdated
let group re n = Re.Group.get_opt re n |> Option.value ~default:"" in | ||
let authority = group res 4 |> Uri.pct_decode in | ||
let path = | ||
let path = group res 5 |> Uri.pct_decode in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are pulling the uri package only for percent decoding, right? Doesn't seem like it's worth it as we can do percent encoding quite easily ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's also being used to encode
/decode
non-ASCII characters. For instance, when you convert a Uri to a string you should encode all non-ASCII characters.
Uri.of_path "/Users/jrieken/Code/_samples/18500/Mödel + Other Thîngß/model.js" |> Uri.to_string
If you don't encode, the string representation is: file:///Users/jrieken/Code/_samples/18500/Mödel + Other Thîngß/model.js
which is not correct. It should be instead: file:///Users/jrieken/Code/_samples/18500/M%C3%B6del%20%2B%20Other%20Th%C3%AEng%C3%9F/model.js
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, I think I misunderstood your question. Yes, that's correct, it's only being used for percent encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I see that there's a reference table that shows the percent encoded value for some characters: https://www.w3schools.com/tags/ref_urlencode.asp. It could be easily implemented without the uri
package.
But what about characters that are outside of that table? For instance, with the vscode-uri
package:
URI.file("/Users/mod❤.js").toString() // file:///Users/mod%E2%9D%A4.js
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what about characters that are outside of that table? For instance, with the vscode-uri package:
I would assume it's just a percent encoding of the special characters in a particular encoding. utf8 most likely?
6933c24
to
f07734b
Compare
Great PR @tatchi. Let's try to get this polished and release 1.12.3 with it. Users are going to be very appreciative of this. |
1ee25ea
to
7861780
Compare
lsp/src/uri_lexer.mll
Outdated
| "" { { scheme = "file"; authority = ""; path = "/" } } | ||
| "//" ([^ '/']* as authority) (['/']_* as path) { { scheme = "file"; authority; path } } | ||
| "//" ([^ '/']* as authority) { { scheme = "file"; authority; path = "/" } } | ||
| _* as path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we need this case for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To cover the case when the string is not empty and doesn't start with //
. That's the equivalent of 7861780#diff-4e2de9533ce99abc9d651d3c346c050c2e1923251c6ad49528deb91519d8748fL48
Also, if I remove that line I get the following error:
File "lsp/src/uri_lexer.mll", line 49, character 0: syntax error.
Do I misunderstand something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I think you got it. Can you split it into two cases though? One with the /
prefix and one without the /
. Should remove the needed for any helper functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever. Amazing how this of_path
function is now way simpler! 🤩
…ndows failing test" This reverts commit 9175448.
7861780
to
68a43ae
Compare
I should have addressed all the comments. I still have to write a change entry and eventually add some tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
CHANGES: ## Fixes - Fix a bad interaction between inferred interfaces and promotion code actions in watch mode (ocaml/ocaml-lsp#753) - Fix URI parsing (ocaml/ocaml-lsp#739 fixes ocaml/ocaml-lsp#471 and ocaml/ocaml-lsp#459)
Closing it as it was committed in 3ea0bb3 |
This PR changes the Uri module implementation to match the "official" one from vscode-uri. I tried to keep the code as close as possible to the vscode-uri to avoid silly mistakes from my end.
The first commit adds more tests taken from vscode-uri. The second commit is the one that changes the implementation.
Now that we have a "correct" implementation with more test cases, I'm happy to refactor it if you prefer. Keeping it closer to the base implementation might make updating it easier (although the base implementation doesn't seem to be changed often).
Four e2e tests are failing on windows on the master branch:
This PR fixes the first two. The workspace symbol test failing is probably due to how we print the path on the test side. It shouldn't be hard to fix.
For the code action, I'll have to investigate.