Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve uri parsing #739

Closed
wants to merge 24 commits into from
Closed

Improve uri parsing #739

wants to merge 24 commits into from

Conversation

tatchi
Copy link
Collaborator

@tatchi tatchi commented Jun 23, 2022

This PR changes the Uri module implementation to match the "official" one from vscode-uri. I tried to keep the code as close as possible to the vscode-uri to avoid silly mistakes from my end.

The first commit adds more tests taken from vscode-uri. The second commit is the one that changes the implementation.

Now that we have a "correct" implementation with more test cases, I'm happy to refactor it if you prefer. Keeping it closer to the base implementation might make updating it easier (although the base implementation doesn't seem to be changed often).

Four e2e tests are failing on windows on the master branch:

  • FAIL tests/textDocument-diagnostics.ts
  • FAIL tests/ocamllsp-switchImplIntf.ts
  • FAIL tests/workspace-symbol.test.ts
  • FAIL tests/textDocument-codeAction.test.ts

This PR fixes the first two. The workspace symbol test failing is probably due to how we print the path on the test side. It shouldn't be hard to fix.
For the code action, I'll have to investigate.

@tatchi
Copy link
Collaborator Author

tatchi commented Jun 23, 2022

You can see the tests results between the current/new implementation here: 4c67dc4?diff=unified&w=0#diff-869a2a47503f122e64605995d33927aecc3ca2014575ce859982df0dcae33023

lsp/src/uri0.ml Outdated
let group re n = Re.Group.get_opt re n |> Option.value ~default:"" in
let authority = group res 4 |> Uri.pct_decode in
let path =
let path = group res 5 |> Uri.pct_decode in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are pulling the uri package only for percent decoding, right? Doesn't seem like it's worth it as we can do percent encoding quite easily ourselves.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's also being used to encode/decode non-ASCII characters. For instance, when you convert a Uri to a string you should encode all non-ASCII characters.

Uri.of_path "/Users/jrieken/Code/_samples/18500/Mödel + Other Thîngß/model.js" |> Uri.to_string

If you don't encode, the string representation is: file:///Users/jrieken/Code/_samples/18500/Mödel + Other Thîngß/model.js which is not correct. It should be instead: file:///Users/jrieken/Code/_samples/18500/M%C3%B6del%20%2B%20Other%20Th%C3%AEng%C3%9F/model.js

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I think I misunderstood your question. Yes, that's correct, it's only being used for percent encoding.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I see that there's a reference table that shows the percent encoded value for some characters: https://www.w3schools.com/tags/ref_urlencode.asp. It could be easily implemented without the uri package.

But what about characters that are outside of that table? For instance, with the vscode-uri package:

URI.file("/Users/mod❤.js").toString() // file:///Users/mod%E2%9D%A4.js

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what about characters that are outside of that table? For instance, with the vscode-uri package:

I would assume it's just a percent encoding of the special characters in a particular encoding. utf8 most likely?

@rgrinberg
Copy link
Member

Great PR @tatchi. Let's try to get this polished and release 1.12.3 with it. Users are going to be very appreciative of this.

@rgrinberg rgrinberg added this to the 1.12.3 milestone Jun 28, 2022
@tatchi tatchi force-pushed the improve-uri-parsing branch 2 times, most recently from 1ee25ea to 7861780 Compare July 1, 2022 21:28
| "" { { scheme = "file"; authority = ""; path = "/" } }
| "//" ([^ '/']* as authority) (['/']_* as path) { { scheme = "file"; authority; path } }
| "//" ([^ '/']* as authority) { { scheme = "file"; authority; path = "/" } }
| _* as path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we need this case for?

Copy link
Collaborator Author

@tatchi tatchi Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To cover the case when the string is not empty and doesn't start with //. That's the equivalent of 7861780#diff-4e2de9533ce99abc9d651d3c346c050c2e1923251c6ad49528deb91519d8748fL48

Also, if I remove that line I get the following error:

File "lsp/src/uri_lexer.mll", line 49, character 0: syntax error.

Do I misunderstand something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I think you got it. Can you split it into two cases though? One with the / prefix and one without the /. Should remove the needed for any helper functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever. Amazing how this of_path function is now way simpler! 🤩

@tatchi tatchi force-pushed the improve-uri-parsing branch from 7861780 to 68a43ae Compare July 3, 2022 08:37
@tatchi
Copy link
Collaborator Author

tatchi commented Jul 3, 2022

I should have addressed all the comments. I still have to write a change entry and eventually add some tests.

Copy link
Member

@rgrinberg rgrinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

rgrinberg added a commit to rgrinberg/opam-repository that referenced this pull request Jul 7, 2022
CHANGES:

## Fixes

- Fix a bad interaction between inferred interfaces and promotion code actions
  in watch mode (ocaml/ocaml-lsp#753)

- Fix URI parsing (ocaml/ocaml-lsp#739 fixes ocaml/ocaml-lsp#471 and ocaml/ocaml-lsp#459)
@tatchi tatchi closed this Jul 7, 2022
@tatchi
Copy link
Collaborator Author

tatchi commented Jul 7, 2022

Closing it as it was committed in 3ea0bb3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants