Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Uri] Uri(String, UriKind) throws IndexOutOfRangeException for certain inputs #1487

Open
davkean opened this issue Jun 19, 2017 · 12 comments
Open

Comments

@davkean
Copy link
Member

davkean commented Jun 19, 2017

We're relying on Uri in the http://github.com/dotnet/project-system, to represent paths. Testing our path handling code, I found that Uri throws IndexOutOfRangeException (instead of UriFormatException) in the following situations:

new Uri("/\\‎//", UriKind.Absolute);
new Uri("\\/\u200e", UriKind.Absolute);
new Uri("/\\\\\r", UriKind.Absolute);
new Uri("\\\\\\\\\\‎", UriKind.Absolute);
@danmoseley
Copy link
Member

@davkean none of these throw anything (including UFE) on Desktop for me..?

Clearly this isa bug though as they throw IOORE on Core.

Unhandled Exception: System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at System.Uri.CreateUriInfo(Flags cF)
   at System.Uri.EnsureUriInfo()
   at System.Uri.ParseRemaining()
   at System.Uri.EnsureParseRemaining()
   at System.Uri.InitializeUri(ParsingError err, UriKind uriKind, UriFormatException& e)
   at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
   at System.Uri..ctor(String uriString, UriKind uriKind)
   at ConsoleApp1.Program.Main(String[] args) in C:\dotnet\1\Program.cs:line 13

@karelz is this the right label for Uri? Please consider whether this is a 2.0 issue.

@danmoseley
Copy link
Member

@Priya91 fyi

@danmoseley
Copy link
Member

My guess: these are not mainstream enough cases to churn Uri further for 2.0.

@karelz
Copy link
Member

karelz commented Jun 19, 2017

It's a good label. @Priya91 can you please help with this one?

@Priya91
Copy link
Contributor

Priya91 commented Jun 19, 2017

@davkean These are invalid URIs? Will look into this.

@davkean
Copy link
Member Author

davkean commented Jun 19, 2017

@danmosemsft Are you testing under 4.5+? Uri was changed in 4.5. I just verified and they all repro for me under desktop.

@Priya91 TBH, I've not done the legwork to figure out if they are invalid according to spec - they were generated for me by Pex. I don't need them for 2.0.

@danmoseley
Copy link
Member

@davkean ah, no I used csc out of the v4.0 folder...

@davkean
Copy link
Member Author

davkean commented Jun 19, 2017

Yeah, you'll go down the 4.0 path.

@karelz
Copy link
Member

karelz commented Jun 22, 2017

Does not look like 2.0 bug, moving to Future for further consideration.

@rmkerr
Copy link
Contributor

rmkerr commented May 2, 2018

It looks like all of these are failing because they contain the Unicode LTR control character. Historically we have stripped control characters out of URIs, and that behavior is causing the issue here. We calculate the index of the end of the path and then strip out the control character, shortening the string and causing the index to fall beyond the end of the string.

The fix here is non-trivial, as the change will have to be made to some fairly complex unsafe code. I'll keep the issue assigned to me but don't have an ETA on this.

@rmkerr rmkerr changed the title Uri(String, UriKind) throws IndexOutOfRangeException for certain inputs [Uri] Uri(String, UriKind) throws IndexOutOfRangeException for certain inputs May 2, 2018
@rmkerr rmkerr self-assigned this May 2, 2018
@davidsh
Copy link
Contributor

davidsh commented Aug 28, 2019

As of .NET Core 3.0 Preview 8 these

new Uri("/\\‎//", UriKind.Absolute);
new Uri("\\/\u200e", UriKind.Absolute);
new Uri("\\\\\\\\\\‎", UriKind.Absolute);

are still throwing IndexOutOfRangeException. So, we still have a bug to fix.

@karelz karelz transferred this issue from dotnet/corefx Jan 9, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Net untriaged New issue has not been triaged by the area owner labels Jan 9, 2020
@karelz karelz added bug and removed untriaged New issue has not been triaged by the area owner labels Jan 9, 2020
@karelz karelz added this to the 5.0 milestone Jan 9, 2020
@karelz karelz modified the milestones: 5.0, Future Mar 31, 2020
@CarnaViire CarnaViire self-assigned this Jul 15, 2020
@CarnaViire
Copy link
Member

As it was mentioned before, the problem is caused by Unicode bidi chars (characters that say that the string should be printed right-to-left, or with right alignment, etc.). These chars are removed from the authority part of a uri during the parsing.

At first I thought that the presence of bidi chars always breaks the parsing, but now I found out it's not the case. Here are examples of uri with bidi chars that are parsed perfectly fine with our current code:

"scheme://user\u200Einfo@ho\u200Est:1234/pa\u200Eth?que\u200Ery#fragment"
"scheme://\u200E"
"scheme://ab\u200E"
"scheme://ab\u200Ecd"
"scheme://\u200E//"
"scheme://\u200E//ab"
"\\\\ab\u200E"
"\\\\\u200Ecd"
"\\\\ab\u200E//"
"\\\\ab\u200E//ab"

And here are the examples where current parsing breaks with IOORE

"\\\\\u200E"
"\\\\\u200E:1234"
"\\\\\u200E//"
"\\\\\u200E//ab"

These all seem to be some corner cases where host ends up being empty AND when scheme is "file".

So we can't fix the issue by simply adjusting the internal parsing index on bidi chars removal. It will actually break all the cases in the first list - last symbols of the host will start to appear as if they were first symbols of the query.

@MihaZupan sees this as one of the problems stemming from the complexity in Uri around re-creating internal uri string representation when non ascii characters are present. The logic is sprinkled around all over the place and clearly hard to reason about. So we think it might be better to deal with this by extracting out the re-writing logic from the shared parsing path. However, this is much broader change than just this particular bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants