-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Uri] Uri(String, UriKind) throws IndexOutOfRangeException for certain inputs #1487
Comments
@davkean none of these throw anything (including UFE) on Desktop for me..? Clearly this isa bug though as they throw IOORE on Core.
@karelz is this the right label for Uri? Please consider whether this is a 2.0 issue. |
@Priya91 fyi |
My guess: these are not mainstream enough cases to churn Uri further for 2.0. |
It's a good label. @Priya91 can you please help with this one? |
@davkean These are invalid URIs? Will look into this. |
@danmosemsft Are you testing under 4.5+? Uri was changed in 4.5. I just verified and they all repro for me under desktop. @Priya91 TBH, I've not done the legwork to figure out if they are invalid according to spec - they were generated for me by Pex. I don't need them for 2.0. |
@davkean ah, no I used csc out of the v4.0 folder... |
Yeah, you'll go down the 4.0 path. |
Does not look like 2.0 bug, moving to Future for further consideration. |
It looks like all of these are failing because they contain the Unicode LTR control character. Historically we have stripped control characters out of URIs, and that behavior is causing the issue here. We calculate the index of the end of the path and then strip out the control character, shortening the string and causing the index to fall beyond the end of the string. The fix here is non-trivial, as the change will have to be made to some fairly complex unsafe code. I'll keep the issue assigned to me but don't have an ETA on this. |
As of .NET Core 3.0 Preview 8 these new Uri("/\\//", UriKind.Absolute);
new Uri("\\/\u200e", UriKind.Absolute);
new Uri("\\\\\\\\\\", UriKind.Absolute); are still throwing IndexOutOfRangeException. So, we still have a bug to fix. |
As it was mentioned before, the problem is caused by Unicode bidi chars (characters that say that the string should be printed right-to-left, or with right alignment, etc.). These chars are removed from the authority part of a uri during the parsing. At first I thought that the presence of bidi chars always breaks the parsing, but now I found out it's not the case. Here are examples of uri with bidi chars that are parsed perfectly fine with our current code:
And here are the examples where current parsing breaks with IOORE
These all seem to be some corner cases where host ends up being empty AND when scheme is "file". So we can't fix the issue by simply adjusting the internal parsing index on bidi chars removal. It will actually break all the cases in the first list - last symbols of the host will start to appear as if they were first symbols of the query. @MihaZupan sees this as one of the problems stemming from the complexity in Uri around re-creating internal uri string representation when non ascii characters are present. The logic is sprinkled around all over the place and clearly hard to reason about. So we think it might be better to deal with this by extracting out the re-writing logic from the shared parsing path. However, this is much broader change than just this particular bug. |
We're relying on
Uri
in the http://github.com/dotnet/project-system, to represent paths. Testing our path handling code, I found that Uri throws IndexOutOfRangeException (instead of UriFormatException) in the following situations:The text was updated successfully, but these errors were encountered: