Optimize Uri.GetHashCode #32713

MihaZupan · 2020-02-23T14:23:21Z

Use text.GetHashCode(StringComparison.OrdinalIgnoreCase) instead of text.ToLowerInvariant().GetHashCode().

For relative Uris also cache the computed hash like we do for absolute Uris. This makes dictionary lookups on relative Uris perform 2-5000 times faster depending on length (field access instead of string allocation and hash computation).

For absolute Uris, cache the RemoteUrl we just computed instead of discarding it.

stephentoub · 2020-02-23T15:47:50Z

Use text.GetHashCode(StringComparison.OrdinalIgnoreCase) instead of text.ToLowerInvariant().GetHashCode().

Does any form of Uri equality check use invariant culture? If yes, this could cause a break, as instances that are considered equal may no longer map to the same hash code.

stephentoub · 2020-02-23T15:48:12Z

cc: @tarekgh, @GrabYourPitchforks

MihaZupan · 2020-02-23T15:49:44Z

Does any form of Uri equality check use invariant culture?

No, it uses Ordinal/OrdinalIgnoreCase

stephentoub · 2020-02-23T15:49:49Z

For relative Uris also cache the computed hash like we do for absolute Uris.

It appears this was being special-cased very explicitly. Do we know why?

MihaZupan · 2020-02-23T15:56:16Z

It appears this was being special-cased very explicitly. Do we know why?

For relative Uris, the equality comparison is an ordinal comparison of the entire string, so GetHashCode need not do any more processing and it is enough to only compute the case-insensitive hash of the string.

For AbsoluteUri, the equality check ultimately comes down to (sometimes case-insensitive) ordinal equality of a normalized representation of everything but the fragment and UserInfo. That is what the call to GetParts(UriComponents.HttpRequestUrl, UriFormat.SafeUnescaped) is fetching.

davidsh · 2020-02-23T15:59:12Z

If yes, this could cause a break

Changing the hash codes of existing Uri's is a 'breaking change'. We know that we declined to do things like this in .NET Framework between versions. Many of our customers have databases with Uri's in them. So, we should be careful here.

MihaZupan · 2020-02-23T16:07:06Z

In fact, a relative Uri can never be equal to an absolute one, and they are matched using an ordinal, case-sensitive comparison, so there is no need for us to compute a non-case-sensitive hash for GetHashCode (for relative Uris).

MihaZupan · 2020-02-23T16:23:48Z

Changing the hash codes of existing Uri's is a 'breaking change'. We know that we declined to do things like this in .NET Framework between versions. Many of our customers have databases with Uri's in them. So, we should be careful here.

As the hash computation is handed over to string hashing, that has already been broken when the change to randomized string hashing was introduced.

Also, GetHashCode documentation has very clear warnings against such use.

scalablecory · 2020-02-23T20:44:28Z

As the hash computation is handed over to string hashing, that has already been broken when the change to randomized string hashing was introduced.

Perhaps @GrabYourPitchforks can help us understand what we learned from doing that. Maybe we are happy with it, or maybe we would rethink doing it again. At the very least we could better understand what sort of response we would get from our users.

GrabYourPitchforks · 2020-02-23T20:58:15Z

As a reminder, I explicitly did not change this code path as part of the related work in #31968 because someString.ToLowerInvariant().GetHashCode() and someString.GetHashCode(StringComparison.OrdinalIgnoreCase) are behaviorally different.

Now, it's possible that this code path should've been using GetHashCode(OrdinalIgnoreCase) all along. But that would be a bug fix (and I wasn't qualified to make a judgment call on whether the original URI behavior was incorrect), not a perf improvement.

GrabYourPitchforks · 2020-02-23T21:39:25Z

To expand a little bit on what I said above, the only required relationship between object.Equals and object.GetHashCode is as follows.

Object.Equals(a, b) ⇒ a.GetHashCode() == a.GetHashCode()
StringComparer.FromComparison(comparison).Equals(a, b) ⇒ a.GetHashCode(comparison) == b.GetHashCode(comparison)

And the contrapositives:

a.GetHashCode() != b.GetHashCode() ⇒ !Object.Equals(a, b)
a.GetHashCode(comparison) != b.GetHashCode(comparison) ⇒ !StringComparer.FromComparison(comparison).Equals(a, b)

If you have a call to someString.ToLowerInvariant().GetHashCode(), we would expect to see it paired with the expression stringA.ToLowerInvariant() == stringB.ToLowerInvariant() at some point. Similarly, if you have a call to someString.GetHashCode(StringComparison.OrdinalIgnoreCase), we would expect to see it paired with the expression string.Equals(stringA, stringB, StringComparison.OrdinalIgnoreCase) at some point.

Importantly, these two expressions do not have the same behavior. Converting two strings to lowercase and checking them for equality is not the same as checking the two strings for equality using an OrdinalIgnoreCase comparison.

For example depending on operating system and environmental configuration:

"ς".ToLowerInvariant() == "σ".ToLowerInvariant(); // currently always FALSE across all supported OSes
string.Equals("ς", "σ", StringComparison.OrdinalIgnoreCase); // may return TRUE depending on OS

MihaZupan · 2020-02-23T23:49:06Z

Importantly, these two expressions do not have the same behavior. Converting two strings to lowercase and checking them for equality is not the same as checking the two strings for equality using an OrdinalIgnoreCase comparison.

Now, it's possible that this code path should've been using GetHashCode(OrdinalIgnoreCase) all along. But that would be a bug fix (and I wasn't qualified to make a judgment call on whether the original URI behavior was incorrect), not a perf improvement.

The current GetHashCode/Equals does not honor that. It uses ToLowerInvariant for GetHashCode and Ordinal(IgnoreCase) for equals. So that makes this change a bug-fix as well.

MihaZupan · 2020-02-23T23:54:02Z

The failing test assumes that GetHashCode is always ignoring casing, which is no longer true after this change. I will update the test to only expect that for absolute Uris that are Unc or Dos paths.

GrabYourPitchforks · 2020-02-24T18:52:42Z

src/libraries/System.Private.Uri/src/System/Uri.cs

-            if ((object?)info.MoreInfo == null)
-            {
-                info.MoreInfo = new MoreInfo();
+                MoreInfo moreInfo = (_info ??= new UriInfo()).MoreInfo ??= new MoreInfo();


In this code path, just return OriginalString.GetHashCode() directly. There's no need to complicate the code path by caching the result. string.GetHashCode() is already fairly efficient, and if we need to optimize it even further by introducing caching we should make the improvements directly in string itself, not here.

There is a dedicated field on the MoreInfo for caching this hash, it just wasn't being stored for relative Uris before.

Having a cached hash can also be used as another fast-path optimization in Equals.

There's no need to complicate the code path by caching the result. string.GetHashCode() is already fairly efficient

Should we also not cache it for AbsoluteUris? If so, we can remove that field to save a pointer size on the allocation by trading GetHashCode/Equals perf.

There is a dedicated field on the MoreInfo for caching this hash

I'd guess that this caching mechanism wasn't intended to work around any perceived slowness in string.GetHashCode as much as it was to work around the possible allocation in the below code path.

runtime/src/libraries/System.Private.Uri/src/System/Uri.cs

Lines 4946 to 4949 in 106eebd

internal static int CalculateCaseInsensitiveHashCode(string text)

{

return text.ToLowerInvariant().GetHashCode();

}

For absolute URIs there's still the possibility for allocation in the hash code computation routine, right?

For absolute URIs there's still the possibility for allocation in the hash code computation routine, right?

Yes, but only once. GetHashCode and Equals calls will cache the value in MoreInfo.RemoteUrl if/when they compute it.

I'd guess that this caching mechanism wasn't intended to work around any perceived slowness in string.GetHashCode as much as it was to work around the possible allocation in the below code path.

I don't know why it was not cached for the relative uri path, I assume it's because Info and MoreInfo are not used for anything by relative Uris otherwise.

I removed the caching of the hash.

src/libraries/System.Runtime/tests/System/Uri.MethodsTests.cs

GrabYourPitchforks · 2020-02-24T19:04:43Z

src/libraries/System.Private.Uri/src/System/Uri.cs

-                    chkString = GetParts(UriComponents.HttpRequestUrl, UriFormat.SafeUnescaped);
-                tempHash = CalculateCaseInsensitiveHashCode(chkString);
-                if (tempHash == 0)
+                MoreInfo moreInfo = EnsureUriInfo().MoreInfo ??= new MoreInfo();


@stephentoub - how far down the rabbit hole do we want to go here with regard to null coalescing syntax? This is a clever application but I admit it might affect maintainability of code since it involves null coalescing plus a possible assignment to a field on an object (not this) that's returned as part of a function call plus a dereference of that field, all in the same line.

I tried to not take it too far :)

With #33042, it becomes MoreInfo moreInfo = EnsureUriInfo().MoreInfo;

GrabYourPitchforks · 2020-02-24T19:07:38Z

I don't have concerns about using randomized hash code calculations here. The updated call sites look fairly straightforward. The NCL team should double-check the comparers used to make sure they're appropriate for the specific scenarios. (Thanks @MihaZupan for finding the discrepancy!)

MihaZupan · 2020-02-24T19:09:56Z

I will likely be opening a PR focused on the Equals ~~tomorrow~~ soon

Hash codes being equal doesn't mean that the values are equal

Optimize Uri.GetHashCode

6108fd7

MihaZupan added the area-System.Net label Feb 23, 2020

MihaZupan added this to the 5.0 milestone Feb 23, 2020

MihaZupan requested review from stephentoub and a team February 23, 2020 14:23

Don't use case-insensitive string hashing when not necessary

80c1951

Don't expect Uri.GetHashCode to always be case-insensitive in tests

4f15c66

GrabYourPitchforks requested changes Feb 24, 2020

View reviewed changes

MihaZupan added 3 commits February 24, 2020 22:52

Correct Uri.Equals test

eca5fd3

Hash codes being equal doesn't mean that the values are equal

Do not cache Uri GetHashCode

684854d

Simplify Uri.Equals test

450be78

MihaZupan requested a review from GrabYourPitchforks March 11, 2020 19:18

GrabYourPitchforks approved these changes Apr 28, 2020

View reviewed changes

MihaZupan merged commit b4b2c46 into dotnet:master Apr 29, 2020

MihaZupan mentioned this pull request May 4, 2020

Make Uri Thread-Safe #33042

Merged

ghost locked as resolved and limited conversation to collaborators Dec 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Uri.GetHashCode #32713

Optimize Uri.GetHashCode #32713

MihaZupan commented Feb 23, 2020

stephentoub commented Feb 23, 2020

stephentoub commented Feb 23, 2020

MihaZupan commented Feb 23, 2020

stephentoub commented Feb 23, 2020

MihaZupan commented Feb 23, 2020 •

edited

Loading

davidsh commented Feb 23, 2020

MihaZupan commented Feb 23, 2020 •

edited

Loading

MihaZupan commented Feb 23, 2020

scalablecory commented Feb 23, 2020

GrabYourPitchforks commented Feb 23, 2020

GrabYourPitchforks commented Feb 23, 2020

MihaZupan commented Feb 23, 2020 •

edited

Loading

MihaZupan commented Feb 23, 2020

GrabYourPitchforks Feb 24, 2020

MihaZupan Feb 24, 2020

GrabYourPitchforks Feb 24, 2020

MihaZupan Feb 24, 2020

MihaZupan Feb 24, 2020

MihaZupan Mar 2, 2020

GrabYourPitchforks Feb 24, 2020

MihaZupan Feb 24, 2020

MihaZupan Mar 11, 2020

GrabYourPitchforks commented Feb 24, 2020

MihaZupan commented Feb 24, 2020 •

edited

Loading

	internal static int CalculateCaseInsensitiveHashCode(string text)
	{
	return text.ToLowerInvariant().GetHashCode();
	}

Optimize Uri.GetHashCode #32713

Optimize Uri.GetHashCode #32713

Conversation

MihaZupan commented Feb 23, 2020

stephentoub commented Feb 23, 2020

stephentoub commented Feb 23, 2020

MihaZupan commented Feb 23, 2020

stephentoub commented Feb 23, 2020

MihaZupan commented Feb 23, 2020 • edited Loading

davidsh commented Feb 23, 2020

MihaZupan commented Feb 23, 2020 • edited Loading

MihaZupan commented Feb 23, 2020

scalablecory commented Feb 23, 2020

GrabYourPitchforks commented Feb 23, 2020

GrabYourPitchforks commented Feb 23, 2020

MihaZupan commented Feb 23, 2020 • edited Loading

MihaZupan commented Feb 23, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GrabYourPitchforks commented Feb 24, 2020

MihaZupan commented Feb 24, 2020 • edited Loading

MihaZupan commented Feb 23, 2020 •

edited

Loading

MihaZupan commented Feb 23, 2020 •

edited

Loading

MihaZupan commented Feb 23, 2020 •

edited

Loading

MihaZupan commented Feb 24, 2020 •

edited

Loading