Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashcode implementation proposal for DataClassificationSet #4933

Merged
merged 8 commits into from
Feb 15, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,16 @@ public DataClassificationSet Union(DataClassificationSet other)
/// Gets a hash code for the current object instance.
/// </summary>
/// <returns>The hash code value.</returns>
public override int GetHashCode() => _classifications.GetHashCode();
public override int GetHashCode()
damianhorna marked this conversation as resolved.
Show resolved Hide resolved
{
int hash = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please make this ode conditional of #if NETFRAMEWORK and then add separate code that uses this API on .NET Core?

https://learn.microsoft.com/en-us/dotnet/api/system.hashcode.add?view=net-8.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if maybe we should compute the hash code just once and store it in a field of the DataClassificationSet struct? I worry about the cost of repeated enumeration, and the fact enumeration of a HashSet is not guarantee to return the sequence in the same order every time. Realistically, since the set is not mutated once initialized, it's highly likely that enumeration will always be consistent. But who knows, maybe somebody will change HashSet in some clever way in the future which breaks this.

Copy link
Contributor Author

@damianhorna damianhorna Feb 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for feedback!

About first point - makes perfect sense, I'll rewrite to use HashCode.Add.

About the second point - the fact that HashSet is not guaranteed to return the sequence in the same order should not be much of a problem as long as we use commutative operations to calculate the hashcode (such as XOR) - correct? With those, we should be able to calculate the hashcode in order-independent way.

For performance reasons it would definitely make sense to compute the hashcode just once, since it is used as a key in a dictionary. For that we could consider either lazy hashcode calculation in the GetHashCode method or calculate it during object initialization (in the constructor). Because of thread-safety concerns, I would prefer to calculate it in constructor.

I will send a commit that addresses this - please let me know what you think.

foreach (var item in _classifications)
{
hash ^= item.GetHashCode();
}

return hash;
}

/// <summary>
/// Compares an object with the current instance to see if they contain the same classifications.
Expand Down
3 changes: 2 additions & 1 deletion test/Generators/Microsoft.Gen.Logging/Generated/Utils.cs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

using System;
using System.Collections.Generic;
using Microsoft.Extensions.Compliance.Classification;
using Microsoft.Extensions.Compliance.Redaction;
using Microsoft.Extensions.Compliance.Testing;
using Microsoft.Extensions.DependencyInjection;
Expand Down Expand Up @@ -82,7 +83,7 @@ public static TestLogger GetLogger()
{
builder.SetRedactor<PlusRedactor>(new PublicDataAttribute().Classification);
builder.SetRedactor<MinusRedactor>(new PrivateDataAttribute().Classification);
builder.SetRedactor<HashRedactor>(new PrivateDataAttribute().Classification, new PublicDataAttribute().Classification);
builder.SetRedactor<HashRedactor>(new DataClassificationSet(new PrivateDataAttribute().Classification, new PublicDataAttribute().Classification));
builder.SetFallbackRedactor<StarRedactor>();
});

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,22 @@ public static void Basic()
Assert.False(dc1.Equals(null));
#pragma warning restore CA1508 // Avoid dead conditional code
}

[Fact]
public static void TestHashCodes()
{
var dc1 = new DataClassificationSet(FakeTaxonomy.PublicData);
var dc2 = new DataClassificationSet(new[] { FakeTaxonomy.PublicData });
var dc3 = new DataClassificationSet(new List<DataClassification> { FakeTaxonomy.PublicData });
var dc4 = (DataClassificationSet)FakeTaxonomy.PublicData;
var dc5 = DataClassificationSet.FromDataClassification(FakeTaxonomy.PublicData);

Assert.Equal(dc1.GetHashCode(), dc2.GetHashCode());
Assert.Equal(dc1.GetHashCode(), dc3.GetHashCode());
Assert.Equal(dc1.GetHashCode(), dc4.GetHashCode());
Assert.Equal(dc1.GetHashCode(), dc5.GetHashCode());

var dc6 = dc1.Union(FakeTaxonomy.PrivateData);
Assert.NotEqual(dc1, dc6);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

using System;
using Microsoft.Extensions.Compliance.Classification;
using Microsoft.Extensions.DependencyInjection;
using Xunit;

namespace Microsoft.Extensions.Compliance.Redaction.Test;
Expand Down Expand Up @@ -46,6 +47,46 @@ public void RedactorProvider_Returns_Redactor_For_Data_Classifications()
Assert.Equal(typeof(ErasingRedactor), r3.GetType());
}

[Fact]
public void RedactorProvider_Returns_Same_Redactor_For_Logically_Same_Data_Classification()
{
var dc1 = new DataClassificationSet(new DataClassification("DummyTaxonomy", "Classification"));
var dc2 = new DataClassificationSet(new DataClassification("DummyTaxonomy", "Classification2"));
var dc3 = new DataClassificationSet(new DataClassification("DummyTaxonomy", "Classification3"));
var dc4 = new DataClassificationSet(new DataClassification("DummyTaxonomy", "Classification4"));
var dc5 = new DataClassificationSet(new DataClassification("DummyTaxonomy", "Classification5"));
var dc6 = new DataClassificationSet(new DataClassification("DummyTaxonomy", "Classification6"));
var dc7 = new DataClassificationSet(new DataClassification("DummyTaxonomy", "Classification7"));
var dc8 = new DataClassificationSet(new DataClassification("DummyTaxonomy", "Classification8"));

var dc9 = new DataClassification("DummyTaxonomy", "Classification9");

var dc1LogicalCopy = new DataClassificationSet(new[] { new DataClassification("DummyTaxonomy", "Classification") });

var redactorProvider = new ServiceCollection()
.AddRedaction(redaction =>
{
redaction.SetRedactor<NullRedactor>(dc1);
redaction.SetRedactor<NullRedactor>(dc2);
redaction.SetRedactor<NullRedactor>(dc3);
redaction.SetRedactor<NullRedactor>(dc4);
redaction.SetRedactor<NullRedactor>(dc5);
redaction.SetRedactor<NullRedactor>(dc6);
redaction.SetRedactor<NullRedactor>(dc7);
redaction.SetRedactor<NullRedactor>(dc8);
})
.BuildServiceProvider()
.GetRequiredService<IRedactorProvider>();

var r1 = redactorProvider.GetRedactor(dc1);
var r2 = redactorProvider.GetRedactor(dc1LogicalCopy);
var r3 = redactorProvider.GetRedactor(dc9);

Assert.Equal(typeof(NullRedactor), r1.GetType());
Assert.Equal(typeof(NullRedactor), r2.GetType());
Assert.Equal(typeof(ErasingRedactor), r3.GetType());
}

[Fact]
public void RedactorProvider_Throws_On_Ctor_When_Options_Come_As_Null()
{
Expand Down