Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sets and maps with hash collision resistance #33

Open
AArnott opened this issue Oct 28, 2024 · 11 comments
Open

Sets and maps with hash collision resistance #33

AArnott opened this issue Oct 28, 2024 · 11 comments
Milestone

Comments

@AArnott
Copy link
Contributor

AArnott commented Oct 28, 2024

The means that these type shapes offer to construct a map or a set do not offer a way to provide a collision resistant hash function (in the form of a custom IEqualityComparer<TKey>. This is important when deserializing untrusted data.

@eiriktsarpalis
Copy link
Owner

eiriktsarpalis commented Oct 28, 2024

Do you have any suggestions? The shapes APIs abstract away concerns such as whether a collection is a hash table or a sorted collection or non-generic so it couldn't accept IEqualityComparer<TKey> in all cases. In my mind there are a few approaches to be pursued that don't require changing any abstractions:

  1. Rely on a collection type having built-in mitigations in the way that S.C.G.Dictionary or S.C.G.HashSet do.
  2. Use a specialized collection converter for types that are known to be susceptible to hash collision attacks, e.g.if (typeof(TCollection) == typeof(FooSet<TElement>)) return HashResilientFooSetConverter<TElement>(...);
  3. Buffer all entries in an intermediate set/dictionary that is collision resilient as they are being read before copying them to the actual deserialized value. It should result in more copying but the intermediate collections can be pooled. It also has the benefit of controlling duplicate handling rather than outsourcing that responsibility to the resultant type.

@AArnott
Copy link
Contributor Author

AArnott commented Oct 28, 2024

Rely on a collection type having built-in mitigations in the way that S.C.G.Dictionary or S.C.G.HashSet do.

They don't though. That's partly why MessagePack-CSharp has had 2 security advisories on hash collision non-resistance. These collections rely on object.GetHashCode() or its overrides, most or all of which are not collision resistant by default.

Buffer all entries in an intermediate set/dictionary that is collision resilient as they are being read before copying them to the actual deserialized value.

I don't see how that would help. If an intermediate collection has collision resistance, it'll be fine. But once you copy its contents into a non-collision resistant collection, the attack will have the same impact as if the intermediate collection were not used.

Do you have any suggestions?

I believe all (built-in) collections that use hash tables (e.g. HashSet<T>, Dictionary<T> and even Hashtable) take an IEqualityComparer<T> as a parameter to some of their constructor overloads. The immutable collections too (not by way of constructors, but they have APIs to set them). So as I imagine you have code to support constructing each or all of these already, can we have them select those constructors and pass in EqualityComparer<T>.Default by default but allow the ITypeShape caller to provide a custom one?

Along these lines too, some collections take a 'hint' as to their eventual size. For msgpack, we always know the final size up front, so it would be nice if a List<T> or any other such collection could be created taking that size into account for better deserialization perf.
So maybe (just brainstorming) a struct with int Size and IEqualityComparer<TKey> fields, can be fed into your delegate, which then uses (or drops) those values based on the collection type and whether it accepts such data?

@AArnott
Copy link
Contributor Author

AArnott commented Oct 28, 2024

Even given the API support, one challenge here is that not all key types will have a collision resistant implementation of IEqualityComparer<T> available to them. For MessagePack-CSharp we wrote a bunch of code to support as many by default as possible, and still in v3 when we turned on UntrustedData as the default mode it breaks folks.

@AArnott
Copy link
Contributor Author

AArnott commented Oct 28, 2024

not all key types will have a collision resistant implementation of IEqualityComparer<T> available to them

Actually, your TypeShape library might help us here. Anything we can effectively serialize can be securely hashed. And obviously if we are deserializing every key, we can also hash the serialized representation for low cost.

We'd need to special case a few types though, and allow the user to add to these, because the above risks violating the contract that equality implies a hash code match. There are data types (including a few primitives like double) which have two or more binary representations for what is considered an equal value. In our secure hash functions in MessagePack we have those we know about special cased to normalize those values before hashing.

@AArnott
Copy link
Contributor Author

AArnott commented Nov 5, 2024

@eiriktsarpalis An easier goal to reach in the short-term is to allow a program to defend itself against hash collision attacks.

Consider this data type:

[GenerateShape]
public partial class HashCollisionResistance : IEquatable<HashCollisionResistance>
{
    public HashCollisionResistance()
    {
        this.Dictionary = new Dictionary<string, string>(SipHashEqualityComparer<string>.Default);
    }

    public Dictionary<string, string> Dictionary { get; }
}

This class creates its own dictionary instance, pre-initialized with a hash-collision resistant equality comparer. The expectation then is that the deserializer will merely populate the collection rather than try to create the collection itself. In fact the lack of a property setter on the dictionary property enforces this.

In my first test of this technique with TypeShape, it failed to deserialize anything into that collection. I'm guessing this isn't TypeShape's fault, but rather the fault of my deserialization layer, so I'll look into that.

@AArnott
Copy link
Contributor Author

AArnott commented Nov 5, 2024

I was able to get it to work. AArnott/Nerdbank.MessagePack#29 adds support for read-only collection properties, thus enabling hash collision attack mitigation within user code.

This isn't a great end story, since users are very likely to not be aware of this requirement on them, or miss one place in their data types where they forget to apply the mitigation. So we should keep this issue active and look for a way to do this by default like MessagePack-CSharp does.

@AArnott
Copy link
Contributor Author

AArnott commented Nov 14, 2024

From #52 (comment)

Seems the best approach is to hardcode mitigations in converters for known hash table implementations like Dictionary, HashSet, etc.

That could work. I'd likely use reflection to search for a constructor that takes an IEqualityComparer<TKey> on the collection type though, which would likely fail with trimming. I could 'lock in' the known collection types with strong references and use the reflection approach as a fallback for other collection types, but if your source generator did this and created a delegate for it, it would guarantee to not be trimmed.

@eiriktsarpalis
Copy link
Owner

I'd likely use reflection to search for a constructor that takes an IEqualityComparer on the collection type though, which would likely fail with trimming.

The source generator could possibly help in that regard, e.g. by emitting [DynamicallyAccessedMembers] annotations for the type. It would probably negatively impact application size though.

In terms of what built-in support would look like, my expectation is that we would have to augment all collection construction delegates with an optional IEqualityComparer<T> parameter. This would be a no-op for non hashtable collections but would otherwise map to relevant constructor parameters (or static factory methods for the immutable collections). We'd need to probably also expose a flag on the shape types indicating whether the collection uses a hashtable. Special care should be taken for non-generic collections (specifically HashTable) which accept a non-generic IEqualityComparer: shapes for non-generic collections surface the key type as object, so the IEqualityComparer<object> must be adapted to IEqualityComparer.

We'd also need to adapt to whatever design C# uses for dictionary expressions. Support for custom comparers is still an open question over there.

@AArnott
Copy link
Contributor Author

AArnott commented Nov 14, 2024

The source generator could possibly help in that regard, e.g. by emitting [DynamicallyAccessedMembers] annotations for the type. It would probably negatively impact application size though.

I don't understand why this isn't a feasible strategy:

 private global::PolyType.Abstractions.ITypeShape<global::System.Collections.Generic.Dictionary<string, int>> Create_Dictionary_String_Int32()
 {
     return new global::PolyType.SourceGenModel.SourceGenDictionaryTypeShape<global::System.Collections.Generic.Dictionary<string, int>, string, int>
     {
         KeyType = String,
         ValueType = Int32,
         GetDictionaryFunc = static obj => obj,
         ConstructionStrategy = global::PolyType.Abstractions.CollectionConstructionStrategy.Mutable,
         DefaultConstructorFunc = static () => new global::System.Collections.Generic.Dictionary<string, int>(),
+        KeyedConstructorFunc = static (IEqualityComparer<string> eq) => new global::System.Collections.Generic.Dictionary<string, int>(eq),
         AddKeyValuePairFunc = static (ref global::System.Collections.Generic.Dictionary<string, int> dict, global::System.Collections.Generic.KeyValuePair<string, int> kvp) => dict[kvp.Key] = kvp.Value,
         EnumerableConstructorFunc = null,
         SpanConstructorFunc = null,
         Provider = this,
     };
 }

It wouldn't require reflection at runtime, or limiting what the linker can do. It could be left as null when such a constructor cannot be detected.

@eiriktsarpalis
Copy link
Owner

What about ImmutableDictionary/Set or FrozenDictionary/Set? These accept an equality comparer from factory methods so they need a separate type of constructor delegate.

@eiriktsarpalis
Copy link
Owner

Today we have three construction strategies for collections: mutable collections, collections with an IEnumerable ctor/factory, or collections with a span ctor/factory. All three would need to support IEqualityComparer separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants