-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Uuid v5 and v7 implementations into System.Guid #88290
Comments
Please note that SQL Server needs a different ordering for unique identifier. Their sequential GUIDs work better then UUIDv7. For example: https://github.com/nhibernate/nhibernate-core/blob/master/src/NHibernate/Id/GuidCombGenerator.cs A quick benchmark is 98% fragmentation for UUIDv7 vs 44% for sequential GUID for 100k inserted records. |
Hi Robbie, yes, but that's SQL server specific that's why I didn't mention, yet still in proposal v7 hopefully can land into an official format. |
The container proposed in this API Proposal could have been an excellent data type for storing Uuid v5 and v7, as well as any other versions, but it was closed. The alternative proposed option, in my opinion, does not solve the problems described here. |
Tagging subscribers to this area: @dotnet/area-system-runtime Issue DetailsBackground and motivationWith the increase of distributed storage, and extensive use of uuid in databases, uuid v4 insert times grow linearly making inefficient its use for primary keys. API Proposalnamespace System.Collections.Generic;
public static Guid NewUuid5()
{
...
}
public static Guid NewUuid7()
{
...
}
API UsageGuid uuid5 = Guid.NewUuid5();
Guid uuid7 = Guid.NewUuid7(); Alternative DesignsNo response RisksNo response
|
UUID v5 and v7 have to do with how new values are constructed. It has little to do with the storage mechanism of the underlying bits. Once a correct sequence of bits has been generated, it can then be serialized as required by the consumer. For some scenarios, this will be
It should probably be called
And as per the existing
There will probably be some discussion as to whether they should be called Exposing Exposing There may likewise need to be discussion on if the methods need any parameterization.
Also noting, with regards to the parameterization, the UUID spec covers:
|
Does this affect what it means to serialize as big vs little endian? The six bytes comprising that value won't be reversed on their own and will thus change where they are in the resulting byte order. |
The actual definition of
And the general layout description in the spec has been simplified to:
Thus, the latest draft spec defaults to The So we'd serialize it correctly if the user specifies
|
I will quote the proposal of the new uuid versions to answer. So is not me who says that, is the reason to have a pseudo sequential generate random IDs.
|
If you are using UUIDs for purpose of DB indexing, I would recommend https://github.com/mareek/UUIDNext. It uses v7 for most DBs and a custom v8 low endian format for MSSQL. The only thing missing is Guid type support for formatting LE UUIDs, which we will hopefully get soon.
Am I missing something? The proposal (with bigEndian boolean) completely solves the issue, it allow us to add APIs mentioned in the original post like |
I'm not concerned about whether we'd roundtrip it well; we could output any order we wanted and as long as we used the same order for deserialization it would roundtrip :) My question is really:
Or asked another way, do the new APIs we've approved make sense to use with Guids created from the APIs proposed in this issue? |
Yes. For That is, the There are then two well-defined layouts:
It would not make sense for us to define a new format where we have some |
As for the performance of database index, just inserting does not give the full picture. Here is a test harness designed specifically for this problem: |
UUIDv7 is a very novel concept, all currently existing 1 implementations use OSF DCE-compatible layout (i.e. Big Endian), and specifically for Note that RFC-4122 and recent IETF drafts only describe First of all, {00000000-0000-0000-C000-000000000046} // IUnknown
{00000001-0000-0000-C000-000000000046} // IClassFactory
{00000112-0000-0000-C000-000000000046} // IOleObject
{0000031b-0000-0000-C000-000000000046} // CLSID_ErrorObject Note that the {B196B28F-BAB4-101A-B69C-00AA00341D07} // IClassFactory2
{99FCFEC4-5260-101B-BBCB-00AA0021347A} // IID_IObjectExporter
{4D9F4AB8-7D1C-11CF-861E-0020AF6E7C57} // IID_IActivation Note that the {4193A62E-F128-410D-9746-E57B5485903D} // Windows.UI.Internal.Input.IMouseCapture
{C0EFA91A-EEB7-41C7-97FA-F0ED645EFB24} // MsRDP.MsRDP.10
{214A1A2F-232B-4FEF-93B7-2F7D9AF053AD} // produced by System.Guid.NewGuid() Note that the So, considering the existence of such "MSFT flavor", I think the input of Windows dev team is desirable here. If Footnotes
|
The internal field layout used by a given runtime (for it's .NET's internal field layout is, due to historical reasons, compatible with the Microsoft typedef struct _GUID {
unsigned long Data1;
unsigned short Data2;
unsigned short Data3;
unsigned char Data4[8];
} GUID; This definition has not fundamentally changed since its original introduction and its not going to change in the future either, because doing so would be massively breaking to the entire Windows ecosystem. Since these are just regular fields, the byte order observed via unsafe code matches the endianness of the host system. Since Windows does not currently support any big endian systems, the layout in practice appears to users as always being "little endian" and discussions often simplify it down to this, even if its not "technically correct". This also means that the layout of these bytes is "functionally fixed". Given a The
This proposal is simply asking for additional functions which would generate
The general discussion was being somewhat simplified as it basically boils down to, in practice, that Windows uses "little endian" and many other (but not all) systems use "big endian". In both cases, this is regardless of the variant/version. Ultimately, users who need to serialize a There is no need to discuss hypothetical alternative layouts which are extremely unlikely to be encountered in practice and the APIs being exposed are sufficient for what is effectively the two de-facto layouts users will actually encounter. |
I'm interested specifically in |
Endianness makes no difference as to the value. Just as the 32-bit unsigned integer The same is true that |
In the proposal: public static Guid NewUuid5() |
Yes, you're right, it is namespace based - so the signature would be different. |
According to this there is no standard way to CompareTo\order a Guid: |
That’s referring to the across all GUID types. The .NET System.Guid type has long had a standardized sorting behavior and matches how I detailed it above. |
The new UUIDs specification was published a few days ago with RFC 9562: https://datatracker.ietf.org/doc/rfc9562/ Any chance this could now be prioritized? |
Looking forward to this as well! In the meantime, my praise goes to @tannergooding for his thorough and insightful posts in this thread. |
I expect this won't happen in .NET 9, it's a bit late in the cycle with about 1-2 months before the time we normally snap for RC1. There's a lot of other work across the stack that is already the focus and a priority, but this can potentially be included for .NET 10. In order for that to happen (or to even attempt getting it in for .NET 9 if there's community members willing to do the work), the proposed surface needs to be updated to account for some of the feedback given here. Namely we should ensure that signatures exist that make sense given the existing APIs on For example, |
I have created a library for Ulid (890 stars) that has similar characteristics to UUIDv7. While its characteristics are desirable, the fact that it is a custom type (struct Ulid) has caused significant frustration because the .NET ecosystem and databases require GUIDs. |
I opened #103658 which covers this for UUIDv7 and opens the path to also do it for UUIDv5 in the future if that's desired |
Background and motivation
With the increase of distributed storage, and extensive use of uuid in databases, uuid v4 insert times grow linearly making inefficient its use for primary keys.
New versions are proven to have less impact on indexes than v4.
A first benchmark with Uuid7 in my 8 core machine with docker/postgres 15 shows:
inserting 1 million uuid v4 with EFCore takes 8.5 seconds with an empty table, and 16 seconds with 12750000 rows (table is only id)
inserting 1 million uuid v7 with EFCore takes 8.5 seconds with an empty table, and 9 seconds with 12750000 rows (table is only id)
API Proposal
API Usage
Alternative Designs
No response
Risks
No response
The text was updated successfully, but these errors were encountered: