-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please provide API and intrinsics to unmanaged memory operations (volatile and atomic operations) to be on parity with managed memory operations #4209
Comments
💡 You have many cases of a parameter |
True, thanks for the notice. It was a quick and dirty API example based on the existing managed API style which use ref address as argument. |
Thanks for the suggestion. Just tagging some folks who may be in a better position to comment. @ericeil who I think did some of our other atomic operations? @BruceForstall from the codegen team. @terrajobst API review? |
Unique advantage of managed pointers (
Thus the existing volatile and atomic operations defined on managed pointers should work fine for your scenario. The code is not as straightforward as it could be with unsafe helper method - the unsafe constructs are not straightforward in C# and .NET Core libraries on purpose, to encourage writing safe code. BTW: Take a look at the proposed C# features Ref Returns and Locals and Array Slicing. If they materialize, they should allow writing more of the code that avoids copies and operate on unmanaged memory in safe C#. For the other part of your proposal - wrappers for CLFLUSH, CLFLUSHOPT, CLWB, PCOMMIT instructions: They feel pretty specialized, with unclear path for portability to non-Intel processors. I think they should be independent NuGet package initially. Would you like to start one? These processor instructions do not look particularly cheap and so it should not be required for them to be JIT intrinsics, to start with at least. |
Thanks for your help and the suggestion. I have written a small example in C# and looks working. It looks like the original issue comes from F# where the two types the The other proposal was only an additional suggestion in the case of an api review. I'll check it out later how it could be implemented. Is there any way to add intrinsics functionality as extensible plugins to the current JIT? If I remember correctly the SIMD functionality works as a JIT extension and/or plugin. I would like to replace some (own) class method calls with single instructions. For example it would like to use a single PAUSE instructions as intrinsics in a tight loop in low latency environment (use case: high performance inter-thread messaging). It could be implemented as pinvoke or executable memory with delegate [1] but it comes with huge amount overhead compared to a single instruction. The other option is to search for the instruction usage in CoreCLR source and try to use it if possible. CoreCLR has the following define in the standalone gc sample in #pragma intrinsic(_mm_pause)
#define YieldProcessor _mm_pause I know my request is probably not a generic use case, but I would like to explore the available options. |
The unmanaged pointer to managed pointer conversions are unsafe casts. They should be close to no-op in JITed code, assuming optimizations work as expected. The SIMD APIs are exposed via independent Nuget package, but the SIMD code generation is built into the JIT. It is not a pluggable extension. The JIT and CLR treat the SIMD Nuget package in a special way. Customizing code generation in the JIT is a tricky subject. It would make some of inner workings of the JIT public, and limit what kind of changes can be done in the JIT in future. I think that the customizing code generation may be better candidate for AOT compiler, where the additional abstractions to expose the inner workings are less of a concern. cc @CarolEidt |
As most of the SIMD and HW Intrinsics are already implemented (including PDEP/PEXT!) in CoreCLR, would you please reconsider also supporting PAUSE intrinsics and the following explicit cache control intrinsics CLFLUSH, CLFLUSHOPT, CLWB, CLZERO(?), SFENCE (PCOMMIT was deprecated by Intel)? Please note, persistent memory devices are already available in the market, and this is no longer Intel only, all of them (with one exception CLWB) AMD also supported in Zen microarchitecture (https://support.amd.com/TechDocs/24594.pdf in page 260, 138, 141, 143 ). The PAUSE instruction target is high performance / low latency intra-thread communication, lot's of energy could be saved with a small latency increase if these communicating tight loops could use a single PAUSE instruction instead of busy spinning. The CLFLUSH, CLFLUSHOPT, CLWB, CLZERO(?), SFENCE target would be the persistent memory support from user space. eg.: https://qconsf.com/sf2017/system/files/presentation-slides/rethink_nvm.pdf cc @fiigii |
Closing this. We've already exposed APIs that take As mentioned by Jan above, taking a pointer and creating a |
Please provide volatile and atomic (interlocked including increment/decrement/add) operation intrinsics not only to managed memory but also for unmanaged memory (eg.: memory mapped files, GCHandle.Alloc*, external library/API provided memory region, etc.).
It is important to provide not only the API but also intrinsics, the JIT engine should replace these calls with single CPU instructions.
Today when a high performance GPU devices, RDMA capable network devices, NVME capable devices and zero copy APIs are available to everybody the .NET virtual machine should provide support for these high performance abstractions. These abstractions (hardware and/or libraries) are provided with user space accessible buffers and command packet queues ring buffer) and variables to notification (doorbell). Sometimes the doorbell variable is just a volatile field in the command packet.
As far as I know the usual implementation look like this: the high performance device and/or library provide one (or more) preallocated and mapped memory region to the user space application and also provide one (or more) command queue (ring buffer) and variables (doorbell) for notification. The application write the data directly (zero copy) to the provided buffer, create a command queue packet and write to the doorbell variable (could be a field in the new command packet).
High performance applications also use similar zero copy abstractions (using memory mapped files as buffers and command queues and variables) for communication between thread and/or processes.
However accessing these high performance abstractions from .NET is currently limited because of the limited unmanaged memory operations support. Java also provide these unmanaged memory operations using Unsafe which may become official later.
Volatile Read:
Volatile Write:
Atomic Operations:
It would be good to have some kind of typed intristics API to volatile read/write and atomic operations including increment/decrement/add. (The "typed" native pointer
IntPtr<'T>
idea is based on the F#NativePtr<'T>
)API Example:
UnmanagedVolatile class (Read operations):
UnmanagedVolatile class (Write operations):
UnmanagedInterlocked class:
However these operations require proper memory alignment, which may require some extended (aligned) allocation operations from .NET side too. Currently there is no proper way to aligned allocation in .NET which will stays aligned even if the GC try to move these data around. Some more or less usable hacks are available but I am afraid these are far from a proper solution. [1] [2] [3]. Please note proper SIMD support will have similar memory alignment requirement.
Please also consider additional memory operation intrinsics in order to support persistent memory architectures which will require consistent memory state. The following new instructions are available to support persistent memory on x86: CLFLUSH, CLFLUSHOPT, CLWB, PCOMMIT [1].
Update: added some syntax highlighting because the original format does not showed the typed
IntPtr<'T>
. Added some note about memory alignment requirement.Best Regards,
Zoltan
[1] https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
[2] https://stackoverflow.com/questions/1951290/memory-alignment-of-classes-in-c
[3] https://stackoverflow.com/questions/13413323/allocate-memory-with-16-byte-alignment
[4] https://stackoverflow.com/questions/10239659/is-there-a-way-to-new-a-net-object-aligned-to-64-bytes
The text was updated successfully, but these errors were encountered: