-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix inconsistency in reporting device local memory between type and heap on macOS. #789
Conversation
_memoryProperties.memoryHeaps[1].flags = 0; | ||
_memoryProperties.memoryHeaps[1].flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT; | ||
_memoryProperties.memoryTypes[2].heapIndex = 1; // Shared memory in the shared heap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should actually clear the VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT
from the memory type corresponding to MTLStorageModeShared
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought along those lines at first...but then I realized that it's shared memory we're talking about. It is device local as far as both Vulkan and Metal are concerned.
From an Apple doc on the matter:
In both [unified and discrete] memory models, a resource with a MTLStorageModeShared mode resides in system memory accessible to both the CPU and the GPU.
BTW...what was the original problem that the second heap was trying to solve? The original code seems to be trying to say that shared memory is not accessible to the GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vkd3d
checks how many heaps there are, and if at least one is not host-visible, to see if the device is a UMA device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was always of the understanding that MTLStorageModeShared
was equivalent to, for example, cudaMallocHost
- memory that is device visible, but still on the host. That would suggest that it shouldn't be marked device local in Vulkan parlance, as that suggests a certain level of performance that device visible but host local memory doesn't have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if at least one is not host-visible
Do you mean not GPU-visible? How do you test for the host-visibility of a heap?
At least one heap must include
VK_MEMORY_HEAP_DEVICE_LOCAL_BIT
inVkMemoryHeap::flags
. If there are multiple heaps that all have similar performance characteristics, they may all includeVK_MEMORY_HEAP_DEVICE_LOCAL_BIT
. In a unified memory architecture (UMA) system there is often only a single memory heap which is considered to be equally “local” to the host and to the device, and such an implementation must advertise the heap as device-local.
So a UMA device must have it's single heap listed as device-local...implying (to me at least) that the device local flags don't imply special fast device-only memory...just memory that the GPU can access.
However...the spec also states that...
Device memory is memory that is visible to the device
...which would imply that ALL heaps and memory types (which are collectively discussed under the Device Memory banner) are assumed to be device-visible...implying also that the requirement for device-local flags is almost redundant in the UMA case above.
...[sigh]...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...[sigh]...
I've posted a question to the Vulkan Working Group about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this PR...and the logic should be right now.
Because this PR has been sitting open for a while, there is a bunch of other semi-related stuff in it now...but, if you are reviewing, the areas of interest for the memory logic are:
MVKPhysicalDevice::initMemoryProperties()
, which I've made much more procedural, organized, and traceable.- The value of
MVK_VK_MEMORY_TYPE_METAL_SHARED
no longer includesVK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT
orVK_MEMORY_PROPERTY_HOST_CACHED_BIT
...butVK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT
is added dynamically for unified memory. MVKPhysicalDevice::getHasUnifiedMemory()
adds a bit of additional logic to test for the low-power GPU and device count in the case theMTLDevice
does not respond tohasUnifiedMemory
.
… macOS. Remove VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT and VK_MEMORY_PROPERTY_HOST_CACHED_BIT from MVK_VK_MEMORY_TYPE_METAL_SHARED. Refactor MVKPhysicalDevice::initMemoryProperties() to configure memory types and heaps in a more organized, procedural, and traceable manner. Enhance MVKPhysicalDevice::getHasUnifiedMemory() detection of unified memory. Simplify MVKPhysicalDevice::getVRAMSize(). On macOS MVKImage::validateUseTexelBuffer() tests the underlying device memory directly to determine if it has coherent memory and cannot support texel buffer. Add MVKInstance::getPhysicalDeviceCount(). Add mvkClear() template function for clearing structs and replace use of memset(*, 0, sz) with mvkClear(). Refactor mvkCopyStruct() and rename to mvkCopy().
Update What's New document.
Change MVKPhysicalDeviceMetalFeatures::dynamicMTLBuffers from boolean to uint dynamicMTLBufferSize limit and guard dynamic buffer usage sizes against it. Update VK_MVK_MOLTENVK_SPEC_VERSION to 23.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to say this before you merged, but it totally slipped my mind.
return [_mtlDevice respondsToSelector: @selector(hasUnifiedMemory)] && _mtlDevice.hasUnifiedMemory; | ||
return (([_mtlDevice respondsToSelector: @selector(hasUnifiedMemory)] && _mtlDevice.hasUnifiedMemory) | ||
|| _mtlDevice.isLowPower | ||
|| getInstance()->getPhysicalDeviceCount() == 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't that erroneously include Macs with one discrete card and no integrated graphics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't that erroneously include Macs with one discrete card and no integrated graphics?
Hmmm...you might be right. It was intended to cover the case where there was only integrated graphics which might not be listed as low-power (I figured a Mac with only one integrated GPU might not list that one GPU as low-power...and I don't have a machine to test that on).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured a Mac with only one integrated GPU might not list that one GPU as low-power
Hmmm...looks like my assumption was wrong. From Apple's docs on isLowPower
...
This property returns YES for integrated GPUs and NO for discrete GPUs.
Pretty straightforward. I'll remove the single-GPU test...but leave the isLowPower
test...on the assumption that if hasUnifiedMemory
is not available...then isLowPower
is a valid indication of integrated, and therefore unified, memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove the single-GPU test...but leave the isLowPower test...on the assumption that if hasUnifiedMemory is not available...then isLowPower is a valid indication of integrated, and therefore unified, memory.
Implemented in PR #795
Fixes issue #786.