Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inconsistency in reporting device local memory between type and heap on macOS. #789

Merged
merged 6 commits into from
Dec 16, 2019

Conversation

billhollings
Copy link
Contributor

Fixes issue #786.

Comment on lines 1847 to 1848
_memoryProperties.memoryHeaps[1].flags = 0;
_memoryProperties.memoryHeaps[1].flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT;
_memoryProperties.memoryTypes[2].heapIndex = 1; // Shared memory in the shared heap
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should actually clear the VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT from the memory type corresponding to MTLStorageModeShared.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdavis5e

I thought along those lines at first...but then I realized that it's shared memory we're talking about. It is device local as far as both Vulkan and Metal are concerned.

From an Apple doc on the matter:

In both [unified and discrete] memory models, a resource with a MTLStorageModeShared mode resides in system memory accessible to both the CPU and the GPU.

BTW...what was the original problem that the second heap was trying to solve? The original code seems to be trying to say that shared memory is not accessible to the GPU.

Copy link
Collaborator

@cdavis5e cdavis5e Dec 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vkd3d checks how many heaps there are, and if at least one is not host-visible, to see if the device is a UMA device.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was always of the understanding that MTLStorageModeShared was equivalent to, for example, cudaMallocHost - memory that is device visible, but still on the host. That would suggest that it shouldn't be marked device local in Vulkan parlance, as that suggests a certain level of performance that device visible but host local memory doesn't have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdavis5e

if at least one is not host-visible

Do you mean not GPU-visible? How do you test for the host-visibility of a heap?

@mbarriault

At least one heap must include VK_MEMORY_HEAP_DEVICE_LOCAL_BIT in VkMemoryHeap::flags. If there are multiple heaps that all have similar performance characteristics, they may all include VK_MEMORY_HEAP_DEVICE_LOCAL_BIT. In a unified memory architecture (UMA) system there is often only a single memory heap which is considered to be equally “local” to the host and to the device, and such an implementation must advertise the heap as device-local.

So a UMA device must have it's single heap listed as device-local...implying (to me at least) that the device local flags don't imply special fast device-only memory...just memory that the GPU can access.

However...the spec also states that...

Device memory is memory that is visible to the device 

...which would imply that ALL heaps and memory types (which are collectively discussed under the Device Memory banner) are assumed to be device-visible...implying also that the requirement for device-local flags is almost redundant in the UMA case above.

...[sigh]...

Copy link
Contributor Author

@billhollings billhollings Dec 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...[sigh]...

I've posted a question to the Vulkan Working Group about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdavis5e @mbarriault

I've updated this PR...and the logic should be right now.

Because this PR has been sitting open for a while, there is a bunch of other semi-related stuff in it now...but, if you are reviewing, the areas of interest for the memory logic are:

  • MVKPhysicalDevice::initMemoryProperties(), which I've made much more procedural, organized, and traceable.
  • The value of MVK_VK_MEMORY_TYPE_METAL_SHARED no longer includes VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT or VK_MEMORY_PROPERTY_HOST_CACHED_BIT...but VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT is added dynamically for unified memory.
  • MVKPhysicalDevice::getHasUnifiedMemory() adds a bit of additional logic to test for the low-power GPU and device count in the case the MTLDevice does not respond to hasUnifiedMemory.

… macOS.

Remove VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT and VK_MEMORY_PROPERTY_HOST_CACHED_BIT
from MVK_VK_MEMORY_TYPE_METAL_SHARED.
Refactor MVKPhysicalDevice::initMemoryProperties() to configure memory
types and heaps in a more organized, procedural, and traceable manner.
Enhance MVKPhysicalDevice::getHasUnifiedMemory() detection of unified memory.
Simplify MVKPhysicalDevice::getVRAMSize().
On macOS MVKImage::validateUseTexelBuffer() tests the underlying device memory
directly to determine if it has coherent memory and cannot support texel buffer.
Add MVKInstance::getPhysicalDeviceCount().
Add mvkClear() template function for clearing structs and replace use of
memset(*, 0, sz) with mvkClear().
Refactor mvkCopyStruct() and rename to mvkCopy().
Update What's New document.
Change MVKPhysicalDeviceMetalFeatures::dynamicMTLBuffers from boolean to uint
dynamicMTLBufferSize limit and guard dynamic buffer usage sizes against it.
Update VK_MVK_MOLTENVK_SPEC_VERSION to 23.
@billhollings billhollings merged commit ef8575f into KhronosGroup:master Dec 16, 2019
Copy link
Collaborator

@cdavis5e cdavis5e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to say this before you merged, but it totally slipped my mind.

return [_mtlDevice respondsToSelector: @selector(hasUnifiedMemory)] && _mtlDevice.hasUnifiedMemory;
return (([_mtlDevice respondsToSelector: @selector(hasUnifiedMemory)] && _mtlDevice.hasUnifiedMemory)
|| _mtlDevice.isLowPower
|| getInstance()->getPhysicalDeviceCount() == 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't that erroneously include Macs with one discrete card and no integrated graphics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't that erroneously include Macs with one discrete card and no integrated graphics?

Hmmm...you might be right. It was intended to cover the case where there was only integrated graphics which might not be listed as low-power (I figured a Mac with only one integrated GPU might not list that one GPU as low-power...and I don't have a machine to test that on).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured a Mac with only one integrated GPU might not list that one GPU as low-power

Hmmm...looks like my assumption was wrong. From Apple's docs on isLowPower...

This property returns YES for integrated GPUs and NO for discrete GPUs.

Pretty straightforward. I'll remove the single-GPU test...but leave the isLowPower test...on the assumption that if hasUnifiedMemory is not available...then isLowPower is a valid indication of integrated, and therefore unified, memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove the single-GPU test...but leave the isLowPower test...on the assumption that if hasUnifiedMemory is not available...then isLowPower is a valid indication of integrated, and therefore unified, memory.

Implemented in PR #795

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants