Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline cache API and implementation for Vulkan #5319

Merged
merged 33 commits into from
May 16, 2024

Conversation

DJMcNab
Copy link
Contributor

@DJMcNab DJMcNab commented Feb 28, 2024

Connections

Description
Adds a PipelineCache resource to wgpu's API, and wires it down to Vulkan vkCreatePipelineCache.
This cache is added as a parameter to ComputePipelineDescriptor and RenderPipelineDescriptor.
This parameter is used as expected on Vulkan, and ignored on DirectX/Metal/WebGPU/gles.

Testing
Testing has been performed on a Pixel 6, which has validated that this works as expected between runs.
This is in the branch in my fork of Vello.
This does only test the cache for Vulkan pipelining.

Checklist

  • Run cargo fmt.
  • Run cargo clippy. If applicable, add:
    • --target wasm32-unknown-unknown
    • --target wasm32-unknown-emscripten - gets the same errors on main, so I assume it's fine
  • Run cargo xtask test to run tests - I get the same failures on main, so I assume it's fine
  • Add change to CHANGELOG.md. See simple instructions inside file.

Status
This PR is now fully ready for review

Future Work
I am planning on implementing additional validation inside wgpu that the cache is suitable, as discussed in https://medium.com/@zeuxcg/creating-a-robust-pipeline-cache-with-vulkan-961d09416cda. That is mainly about storing the driver version and file size inline in the file. This should be done before this feature is released, but could be in a different pull request. Please advise if you have any preference here.

This could also be extended to different platforms/APIs - see e.g. gfx-rs/gfx#2877
I am not intending to do this work myself

@DJMcNab
Copy link
Contributor Author

DJMcNab commented Mar 1, 2024

cc @expenses, as you had a previous investigation in gfx-rs/gfx#3716

Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! The code itself is fine, keep it up like this, and the documentation is dreamy.

The api has a few issues at multiple layers. At points wgpu-hal is expected to do validation, at other points the end user is just expected to know how to pass the correct thing. I think we should unify the api as follows.

wgpu-hal

The wgpu-hal layer should just expect that the blob passed to it is correct for that platform and driver combination. It shouldn't do any additional validation. All of its internal code can expect that the cache key is honored.

It may append additional data structures to the blob which contain shader intermediate states (such as HLSL generated or DXBC/DXIL).

There is a method on the Adapter which returns a backend-specific string of bytes to serve as a "cache key". This includes the wgpu-hal version.

wgpu-core

Core does all of the validation that the key is honored and the file was not corrupted.

It adds metadata to the top of the blob that stores the magic number, file size, file hash, and pipeline key.

If this errors, it will create a new empty cache.

wgpu

Passes through the core apis.

User

The user will pull the pipeline key from the adapter, hash it, and use it as a filename to store the data.

Does this all make sense?

wgpu-core/src/device/resource.rs Outdated Show resolved Hide resolved
wgpu-core/src/pipeline.rs Show resolved Hide resolved
wgpu-hal/src/vulkan/device.rs Outdated Show resolved Hide resolved
wgpu-types/src/lib.rs Outdated Show resolved Hide resolved
@DJMcNab
Copy link
Contributor Author

DJMcNab commented Mar 14, 2024

Thanks for the review @cwfitzgerald

In terms of validation, my intent was to follow up with that, because I wasn't sure I was doing the right thing. So I can definitely add that in here.
So I can definitely add that validation in this PR, since the rest of the shape is approximately right.
One thing I need guidance on is how to hash the data in wgpu_core. Are you aware of a preferred library for this? It feels like the sort of thing which is quite opinionated

I just want some clarification about what you mean by:

The user will pull the pipeline key from the adapter, hash it, and use it as a filename to store the data.

In my prototype integration, I wasn't hashing the filename. I find it unlikely that users would do so, either.
Should we pre-hash the value from the cache key method?

@DJMcNab
Copy link
Contributor Author

DJMcNab commented Mar 14, 2024

It shouldn't do any additional validation

This is also slightly troubling, as different APIs have different validation needs. For example, Vulkan probably needs to validate the driver version is the same, whereas the concept of a driver version isn't exposed at the higher levels

@DJMcNab DJMcNab force-pushed the pipeline-cache-api-vulkan branch from 397326f to d256e9c Compare March 14, 2024 09:19
@xorgy
Copy link

xorgy commented Mar 14, 2024

For example, Vulkan probably needs to validate the driver version is the same, whereas the concept of a driver version isn't exposed at the higher levels

That's interesting... are there Vk drivers in the wild that accept incompatible pipeline caches from applications, or is this more preemptive (due to the quality issues with some drivers)?

@DJMcNab
Copy link
Contributor Author

DJMcNab commented Mar 14, 2024

According to the linked article, there have been cases in the past where a driver either:

  1. Doesn't validate the UUID which is meant to detect this at all
  2. Fails to update the UUID when it would be incompatible

@cwfitzgerald cwfitzgerald self-requested a review March 14, 2024 17:00
@DJMcNab DJMcNab marked this pull request as draft March 14, 2024 17:50
@cwfitzgerald
Copy link
Member

This is also slightly troubling, as different APIs have different validation needs. For example, Vulkan probably needs to validate the driver version is the same, whereas the concept of a driver version isn't exposed at the higher levels

The idea is that the wgpu-hal decides on its pipeline key based on its validation needs, and wgpu-core is the one who does the comparison and rejection of the pipeline key. This way the driver version will be part of the wgpu-hal pipeline key.

In my prototype integration, I wasn't hashing the filename. I find it unlikely that users would do so, either.
Should we pre-hash the value from the cache key method?

If we make cache key an opaque type we can put a into_bytes and hash method on it.

@DJMcNab
Copy link
Contributor Author

DJMcNab commented Mar 14, 2024

The key factor which needs to be accounted for is:

  1. If the driver updates, the old cache you still have is useless.

I'm planning on implementing that with effectively two keys, a "device" key (which identifies the device/backend combination) and a further validation key. The device key should related to the "file name" being used, whereas the validation key is just used to bail from using the cache. That is potentially what you are describing, but I think it's good to clarify that

@DJMcNab DJMcNab marked this pull request as ready for review March 18, 2024 10:56
@DJMcNab DJMcNab force-pushed the pipeline-cache-api-vulkan branch from e973ce8 to 7626051 Compare March 18, 2024 10:56
@DJMcNab
Copy link
Contributor Author

DJMcNab commented Mar 18, 2024

This is now ready for follow-up review. I am going to now test this updated version in Vello, and also need some guidance on computing the hash (it currently uses a hardcoded value).

I'm unlikely to make further functional changes before another review

@DJMcNab
Copy link
Contributor Author

DJMcNab commented Apr 5, 2024

Having done some more thinking on this, I have decided to exclude the hashing of the cache data from this PR. The main reasons for this are:

  • I believe it to be unlikely that filesystems would corrupt the data in a way which keeps the same length, but changes some of the content
  • Any attempt at meddling in the file by other processes/application is already declared out of scope by our safety comment
  • We do not already have a dependency which would compute a suitable hash. We could likely produce a hash designed for a hashmap, but these have significantly different requirements to our hashing need.

I also don't see the necessity in making the pipeline cache key an opaque set of bytes, rather than a readable string.
These files should be in a location which would be rarely accessed by end-users.
Additionally, if using these as a file name, an application would have to sanitise the bytes to be reasonable for use as a file name. I suspect at least some applications would not do so, and potentially lose some of portability which wgpu provides.
But I also think it's plausible that I've missed some important reason for this to be an incorrect assessment.

Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is absolutely fabulous work!

I think the setup for caching is perfectly reasonable, and the core/hal relationship is exactly what i wanted to see!

Have some comments. I also would like to see an execution test that actually uses the pipeline cache. Maybe make a cache, make a pipeline, serialize the cache, re-parse it to a new cache, and make an identical pipeline. Just to make sure everything actually works and there aren't any api level issues.

examples/src/mipmap/mod.rs Show resolved Hide resolved
wgpu-core/src/pipeline_cache.rs Show resolved Hide resolved
wgpu/src/lib.rs Outdated Show resolved Hide resolved
wgpu-hal/src/vulkan/device.rs Outdated Show resolved Hide resolved
wgpu/src/lib.rs Outdated Show resolved Hide resolved
@DJMcNab DJMcNab force-pushed the pipeline-cache-api-vulkan branch from 53f7f1e to 84e1616 Compare April 19, 2024 09:23
@DJMcNab
Copy link
Contributor Author

DJMcNab commented Apr 19, 2024

Re-requesting review pending CI

Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking so long to get back to you on this - we're basically there, just some small nits at this point.

tests/tests/pipeline_cache.rs Outdated Show resolved Hide resolved
tests/tests/pipeline_cache.rs Outdated Show resolved Hide resolved
tests/tests/pipeline_cache.rs Outdated Show resolved Hide resolved
tests/tests/pipeline_cache.rs Outdated Show resolved Hide resolved
tests/tests/pipeline_cache.rs Outdated Show resolved Hide resolved
tests/tests/pipeline_cache.rs Outdated Show resolved Hide resolved
wgpu-hal/src/vulkan/device.rs Outdated Show resolved Hide resolved
wgpu-types/src/lib.rs Show resolved Hide resolved
@DJMcNab DJMcNab force-pushed the pipeline-cache-api-vulkan branch from d3d4fb9 to 20906e9 Compare May 9, 2024 08:45
@DJMcNab DJMcNab requested a review from cwfitzgerald May 14, 2024 08:08
Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need your throughs on where the pipeline cache should go, then we're good to land.

@DJMcNab DJMcNab requested a review from cwfitzgerald May 16, 2024 11:53
@cwfitzgerald
Copy link
Member

Looks to be an issue with the ash update

@cwfitzgerald
Copy link
Member

I'll fix this up and get it in

@cwfitzgerald cwfitzgerald enabled auto-merge (squash) May 16, 2024 13:28
@cwfitzgerald cwfitzgerald merged commit 4902e47 into gfx-rs:trunk May 16, 2024
25 checks passed
@DJMcNab DJMcNab deleted the pipeline-cache-api-vulkan branch May 16, 2024 13:54
jimblandy pushed a commit to jimblandy/wgpu that referenced this pull request May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pipeline Caching (between program executions)
3 participants