Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous Pipline Creation using CPU shaders #485

Open
DJMcNab opened this issue Mar 1, 2024 · 1 comment
Open

Asynchronous Pipline Creation using CPU shaders #485

DJMcNab opened this issue Mar 1, 2024 · 1 comment
Labels
O-Windows Applicable specifically to the Windows target

Comments

@DJMcNab
Copy link
Member

DJMcNab commented Mar 1, 2024

This is part of my investigation into startup time on Android (#gpu > Android Startup Time Investigation).

We should create the compute pipelines asynchronously, by using the CPU shaders whilst the GPU pipelines are being created (except perhaps for fine).
This is especially important for first-run performance, when pipeline caches built in to drivers won't have been filled yet. On my Google Pixel 6, creating pipelines with a cold cache takes approximately 1.7 seconds. This does not give a good user experience for the first run of the app1.
This has been somewhat mitigated by #455, as prior to that this took more than 4 seconds.

This 1.7 seconds currently blocks app startup, but using the CPU shaders instead means renderer creation takes 140ms instead.
Note that this does have an impact on frame latency - my measurements suggest that each frame of Tiger takes 30ms when using the CPU shaders vs <10ms with the GPU shaders. So overall, this approach should be expected to save ~ $1700-140-20=1540$ milliseconds, i.e. about 1.5 seconds on time to first frame on first run. This is the vast majority of the current time to first frame.

This is also applicable to desktop use cases2, but is not the motivating example, because the startup time is shorter, even with a cold cache.

Footnotes

  1. Note that on current main there is no pipeline caching on Android. This is blocked on Pipeline cache API and implementation for Vulkan gfx-rs/wgpu#5319

  2. MESA_SHADER_CACHE_DISABLE=1 cargo run -p with_winit --release can be used with Mesa to test the time without caches - it takes ~200ms with 14 threads, versus ~5ms with the cache on my machine.

@DJMcNab
Copy link
Member Author

DJMcNab commented Dec 11, 2024

This is actually even more important on DirectX, because doing on-device shader compilation is extremely slow with that platform.

The workgroup zeroing work (#575) largely resolved the issues on Android (at least sufficiently to proceed).

@DJMcNab DJMcNab added O-Windows Applicable specifically to the Windows target and removed O-Android Applicable specifically to the Android target labels Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-Windows Applicable specifically to the Windows target
Projects
None yet
Development

No branches or pull requests

2 participants