-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vulkan phase2] Vulkan Runtime #6924
Conversation
Error on failed string termination
Add HALIDE_VERSION compiler definitions to compilation
C:\build_bot\worker\halide-testbranch-main-llvm16-x86-64-windows-cmake\halide-source\src\CodeGen_Vulkan_Dev.cpp(1082): error C2398: Element '3': conversion from 'SpvBuiltIn_' to '_Ty' requires a narrowing conversion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks very nice; my comments are mostly nits but a few actionable items. My main concern here is that there are a lot of TODO comments (and also some if 0
unimplemented sections) in Codegen_Dev, and it's not clear to me which/how-many of them are "need to complete before this will work" vs "this should be improved someday but isn't blocking a functional initial release" -- clarifying those would IMHO make this easier for other folks to read and grasp the current state.
Oooh, thanks! I'll fix this! |
Thanks! Okay, cool! Yes, the CodeGen_Dev is in the process of being refactored to use the SPIRV_IR classes and to address most of the TODO comments. Once this is in place, I'll update the docs and status for what's working, etc! |
Monday Morning Review Ping -- where does this PR stand? |
Apologies … I’m on vacation but will be back later this week! Most of the requested changes are pushed … remaining items are on the codegen, and docs! |
Added splitmix64 based hashing scheme for types and constants. Numerous fixes to instruction packing. Added debug symbols to all variables.
Hmmm. I'm not sure how this might be added as a feature ... we'd need to know that you intend to pass a buffer of 32bit integers with the intent of indexing it as two float16s? And we'd have to know at CodeGen time, which would imply that you'd need some set of feature flags to trigger this. Feel free to add additional suggestions and use cases to the ticket I created: |
Cleanup constant value declarations with template helper methods Add comments on workgroup size usage
Add nearest_multiple constraint to block/region allocator
Add nearest_multiple constrating to vulkan memory allocatori + fixes correctness/multiple_outputs test Add vkCreateBuffer/vkDestroyBuffer debug output i + for gpu_object_lifetime_tracker Cleanup shutdown for shader_module destruction
Fix incorrect lowering for intrinsic.
Changes applied. Ready to merge
@steven-johnson I don't have write access to do the merge ... could you approve this PR when you get a chance? |
Sure, but did the other reviewers approve it? It still says "needs 1 approving review" |
Nope. Nobody that has made comments has hit the approval button, even though all comments were addressed so far. |
@rootjalex @shoaibkamil One of you, please approve this so we can land it :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the feedback!
…ide#7347) In PyBind11, if you specify a default argument for a method, it is evaluated when the Python module is initialized, *not* when the method is called (as you might expect in C++). For defaults that are just constants/literals, this is no big deal, but when calling get_*_target_from_environment, this means it is called at module init time -- also normally not a big deal (since the values ~never change at runtime anyway), with one big exception (no pun intended): if the function throws an exception (e.g. via calling user_assert() or similar), that exception is thrown at Module-initialization time, which is a much more inscrutable crash, and one that is very hard to recover from. This may seem unlikely, but can happen pretty easily if you set (say) HL_JIT_TARGET=host-cuda (or other gpu) and the given GPU runtime isn't present on the given system; the current behavior is basically "make if impossible for the libHalidePython bindings to run", whereas what we want is "runtime exception thrown when you call the method". This changes the relevant methods to use `Target()` as the default value, and inside the method wrapper, if the value passed equals `Target()`, it replaces the value with the righ `get_*_target_from_environment()` call. (This turned up while doing some testing of halide#6924 on a system without Vulkan available)
* Import Vulkan runtime changes from personal branch * Fix build to work with latest changes in main * Hookup Vulkan into Target, DeviceInterface and OffloadGPULoops * Add Vulkan runtime to Makefile * Add Vulkan target to Python bindings * Add runtime linker support to target Vulkan CodeGen * Add Vulkan windows decorator to runtime targets * Wrap debug messages for internal runtime classes with DEBUG_INTERNAL Error on failed string termination * Silence clang-tidy warnings for redundant expressions on Vulkan enum values * Clang tidy & format pass * Fix formatting for single line statements * Move Vulkan option to top-level CMakeLists.txt and enable SPIR-V as needed * Fix Vulkan & SPIRV dependencies for makefile * Add Halide version info to Makefile Add HALIDE_VERSION compiler definitions to compilation * Add HL_VERSION_FLAGS to RUNTIME_CXX_FLAGS * Finish refactoring of Vulkan CodeGen to use SpirV-IR. Added splitmix64 based hashing scheme for types and constants. Numerous fixes to instruction packing. Added debug symbols to all variables. * Clang tidy/format pass. * Fix formatting * Remove leftover ifdef * Fix build error for clang OSX for mismatched type comparison * Refactor loops and conditionals to use blocks * Clang tidy/format pass * Add detailed comments for acquire context parameters * Add comments describing loader method exports and dynamically resolved function pointers Other minor cleanups * Change aborts to debug asserts for context parameters. Add error handling to acquire context. * Cache Vulkan descriptor sets and other shader module objects in compilation cache for reuse * Replace platform specific strncpy for grabbing Extension strings with StringUtils::copy_upto * Enable device features for selected device * Fix alignment constraints for to match Vulkan buffer memory requirements. Add env vars to control Vulkan Memory Allocator config. * Add Vulkan to list of supported APIs in README.md Add Vulkan specific README_vulkan.md * Clang tidy/format pass * Fix conform_alignment to handle zero values * Fix declaration of custom_allocation_callbacks to be static. Change to constexpr for invalid values * Whitespace change to trigger build. * Handle Vulkan kernels that don't require storage buffers. Updated test status. Fixes 7 test cases. * Add src/mini_vulkan.h Apache 2.0 license requirements to License file * Add descriptor set binding info as pre-amble to SPIR-V code module Fix shared memory allocation to use global variables in workgroup storage space Add extern calls for spirv and glsl builtins Add memory fence call to gpu thread barrier Add missing visitors to Vulkan CodeGen Add scalar index & vector index methods for load/store * Clang tidy & format pass * Update test results for Vulkan docs. Passing: 326 Failing: 39 * Fix formatting * Remove extraneous parentheses for is_array_type() * Add Vulkan library to linkage fo Halide generator helpers * Add SPIR-V formatted output (for debugging) * Only declare SIMT intrinics that are actually used. Cleanup & refactor add_kernel method. * Add Vulkan handler to test targets * Clang format/tidy pass * Add doc-strings to SPIR-V interface * Adjust runtime array to widest vector width based on alignment and dense vector loads/stores Fix scalar and vector load/stores Fix casts for vectors Add missing nan, inf, neg_inf, is_finite builtins * Add missing bitwise and logical and methods. Cleanups. * Add comments about necessary packages on Ubuntu v22.04 vs earlier versions * Clang tidy & format pass. * Update Vulkan test results. Pass: 329 Fail: 36 * Remove unused Produce/Consume visitor method * Fix Molten VK initialization to work with v1.3+ loader Add support for direct casts for same-size types Add missing mux, mix, lerp, sinh, tanh, etc intrinsics Add explicit storage access for variables Add a macro to enable debug messages in Vulkan Memory Allocator * Disable dynamic shared memory portion of test for Vulkan (since its not supported yet) * Disable uncached portion of test for Vulkan (since it may OOM) * Disable float64 support in Type::supports_type() for Vulkan target since it's not widely supported * Fix Shuffle to handle all known cases Hookup VulkanMemoryAllocator to gpu allocation cache. Fix if_then_else to allow calls and statements to be used Fix loop counter comparison, and don't allow dynamic loops to be unrolled. Fix scalarize to use CompositeInsert instead of VectorInsertDynamic Fix FMod to use FRem (cause SPIR-V's FMod doesn't do what you'd expect ... but FRem does?!) Use exact same sematics for barriers as GLSL Compute ... still not passing everything Fix SPIR-V block termination checks, keys for null constants, and other cleanups * Clang tidy & format pass * Update correctness test results. PASS: 338, FAIL: 27 * Move counter inside debug #define to fix build * Relax tolerance for newton's method to match other GPU APIs Skip gpu dynamic shared testfor Vulkan (since dynamic shared allocations aren't supported yet) Update correctness test status. PASS: 340, FAIL: 25 * Clang format/tidy pass * Skip Vulkan for float64 for correctness test round (since f64 is optional) * Skip Vulkan for tests that rely upon device crop, and slice. * Only test small vector widths for Vulkan (since widths >=8 are optional) * Caninicalize gpu vars for Vulkan * Fix loop initialization, and increments Add all explicit types, and fix constant declarations Add missing fast intrinsics Convert results of logical ops into expected types (instead of bools) * Add SpvInstruction::add_operands(), add_immediates() and template based append() Make integer logical operations explicit. Better handling of constant data. * Clang format & tidy pass * Fix windows build ... refactor convert_to_bool to use std::vectors rather than dynamic fixed sized arrays * Skip asyn_device_copy, device_buffer_copy, device_crop, and device_slice tests for Vulkan (for now). * Don't test large vector widths for Vulkan (since they are optionally supported) * Clear Vulkan buffer allocations prior to use (tbd if this is necessary) * Skip Vulkan for async copy chain test * Skip Vulkan for interpreter test * Clang tidy/format pass * Fix formatting * Fix build ... use error messages for errors * Separate shared memory resources by element type for Vulkan. * Add Vulkan to conditional for fusing gpu loops * Reorder reset method to match declaration ordering. * Cleanup debug log messages for Vulkan resources * Assert alignment is power of two * Only split regions that have already been freed. Add more debug messages to log * Explicitly cleanup Vulkan command buffers as after they are used Avoid recreating descriptor sets Tidy up Vulkan debug messages * Fix Div, Mod, and div_round_to_zero for integer cases Cleanup reset method * Skip Vulkan for async_copy_chain * Skip 64-bit values on Vulkan since they are optionally supported * Skip interleave_rgb for Vulkan (which doesn't support cropping) * Skip interpreter for Vulkan (which doesn't support dynamic allocation of shared mem). * Clang Tidy/Format pass * Handle calls to pow with negative values for Vulkan Add integer and float constant helpers to SPIRV * Only test real numbers for pow with Vulkan * Clang tidy/format pass * Fix logic so a region request of an entire block matches if exactly the same size as an empty block * Create a zero size buffer to check for alignment Return null handles after freeing * Add more verbose debug output for malloc * Fix UConvert logic to avoid narrowing an integer type less than 8 bits Remove optimization path for division which seems to fail worse than DIV Cleanup DIV and MOD operators * Clang format/tidy pass * Fix SConvert & UConvert ops * Add retain semantics to block allocator interface Update test to validate retain/release/reclaim functionality * Implement device_crop, device_slice and release_crop for Vulkan. Re-enable device_crop, device_slice and interleave_rgb tests. * Clang format/tidy pass * Implement device copy for Vulkan. Enable device copy test. * Clang format/tidy pass * Fix signed mod operator and use euclidean identity (just like glsl) * Clang format/tidy pass * Fix to handle Mod on vectors (use vector constant for bitwise and) * Fix pow operator for Vulkan, and re-enable math test to full range. * Add error checking for return types for conditionals Use bool types for ops that require them, and adapt to expected return types * Handle deallocation for existing regions prior to coalescing. Cleanup region allocator logic for availability. Augment block_allocator test to cover allocation reuse. * Clang tidy/format pass * Fix reserved accounting for regions * Add more details to Windows specific Vulkan build config * Update SPIR-V headers to v1.6 * Add support for dynamic shared memory allocations for Vulkan Add dynamic workgroup dispatching to Vulkan Add optional feature flags for Vulkan capabilities Add Vulkan API version flags for target features Enable v1.3 path if requested Re-enable tests for added features Update Vulkan docs with status updates and feature flags * Enable Vulkan asyc_device_copy test. * Disable Vulkan performance test for async gpu (for now). * Disable Vulkan from python AOT tests and tutorials (since it requires linkage against the vulkan loader system library). * Update Vulkan readme with latest status. Everything works! More or less. =) * Clang format pass * Cleanup formatting for Halide version info in Makefile * Fix typos and address review comments for Vulkan readme * Change value casts to match Halide conventions * Fix typos in comments * Add static_assert to rotl to make compilation errors clearer (instead of using enable_if) Fix debug(3) formatting to avoid super long messages Use lookup table for SPIR-V op code names * Fix typos and logic for Vulkan capabilities * Remove leftover debug ifdef * Fix typo in comments * Rename copy_upto(...) method to be copy_up_to(...) * Handle error case for uninitialized buffer allocation (rather than abort) Fix typos in comments * Support any arbitary number of devices and queues for context creation Fix typos in comments * Add get/set alloc_config methods and API hooks for configuring the VulkanMemoryAllocator * Remove leftover debug ifdef * Hookup API methods for get/set alloc_config when initializing the VulkanMemoryAllocator * Remove empty lines in main * Add required capability flags for 8-bit and 16-bit uniform and storage buffer access Handle casts for GLSL ops (spec requires all args to be the same type as the return type) * Add VkPhysicalDevice8BitStorageFeaturesKHR and related constants * Query for 8-bit and 16-bit uniform and storage access support. Enable these as part of the device feature query chain. * Use VK_WHOLE_SIZE for setting buffer (to pass validation ... otherwise size has to be a multiple of alignment) Remove useless debug asserts for static variables Fix debug logging messages for allocations of scalars (which may not have a dim array) * Query for device limits to enforce min alignment constraints for storage and uniform buffers * Fix shutdown sequence to iterate over descriptor sets Avoid bug in validation layer by reordering destruction sequence * Clang format & tidy pass * Fix logic for locating entry point shader binding ... assume exact match for entry point name Cleanup entry point binding variables and clarify usage * Remove accidentally uncommented debug statements * Cleanup debug output for buffer related updates * Fix split and allocate methods in region allocator to fix issues with alignment constraints - discovered a hang if requested size couldn't be fulfilled after adjusting to aligned sizes - cause was incorrect splitting of existing regions Cleanup region allocator iteration, cleanup and shutdown Added maximum_pool_size configuration option to Vulkan Memory Allocator to restrict pool sizes * Added notes about TARGET_VULKAN=ON being the default now Added links to LunarG MoltenVK SDK installer, and brew packages * Fix markdown formatting * Fix error code handling in Vulkan runtime and internal datastructures. Refactor all (well nearly all) return values to use halide error codes. Reduce the usage of abort_if() for recoverable errors. * Fix typo in error message * Fix typo in readme * Skip GPU allocation cache test on MacOSX since MoltenVK only supports 30 buffers to be allocated * Skip widening reduction test on Vulkan for Mac OSX/IOS since MoltenVK fails to translate calls with vector types for builtins like min/max. etc * Skip doubles in vector cast test on Vulkan for Mac OSX/IOS since Molten doesn't support them * Skip gpu_dynamic_shared and gpu_specialize test for Vulkan on Mac OSX/IOS since MoltenVK doesn't support the dynamic shared memory allocation or dynamic grid size. * Clang format / tidy pass * Resolve conflicts for mini_webgpu.h ... revert to main * Use unique intrinsic var names for each kernel Cleanup constant value declarations with template helper methods Add comments on workgroup size usage * Wrap debug output under ifdef DEBUG_RUNTIME_INTERNAL macro guard Add nearest_multiple constraint to block/region allocator * Add vk_clear_device_buffer utility method Add nearest_multiple constrating to vulkan memory allocatori + fixes correctness/multiple_outputs test Add vkCreateBuffer/vkDestroyBuffer debug output i + for gpu_object_lifetime_tracker Cleanup shutdown for shader_module destruction * Add note about nearest_multiple constraint for vulkan memory allocator * Hookup gpu_object_lifetime_tracker with Vulkan debug statements * Skip dynamic shared memory portion of test for Vulkan on iOS/OSX. * Fix stale comment for float type support. Fix incorrect lowering for intrinsic. --------- Co-authored-by: Derek Gerstmann <[email protected]> Co-authored-by: Steven Johnson <[email protected]>
This set of changes is Phase2 for adding Vulkan support, which adds the runtime class interfaces for interacting with the Vulkan API. Specifically, it adds the following files:
The forthcoming Phase3 will address testing and CI modifications.