Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[New API] A new overloaded variant of execute has been added which allows the variant pack to be mentioned as pair of "uid, device pointer". In order to use this, the expectation is user will provide the uid for the tensors created.
[New API] Serialization: Graph class now supports serialization and deserialization after the final plan is built. Serialization is only supported on Runtime compiled engines in the cuDNN backend as of today, but may be extended to other engines in future. Deserialization requires a cuDNN handle that is created for an identical GPU the original graph/plan was created with. New samples showcasing this have been added in
samples/cpp/serialization.cpp
[New API] Autotuning: If the graph allows multiple engine configs for a given topology, each of this can now be built and executed in parallel. The expected flow is user queries the number of plans present and spawns a new thread for each plan to be finalized in parallel. The set of APIs to support this are as follows:
[New feature] sdpa_node now allows ragged offset to be set in the input and output tensors.
[Bug Fix] Certain parts of the FE code, used to throw excpetion even with
DISABLE_EXCEPTION
flag set. This has been cleaned up.[Bug Fix] For sdpa node, cudnn now correctly returns
NOT_SUPPORTED
when s_q is not a multiple of 64 and padding mask is on and cudnn version is less than 9.0.0.[Bug Fix] For sdpa backward node, cudnn now correctly returns
NOT_SUPPORTED
when s_q is less than 64 and cudnn version is less than 9.0.0.[Bug Fix] Fixed an issue with pointwise Modulo operation.
[Bug Fix] Fixed an issue in sdpa node, where the intermediate data types were wrong.
[Samples] Added a sample to showcase matmul with int8 and FP8 precisions.
[Cleanup] Python samples have moved from
samples/python
totests/python_fe
.[Cleanup] Removed the
cudnn_frontend::throw_if
function.