v1.1-rc #60

Anerudhan · 2024-02-06T23:14:32Z

[New API] A new overloaded variant of execute has been added which allows the variant pack to be mentioned as pair of "uid, device pointer". In order to use this, the expectation is user will provide the uid for the tensors created.

error_t
cudnn_frontend::graph::Graph::execute(cudnnHandle_t handle, 
            std::unordered_map<int64_t, void*>& tensor_to_pointer_map, void *workspace) const;

[New API] Serialization: Graph class now supports serialization and deserialization after the final plan is built. Serialization is only supported on Runtime compiled engines in the cuDNN backend as of today, but may be extended to other engines in future. Deserialization requires a cuDNN handle that is created for an identical GPU the original graph/plan was created with. New samples showcasing this have been added in samples/cpp/serialization.cpp

error_t
cudnn_frontend::graph::Graph::serialize(std::vector<uint8_t>& data) const;

error_t
cudnn_frontend::graph::Graph::deserialize(cudnnHandle_t handle, 
                   std::vector<uint8_t> const& data);

[New API] Autotuning: If the graph allows multiple engine configs for a given topology, each of this can now be built and executed in parallel. The expected flow is user queries the number of plans present and spawns a new thread for each plan to be finalized in parallel. The set of APIs to support this are as follows:

int64_t 
Graph::get_execution_plan_count() const;

error_t
Graph::build_plan_at_index(cudnnHandle_t const &handle, int64_t index);

error_t
Graph::execute_plan_at_index(cudnnHandle_t const &handle, 
                         std::unordered_map<int64_t, void*>& ,  
                         void* workspace,  
                         int64_t plan_index) const;

int64_t
get_workspace_size_plan_at_index(int64_t plan_index) const;

[New feature] sdpa_node now allows ragged offset to be set in the input and output tensors.

[Bug Fix] Certain parts of the FE code, used to throw excpetion even with DISABLE_EXCEPTION flag set. This has been cleaned up.

[Bug Fix] For sdpa node, cudnn now correctly returns NOT_SUPPORTED when s_q is not a multiple of 64 and padding mask is on and cudnn version is less than 9.0.0.

[Bug Fix] For sdpa backward node, cudnn now correctly returns NOT_SUPPORTED when s_q is less than 64 and cudnn version is less than 9.0.0.

[Bug Fix] Fixed an issue with pointwise Modulo operation.

[Bug Fix] Fixed an issue in sdpa node, where the intermediate data types were wrong.

[Samples] Added a sample to showcase matmul with int8 and FP8 precisions.

[Cleanup] Python samples have moved from samples/python to tests/python_fe.

[Cleanup] Removed the cudnn_frontend::throw_if function.

…lows the variant pack to be mentioned as pair of "uid, device pointer". In order to use this, the expectation is user will provide the uid for the tensors created. ``` error_t cudnn_frontend::graph::Graph::execute(cudnnHandle_t handle, std::unordered_map<int64_t, void*>& tensor_to_pointer_map, void *workspace) const; ``` [New API] Serialization: Graph class can now be serialized once the final plan is built. The corresponding deserialized plan requires the handle to be created on the same device the original graph was created with. Serialization is only supported on Runtime compiled engines. This support may be extended to other engines in future. New samples showcasing this have been added in `samples/cpp/serialization.cpp` ``` error_t cudnn_frontend::graph::Graph::serialize(std::vector<uint8_t>& data) const; error_t cudnn_frontend::graph::Graph::deserialize(cudnnHandle_t handle, std::vector<uint8_t> const& data); ``` [New API] Autotuning: If the graph allows multiple engine configs for a given topology, each of this can now be built and executed in parallel. The expected flow is user queries the number of plans present and spawns a new thread for each plan to be finalized in parallel. The set of APIs to support this are as follows: ``` int64_t Graph::get_execution_plan_count() const; error_t Graph::build_plan_at_index(cudnnHandle_t const &handle, int64_t index); error_t Graph::execute_plan_at_index(cudnnHandle_t const &handle, std::unordered_map<int64_t, void*>& , void* workspace, int64_t plan_index) const; int64_t get_workspace_size_plan_at_index(int64_t plan_index) const; ``` [New feature] sdpa_node now allows ragged offset to be set in the input and output tensors. [Bug Fix] Certain parts of the FE code, used to throw excpetion even with `DISABLE_EXCEPTION` flag set. This has been cleaned up. [Bug Fix] For sdpa node, cudnn now correctly returns `NOT_SUPPORTED` when s_q is not a multiple of 64 and padding mask is on. [Bug Fix] For sdpa backward node, cudnn now correctly returns `NOT_SUPPORTED` when s_q is less than 64. [Bug Fix] Fixed an issue with pointwise Modulo operation. [Bug Fix] Fixed an issue in sdpa node, where the intermediate data types were wrong. [Samples] Added a sample to showcase matmul with int8 and FP8 precisions. [Cleanup] Python samples have moved from `samples/python` to `tests/python_fe`. [Cleanup] Removed the `cudnn_frontend::throw_if` function.

Anerudhan merged commit c29d609 into main Feb 7, 2024

timmoon10 mentioned this pull request Mar 1, 2024

[Common] Fix build errors with recent cuDNN frontend versions NVIDIA/TransformerEngine#696

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.1-rc #60

v1.1-rc #60

Anerudhan commented Feb 6, 2024 •

edited

Loading

v1.1-rc #60

v1.1-rc #60

Conversation

Anerudhan commented Feb 6, 2024 • edited Loading

Anerudhan commented Feb 6, 2024 •

edited

Loading