Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.1-rc #60

Merged
merged 1 commit into from
Feb 7, 2024
Merged

v1.1-rc #60

merged 1 commit into from
Feb 7, 2024

Conversation

Anerudhan
Copy link
Collaborator

@Anerudhan Anerudhan commented Feb 6, 2024

[New API] A new overloaded variant of execute has been added which allows the variant pack to be mentioned as pair of "uid, device pointer". In order to use this, the expectation is user will provide the uid for the tensors created.

error_t
cudnn_frontend::graph::Graph::execute(cudnnHandle_t handle, 
            std::unordered_map<int64_t, void*>& tensor_to_pointer_map, void *workspace) const;

[New API] Serialization: Graph class now supports serialization and deserialization after the final plan is built. Serialization is only supported on Runtime compiled engines in the cuDNN backend as of today, but may be extended to other engines in future. Deserialization requires a cuDNN handle that is created for an identical GPU the original graph/plan was created with. New samples showcasing this have been added in samples/cpp/serialization.cpp

error_t
cudnn_frontend::graph::Graph::serialize(std::vector<uint8_t>& data) const;

error_t
cudnn_frontend::graph::Graph::deserialize(cudnnHandle_t handle, 
                   std::vector<uint8_t> const& data);

[New API] Autotuning: If the graph allows multiple engine configs for a given topology, each of this can now be built and executed in parallel. The expected flow is user queries the number of plans present and spawns a new thread for each plan to be finalized in parallel. The set of APIs to support this are as follows:

int64_t 
Graph::get_execution_plan_count() const;

error_t
Graph::build_plan_at_index(cudnnHandle_t const &handle, int64_t index);

error_t
Graph::execute_plan_at_index(cudnnHandle_t const &handle, 
                         std::unordered_map<int64_t, void*>& ,  
                         void* workspace,  
                         int64_t plan_index) const;

int64_t
get_workspace_size_plan_at_index(int64_t plan_index) const;

[New feature] sdpa_node now allows ragged offset to be set in the input and output tensors.

[Bug Fix] Certain parts of the FE code, used to throw excpetion even with DISABLE_EXCEPTION flag set. This has been cleaned up.

[Bug Fix] For sdpa node, cudnn now correctly returns NOT_SUPPORTED when s_q is not a multiple of 64 and padding mask is on and cudnn version is less than 9.0.0.

[Bug Fix] For sdpa backward node, cudnn now correctly returns NOT_SUPPORTED when s_q is less than 64 and cudnn version is less than 9.0.0.

[Bug Fix] Fixed an issue with pointwise Modulo operation.

[Bug Fix] Fixed an issue in sdpa node, where the intermediate data types were wrong.

[Samples] Added a sample to showcase matmul with int8 and FP8 precisions.

[Cleanup] Python samples have moved from samples/python to tests/python_fe.

[Cleanup] Removed the cudnn_frontend::throw_if function.

…lows the variant pack to be mentioned as pair of "uid, device pointer". In order to use this, the expectation is user will provide the uid for the tensors created.

```
error_t
cudnn_frontend::graph::Graph::execute(cudnnHandle_t handle,
            std::unordered_map<int64_t, void*>& tensor_to_pointer_map, void *workspace) const;
```

[New API] Serialization: Graph class can now be serialized once the final plan is built. The corresponding deserialized plan requires the handle to be created on the same device the original graph was created with. Serialization is only supported on Runtime compiled engines. This support may be extended to other engines in future. New samples showcasing this have been added in `samples/cpp/serialization.cpp`

```
error_t
cudnn_frontend::graph::Graph::serialize(std::vector<uint8_t>& data) const;

error_t
cudnn_frontend::graph::Graph::deserialize(cudnnHandle_t handle,
                   std::vector<uint8_t> const& data);
```

[New API] Autotuning: If the graph allows multiple engine configs for a given topology, each of this can now be built and executed in parallel. The expected flow is user queries the number of plans present and spawns a new thread for each plan to be finalized in parallel. The set of APIs to support this are as follows:

```
int64_t
Graph::get_execution_plan_count() const;

error_t
Graph::build_plan_at_index(cudnnHandle_t const &handle, int64_t index);

error_t
Graph::execute_plan_at_index(cudnnHandle_t const &handle,
                         std::unordered_map<int64_t, void*>& ,
                         void* workspace,
                         int64_t plan_index) const;

int64_t
get_workspace_size_plan_at_index(int64_t plan_index) const;
```

[New feature] sdpa_node now allows ragged offset to be set in the input and output tensors.

[Bug Fix] Certain parts of the FE code, used to throw excpetion even with `DISABLE_EXCEPTION` flag set. This has been cleaned up.

[Bug Fix] For sdpa node, cudnn now correctly returns `NOT_SUPPORTED` when s_q is not a multiple of 64 and padding mask is on.

[Bug Fix] For sdpa backward node, cudnn now correctly returns `NOT_SUPPORTED` when s_q is less than 64.

[Bug Fix] Fixed an issue with pointwise Modulo operation.

[Bug Fix] Fixed an issue in sdpa node, where the intermediate data types were wrong.

[Samples] Added a sample to showcase matmul with int8 and FP8 precisions.

[Cleanup] Python samples have moved from `samples/python` to `tests/python_fe`.

[Cleanup] Removed the `cudnn_frontend::throw_if` function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant