Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SCU][CINN][Add Backend Pass Comment No.13] Add comment for UpdateBufferAxis #70271

Merged
merged 3 commits into from
Dec 19, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 36 additions & 3 deletions paddle/cinn/optim/update_buffer_axis_pass.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,42 @@ namespace cinn {
namespace optim {

/**
* Used in OptimizeExprGpu. Given Expr AST, analyze the Buffer axis, if it is
* shared/local GPU memory and access indices are same at the same axis, then
* it means we may not need that much memory. We set those indices to 0
* UpdateBufferAxisPass optimizes buffer access by formalizing indices and
* replacing redundant accesses with zero.
*
* This pass is used in `OptimizeExprGpu` and is applicable in scenarios
* where buffer accesses in shared or local GPU memory have consistent index
* expressions across the same axis. In such cases, the pass analyzes the Expr
* AST to determine if these consistent indices imply that less memory is
* needed. By setting these redundant indices to zero, the pass can help
* minimize memory usage.
*
* When applied, this pass analyzes buffer access patterns and identifies
* indices that are consistently accessed with the same expression across the
* same axis in shared or local GPU memory. It then replaces these indices with
* zero, which can lead to reduced memory allocation requirements and
* streamlined memory usage.
*
* Performance impact: This pass addresses memory optimization in GPU
* environments by potentially reducing memory allocation and improving access
* efficiency, which can enhance overall performance.
*
* Examples:
* 1. Consistent Index Access in GPU Shared Memory:
* Input IR:
* `A[i * 3][j] = ...`
* `... = A[k][j]`
* Output IR:
* `A[i * 3][0] = ...`
* `... = A[k][0]`
*
* 2. Single Dimension Access Simplified:
* Input IR:
* B[i * n + j] = ...
* ... = B[k * n + j]
* Output IR:
* B[i * n + 0] = ...
* ... = B[k * n + 0]
*/
void UpdateBufferAxisPass(ir::Expr* expr);

Expand Down