PaddlePaddle · luotao1 · Dec 19, 2024 · Dec 17, 2024 · Dec 18, 2024 · Dec 18, 2024
diff --git a/paddle/cinn/optim/update_buffer_axis_pass.h b/paddle/cinn/optim/update_buffer_axis_pass.h
@@ -21,9 +21,42 @@ namespace cinn {
 namespace optim {
 
 /**
- * Used in OptimizeExprGpu. Given Expr AST, analyze the Buffer axis, if it is
- * shared/local GPU memory and access indices are same at the same axis, then
- * it means we may not need that much memory. We set those indices to 0
+ * UpdateBufferAxisPass optimizes buffer access by formalizing indices and
+ * replacing redundant accesses with zero.
+ *
+ * This pass is used in `OptimizeExprGpu` and is applicable in scenarios
+ * where buffer accesses in shared or local GPU memory have consistent index
+ * expressions across the same axis. In such cases, the pass analyzes the Expr
+ * AST to determine if these consistent indices imply that less memory is
+ * needed. By setting these redundant indices to zero, the pass can help
+ * minimize memory usage.
+ *
+ * When applied, this pass analyzes buffer access patterns and identifies
+ * indices that are consistently accessed with the same expression across the
+ * same axis in shared or local GPU memory. It then replaces these indices with
+ * zero, which can lead to reduced memory allocation requirements and
+ * streamlined memory usage.
+ *
+ * Performance impact: This pass addresses memory optimization in GPU
+ * environments by potentially reducing memory allocation and improving access
+ * efficiency, which can enhance overall performance.
+ *
+ * Examples:
+ * 1. Consistent Index Access in GPU Shared Memory:
+ *    Input IR:
+ *      `A[i * 3][j] = ...`
+ *      `... = A[k][j]`
+ *    Output IR:
+ *      `A[i * 3][0] = ...`
+ *      `... = A[k][0]`
+ *
+ * 2. Single Dimension Access Simplified:
+ *    Input IR:
+ *      B[i * n + j] = ...
+ *      ... = B[k * n + j]
+ *    Output IR:
+ *      B[i * n + 0] = ...
+ *      ... = B[k * n + 0]
  */
 void UpdateBufferAxisPass(ir::Expr* expr);