Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] Kernel Creation, Improving memory-planner, etc... #32

Merged
merged 59 commits into from
Aug 27, 2024

Conversation

hikettei
Copy link
Owner

@hikettei hikettei commented Aug 23, 2024

  • Simplify the scheduling process of symbolic shaped/strided operation
  • Implement ngroup algorithm proposed in the issue
    • Single Loop Single Kernel Rule
    • Reuse tmpvars across different function
    • float _tmp_xxx
    • BugFix: Reinitialize allocation (copy from the original code!)
    • TODO: Argsの特定
    • Reference_CounterでTmpVarの管理
    • Vectorize/Unrolling/Tilingいける?
  • Refactor for the render: implement metadata
  • Compile threefry2x32
    • But when the input is dynamic, it losts the randomness
  • CUSTOM/RAND
  • (caten (!matmul (make-tensor `(10 3)) (!matmul (make-tensor `(3 5)) (make-tensor `(5 7))))
  • (make-backend :clang :opt xxx)
  • Export to ./.caten
  • JIT Kernel Test Update
  • (First priority) TODO: Take a into symbolic compilation and ISL things!
    • render-isl-arefのSymbolicをどうにかする
    • forall uses lparallel

@hikettei
Copy link
Owner Author

hikettei commented Aug 24, 2024

Softmax is getting sophisticated. Plus, we can create a tmpvar against val_1 and val_11 because they are independent of c0.
(Is it ok to use a metadata like :reduction?)

CATEN> (caten (!softmax (make-tensor `(3 3))))
WARNING: WIP: MaxOp
Compiled[e4]:
#include <math.h>
#include <stdint.h>
#define boolean _Bool
#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

/*
Arrays:
  - val_11[float32]: (3 1) // IO, TMP
  - val_0[float32]: (3 3) // IO, USER
  - val_1[float32]: (3 1) // IO, TMP
*/
void main9327315_e4_k0(float* val_11, float* val_0, float* val_1);
void main9327315_e4_k0(float* val_11, float* val_0, float* val_1) {
  for(int c0=0;(c0<=2);c0+=1) {
    val_1[c0+0] = 0.0;
    for(int c1=0;(c1<=2);c1+=1) {
      val_1[c0+0] = max(val_1[c0+0], val_0[3*c0+c1]);
    }
    val_11[c0+0] = 0.0;
    val_1[c0+0] = -(val_1[c0+0]);
    for(int c1=2;(c1<=4);c1+=1) {
      val_0[3*c0+(c1-2)] = exp2(((val_0[3*c0+(c1-2)]+val_1[c0+0])*1.442695));
      val_11[c0+0] = (val_11[c0+0]+val_0[3*c0+(c1-2)]);
    }
    val_11[c0+0] = 1/(val_11[c0+0]);
    for(int c1=4;(c1<=6);c1+=1) {
      val_0[3*c0+(c1-4)] = (val_0[3*c0+(c1-4)]*val_11[c0+0]);
    }
  }
}

@hikettei
Copy link
Owner Author

趣旨は一時領域関連の最適化と、カスタムカーネルへの布石なので、reductionは忘れよう

@hikettei hikettei changed the title Feature/tmpvar jit [JIT] Kernel Creation, Improving memory-planner, etc... Aug 24, 2024
@hikettei
Copy link
Owner Author

hikettei commented Aug 27, 2024

どっかにまとめる
Tensor/Loop Boundに対する制約

  • Shape: Symbolic/Tensor/FixnumがOK
  • Stride: Symbolic/FixnumがOK(原則自動生成)
  • Permute: 全てFixnum
  • View: ByのみFixnum (か,Byを含むViewはScheduleしない)
  • その代わりLisp-Like DSLで直接コードを記述できるか,Lexographical Memory Accessingを許容

@hikettei hikettei marked this pull request as ready for review August 27, 2024 11:11
@hikettei
Copy link
Owner Author

merge this as sbcl passed

@hikettei hikettei merged commit 069a40c into main Aug 27, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant