Releases: hikettei/Caten
Releases · hikettei/Caten
v0.2
What's Changed
- [Refactor] Reimplement Everything (ShapeTracker+New JIT+New AutoScheduler) by @hikettei in #146
- Revert "[Refactor] Reimplement Everything (ShapeTracker+New JIT+New AutoScheduler)" by @hikettei in #151
- Reimplement JIT by @hikettei in #154
- Update Readme.md by @hikettei in #175
- Refactor: fold-constant without recursive function call by @hikettei in #178
- Add: caten/benchmark for profiling the compilation time in CI by @hikettei in #179
- feat: add softshrink activation by @abourramouss in #177
- CI: Rename to benchmark by @hikettei in #181
- CI: AUTO_SCHEDULER + Dynamic is broken by @hikettei in #183
- op logsigmoid + gitignore by @abourramouss in #184
- Opt: 2x times speed up on lowerer, Running external-simplifiers in the small groups by @hikettei in #186
- Opt: remove verify-graph for 10x time faster lowerer by @hikettei in #188
- BugFix: Removing extra node-writes purged by memory-planner by @hikettei in #190
- defsimplifier is O(N) by @hikettei in #189
- selu, softmin and hardtanh by @abourramouss in #185
- remove print by @hikettei in #194
- Refactor: Integrate the local and global memory-planner by @hikettei in #193
- regression tests for ShapeTracker by @hikettei in #199
- BugFix: Permute+Reshape(Uprank) Shape Tracker Merging for ConvND by @hikettei in #202
- Accuracy Testing for LayerNorm/RMSNorm/MultiHeadAttention by @hikettei in #205
- Refactor: Module supports multiple outputs + view ops by @hikettei in #212
- BugFix: ExprGraph(x = y + y) by @hikettei in #214
- merge-views: Reverse masks for NIL by @hikettei in #218
- bugfix: rotate the permutation in schedule.lisp by @hikettei in #219
- Lowerer: BATCH_SIZE=1 by @hikettei in #223
- Refactor: MultiHeadAttention by @hikettei in #226
- a lil tweak on MHA+Use UINT64/INT64 in default by @hikettei in #227
- Docs: Support English by @hikettei in #229
- Support Full Symbolic JIT (Allow duplicated seen for ds) by @hikettei in #230
- opt: don't lower the cached schedule-item (3x faster jit compiler) by @hikettei in #232
- Enhancement: Support Parallel Compilation by @hikettei in #236
- Feat: GPT2 Inference Infrastructure (GGUF, StateDict, BPE Tokenizer) by @hikettei in #237
- refactor: make sure two groups have the same rank before merging by @hikettei in #239
- Refactor: Memory Planner Newid creates seen by @hikettei in #241
- BugFix: Tensor Shaped Index-Components by @hikettei in #240
- Feat: KVCache GPT2 by @hikettei in #243
- CI: sbcl-bin/latest by @hikettei in #246
- BugFix: Fix structures to be visible at compile time from
trivia
library by @elderica in #248 - ShapeTracker: tr-stride instead of tr-shape-for-stride + tr-permute by @hikettei in #245
- Load GPT2 Parameters by @hikettei in #249
- BugFix: Print
double-float
number in the correct way by @elderica in #253 - SERIALIZE=1 to no fusion by @hikettei in #254
- Scheduler: bfs topological sort after scheduling by @hikettei in #255
- feat: Rotatory Positional Encoding by @abourramouss in #215
- [Refactor] Brand New Scheduler by @hikettei in #257
- Feat: with-inference-mode by @hikettei in #262
- Bring back Facet APIs by @hikettei in #263
- ShapeTracker: Masked Reshape by @hikettei in #266
- Feat: defcall by @hikettei in #268
- BugFix: MHA in rtol<1e-5 by @hikettei in #270
- CI: timeout-minutes=25 by @hikettei in #271
- Opt: Faster Symbolic Lowerer by @hikettei in #275
- more tests on symbolic by @hikettei in #277
- Patch: :ALLOCATE loads a buffer from state-dict by @hikettei in #280
- BugFix: JIT=1 should support !assign to get KVCache worked. by @hikettei in #282
- gguf: force synchronize the offset between fastio and sbcl by @hikettei in #286
- numerical accuracy stuff by @hikettei in #287
- ONNX Support in Caten by @hikettei in #288
New Contributors
Full Changelog: v0.1...v0.2
v0.1
What's Changed
- Enhancement: Get AOT Compilation Worked Again by @hikettei in #73
- BugFix: Fix for recursive deps by @hikettei in #74
- Pre-compiled tensor initializers for NN Ops (+ IDIV) by @hikettei in #77
- Optimize: Simplify duplicated Load(A) by @hikettei in #79
- O(n) Embedding by @hikettei in #80
- Enhancement: External/GGUF by @hikettei in #84
- [WIP] ANSI-COLOR, CLI Tool, GPT2 Inference, Enhancements on Logger, etc... by @hikettei in #88
- [Refactor] type-safe air node by defnode by @hikettei in #91
- [Refactor] FastGraph: 10x times faster Pattern Matcher by @hikettei in #101
- [BugFix] Eliminate Known issues in Simplifier by @hikettei in #105
- Allowing (forward module tensor nil) by @hikettei in #106
- WIP: PyTorch Interop for writing tests by @hikettei in #108
- Various enhancements and refactorings on caten/ajit by @hikettei in #110
- Enhancement: MultiExpr in the same domain by @hikettei in #114
- retrying with maximize_coincidence by @hikettei in #115
- Post-MultiExpr Fusion in the equivalent domain by @hikettei in #116
- JIT: Post-MultiExpr Optimization between different iteration spaces (Final) by @hikettei in #117
- Enhancement: DOT=1 by @hikettei in #119
- !reshape always use !contiguous by @hikettei in #120
- New Linalg Ops: Einsum/Tril/Triu etc... by @hikettei in #123
- Documentation by @hikettei in #126
- Support INF/-INF/NaN by @hikettei in #128
- a lil improvement and bugfix to the scheduler by @hikettei in #130
- maximize_band_depth and Loop Fusion by @hikettei in #133
- [Prepreq] Revisit the algorithm in transformer.lisp by @hikettei in #138
- Fix: Propagate Scalar anywhere by @hikettei in #139
Full Changelog: v0.0...v0.1
v0.0
Features
- Fundamentals of Building a Compiler Based on Polyhedral Compiler and Pattern Matcher
caten/ajit
Polyhedral Compiler based on ISL (and more!), targeting multiple hardware.- Currently supports basic loop-oriented optimization, including: Loop Fusion, Loop Unrolling (Automatic Parallelization and Vectorizing), and Permuting.
caten/air
An optimizing pattern matcher based on trivia.
- Implement a MemoryPlanner.
- all unary ops activations are always in-place.
- VM
- Lisp VM (for testing)
- JIT
- Currently supports Clang and Metal JIT.
- Basic Neural Network Operations (Linear, ConvND, Activations, Norms, etc...)
- Fundamental Tensor Operations (mathematical functions, arithmetic, reduce, broadcast, view, reshape, permute, etc..)
- Implements reverse-mode Autograd but it is still unstable.
For the next release...
- (Metal JIT) Complete Implementing Metal JIT, by implementing BEAM Search to determine the optimal
global_size/local_size
. #44 - (AJIT) Implement
Hide Latency Optimization
andIndex Computation Simplification
. - (AIR) Implement
Attr
class for the type-safety. #70 - (NN) Add tests for
ConvND
and Norms - (CI) Add a pipeline for the benchmarking (my goal is to beat PyTorch)
- (AJIT) Gemm-specific optimization (8x8 gemm tiling)
- (AJIT)
packed-funcall
->simd
mutation for Clang and Metal - (CI) Implement YoLoV3 and running on CI
What's Changed
- [CI Debugging+Enhancement] All mathematical functions should (aim to) satisfy the 1.0ULP accuracy by @hikettei in #4
- [wip] refactors, implemented nn ops etc by @hikettei in #5
- [WIP] polyhedral compiler passing all tests by @hikettei in #8
- [Feature] Activations/ReduceMax/ReduceMin by @hikettei in #11
- [WIP] Establish APIs, NN Ops, Control Flow etc... by @hikettei in #12
- Threefry2x32 by @hikettei in #13
- AOT Compilation, Shape Inference by @hikettei in #15
- [Draft] Make JIT completely equivalent to VM (Forward Mode) by @hikettei in #16
- [WIP] (Backward Mode) Make JIT completely equivalent to the VM Mode by @hikettei in #18
- [JIT] Kernel Creation, Improving memory-planner, etc... by @hikettei in #32
- Patch for CCL and ECL by @hikettei in #37
- gc-reachable isl objects by @hikettei in #39
- [Feature] New Backends: Metal and Lisp by @hikettei in #42
- [Enhancements] (Much Better) Loop Fusion and Packed Funcall Mutation (for unrolling/vectorizing/tiling). by @hikettei in #49
- [WIP] Fix invaild shape inference in composed gemm and index-components by @hikettei in #51
- Revisit: fix %where by @hikettei in #53
- CI: Improvements on CI and Testing by @hikettei in #54
- CI: Testing against all nn ops by @hikettei in #66
- [Refactor] Introduce Memory Optimization based on MIP by @hikettei in #69
New Contributors
Full Changelog: https://github.com/hikettei/Caten/commits/v0.0