This repository is still in the early stages of development. Additionally, it includes many experimental approaches. Please consider this as a place to experiment with my ideas. Do not use it in a product under any circumstances.
Caten = Compile+AbstracTENsor
Caten is an experimental deep learning compiler. Our goal is to create a solution that’s as simple as tinygrad yet as flexible as TVM—all while extending the possibilities of interactive programming into the realm of AI.
We're looking for collaborators! Please join our Discord and let me know if you'd like to contribute!
Caten is still under development, but it aims to support a wide range of models in the future—from image processing to text generation, and vision language models! Some models are already up and running.
We have two doc files that explain how the Caten compilation pipeline works:
- End-to-End Example Which how the end-to-end compilation pipeline works.
- Getting Started An intro to Caten.
$ BACKEND=CLANG PARALLEL=8 ./roswell/caten.ros llm-example --model "gpt2" --prompt "Hello" --max-length 100
Give the GPT2 demo a try! You can pass compilation settings through environment variables.
For example, setting BACKEND=CLANG
enables JIT compilation, while JIT_DEBUG >= 2
allows you to view the schedule and the generated kernels. Setting PARALLEL=8
divides the ScheduleGraph and compiles it in parallel.
You may still find the token/ms rate slow, but we're not yet at the stage of implementing an AutoScheduler to accelerate kernel performance (as well as GPU support). Once our IR matures enough to handle a wide range of deep learning models, we plan to focus on speeding things up!
Caten is capable of generating the necessary kernels independently!
Instead of relying on OpenBLAS bindings or hand-optimized CUDA kernels, Caten avoids abstractions that would restrict us to specific libraries.
Let’s take Matmul+Activation
Fusion as an example to illustrate this approach:
(in-package :caten-user)
(pprint-graph
(tensor-graph (!relu (!matmul (make-tensor `(a b)) (make-tensor `(b c))))))
When you set BACKEND=CLANG
, the graph is compiled to an external language. You can view the generated code by specifying JIT_DEBUG >= 2
.
Give it a try in your REPL!
(in-package :caten-user)
;; (setf (ctx:getenv :BACKEND) "CLANG") to set globally
(ctx:with-contextvar (:BACKEND "CLANG")
(caten (!relu (!matmul (make-tensor `(a b)) (make-tensor `(b c))))))
We’ve adopted a RISC-style architecture. Ultimately, everything in Caten boils down to just 26 composable primitive ops.
When you replace tensor-graph
with tensor-lowered-graph
, you’ll see exactly what we mean! And by using ->dot
instead of pprint-graph
, you can visualize that graph right in your browser!
Finally, our lazy evaluation doesn’t make debugging any harder. If you want to check an intermediate result, just insert proceed
at any point—it won’t break the computation graph!
;; They are the equivalent
(proceed (!sin (!cos (ax+b `(3 3) 1 0))))
(proceed (!sin (proceed (!cos (ax+b `(3 3) 1 0)))))
(in-package :caten-user)
(defsequence MLP (in-features hidden-dim out-features &key (activation #'!relu))
(Linear in-features hidden-dim)
(asnode activation)
(Linear hidden-dim hidden-dim)
(asnode activation)
(Linear hidden-dim out-features))
(defun build-mlp-model ()
(let* ((model (MLP 64 32 16))
(outputs (call model (make-tensor `(b 64) :from :x)))
(loss (!cross-entropy (!softmax outputs) (make-tensor `(b 16) :from :y)))
(runner (caten loss)))
(values runner (hook-optimizers runner (SGD :lr 1e-3)))))
(defun train ()
(multiple-value-bind (runner optimizers) (build-mlp-model)
(dotimes (i 10)
(forward runner `(:x . ,(rand `(10 64))) `(:y . ,(rand `(10 16))) `(b . 10)) ;; replace with mnist dataloader
(backward runner)
(mapc #'step-optimizer optimizers)
(mapc #'zero-grad optimizers))))
Though our focus is still on the inference, we will support training models. (Still Experimental, Unstable.) I am not sure our backward scheduler can be expanded into more large and complicated graphs. :(
- Install Roswell and a suitable IDE. (If unsure, Emacs or Lem is recommended)
- Install ISL (Integer Set Library) for the fast kernel generation.
- If not already installed, then install libyaml for YAML parsing and emitting.
- Install Qlot
- Check out getting-started.lisp
$ git clone [email protected]:hikettei/Caten.git
$ cd Caten
$ qlot install
$ qlot exec ros run
> (ql:quickload :caten)
> (in-package :caten-user)
> (proceed (!randn `(3 3)))
-
Join our Discord Server.
-
Check out our roadmap.
-
Create a PR
Caten is a project that started only a few months ago. We are currently in the stage of building a solid foundational library. Here’s what we’re looking for:
-
Feature additions with tests (e.g., new activations, unimplemented matrix operations)
-
Bug reports and additional tests.
-
Refactoring of the core compiler components
-
Improving the documentation
etc...
Before contributing, please note that there is no linter here. Make an effort to adhere to Google Common Lisp Style Guide. Changes that do not follow this should be rejected by the review.
- Generative AI
- GPT2
- Llama3
- TinyLLAMA
- StableDiffusion
- QwenVL2
- Classification
- MobileNetV2
- MobileNetV3
- ResNet18/ResNet34/ResNet50
- VIT_B_16
- Segmentation
- CenterNet
- Detection
- YoLOv3
- YoLOv7
- Common Lisp Frontend (caten/api)
- ONNX (caten/onnx)
- GGUF (caten/gguf)
- Support Dequantization from GGUF
- Support QOPs
- Autodiff
- Fast Autodiff
- Support Training (But still limited)
- Distributed Training
- LISP VM (BACKEND=LISP)
- LISP JIT (BACKEND=NATIVE)
- CLANG JIT (BACKEND=CLANG)
- METAL (BACKEND=METAL)
- WebGPU (BACKEND=WEBGPU)
- CUDA (BACKEND=CUDA)
- LLVM (BACKEND=LLVM)
- OpenCL (BACKEND=OPENCL)
- Finish AutoScheduler (Polyhedral Compiler + BEAM Search)
- LISP RUNTIME
- Exported Lisp Runtime (BACKEND=NATIVE)
- Exported to dylib (BACKEND=CLANG)
- JavaScript Runtime (BACKEND=WEBGPU)
You should install python, numpy, pytorch before running the test-suite by using make install_extra
. If not specified, install the latest one.
$ make install_extra # extra dependencies for running tests
$ make test