Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Brand New Scheduler #257

Merged
merged 90 commits into from
Nov 25, 2024
Merged

[Refactor] Brand New Scheduler #257

merged 90 commits into from
Nov 25, 2024

Conversation

hikettei
Copy link
Owner

@hikettei hikettei commented Nov 22, 2024

Workload (Final)

  • Check the kernel manually for batch_norm/layer_norm/feed_forward/rope/attention/softmax/chunk/Transformer (add tests if needed)
  • Embedding+Embedding, Softmax+Softmax
  • Fuse Chunk/Attention, Failing Case
  • GPT2 No Segv
  • GPT2 in CI
  • almost there; so fix the schedule + fix the schedule cache and merge
  • no-grad wont change the kernel (make an issue first)
  • realize regardless of pause/backward (create a table in ctx)
  • optimize the kernel for backward

Memo

  • GPT2 Test in CI
  • final: update group-mergeable-p
  • Chunk=1, Attention=4
  • Test: Matmul(GeLU(X), X) Matmul(X, GeLU(X)) as well as GeLU=Sin
  • fix for jit; it is ok to realize save for backward across pause/backward
  • RoPE JIT=1
  • BatchNorm JIT=1
  • Transformer
    • SERIALIZE=1 worked
    • SERIALIZE=0? -> no segv, it is working!
  • ideally scheduler.lisp < 400 lines
  • docs for pprint-graph
  • move pprint-graph to caten/aasm
  • break the graph first?
    • patch for fast dynamic shape first. (Load is scheduled multiple times)
    • and chase down to unfused views? (merge views)
    • LayerNorm = 1 Kernels
    • New Scheduler fuses multiple kernels too much; Loop Fission by ISL Scheduler is NEEDED!

Workload

  • BatchNorm/LayerNorm/Softmax/... = 1 Kernels
  • RoPE JIT=1 and 2 kernels, and pass the tests
  • Fix for dynamic shape compilation (faster!)
  • Update: Scalar <-> Scalar Pointer Mutation
  • Update: Matmul(Matmul(..., ...))
  • Remove old scheduler
  • update the Lowerer
  • Softmax (10)
  • Refactor
  • todo: automatically determine PARALLEL, PARALLEL in default

Lowerer Workload

  • Update how node-reads/node-writes are used for the schedule-graph
  • Update Scalar Pointer Manipulation
    • Fix for !rand '(a b)
    • as well as randn
  • Refactor exprify.lisp
  • Update Scalar Pointer Manipulation
  • (!softmax `(10)) (Restart and create an additional loop)
  • Transformer
  • Transformer = 29 Kernel is MUST

@hikettei hikettei changed the title reimplementing scheduler.lisp? Brand new scheduler Nov 23, 2024
@hikettei hikettei changed the title Brand new scheduler Brand New Scheduler Nov 23, 2024
@hikettei hikettei changed the title Brand New Scheduler [Refactor] Brand New Scheduler Nov 23, 2024
@hikettei hikettei marked this pull request as ready for review November 25, 2024 06:41
@hikettei hikettei merged commit 05819c9 into main Nov 25, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant