Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Milestone: Refactor on caten/air graph #90

Closed
13 of 15 tasks
hikettei opened this issue Sep 16, 2024 · 6 comments
Closed
13 of 15 tasks

Milestone: Refactor on caten/air graph #90

hikettei opened this issue Sep 16, 2024 · 6 comments

Comments

@hikettei
Copy link
Owner

hikettei commented Sep 16, 2024

Why

  • The compilation time of GPT-2 is extremely slow. (there are still O(N^2) algorithm)
  • 必要ないところでO(N^2)なalgorithmが存在しているはず
  • There are potential bugs when doing the symbolic compilation
  • Lower->Module is still unstable.

Node

  • Introduce an Attribute class to make an air node type-safe.
(defattr :Add (JITGraph) ;; (doing defclass (Attribute) ...)
  ((reduce t)
    (_reads ... )))
(defparameter *keyword->class* (make-hash-table))
;; :Add -> AddAttr
(defun make-node (...)
  (make-instance (gethash :Add *keyword->class*) :reduce ...))
  • 最初はOptionalにして,最後はMustにする

Data Structure (Slow)

  • id->value, id->users are slow
  • Stop doing (car (node-writes ...))
  • Purge isolated Graph is slow
    • SortingはAVM実行の直前の一回のみで良い
  • コンパイルの三つの段階を考える
    • AJITを通す前
      • VM実行は必要ないのでSortをする必要はない
      • id->value, value->idを高速にする必要がある
      • LowerしたModuleに対して何回もPattern Matcherを適用する必要はない?
      • id->value/purge-isolated-graph, どれが遅いかBenchmark
@hikettei

This comment was marked as off-topic.

@hikettei
Copy link
Owner Author

#101

CATEN/LLM> (ctx:with-contextvar (:safety 0)
	     (with-no-grad
	       (time (caten (call (Transformer 64 8 2 1e-5 64) (make-tensor `(10 3)) (iconst 2)))))
	     nil)
Evaluation took:
  7.795 seconds of real time
  7.787674 seconds of total run time (7.675815 user, 0.111859 system)
  [ Real times consist of 0.051 seconds GC time, and 7.744 seconds non-GC time. ]
  [ Run times consist of 0.050 seconds GC time, and 7.738 seconds non-GC time. ]
  99.91% CPU
  6,590,503,968 bytes consed
  
NIL
CATEN/LLM> (ctx:with-contextvar (:safety 0)
	     (with-no-grad
	       (time (caten (call (Transformer 64 8 1 1e-5 64) (make-tensor `(10 3)) (iconst 2)))))
	     nil)
Evaluation took:
  6.085 seconds of real time
  5.785396 seconds of total run time (5.222497 user, 0.562899 system)
  [ Real times consist of 0.006 seconds GC time, and 6.079 seconds non-GC time. ]
  [ Run times consist of 0.006 seconds GC time, and 5.780 seconds non-GC time. ]
  95.07% CPU
  4,794,871,344 bytes consed
  
NIL
CATEN/LLM> (ctx:with-contextvar (:safety 1)
	     (with-no-grad
	       (time (caten (call (Transformer 64 8 2 1e-5 64) (make-tensor `(10 3)) (iconst 2)))))
	     nil)
Evaluation took:
  8.040 seconds of real time
  8.027498 seconds of total run time (7.817753 user, 0.209745 system)
  [ Real times consist of 0.011 seconds GC time, and 8.029 seconds non-GC time. ]
  [ Run times consist of 0.009 seconds GC time, and 8.019 seconds non-GC time. ]
  99.84% CPU
  7,229,228,800 bytes consed
  
NIL
CATEN/LLM> (ctx:with-contextvar (:safety 1)
	     (with-no-grad
	       (time (caten (call (Transformer 64 8 1 1e-5 64) (make-tensor `(10 3)) (iconst 2)))))
	     nil)
Evaluation took:
  5.237 seconds of real time
  5.237717 seconds of total run time (5.214703 user, 0.023014 system)
  [ Real times consist of 0.007 seconds GC time, and 5.230 seconds non-GC time. ]
  [ Run times consist of 0.007 seconds GC time, and 5.231 seconds non-GC time. ]
  100.02% CPU
  4,995,774,448 bytes consed
  
NIL
CATEN/LLM> 

@hikettei
Copy link
Owner Author

The performance is not related to air?

  seconds  |     gc     |     consed    |    calls   |  sec/call  |  name  
----------------------------------------------------------------
     0.390 |      0.005 |   144,309,440 | 11,806,716 |   0.000000 | CATEN/AIR::%GETATTR
     0.274 |      0.000 |    75,557,664 |        769 |   0.000356 | CATEN/AIR:->GRAPH
     0.229 |      0.000 | 2,205,735,008 | 11,808,608 |   0.000000 | CATEN/AIR:GETATTR
     0.058 |      0.000 |   150,920,000 |    165,817 |   0.000000 | CATEN/AIR:INSERT-NODES
     0.041 |      0.000 |    39,118,400 |     12,591 |   0.000003 | CATEN/AIR::RESOLVE-ISOLATED-NODES
     0.033 |      0.000 |   280,195,680 |  2,062,985 |   0.000000 | CATEN/AIR:GRAPH-NODES
     0.013 |      0.000 |    77,782,016 |    577,431 |   0.000000 | (SETF CATEN/AIR:GRAPH-NODES)
     0.007 |      0.000 |     8,706,624 |      9,566 |   0.000001 | CATEN/AIR:DUMP-INTO-LIST
     0.005 |      0.005 |     1,833,712 |      2,136 |   0.000002 | CATEN/AASM:%RECIP
     0.004 |      0.000 |    22,081,280 |    165,136 |   0.000000 | CATEN/AIR:MAKE-GRAPH
     0.004 |      0.000 |     2,162,400 |     59,870 |   0.000000 | CATEN/AIR::%BOUNDP
     0.001 |      0.000 |       262,080 |         56 |   0.000024 | CATEN/AIR:VERIFY-ARGS
     0.001 |      0.000 |             0 |         56 |   0.000012 | CATEN/AIR::MAKE-ATTR
     0.001 |      0.000 |       655,104 |      3,282 |   0.000000 | CATEN/AIR::SPECIAL-P
     0.000 |      0.000 |       327,504 |         56 |   0.000008 | CATEN/AIR::ATTRIBUTE->INSTANCE
     0.000 |      0.000 |       589,792 |      4,445 |   0.000000 | CATEN/AASM::NODE->ID1
     0.000 |      0.000 |        65,520 |         56 |   0.000001 | CATEN/AIR::%MAKE-NODE
     0.000 |      0.000 |    18,741,168 |    577,431 |   0.000000 | (SETF CATEN/AIR::%GRAPH-NODES)

@hikettei
Copy link
Owner Author

Hmm

  seconds  |     gc     |     consed    |  calls  |  sec/call  |  name  
-------------------------------------------------------------
     6.540 |      0.005 | 5,835,638,592 |  12,128 |   0.000539 | CATEN/APIS:FORWARD

@hikettei
Copy link
Owner Author

(ctx:with-contextvar (:safety 0)
  (with-no-grad
    (time  (call (Transformer 64 8 2 1e-5 64) (make-tensor `(10 3)) (iconst 2))))
    nil)
Evaluation took:
  6.601 seconds of real time
  6.474324 seconds of total run time (6.129876 user, 0.344448 system)
  [ Real times consist of 0.015 seconds GC time, and 6.586 seconds non-GC time. ]
  [ Run times consist of 0.015 seconds GC time, and 6.460 seconds non-GC time. ]
  98.08% CPU
  5,908,658,160 bytes consed

@hikettei
Copy link
Owner Author

completed except for bug fixing (but might not be related to air refactoring)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant