Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

defsimplifier is O(N) #189

Merged
merged 12 commits into from
Nov 9, 2024
Merged

defsimplifier is O(N) #189

merged 12 commits into from
Nov 9, 2024

Conversation

hikettei
Copy link
Owner

@hikettei hikettei commented Nov 9, 2024

  • Remove the simplifier to the entire graph
  • Increase the limit of benchmark from N=12 to N=32
  • Optimize defsimplifier for handing the large graph. If that's O(N), jit should be also O(n)
  • Set restart-points for fastgraph simplified. Simplifyされた経路を記録して次のgraph-outputsとして使う
  • 99% of simplifier time for the large graph computation is used for the mearningless node comparison
  • JIT always use FastGraph
  • optimize-aasm may be slow (simplify-dynamic-arithmetic)
CATEN/LLM> (defparameter *transformer* (time (caten (forward *model* (make-tensor `(10 32)) (iconst 1)))))
COMPOSE-VIEWS-FROM-GRAPH: 56095 calls for 2302 nodes (24.36794%)
APPLY-FOLD-CONSTANT: 5205 calls for 1871 nodes (2.7819347%)
FUSE-VMOPS: 1748 calls for 1748 nodes (1.0%)
Evaluation took:
  0.796 seconds of real time
  0.778812 seconds of total run time (0.686056 user, 0.092756 system)
  97.86% CPU
  599,419,392 bytes consed
  
*TRANSFORMER*
CATEN/LLM> 

@hikettei
Copy link
Owner Author

hikettei commented Nov 9, 2024

❯  ./roswell/caten.ros benchmark transformer_compile_time 12 0
[22:40:49, 11/09/2024 (GMT+9)] : Running the benchmark transformer_compile_time
[22:40:49, 11/09/2024 (GMT+9)] : Configuration: N=12 JIT=0 PATH=NIL
[22:40:49, 11/09/2024 (GMT+9)] : Running with 0 layer transformers...
COMPOSE-VIEWS-FROM-GRAPH: 1388 calls for 1082 nodes (128.28096%)
APPLY-FOLD-CONSTANT: 1648 calls for 830 nodes (198.55421%)
FUSE-VMOPS: 826 calls for 826 nodes (100.0%)
COMPOSE-VIEWS-FROM-GRAPH: 12 calls for 16 nodes (75.0%)
APPLY-FOLD-CONSTANT: 12 calls for 12 nodes (100.0%)
FUSE-VMOPS: 12 calls for 12 nodes (100.0%)
COMPOSE-VIEWS-FROM-GRAPH: 1091 calls for 824 nodes (132.40291%)
APPLY-FOLD-CONSTANT: 2001 calls for 674 nodes (296.88428%)
FUSE-VMOPS: 668 calls for 668 nodes (100.0%)
COMPOSE-VIEWS-FROM-GRAPH: 1271 calls for 349 nodes (364.18338%)
APPLY-FOLD-CONSTANT: 751 calls for 264 nodes (284.4697%)
FUSE-VMOPS: 251 calls for 251 nodes (100.0%)
[22:40:52, 11/09/2024 (GMT+9)] : Completed in 0.083978 secs
[22:40:52, 11/09/2024 (GMT+9)] : Running with 1 layer transformers...
COMPOSE-VIEWS-FROM-GRAPH: 1063 calls for 786 nodes (135.24173%)
APPLY-FOLD-CONSTANT: 1308 calls for 660 nodes (198.18182%)
FUSE-VMOPS: 656 calls for 656 nodes (100.0%)
COMPOSE-VIEWS-FROM-GRAPH: 7681 calls for 979 nodes (784.5761%)
APPLY-FOLD-CONSTANT: 2188 calls for 763 nodes (286.7628%)
FUSE-VMOPS: 732 calls for 732 nodes (100.0%)
[22:40:53, 11/09/2024 (GMT+9)] : Completed in 0.161378 secs
[22:40:53, 11/09/2024 (GMT+9)] : Running with 2 layer transformers...
COMPOSE-VIEWS-FROM-GRAPH: 18367 calls for 1582 nodes (1160.9988%)
APPLY-FOLD-CONSTANT: 3550 calls for 1237 nodes (286.98462%)
FUSE-VMOPS: 1188 calls for 1188 nodes (100.0%)
[22:40:53, 11/09/2024 (GMT+9)] : Completed in 0.252982 secs
[22:40:53, 11/09/2024 (GMT+9)] : Running with 3 layer transformers...
COMPOSE-VIEWS-FROM-GRAPH: 33319 calls for 2185 nodes (1524.897%)
APPLY-FOLD-CONSTANT: 4912 calls for 1711 nodes (287.0836%)
FUSE-VMOPS: 1644 calls for 1644 nodes (100.0%)
[22:40:53, 11/09/2024 (GMT+9)] : Completed in 0.425999 secs
[22:40:53, 11/09/2024 (GMT+9)] : Running with 4 layer transformers...
COMPOSE-VIEWS-FROM-GRAPH: 52537 calls for 2788 nodes (1884.3975%)
APPLY-FOLD-CONSTANT: 6274 calls for 2185 nodes (287.1396%)
FUSE-VMOPS: 2100 calls for 2100 nodes (100.0%)
[22:40:54, 11/09/2024 (GMT+9)] : Completed in 0.678491 secs
[22:40:54, 11/09/2024 (GMT+9)] : Running with 5 layer transformers...
COMPOSE-VIEWS-FROM-GRAPH: 76021 calls for 3391 nodes (2241.8462%)
APPLY-FOLD-CONSTANT: 7636 calls for 2659 nodes (287.17563%)
FUSE-VMOPS: 2556 calls for 2556 nodes (100.0%)
[22:40:55, 11/09/2024 (GMT+9)] : Completed in 1.016605 secs
[22:40:55, 11/09/2024 (GMT+9)] : Running with 6 layer transformers...
COMPOSE-VIEWS-FROM-GRAPH: 103771 calls for 3994 nodes (2598.172%)
APPLY-FOLD-CONSTANT: 8998 calls for 3133 nodes (287.20078%)
FUSE-VMOPS: 3012 calls for 3012 nodes (100.0%)
[22:40:57, 11/09/2024 (GMT+9)] : Completed in 1.498794 secs
[22:40:57, 11/09/2024 (GMT+9)] : Running with 7 layer transformers...
COMPOSE-VIEWS-FROM-GRAPH: 135787 calls for 4597 nodes (2953.8179%)
APPLY-FOLD-CONSTANT: 10360 calls for 3607 nodes (287.2193%)
FUSE-VMOPS: 3468 calls for 3468 nodes (100.0%)
[22:40:59, 11/09/2024 (GMT+9)] : Completed in 2.036387 secs
[22:40:59, 11/09/2024 (GMT+9)] : Running with 8 layer transformers...
COMPOSE-VIEWS-FROM-GRAPH: 172069 calls for 5200 nodes (3309.019%)
APPLY-FOLD-CONSTANT: 11722 calls for 4081 nodes (287.23352%)
FUSE-VMOPS: 3924 calls for 3924 nodes (100.0%)
[22:41:01, 11/09/2024 (GMT+9)] : Completed in 2.753269 secs
[22:41:01, 11/09/2024 (GMT+9)] : Running with 9 layer transformers...
COMPOSE-VIEWS-FROM-GRAPH: 212617 calls for 5803 nodes (3663.9153%)
APPLY-FOLD-CONSTANT: 13084 calls for 4555 nodes (287.2448%)
FUSE-VMOPS: 4380 calls for 4380 nodes (100.0%)
[22:41:05, 11/09/2024 (GMT+9)] : Completed in 3.703634 secs
[22:41:05, 11/09/2024 (GMT+9)] : Running with 10 layer transformers...
COMPOSE-VIEWS-FROM-GRAPH: 257431 calls for 6406 nodes (4018.592%)
APPLY-FOLD-CONSTANT: 14446 calls for 5029 nodes (287.25394%)
FUSE-VMOPS: 4836 calls for 4836 nodes (100.0%)
[22:41:10, 11/09/2024 (GMT+9)] : Completed in 4.831347 secs
[22:41:10, 11/09/2024 (GMT+9)] : Running with 11 layer transformers...
COMPOSE-VIEWS-FROM-GRAPH: 306511 calls for 7009 nodes (4373.106%)
APPLY-FOLD-CONSTANT: 15808 calls for 5503 nodes (287.26147%)
FUSE-VMOPS: 5292 calls for 5292 nodes (100.0%)
[22:41:16, 11/09/2024 (GMT+9)] : Completed in 6.071198 secs
[Result]
N=0 | 0.083978 sec
N=1 | 0.161378 sec
N=2 | 0.252982 sec
N=3 | 0.425999 sec
N=4 | 0.678491 sec
N=5 | 1.016605 sec
N=6 | 1.498794 sec
N=7 | 2.036387 sec
N=8 | 2.753269 sec
N=9 | 3.703634 sec
N=10 | 4.831347 sec
N=11 | 6.071198 sec

@hikettei
Copy link
Owner Author

hikettei commented Nov 9, 2024

Linear?
image

@hikettei hikettei marked this pull request as ready for review November 9, 2024 13:57
@hikettei hikettei changed the title WIP: Finally O(n) simplifier Finally O(N) Simplifier Nov 9, 2024
@hikettei
Copy link
Owner Author

hikettei commented Nov 9, 2024

closer image

@hikettei hikettei changed the title Finally O(N) Simplifier defsimplifier is O(N) Nov 9, 2024
@hikettei hikettei merged commit 0ec09fa into main Nov 9, 2024
6 checks passed
@hikettei hikettei deleted the on-simplifier branch November 9, 2024 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant