Skip to content

Commit

Permalink
Update README.md: add mention of -f unroll loops option for gcc
Browse files Browse the repository at this point in the history
  • Loading branch information
karpathy authored Aug 1, 2023
1 parent def12a2 commit e270c6e
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,8 @@ The fastest throughput I saw so far on my MacBook Air (M1) so far is with `make

You can also experiment with replacing `gcc` with `clang`.

If compiling with gcc, try experimenting with `-funroll-all-loops`, see PR [#183](https://github.com/karpathy/llama2.c/pull/183)

### OpenMP
Big improvements can also be achieved by compiling with OpenMP, which "activates" the `#pragma omp parallel for` inside the matmul and attention, allowing the work in the loops to be split up over multiple processors.
You'll need to install the OpenMP library and the clang compiler first (e.g. `apt install clang libomp-dev` on ubuntu). I was not able to get improvements from OpenMP on my MacBook, though. Then you can compile with `make runomp`, which does:
Expand Down

0 comments on commit e270c6e

Please sign in to comment.