Skip to content

Commit

Permalink
llama.cpp 2960 (New Formula) Fix failing test.
Browse files Browse the repository at this point in the history
llama.cpp 2960 (New Formula) Fix test.

Adds a more comprehensive test which loads a model and then checks whether the generation happened or not in the first place.

llama.cpp 2960 (New Formula) Add llama.cpp to autombump

Adding the update to autobump file, as I forgot to add it earlier.

llama.cpp 2960 (New Formula) Add curl dependency

llama.cpp 2960 (New Formula) Fix the order of statements.

llama.cpp 2960 (New Formula) Fix the failing test.

llama.cpp 2960 (New Formula) Fix the failing test. x2

llama.cpp 2960 (New Formula) Fix the failing test. x3

llama.cpp 2960 (New Formula) Fix style issues.

llama.cpp 2960 (New Formula) Fix order.

llama.cpp 2960 (New Formula) All test pass locally.

All style and the actual test cases pass locally now

llama.cpp 2960 (New Formula) Restrict the macos version.

All style and the actual test cases pass locally now

llama.cpp 2960 (New Formula) Fix Xcode issues.

llama.cpp 2960 (New Formula) Fix style issues.
  • Loading branch information
Vaibhavs10 committed May 24, 2024
1 parent d08feb7 commit dd9ad47
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 5 deletions.
1 change: 1 addition & 0 deletions .github/autobump.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1344,6 +1344,7 @@ literate-git
little-cms2
livekit
livekit-cli
llama-cpp
llm
llvm
lmdb
Expand Down
28 changes: 23 additions & 5 deletions Formula/l/llama.cpp.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,37 @@ class LlamaCpp < Formula
homepage "https://github.com/ggerganov/llama.cpp"
# pull from git tag to get submodules
url "https://github.com/ggerganov/llama.cpp.git",
tag: "b2950",
revision: "db10f01310beea8a1ef7798651b9d692fd1149d0"
tag: "b2963",
revision: "95fb0aefab568348da159efdd370e064d1b35f97"
license "MIT"

livecheck do
throttle 10
end

depends_on xcode: ["15.0", :build]
depends_on arch: :arm64
depends_on macos: :ventura
depends_on :macos
uses_from_macos "curl"

def install
system "make", "DLLAMA_FATAL_WARNINGS=ON", "DLLAMA_METAL_EMBED_LIBRARY=ON", "DLLAMA_CURL=ON"
system "make", "LLAMA_FATAL_WARNINGS=ON", "LLAMA_METAL_EMBED_LIBRARY=ON", "LLAMA_CURL=ON"

bin.install "./main" => "llama-cli"
bin.install "./server" => "llama-server"
end

test do

Check failure on line 27 in Formula/l/llama.cpp.rb

View workflow job for this annotation

GitHub Actions / macOS 14-arm64

`brew test --verbose llama.cpp` failed on macOS Sonoma (14) on Apple Silicon!

/opt/homebrew/Library/Homebrew/vendor/portable-ruby/3.3.1/bin/bundle clean ==> Testing llama.cpp ==> llama-cli --hf-repo ggml-org/tiny-llamas -m stories15M-q4_0.gguf -n 400 -p I Log start main: build = 2963 (95fb0aef) main: built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.4.0 main: seed = 1716566935 llama_download_file: no previous model file found stories15M-q4_0.gguf llama_download_file: downloading from https://huggingface.co/ggml-org/tiny-llamas/resolve/main/stories15M-q4_0.gguf to stories15M-q4_0.gguf (server_etag:"f15a5ea82f07243d28aed38819d443c6-2", server_last_modified:Wed, 22 May 2024 13:14:21 GMT)... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1142 100 1142 0 0 11418 0 --:--:-- --:--:-- --:--:-- 11418 100 18.1M 100 18.1M 0 0 51.0M 0 --:--:-- --:--:-- --:--:-- 51.0M llama_download_file: file metadata saved: stories15M-q4_0.gguf.json llama_model_loader: loaded meta data with 20 key-value pairs and 57 tensors from stories15M-q4_0.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... llama_model_loader: - kv 1: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 2: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 3: tokenizer.ggml.model str = llama llama_model_loader: - kv 4: general.architecture str = llama llama_model_loader: - kv 5: general.name str = llama llama_model_loader: - kv 6: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 7: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 8: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 9: tokenizer.ggml.seperator_token_id u32 = 4294967295 llama_model_loader: - kv 10: tokenizer.ggml.padding_token_id u32 = 4294967295 llama_model_loader: - kv 11: llama.context_length u32 = 128 llama_model_loader: - kv 12: llama.embedding_length u32 = 288 llama_model_loader: - kv 13: llama.feed_forward_length u32 = 768 llama_model_loader: - kv 14: llama.attention.head_count u32 = 6 llama_model_loader: - kv 15: llama.block_count u32 = 6 llama_model_loader: - kv 16: llama.rope.dimension_count u32 = 48 llama_model_loader: - kv 17: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 18: general.quantization_version u32 = 2 llama_model_loader: - kv 19: general.file_type u32 = 2 llama_model_loader: - type f32: 13 tensors llama_model_loader: - type q4_0: 43 tensors llama_model_loader: - type q8_0: 1 tensors llm_load_vocab: bad special token: 'tokenizer.ggml.seperator_token_id' = 4294967295d, using default id -1 llm_load_vocab: bad special token: 'tokenizer.ggml.padding_token_id' = 4294967295d, using default id -1 llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 128 llm_load_print_meta: n_embd = 288

Check failure on line 27 in Formula/l/llama.cpp.rb

View workflow job for this annotation

GitHub Actions / macOS 13-arm64

`brew test --verbose llama.cpp` failed on macOS Ventura (13) on Apple Silicon!

/opt/homebrew/Library/Homebrew/vendor/portable-ruby/3.3.1/bin/bundle clean ==> Testing llama.cpp ==> llama-cli --hf-repo ggml-org/tiny-llamas -m stories15M-q4_0.gguf -n 400 -p I Log start main: build = 2963 (95fb0aef) main: built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin22.6.0 main: seed = 1716566923 llama_download_file: no previous model file found stories15M-q4_0.gguf llama_download_file: downloading from https://huggingface.co/ggml-org/tiny-llamas/resolve/main/stories15M-q4_0.gguf to stories15M-q4_0.gguf (server_etag:"f15a5ea82f07243d28aed38819d443c6-2", server_last_modified:Wed, 22 May 2024 13:14:21 GMT)... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1148 100 1148 0 0 11306 0 --:--:-- --:--:-- --:--:-- 11306 100 18.1M 100 18.1M 0 0 47.5M 0 --:--:-- --:--:-- --:--:-- 47.5M llama_download_file: file metadata saved: stories15M-q4_0.gguf.json llama_model_loader: loaded meta data with 20 key-value pairs and 57 tensors from stories15M-q4_0.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... llama_model_loader: - kv 1: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 2: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 3: tokenizer.ggml.model str = llama llama_model_loader: - kv 4: general.architecture str = llama llama_model_loader: - kv 5: general.name str = llama llama_model_loader: - kv 6: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 7: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 8: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 9: tokenizer.ggml.seperator_token_id u32 = 4294967295 llama_model_loader: - kv 10: tokenizer.ggml.padding_token_id u32 = 4294967295 llama_model_loader: - kv 11: llama.context_length u32 = 128 llama_model_loader: - kv 12: llama.embedding_length u32 = 288 llama_model_loader: - kv 13: llama.feed_forward_length u32 = 768 llama_model_loader: - kv 14: llama.attention.head_count u32 = 6 llama_model_loader: - kv 15: llama.block_count u32 = 6 llama_model_loader: - kv 16: llama.rope.dimension_count u32 = 48 llama_model_loader: - kv 17: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 18: general.quantization_version u32 = 2 llama_model_loader: - kv 19: general.file_type u32 = 2 llama_model_loader: - type f32: 13 tensors llama_model_loader: - type q4_0: 43 tensors llama_model_loader: - type q8_0: 1 tensors llm_load_vocab: bad special token: 'tokenizer.ggml.seperator_token_id' = 4294967295d, using default id -1 llm_load_vocab: bad special token: 'tokenizer.ggml.padding_token_id' = 4294967295d, using default id -1 llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 128 llm_load_print_meta: n_embd = 2
llama_cli_command = "llama-cli"
assert_includes shell_output(llama_cli_command), "Log start"
llama_cli_command = ["llama-cli",
"--hf-repo",
"ggml-org/tiny-llamas",
"-m",
"stories15M-q4_0.gguf",
"-n",
"400",
"-p",
"I"].join(" ")
assert_includes shell_output(llama_cli_command), "<s>"
end
end

0 comments on commit dd9ad47

Please sign in to comment.