You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
When running ./build/bin/quantize ./models/30B/ggml-model-f16.bin ./models/30B/ggml-model-q8_0.bin q8_0, ./quantize either successfully quantises the model, saving it as ggml-model-q8_0.bin, or gives an error message without the program breaking.
Current Behavior
When running this command, ./quantize gives an illegal hardware instruction. As per Valgrind:
vex amd64->IR: unhandled instruction bytes: 0xC5 0xF1 0xEF 0xC9 0xC4 0xC1 0x71 0xC4 0xCF 0x0
vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0
==13694== valgrind: Unrecognised instruction at address 0x13c4f0.
==13694== at 0x13C4F0: ggml_init (in /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize)
==13694== by 0x113219: llama_init_backend (in /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize)
==13694== by 0x10FCF9: main (in /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize)
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
Physical (or virtual) hardware you are using:
~ % lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 40 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU X5690 @ 3.47GHz
CPU family: 6
Model: 44
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 2
Stepping: 2
Frequency boost: enabled
CPU max MHz: 3468,0000
CPU min MHz: 1600,0000
BogoMIPS: 6932.61
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf
pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt lahf_lm epb pti tpr_shad
ow vnmi flexpriority ept vpid dtherm ida arat
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 384 KiB (12 instances)
L1i: 384 KiB (12 instances)
L2: 3 MiB (12 instances)
L3: 24 MiB (2 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0-5,12-17
NUMA node1 CPU(s): 6-11,18-23
Vulnerabilities:
Itlb multihit: KVM: Mitigation: VMX disabled
L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Meltdown: Mitigation; PTI
Mmio stale data: Not affected
Retbleed: Not affected
Spec store bypass: Vulnerable
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Srbds: Not affected
Tsx async abort: Not affected
Operating System:
~ % uname -a
Linux computer-pig.net 5.19.3-gentoo #2 SMP PREEMPT_DYNAMIC Fri Aug 26 20:41:30 PDT 2022 x86_64 Intel(R) Xeon(R) CPU X5690 @ 3.47GHz GenuineIntel GNU/Linux
SDK version:
~ % python3 --version
Python 3.9.16
~ % make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
I built with cmake, not make, therefore am also putting:
~ % cmake --version
cmake version 3.25.2
CMake suite maintained and supported by Kitware (kitware.com/cmake).
~ % g++ --version
g++ (Gentoo 11.3.0 p4) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Pip cannot install requirements.txt verbatim due to the externally-managed-environment error on Gentoo, and sentencepiece is not natively in Gentoo repositories, therefore I had installed the package using pip manually with the --user --break-system-packages flags, and numpy appears to be managed by the system package manager. The following is the current package info:
Originally, I had installed sentencepiece 0.1.98 several days ago. When I initially tried to convert the model today, I had the following error:
Traceback (most recent call last):
File "/mnt/Biblioteko/Gits/llama.cpp/convert.py", line 25, in <module>
from sentencepiece import SentencePieceProcessor # type: ignore
File "/home/happysmash27/.local/lib/python3.9/site-packages/sentencepiece/__init__.py", line 13, in <module>
from . import _sentencepiece
ImportError: /home/happysmash27/.local/lib/python3.9/site-packages/sentencepiece/_sentencepiece.cpython-39-x86_64-linux-gnu.so: file too short
So, I ran pip3 install --upgrade --user --break-system-packages sentencepiece to upgrade it, and after this, the conversion went seemingly successfully. The quantisation, however, did not, causing me to start creating this issue.
In the process of making this issue, however, it was necessary to rule out this slightly newer version as causing the error. Upon downgrading back to 0.1.98 and re-running convert.py, it works now, but the error in running ./quantize is the same:
vex amd64->IR: unhandled instruction bytes: 0xC5 0xF1 0xEF 0xC9 0xC4 0xC1 0x71 0xC4 0xCF 0x0
vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0
==18826== valgrind: Unrecognised instruction at address 0x13c4f0.
==18826== at 0x13C4F0: ggml_init (in /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize)
==18826== by 0x113219: llama_init_backend (in /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize)
==18826== by 0x10FCF9: main (in /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize)
For this reason, the steps to reproduce do not include this upgrade-downgrade, in order to not have this issue be too badly disorganised. However, in order to give as much context as possible, here are the steps I have followed at this point in time in more detail:
Clone repository.
Install numpy 1.24.2 using Gentoo package manager, and sentencepiece 0.1.98 using pip.
mkdir build; cd build; cmake ..
cmake --build . --config Release
cd ..
Using cp --reflink=auto, copy all model files to the models directory
python convert.py models/30B/ results in error.
Attempt to fix error: pip3 install --upgrade --user --break-system-packages sentencepiece. If 0.1.99 is not the latest version, modify that command to install that version specifically.
As the conversion is finishing, delete ggml-model-f16.bin.old
Run ./quantize again. The error still appears to be identical.
After looking through the documentation even more and finding a bunch of sha256sums, I also decided to run a sha256sum on the converted model to be absolutely sure that convert.py is not causing the error. The result is as follows:
% valgrind ./build/bin/quantize ./models/30B/ggml-model-f16.bin ./models/30B/ggml-model-q8_0.bin q8_0
==13694== Memcheck, a memory error detector
==13694== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==13694== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==13694== Command: ./build/bin/quantize ./models/30B/ggml-model-f16.bin ./models/30B/ggml-model-q8_0.bin q8_0
==13694==
vex amd64->IR: unhandled instruction bytes: 0xC5 0xF1 0xEF 0xC9 0xC4 0xC1 0x71 0xC4 0xCF 0x0
vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0
==13694== valgrind: Unrecognised instruction at address 0x13c4f0.
==13694== at 0x13C4F0: ggml_init (in /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize)
==13694== by 0x113219: llama_init_backend (in /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize)
==13694== by 0x10FCF9: main (in /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize)
==13694== Your program just tried to execute an instruction that Valgrind
==13694== did not recognise. There are two possible reasons for this.
==13694== 1. Your program has a bug and erroneously jumped to a non-code
==13694== location. If you are running Memcheck and you just saw a
==13694== warning about a bad jump, it's probably your program's fault.
==13694== 2. The instruction is legitimate but Valgrind doesn't handle it,
==13694== i.e. it's Valgrind's fault. If you think this is the case or
==13694== you are not sure, please let us know and we'll try to fix it.
==13694== Either way, Valgrind will now raise a SIGILL signal which will
==13694== probably kill your program.
==13694==
==13694== Process terminating with default action of signal 4 (SIGILL)
==13694== Illegal opcode at address 0x13C4F0
==13694== at 0x13C4F0: ggml_init (in /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize)
==13694== by 0x113219: llama_init_backend (in /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize)
==13694== by 0x10FCF9: main (in /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize)
==13694==
==13694== HEAP SUMMARY:
==13694== in use at exit: 73,064 bytes in 6 blocks
==13694== total heap usage: 6 allocs, 0 frees, 73,064 bytes allocated
==13694==
==13694== LEAK SUMMARY:
==13694== definitely lost: 0 bytes in 0 blocks
==13694== indirectly lost: 0 bytes in 0 blocks
==13694== possibly lost: 0 bytes in 0 blocks
==13694== still reachable: 73,064 bytes in 6 blocks
==13694== suppressed: 0 bytes in 0 blocks
==13694== Rerun with --leak-check=full to see details of leaked memory
==13694==
==13694== For lists of detected and suppressed errors, rerun with: -s
==13694== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
zsh: illegal hardware instruction valgrind ./build/bin/quantize ./models/30B/ggml-model-f16.bin q8_0
After going to the build/bin directory in case the issue was related to linking, the issue still persists:
Same result, both before re-converting the model and after:
% valgrind ./build/bin/quantize ./models/30B/ggml-model-f16.bin ./models/30B/ggml-model-q8_0.bin q8_0
==18826== Memcheck, a memory error detector
==18826== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==18826== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==18826== Command: ./build/bin/quantize ./models/30B/ggml-model-f16.bin ./models/30B/ggml-model-q8_0.bin q8_0
==18826==
vex amd64->IR: unhandled instruction bytes: 0xC5 0xF1 0xEF 0xC9 0xC4 0xC1 0x71 0xC4 0xCF 0x0
vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0
==18826== valgrind: Unrecognised instruction at address 0x13c4f0.
==18826== at 0x13C4F0: ggml_init (in /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize)
==18826== by 0x113219: llama_init_backend (in /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize)
==18826== by 0x10FCF9: main (in /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize)
==18826== Your program just tried to execute an instruction that Valgrind
==18826== did not recognise. There are two possible reasons for this.
==18826== 1. Your program has a bug and erroneously jumped to a non-code
==18826== location. If you are running Memcheck and you just saw a
==18826== warning about a bad jump, it's probably your program's fault.
==18826== 2. The instruction is legitimate but Valgrind doesn't handle it,
==18826== i.e. it's Valgrind's fault. If you think this is the case or
==18826== you are not sure, please let us know and we'll try to fix it.
==18826== Either way, Valgrind will now raise a SIGILL signal which will
==18826== probably kill your program.
==18826==
==18826== Process terminating with default action of signal 4 (SIGILL)
==18826== Illegal opcode at address 0x13C4F0
==18826== at 0x13C4F0: ggml_init (in /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize)
==18826== by 0x113219: llama_init_backend (in /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize)
==18826== by 0x10FCF9: main (in /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize)
==18826==
==18826== HEAP SUMMARY:
==18826== in use at exit: 73,064 bytes in 6 blocks
==18826== total heap usage: 6 allocs, 0 frees, 73,064 bytes allocated
==18826==
==18826== LEAK SUMMARY:
==18826== definitely lost: 0 bytes in 0 blocks
==18826== indirectly lost: 0 bytes in 0 blocks
==18826== possibly lost: 0 bytes in 0 blocks
==18826== still reachable: 73,064 bytes in 6 blocks
==18826== suppressed: 0 bytes in 0 blocks
==18826== Rerun with --leak-check=full to see details of leaked memory
==18826==
==18826== For lists of detected and suppressed errors, rerun with: -s
==18826== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
zsh: illegal hardware instruction valgrind ./build/bin/quantize ./models/30B/ggml-model-f16.bin q8_0
Update: I have recompiled with -DLLAMA_NATIVE=ON (to try to remove the unsupported instructions, which did not seem to work) and with debug symbols and have ran using gdb. My output is as follows:
(gdb) run
Starting program: /mnt/Biblioteko/Gits/llama.cpp/build/bin/quantize ../models/30B/ggml-model-f16.bin ../models/30B/ggml-model-q8_0.bin q8_0
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Program received signal SIGILL, Illegal instruction.
0x000055555559b449 in _cvtsh_ss (__S=0) at /usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/include/f16cintrin.h:40
40 __v8hi __H = __extension__ (__v8hi){ (short) __S, 0, 0, 0, 0, 0, 0, 0 };
(gdb) backtrace
#0 0x000055555559b449 in _cvtsh_ss (__S=0) at /usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/include/f16cintrin.h:40
#1 ggml_init (params=...) at /home/happysmash27/Gits/llama.cpp/ggml.c:3905
#2 0x000055555555d269 in llama_init_backend () at /home/happysmash27/Gits/llama.cpp/llama.cpp:865
#3 0x0000555555559cc1 in main (argc=4, argv=0x7fffffffd1f8) at /home/happysmash27/Gits/llama.cpp/examples/quantize/quantize.cpp:53
After some more research, using a combination of a bunch of ChatGPT sessions and manually finding documentation to try to deal with halucinations better, I believe the issue is that this is calling an AVX instruction, while AVX was not implemented into Intel CPUs until a generation or so after my CPUs were made.
Update 3:
Edited update 1 to note that I also had added -DLLAMA_NATIVE=ON at that point and...
Now I ahve also added -DLLAMA_AVX2=OFF -DLLAMA_AVX=OFF to my CMake flags. Unfortunately, the error still appears to persist.
Update 4:
I now realise that by Update 3 the Valgrind output had actually changed:
==6513== Memcheck, a memory error detector
==6513== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==6513== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==6513== Command: ./bin/quantize ../models/30B/ggml-model-f16.bin ../models/30B/ggml-model-q8_0.bin q8_0
==6513==
vex amd64->IR: unhandled instruction bytes: 0xC5 0xF9 0xEF 0xC0 0xC5 0xF9 0xC4 0xC0 0x0 0xC5
vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0
==6513== valgrind: Unrecognised instruction at address 0x14f775.
==6513== at 0x14F775: _cvtsh_ss (f16cintrin.h:40)
==6513== by 0x14F775: ggml_init (ggml.c:3905)
==6513== by 0x111268: llama_init_backend (llama.cpp:865)
==6513== by 0x10DCC0: main (quantize.cpp:53)
==6513== Your program just tried to execute an instruction that Valgrind
==6513== did not recognise. There are two possible reasons for this.
==6513== 1. Your program has a bug and erroneously jumped to a non-code
==6513== location. If you are running Memcheck and you just saw a
==6513== warning about a bad jump, it's probably your program's fault.
==6513== 2. The instruction is legitimate but Valgrind doesn't handle it,
==6513== i.e. it's Valgrind's fault. If you think this is the case or
==6513== you are not sure, please let us know and we'll try to fix it.
==6513== Either way, Valgrind will now raise a SIGILL signal which will
==6513== probably kill your program.
==6513==
==6513== Process terminating with default action of signal 4 (SIGILL)
==6513== Illegal opcode at address 0x14F775
==6513== at 0x14F775: _cvtsh_ss (f16cintrin.h:40)
==6513== by 0x14F775: ggml_init (ggml.c:3905)
==6513== by 0x111268: llama_init_backend (llama.cpp:865)
==6513== by 0x10DCC0: main (quantize.cpp:53)
==6513==
==6513== HEAP SUMMARY:
==6513== in use at exit: 73,064 bytes in 6 blocks
==6513== total heap usage: 6 allocs, 0 frees, 73,064 bytes allocated
==6513==
==6513== LEAK SUMMARY:
==6513== definitely lost: 0 bytes in 0 blocks
==6513== indirectly lost: 0 bytes in 0 blocks
==6513== possibly lost: 0 bytes in 0 blocks
==6513== still reachable: 73,064 bytes in 6 blocks
==6513== suppressed: 0 bytes in 0 blocks
==6513== Rerun with --leak-check=full to see details of leaked memory
==6513==
==6513== For lists of detected and suppressed errors, rerun with: -s
==6513== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
zsh: illegal hardware instruction valgrind ./bin/quantize ../models/30B/ggml-model-f16.bin q8_0
More importantly though, after adding in yet another flag to disable F16C (which I had noticed seems to be the only one of these variables that shows up in the relevent source code), Valgrind output has changed again:
==14777== Memcheck, a memory error detector
==14777== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==14777== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==14777== Command: ./bin/quantize ../models/30B/ggml-model-f16.bin ../models/30B/ggml-model-q8_0.bin q8_0
==14777==
vex amd64->IR: unhandled instruction bytes: 0xC5 0xFA 0x10 0x5 0xAE 0x63 0x3 0x0 0xC5 0xFA
vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0
==14777== valgrind: Unrecognised instruction at address 0x1467ae.
==14777== at 0x1467AE: ggml_compute_fp16_to_fp32 (ggml.c:269)
==14777== by 0x14F596: ggml_init (ggml.c:3905)
==14777== by 0x111268: llama_init_backend (llama.cpp:865)
==14777== by 0x10DCC0: main (quantize.cpp:53)
==14777== Your program just tried to execute an instruction that Valgrind
==14777== did not recognise. There are two possible reasons for this.
==14777== 1. Your program has a bug and erroneously jumped to a non-code
==14777== location. If you are running Memcheck and you just saw a
==14777== warning about a bad jump, it's probably your program's fault.
==14777== 2. The instruction is legitimate but Valgrind doesn't handle it,
==14777== i.e. it's Valgrind's fault. If you think this is the case or
==14777== you are not sure, please let us know and we'll try to fix it.
==14777== Either way, Valgrind will now raise a SIGILL signal which will
==14777== probably kill your program.
==14777==
==14777== Process terminating with default action of signal 4 (SIGILL)
==14777== Illegal opcode at address 0x1467AE
==14777== at 0x1467AE: ggml_compute_fp16_to_fp32 (ggml.c:269)
==14777== by 0x14F596: ggml_init (ggml.c:3905)
==14777== by 0x111268: llama_init_backend (llama.cpp:865)
==14777== by 0x10DCC0: main (quantize.cpp:53)
==14777==
==14777== HEAP SUMMARY:
==14777== in use at exit: 73,064 bytes in 6 blocks
==14777== total heap usage: 6 allocs, 0 frees, 73,064 bytes allocated
==14777==
==14777== LEAK SUMMARY:
==14777== definitely lost: 0 bytes in 0 blocks
==14777== indirectly lost: 0 bytes in 0 blocks
==14777== possibly lost: 0 bytes in 0 blocks
==14777== still reachable: 73,064 bytes in 6 blocks
==14777== suppressed: 0 bytes in 0 blocks
==14777== Rerun with --leak-check=full to see details of leaked memory
==14777==
==14777== For lists of detected and suppressed errors, rerun with: -s
==14777== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
zsh: illegal hardware instruction valgrind ./bin/quantize ../models/30B/ggml-model-f16.bin q8_0
It doesn't actually solve the issue, unfortunately, but I consider it good progress that it has at least gotten to the issue through a different route than it did before.
The text was updated successfully, but these errors were encountered:
happysmash27
changed the title
[User] Insert summary of your issue or enhancement..
[User] ./quantize fails with illegal hardware instruction
May 31, 2023
happysmash27
changed the title
[User] ./quantize fails with illegal hardware instruction
[User] ./quantize fails with illegal hardware instruction when AVX is not supported
May 31, 2023
I finally, finally got it to work! All I had to do was disable yet another one of the new instruction sets listed in CMakeLists.txt that are enabled by default: FMA. Inputting the following when configuring with CMake will make it work completely correctly:
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
When running
./build/bin/quantize ./models/30B/ggml-model-f16.bin ./models/30B/ggml-model-q8_0.bin q8_0
,./quantize
either successfully quantises the model, saving it as ggml-model-q8_0.bin, or gives an error message without the program breaking.Current Behavior
When running this command,
./quantize
gives an illegal hardware instruction. As per Valgrind:Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
I built with cmake, not make, therefore am also putting:
Pip cannot install requirements.txt verbatim due to the
externally-managed-environment
error on Gentoo, and sentencepiece is not natively in Gentoo repositories, therefore I had installed the package using pip manually with the--user --break-system-packages
flags, and numpy appears to be managed by the system package manager. The following is the current package info:Failure Information (for bugs)
Valgrind gives extra details as follows:
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
mkdir build; cd build; cmake ..
cmake --build . --config Release
cd ..
cp --reflink=auto
, copy all model files to the models directorypython convert.py models/30B/
./build/bin/quantize ./models/30B/ggml-model-f16.bin ./models/30B/ggml-model-q8_0.bin q8_0
Important notes
Originally, I had installed sentencepiece 0.1.98 several days ago. When I initially tried to convert the model today, I had the following error:
So, I ran
pip3 install --upgrade --user --break-system-packages sentencepiece
to upgrade it, and after this, the conversion went seemingly successfully. The quantisation, however, did not, causing me to start creating this issue.In the process of making this issue, however, it was necessary to rule out this slightly newer version as causing the error. Upon downgrading back to 0.1.98 and re-running
convert.py
, it works now, but the error in running./quantize
is the same:For this reason, the steps to reproduce do not include this upgrade-downgrade, in order to not have this issue be too badly disorganised. However, in order to give as much context as possible, here are the steps I have followed at this point in time in more detail:
mkdir build; cd build; cmake ..
cmake --build . --config Release
cd ..
cp --reflink=auto
, copy all model files to the models directorypython convert.py models/30B/
results in error.pip3 install --upgrade --user --break-system-packages sentencepiece
. If 0.1.99 is not the latest version, modify that command to install that version specifically.python convert.py models/30B/
runs seemingly successfully../quantize
does not run successfully.pip3 install --user --break-system-packages sentencepiece==0.1.98
./quantize
still fails.mv models/30B/ggml-model-f16.bin models/30B/ggml-model-f16.bin.old
python convert.py models/30B/
ggml-model-f16.bin.old
./quantize
again. The error still appears to be identical.After looking through the documentation even more and finding a bunch of sha256sums, I also decided to run a sha256sum on the converted model to be absolutely sure that
convert.py
is not causing the error. The result is as follows:This appears to be distinct from #41 as I am on x86_64, not ARM.
Failure Logs
When compiling, some warnings appeared which may or may not be related:
Every command I have used to get this error:
After going to the
build/bin
directory in case the issue was related to linking, the issue still persists:File info:
As mentioned in "Important notes", initially I had converted using sentencepiece-0.1.99. Downgrading to sentencepiece-0.1.98 to see if it fixes it:
Same result, both before re-converting the model and after:
Update: I have recompiled with
-DLLAMA_NATIVE=ON
(to try to remove the unsupported instructions, which did not seem to work) and with debug symbols and have ran using gdb. My output is as follows:Update 2:
After some more research, using a combination of a bunch of ChatGPT sessions and manually finding documentation to try to deal with halucinations better, I believe the issue is that this is calling an AVX instruction, while AVX was not implemented into Intel CPUs until a generation or so after my CPUs were made.
Update 3:
Edited update 1 to note that I also had added
-DLLAMA_NATIVE=ON
at that point and...Now I ahve also added
-DLLAMA_AVX2=OFF -DLLAMA_AVX=OFF
to my CMake flags. Unfortunately, the error still appears to persist.Update 4:
I now realise that by Update 3 the Valgrind output had actually changed:
More importantly though, after adding in yet another flag to disable F16C (which I had noticed seems to be the only one of these variables that shows up in the relevent source code), Valgrind output has changed again:
It doesn't actually solve the issue, unfortunately, but I consider it good progress that it has at least gotten to the issue through a different route than it did before.
The text was updated successfully, but these errors were encountered: