We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please answer the following questions for yourself before submitting an issue.
Attempted to train a model via: train-text-from-scratch --vocab-model models/ggml-vocab-llama.gguf --train-data ../cruft.llama/icbmlog.ttk.2.txt --adam-iter 500 --head 16 --layer 16
train-text-from-scratch --vocab-model models/ggml-vocab-llama.gguf --train-data ../cruft.llama/icbmlog.ttk.2.txt --adam-iter 500 --head 16 --layer 16
Segmentation fault in llama_build_train_graphs():
main: init model print_params: n_vocab: 32000 print_params: n_ctx: 128 print_params: n_embd: 256 print_params: n_head: 16 print_params: n_ff: 768 print_params: n_layer: 16 print_params: n_rot: 16 main: total train_iterations 0 main: seen train_samples 0 main: seen train_tokens 0 main: completed train_epochs 0 main: model_size = 240290304 bytes (229.2 MB) main: opt_size = 360288288 bytes (343.6 MB) main: opt iter 0 main: input_size = 131076128 bytes (125.0 MB) Segmentation fault``` # Environment and Context * Physical (or virtual) hardware you are using, e.g. for Linux: ```$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 12 On-line CPU(s) list: 0-11 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 158 Model name: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz Stepping: 10 CPU MHz: 800.332 CPU max MHz: 4500.0000 CPU min MHz: 800.0000 BogoMIPS: 5199.98 Virtualization: VT-x L1d cache: 192 KiB L1i cache: 192 KiB L2 cache: 1.5 MiB L3 cache: 12 MiB NUMA node0 CPU(s): 0-11 Vulnerability Itlb multihit: KVM: Mitigation: Split huge pages Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Meltdown: Mitigation; PTI Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall n x pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dte s64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ep t vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv 1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
Linux kirov.ciar.org 5.4.10 #1 SMP Thu Jan 9 14:13:31 CST 2020 x86_64 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz GenuineIntel GNU/Linux
$ python3 --version Python 3.8.1 $ make --version GNU Make 4.2.1 $ g++ --version g++ (GCC) 12.1.0
In train-text-from-scratch.cpp, llama_build_train_graphs is trying to initialize KQ_pos->data when it is NULL.
llama_build_train_graphs calls ggml_new_tensor_1d, which calls ggml_new_tensor, which calls ggml_new_tensor_impl.
In ggml_new_tensor_impl:
Immediately upon assigning returned result to KQ_pos, llama_build_train_graphs tries to set N elements of KQ_pos->data to n_past + i.
Since KQ_pos->data is NULL, this causes a segfault.
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
$ git log | head -1 commit bc39553c901a91cfcb757863586250838c83eeab $ pip3 list | egrep "torch|numpy|sentencepiece" numpy 1.22.1 sentencepiece 0.1.99 torch 2.0.1 torchvision 0.15.2 $ make --version | head -1 GNU Make 4.2.1
The text was updated successfully, but these errors were encountered:
Confirmed that this update fixes the problem. Thanks for all that you do, folks!
I do notice this at the end of the training run, but it might be unrelated:
main: total training time: 00:11:53 double free or corruption (!prev) Aborted
Sorry, something went wrong.
Successfully merging a pull request may close this issue.
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
Attempted to train a model via:
train-text-from-scratch --vocab-model models/ggml-vocab-llama.gguf --train-data ../cruft.llama/icbmlog.ttk.2.txt --adam-iter 500 --head 16 --layer 16
Current Behavior
Segmentation fault in llama_build_train_graphs():
Failure Information (for bugs)
In train-text-from-scratch.cpp, llama_build_train_graphs is trying to initialize KQ_pos->data when it is NULL.
llama_build_train_graphs calls ggml_new_tensor_1d, which calls ggml_new_tensor, which calls ggml_new_tensor_impl.
In ggml_new_tensor_impl:
Immediately upon assigning returned result to KQ_pos, llama_build_train_graphs tries to set N elements of KQ_pos->data to n_past + i.
Since KQ_pos->data is NULL, this causes a segfault.
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
Failure Logs
The text was updated successfully, but these errors were encountered: