can't start new thread #1624

JerryYao80 · 2023-05-28T09:57:09Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[√] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[√] I carefully followed the README.md.
[√] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[√] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I just wanna running ggml-model-q4_0.bin on my Windows 7.

Current Behavior

I followed the instruction written in README.md and executed :

docker run -v /models/llama:/home ghcr.io/ggerganov/llama.cpp:full --convert "/home/7B" 1

but I got the error:

Loading model file /home/pytorch_model-00001 -of-00033.bin
......
Loading model file /home/pytorch_model-00033 -of-00033.bin
Loading vocab file /home/tokenizer.model
Writing vocab...
......
RuntimeError: can't start new thread

Environment and Context

Docker Toolbox 1.13.1
docker client: 1.13.1 os/arch: windows 7 /amd64
docker server:19.03.12 os/arch:ubuntu 22.04 /amd64

Steps to Reproduce

installed Docker Toolbox 1.13.1 in my windows 7
pulled image ghcr.io/ggerganov/llama.cpp:full
downloaded Llama-7b-hf from huggingface to D:\installation\images\ptm
mounted D:\installation\images\ptm to /models in ubuntu, and it worked
exected command——docker run -v /models/llama:/home ghcr.io/ggerganov/llama.cpp:full --convert "/home/7B" 1

What should I do ? Thanks in advance

KerfuffleV2 · 2023-05-28T13:34:23Z

It seems like that script tries to use 8 threads to write the vocabulary. If you have some sort of resource limits set for your user or with Docker that could be limiting the number of threads, you could try adjusting that (no suggestions, I don't use Docker or Windows).

If you can figure out how to edit the convert.py script in the container:

llama.cpp/convert.py

Line 967 in a670464

ndarrays = bounded_parallel_map(do_item, model.items(), concurrency=8)

You could try setting the concurrency to a lower value there. Unfortunately, it doesn't appear configurable without actually editing the script. This may or may not help you: https://stackoverflow.com/questions/47490307/editing-files-inside-of-a-docker-container

Looks like the scripts and such will be under the /app directory in the container.

JerryYao80 · 2023-05-29T01:02:42Z

@KerfuffleV2 I changed the concurrency to 1, committed a new image and run again

ndarrays = bounded_parallel_map(do_item, model.items(), concurrency=1)

but I got the same problem:

Loading model file /home/pytorch_model-00001 -of-00033.bin
......
Loading model file /home/pytorch_model-00033 -of-00033.bin
Loading vocab file /home/tokenizer.model
Writing vocab...
......
RuntimeError: can't start new thread

Does anyone run the llama.cpp correct in Windows 7?

KerfuffleV2 · 2023-05-29T01:14:16Z

Hmm, seems like starting threads just doesn't work with your setup. I don't know if it's an issue with the Docker container or something else. Sorry, I also don't know about the other question. I would guess there are very few people still on Windows 7 at this point so there is a decent chance that particular setup isn't very well tested.

One thing you can possibly try is changing that parallel map to just a normal map. I think you could do:

ndarrays = map(do_item, model.items())

Making sure to preserve the existing indentation when you change it.

JerryYao80 · 2023-05-29T01:36:41Z

@KerfuffleV2 Thanks that works, I change bounded_parallel_map into map, and I get ggml-model-f16.bin
But a new error occurs:

ERROR: /app/.devops/tools.sh: line 40: 6 Illegal instruction ./quantize $arg2

when I execute:

--quantize "/models/7B/ggml-model-f16.bin" "/models/7B/ggml-model-q4_0.bin" 2

and I checked my quantize file:

quantize: ELF 64-bit LSB shared object, x86-64, version 1 <GNU/Linux> ...... for GNU/Linux 3.2.0, not stripped

But the architecture of docker image named ghcr.io/ggerganov/llama.cpp:full is amd64:

docker image ghcr.io/ggerganov/llama.cpp:full|grep Architecture
"Architecture": "amd64"

and my environment is:

Docker Toolbox 1.13.1
docker client: 1.13.1 os/arch: windows 7 /amd64
docker server:19.03.12 os/arch:ubuntu 22.04 /amd64
CPU type: Intel Core i7 6700 , supported command set: MMX, SSE, SSE2, ......, AVX, AVX2, FMA3, TSX

SO I'M CONFUSED:
1 Must I re-compile quantize?
2 How to re-compile if I use docker image?
3 Is there anywhere that I can download directly?

KerfuffleV2 · 2023-05-29T02:34:39Z

Unfortunately, I think we're reaching the point where I can't really help you anymore. I haven't actually quantized my own models or used the container, and I also don't use Windows. So I'm just going by random stuff I've seen.

There are two possible explanations I can think of here:

Some of the tools aren't really user friendly and will actually call abort() on conditions like missing files. This causes the app to crash and can show something like "Illegal instruction". Double check that the files exist, are called what you expect, etc. Also check for other warnings/errors above the "Illegal instruction" part which may have more information about what actually happened.
Issue Cmake file always assumes AVX2 support #1583 seems like it implies the binaries in stuff like the container would be compiled with AVX2 support (or I guess you can read that as "an AVX2 requirement"). If you're on Windows 7, it seems possible that you could be on old hardware that doesn't actually support AVX2 so apps compiled that require it will probably just die with "Illegal instruction".

In the case of #2, I'm not sure there's a lot you can do other than find the model already converted/quantized (it's usually not that hard, I can't tell you specifically since directly linking to those models isn't allowed) and simply download it or try to compile the project locally yourself. I don't know if WSL even works with Windows 7, so I can't give you advice on that part.

JerryYao80 · 2023-05-29T03:28:44Z

@KerfuffleV2 thanks for your help, the CPU I used supports AVX2, and docker architecture is amd64, but type of quntize file is x86_64, and I have tested , not only the quantization phase, but in the run phase (download ggml_model_q4_0.bin from huggingface) I get the same question, that means that the distance between success if only this question.

JerryYao80 changed the title ~~Can't find model in directory /models/7B~~ can't start new thread May 28, 2023

JerryYao80 closed this as completed May 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can't start new thread #1624

can't start new thread #1624

JerryYao80 commented May 28, 2023 •

edited

Loading

KerfuffleV2 commented May 28, 2023

JerryYao80 commented May 29, 2023

KerfuffleV2 commented May 29, 2023

JerryYao80 commented May 29, 2023 •

edited

Loading

KerfuffleV2 commented May 29, 2023

JerryYao80 commented May 29, 2023

can't start new thread #1624

can't start new thread #1624

Comments

JerryYao80 commented May 28, 2023 • edited Loading

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce

KerfuffleV2 commented May 28, 2023

JerryYao80 commented May 29, 2023

KerfuffleV2 commented May 29, 2023

JerryYao80 commented May 29, 2023 • edited Loading

KerfuffleV2 commented May 29, 2023

JerryYao80 commented May 29, 2023

JerryYao80 commented May 28, 2023 •

edited

Loading

JerryYao80 commented May 29, 2023 •

edited

Loading