Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't start new thread #1624

Closed
JerryYao80 opened this issue May 28, 2023 · 6 comments
Closed

can't start new thread #1624

JerryYao80 opened this issue May 28, 2023 · 6 comments

Comments

@JerryYao80
Copy link

JerryYao80 commented May 28, 2023

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [√] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [√] I carefully followed the README.md.
  • [√] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [√] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I just wanna running ggml-model-q4_0.bin on my Windows 7.

Current Behavior

I followed the instruction written in README.md and executed :

docker run -v /models/llama:/home ghcr.io/ggerganov/llama.cpp:full --convert "/home/7B" 1

but I got the error:

Loading model file /home/pytorch_model-00001 -of-00033.bin
......
Loading model file /home/pytorch_model-00033 -of-00033.bin
Loading vocab file /home/tokenizer.model
Writing vocab...
......
RuntimeError: can't start new thread

Environment and Context

Docker Toolbox 1.13.1
docker client: 1.13.1 os/arch: windows 7 /amd64
docker server:19.03.12 os/arch:ubuntu 22.04 /amd64

Steps to Reproduce

  1. installed Docker Toolbox 1.13.1 in my windows 7
  2. pulled image ghcr.io/ggerganov/llama.cpp:full
  3. downloaded Llama-7b-hf from huggingface to D:\installation\images\ptm
  4. mounted D:\installation\images\ptm to /models in ubuntu, and it worked
  5. exected command——docker run -v /models/llama:/home ghcr.io/ggerganov/llama.cpp:full --convert "/home/7B" 1

What should I do ? Thanks in advance

@JerryYao80 JerryYao80 changed the title Can't find model in directory /models/7B can't start new thread May 28, 2023
@KerfuffleV2
Copy link
Collaborator

It seems like that script tries to use 8 threads to write the vocabulary. If you have some sort of resource limits set for your user or with Docker that could be limiting the number of threads, you could try adjusting that (no suggestions, I don't use Docker or Windows).

If you can figure out how to edit the convert.py script in the container:

ndarrays = bounded_parallel_map(do_item, model.items(), concurrency=8)

You could try setting the concurrency to a lower value there. Unfortunately, it doesn't appear configurable without actually editing the script. This may or may not help you: https://stackoverflow.com/questions/47490307/editing-files-inside-of-a-docker-container

Looks like the scripts and such will be under the /app directory in the container.

@JerryYao80
Copy link
Author

@KerfuffleV2 I changed the concurrency to 1, committed a new image and run again

ndarrays = bounded_parallel_map(do_item, model.items(), concurrency=1)

but I got the same problem:

Loading model file /home/pytorch_model-00001 -of-00033.bin
......
Loading model file /home/pytorch_model-00033 -of-00033.bin
Loading vocab file /home/tokenizer.model
Writing vocab...
......
RuntimeError: can't start new thread

Does anyone run the llama.cpp correct in Windows 7?

@KerfuffleV2
Copy link
Collaborator

Hmm, seems like starting threads just doesn't work with your setup. I don't know if it's an issue with the Docker container or something else. Sorry, I also don't know about the other question. I would guess there are very few people still on Windows 7 at this point so there is a decent chance that particular setup isn't very well tested.

One thing you can possibly try is changing that parallel map to just a normal map. I think you could do:

ndarrays = map(do_item, model.items()) 

Making sure to preserve the existing indentation when you change it.

@JerryYao80
Copy link
Author

JerryYao80 commented May 29, 2023

@KerfuffleV2 Thanks that works, I change bounded_parallel_map into map, and I get ggml-model-f16.bin
But a new error occurs:

ERROR: /app/.devops/tools.sh: line 40: 6 Illegal instruction ./quantize $arg2

when I execute:

--quantize "/models/7B/ggml-model-f16.bin" "/models/7B/ggml-model-q4_0.bin" 2

and I checked my quantize file:

quantize: ELF 64-bit LSB shared object, x86-64, version 1 <GNU/Linux> ...... for GNU/Linux 3.2.0, not stripped

But the architecture of docker image named ghcr.io/ggerganov/llama.cpp:full is amd64:

docker image ghcr.io/ggerganov/llama.cpp:full|grep Architecture
"Architecture": "amd64"

and my environment is:

Docker Toolbox 1.13.1
docker client: 1.13.1 os/arch: windows 7 /amd64
docker server:19.03.12 os/arch:ubuntu 22.04 /amd64
CPU type: Intel Core i7 6700 , supported command set: MMX, SSE, SSE2, ......, AVX, AVX2, FMA3, TSX

SO I'M CONFUSED:
1 Must I re-compile quantize?
2 How to re-compile if I use docker image?
3 Is there anywhere that I can download directly?

@KerfuffleV2
Copy link
Collaborator

Unfortunately, I think we're reaching the point where I can't really help you anymore. I haven't actually quantized my own models or used the container, and I also don't use Windows. So I'm just going by random stuff I've seen.

There are two possible explanations I can think of here:

  1. Some of the tools aren't really user friendly and will actually call abort() on conditions like missing files. This causes the app to crash and can show something like "Illegal instruction". Double check that the files exist, are called what you expect, etc. Also check for other warnings/errors above the "Illegal instruction" part which may have more information about what actually happened.
  2. Issue Cmake file always assumes AVX2 support #1583 seems like it implies the binaries in stuff like the container would be compiled with AVX2 support (or I guess you can read that as "an AVX2 requirement"). If you're on Windows 7, it seems possible that you could be on old hardware that doesn't actually support AVX2 so apps compiled that require it will probably just die with "Illegal instruction".

In the case of #2, I'm not sure there's a lot you can do other than find the model already converted/quantized (it's usually not that hard, I can't tell you specifically since directly linking to those models isn't allowed) and simply download it or try to compile the project locally yourself. I don't know if WSL even works with Windows 7, so I can't give you advice on that part.

@JerryYao80
Copy link
Author

@KerfuffleV2 thanks for your help, the CPU I used supports AVX2, and docker architecture is amd64, but type of quntize file is x86_64, and I have tested , not only the quantization phase, but in the run phase (download ggml_model_q4_0.bin from huggingface) I get the same question, that means that the distance between success if only this question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants