Replies: 1 comment
-
Loving llm so far as well! I think this is what I'm seeing though too? Following the steps from this blog post: https://simonwillison.net/2023/Aug/1/llama-2-mac/ Input this: Got this (notice the truncation at the end):
Expected something more complete as per the blog post like this:
This may also just be due to my misunderstanding how these models work. Very new to llms. Installed via pip |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there,
I love llm so far. Thanks for building this!
One request is to add the ability to give the "ctx_size" option to llama-cpp when running models from hf (for example). Here is the output from
llm models list
in my environment:OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
OpenAI Chat: gpt-4-32k (aliases: 4-32k)
gpt4all: ggml-all-MiniLM-L6-v2-f16 - Bert, 43.41MB download, needs 1GB RAM
gpt4all: orca-mini-3b - Mini Orca (Small), 1.80GB download, needs 4GB RAM
gpt4all: ggml-gpt4all-j-v1 - Groovy, 3.53GB download, needs 8GB RAM
gpt4all: llama-2-7b-chat - Llama-2-7B Chat, 3.53GB download, needs 8GB RAM
...
gpt4all: wizardLM-13B-Uncensored - Wizard Uncensored, 7.58GB download, needs 16GB RAM
LlamaModel: llama-2-7b-chat.ggmlv3.q8_0 (aliases: llama2-chat, l2c)
LlamaModel: Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0 (aliases: wizard-vicuna-7b, wizard)
For the last two, the default batch size is 512 tokens. I know the context size is 4096 tokens, but what seems to happen is that there is a "break" after 512 tokens while the model finalizes its answer. See ggerganov/llama.cpp#1403 for a discussion.
The problem is, llm appears to take this "break" as the end of the response. Thus, I cannot get more than 512 tokens per response for the LlamaModels above.
Any suggestions would be great!
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions