We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I create a streamlit app to chat with Lamma 7B chat model.
When I open two tabs to chat and ask query it crashes becasue of the error OSError: exception: access violation writing 0x0000000000000380.
streaming was False llm = LlamaCpp( model_path="./models/llama-2-7b-chat.ggmlv3.q4_0.bin", n_ctx=6000, n_gpu_layers=512, n_batch=512, temperature = 0.9, max_tokens = 64, n_parts=1, streaming=False )
llm_chain = LLMChain(prompt=prompt, llm=llm) llm_chain.run(st.session_state["chat_query"])
The text was updated successfully, but these errors were encountered:
open two tabs
Edit: In progress: #2813 (comment)
Sorry, something went wrong.
streamlit app Not supported. README shows Supported platforms.
streamlit app
Not supported. README shows Supported platforms.
how to load model once in memory and use concurrently
See #2813
This issue was closed because it has been inactive for 14 days since being marked as stale.
No branches or pull requests
I create a streamlit app to chat with Lamma 7B chat model.
When I open two tabs to chat and ask query it crashes becasue of the error OSError: exception: access violation writing 0x0000000000000380.
streaming was False
llm = LlamaCpp(
model_path="./models/llama-2-7b-chat.ggmlv3.q4_0.bin",
n_ctx=6000,
n_gpu_layers=512,
n_batch=512,
temperature = 0.9,
max_tokens = 64,
n_parts=1,
streaming=False
)
llm_chain = LLMChain(prompt=prompt, llm=llm)
llm_chain.run(st.session_state["chat_query"])
The text was updated successfully, but these errors were encountered: