Persimmon-8B Support? #3071
Unanswered
loretoparisi
asked this question in
Q&A
Replies: 1 comment
-
I think most of the time models will start out with a simple example in the Seems like they use a custom flash attention, so that operation would need to get added to ggml. Not sure if any of their other stuff would require new ops. Also, wow. 262,000 vocab size? That's way bigger than any other model I've ever heard of. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What are the steps necessary to support Persimmon-8B into Llama.cpp.
Adept AI released a 8B LLM Persimmon-8B with the base model having a
Apache 2.0
License (Chat version license is more restrictiveCC-BY-NC 4.0
). Persimmon-8B has a interesting 16k Context Window, a customized (improved) Flash Attention, 262k tokens vocabulary from a unigram SentencePiece model. According to their benchmarks Persimmon-8BModel Card
The inference script as well as base and chat weights are available here
https://github.com/persimmon-ai-labs/adept-inference
Beta Was this translation helpful? Give feedback.
All reactions