Persimmon-8B Support? #3071

loretoparisi · 2023-09-07T21:52:03Z

loretoparisi
Sep 7, 2023

What are the steps necessary to support Persimmon-8B into Llama.cpp.

Adept AI released a 8B LLM Persimmon-8B with the base model having a Apache 2.0 License (Chat version license is more restrictive CC-BY-NC 4.0). Persimmon-8B has a interesting 16k Context Window, a customized (improved) Flash Attention, 262k tokens vocabulary from a unigram SentencePiece model. According to their benchmarks Persimmon-8B

Model Card

Attribute	Value
Hidden Size	4096
Heads	64
Layers	36
Batch Size	120
Sequence Length	16384
Training Iterations	375000
Tokens Seen	737 Billion

The inference script as well as base and chat weights are available here
https://github.com/persimmon-ai-labs/adept-inference

KerfuffleV2 · 2023-09-07T21:56:41Z

KerfuffleV2
Sep 7, 2023
Collaborator

I think most of the time models will start out with a simple example in the ggml repo before they're included in llama.cpp: https://github.com/ggerganov/ggml

Seems like they use a custom flash attention, so that operation would need to get added to ggml. Not sure if any of their other stuff would require new ops.

Also, wow. 262,000 vocab size? That's way bigger than any other model I've ever heard of.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persimmon-8B Support? #3071

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Persimmon-8B Support? #3071

loretoparisi Sep 7, 2023

Replies: 1 comment

KerfuffleV2 Sep 7, 2023 Collaborator

loretoparisi
Sep 7, 2023

KerfuffleV2
Sep 7, 2023
Collaborator