Add Cohere's Command-R #1089

carmocca · 2024-03-12T02:31:22Z

https://txt.cohere.com/command-r/
https://huggingface.co/CohereForAI/c4ai-command-r-v01

I don't think the architecture needs any changes to support this

Andrei-Aksionov · 2024-03-12T12:33:38Z

I don't think the architecture needs any changes to support this

I thought the same about Gemma 😄.

This model requires custom modeling and tokenizer classes.
So, it might not be that straightforward to implement.

carmocca · 2024-03-12T15:23:07Z

Do you see any specific differences in the modeling?

Andrei-Aksionov · 2024-03-12T16:05:57Z

I posted it without even looking at the code.
I mean, why would anyone provide a custom code if it's identical to the one that is already in transformers?

So, after a really quick "scanning" of the modeling code I found a couple of interesting details.

They have logits_scalethat is applied on the lm_head output:
https://huggingface.co/CohereForAI/c4ai-command-r-v01/blob/2a6d259c29bd319c3bdb8dd88b8d59b8c303c318/modeling_cohere.py#L1164
Forward method in CohereDecoderLayer does things a bit differently: https://huggingface.co/CohereForAI/c4ai-command-r-v01/blob/2a6d259c29bd319c3bdb8dd88b8d59b8c303c318/modeling_cohere.py#L689-L709
It's definitely a parallel_residual + shared_attention_norm, but nothing of this is mentioned in the config file. Since it's a just a matter of a proper config setting, shouldn't give us any problems.

Maybe something more.

choyakawa · 2024-03-13T06:29:37Z

I think you've missed the rotate_half part, while the tokenizer is the same as llama

def rotate_half(x):
    # Split and rotate
    x1 = x[..., ::2]
    x2 = x[..., 1::2]
    rot_x = torch.stack([-x2, x1], dim=-1).flatten(-2)
    return rot_x

ggerganov/llama.cpp#6033 (comment)

carmocca added the model-weights label Mar 12, 2024

acanis mentioned this issue Mar 13, 2024

Add Command-R Model ggerganov/llama.cpp#6033

Merged

choyakawa mentioned this issue Mar 14, 2024

Support CohereForAI/c4ai-command-r-v01 InternLM/xtuner#480

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Cohere's Command-R #1089

Add Cohere's Command-R #1089

carmocca commented Mar 12, 2024 •

edited

Loading

Andrei-Aksionov commented Mar 12, 2024

carmocca commented Mar 12, 2024

Andrei-Aksionov commented Mar 12, 2024

choyakawa commented Mar 13, 2024 •

edited

Loading

Add Cohere's Command-R #1089

Add Cohere's Command-R #1089

Comments

carmocca commented Mar 12, 2024 • edited Loading

Andrei-Aksionov commented Mar 12, 2024

carmocca commented Mar 12, 2024

Andrei-Aksionov commented Mar 12, 2024

choyakawa commented Mar 13, 2024 • edited Loading

carmocca commented Mar 12, 2024 •

edited

Loading

choyakawa commented Mar 13, 2024 •

edited

Loading