Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Cohere's Command-R #1089

Open
carmocca opened this issue Mar 12, 2024 · 4 comments
Open

Add Cohere's Command-R #1089

carmocca opened this issue Mar 12, 2024 · 4 comments

Comments

@carmocca
Copy link
Contributor

carmocca commented Mar 12, 2024

https://txt.cohere.com/command-r/
https://huggingface.co/CohereForAI/c4ai-command-r-v01

I don't think the architecture needs any changes to support this

@Andrei-Aksionov
Copy link
Collaborator

I don't think the architecture needs any changes to support this

I thought the same about Gemma 😄.

This model requires custom modeling and tokenizer classes.
So, it might not be that straightforward to implement.

@carmocca
Copy link
Contributor Author

Do you see any specific differences in the modeling?

@Andrei-Aksionov
Copy link
Collaborator

I posted it without even looking at the code.
I mean, why would anyone provide a custom code if it's identical to the one that is already in transformers?

So, after a really quick "scanning" of the modeling code I found a couple of interesting details.

  1. They have logits_scalethat is applied on the lm_head output:
    https://huggingface.co/CohereForAI/c4ai-command-r-v01/blob/2a6d259c29bd319c3bdb8dd88b8d59b8c303c318/modeling_cohere.py#L1164

  2. Forward method in CohereDecoderLayer does things a bit differently: https://huggingface.co/CohereForAI/c4ai-command-r-v01/blob/2a6d259c29bd319c3bdb8dd88b8d59b8c303c318/modeling_cohere.py#L689-L709
    It's definitely a parallel_residual + shared_attention_norm, but nothing of this is mentioned in the config file. Since it's a just a matter of a proper config setting, shouldn't give us any problems.

Maybe something more.

@choyakawa
Copy link

choyakawa commented Mar 13, 2024

I think you've missed the rotate_half part, while the tokenizer is the same as llama

def rotate_half(x):
    # Split and rotate
    x1 = x[..., ::2]
    x2 = x[..., 1::2]
    rot_x = torch.stack([-x2, x1], dim=-1).flatten(-2)
    return rot_x

ggerganov/llama.cpp#6033 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants