Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RoPE embeddings #30

Open
PRamoneda opened this issue Aug 13, 2024 · 1 comment
Open

RoPE embeddings #30

PRamoneda opened this issue Aug 13, 2024 · 1 comment

Comments

@PRamoneda
Copy link

My conclusions about changing the positional encoding are that NOPE and ALiBi do not work well for only-encoders because, compared to only-decoders, they do not understand position at all (they are permutation equivariant). However, RoPE (Rotary Position Embedding) seems promising because, although it cannot extrapolate directly, it can be trained for longer sequences with only 1000 training steps. Even if it doesn't work perfectly, it allows for relative positional encoding (we can see it as a imrpovement of sinusoidal positional encoding), which I believe makes a lot of sense in music. This is likely why the authors of Transformer++ used it. Additionally, RoPE seems to accelerate convergence and improve model stability, which is why even famous only decoder LLMS (LLAMA) use it, despite ALiBi's ability to extrapolate it is very unstable during training.

we can borrow the code from here https://github.com/lucidrains/rotary-embedding-torch/blob/main/rotary_embedding_torch/rotary_embedding_torch.py

@VarunGumma
Copy link

Here is a relevant paper we had written recently on the same topic: https://arxiv.org/abs/2408.11382

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants