added query-key norm to accomodate OLMo2 #1894

ysjprojects · 2024-12-28T22:14:50Z

query_states = self.q_norm(self.q_proj(hidden_states))
key_states = self.k_norm(self.k_proj(hidden_states))
value_states = self.v_proj(hidden_states)

OLMo2 applies RMSNorm to the q and k matrices in its attention layer, something that is not yet supported by litgpt's architecture.

To support the addition of OLMo2, this PR adds an option to norm the q and k matrices via the config.norm_qk option which defaults to False.

Currently, the method for qk norm is assumed to follow the overall norm class.

litgpt/model.py

Andrei-Aksionov · 2025-01-02T17:53:02Z

Looks good now.
Thank you.

added query-key norm to accomodate OLMo2

0b4629a

ysjprojects requested review from rasbt and lantiga as code owners December 28, 2024 22:14

Andrei-Aksionov reviewed Dec 30, 2024

View reviewed changes

litgpt/model.py Outdated Show resolved Hide resolved

Andrei-Aksionov and others added 5 commits December 30, 2024 19:24

Add rerun on failures for test_readme/download[model,books]

5f9df57

Merge branch 'main' into norm_qk

d1ebefa

Merge branch 'main' into norm_qk

2a434ff

refactoring

395f99f

Merge branch 'main' into norm_qk

3bc659e

Andrei-Aksionov merged commit 40c08dc into Lightning-AI:main Jan 2, 2025
8 of 9 checks passed

Provide feedback