Implement the YaRN rop scaling feature #109

xiguiw · 2024-02-01T09:44:38Z

Add new API for YaRN rop-scale:

NE_API struct ne_tensor* ne_rope_custom_inplace(struct ne_context* ctx, struct ne_tensor* a, int n_past, int n_dims, int mode,
int prompt_size, float freq_base, float freq_scale,
int yarn_orig_ctx, float ext_factor, float attn_factor,
float beta_fast, float beta_slow);

// shift all tokens by a give p (n_shift)
// Optionally give a 1d tensor of precomputed interleaved cos/sin value of n_shiftscale^k for k \in [0, n_dims)
NE_API struct ne_tensor ne_rope_custom_shift_inplace(struct ne_context* ctx, struct ne_tensor* a, int n_shift, int n_dims,
int mode, int prompt_size, int n_keep, struct ne_tensor* cossin,
float freq_base, float freq_scale,
int yarn_orig_ctx, float ext_factor, float attn_factor,
float beta_fast, float beta_slow);

Change ne_layer internal API:
original:
struct ne_tensor* ne_rope_impl(struct ne_context* ctx, struct ne_tensor* a, int n_past, int n_dims, int mode,
int prompt_size, bool inplace, int n_keep, struct ne_tensor* cossin, int* n_padding,
bool padding_left, float freq_base, float freq_scale)
new API:
struct ne_tensor* ne_rope_impl(struct ne_context* ctx, struct ne_tensor* a, int n_past, int n_dims, int mode,
int prompt_size, bool inplace, int n_keep, struct ne_tensor* cossin, int* n_padding,
bool padding_left, float freq_base, float freq_scale,
int yarn_orig_ctx, float ext_factor, float attn_factor,
float beta_fast, float beta_slow)

model calling ne_rope_impl, it's behavior should not be changed.

airMeng · 2024-02-01T10:59:22Z

you can include related python scripts update in this pr

neural-speed/neural_speed/convert/convert_baichuan.py

Line 163 in 5a90dc4

    
           fout.write(struct.pack("f", 0.0)) # config.json "rope_scaling.factor", not enabled

intellinjun · 2024-02-02T02:57:52Z

https://inteltf-jenk.sh.intel.com/job/neural_speed_extension/49/
This ci test llama2 and gptneox, check if this PR affects the original model that uses rope.

airMeng · 2024-02-02T02:59:58Z

fix the format issues then merge

intellinjun · 2024-02-02T03:01:55Z

https://inteltf-jenk.sh.intel.com/job/neural_speed_extension/50/ This ci test llama2 and gptneox, check if this PR affects the original model that uses rope.

please wait result before merge.

Interpolate the rotary postion embedding Only inference is implemented, training is not implemetned.

DDEle · 2024-02-06T02:48:54Z

neural_speed/core/ne_layers.c

+  /* what the diffrence of setting parameters in b->data and in op_parameters */
+  /* float and int are in different data ?? */


No difference. The only difference is that b->data can set size case by case while op params have a shard max size as NE_MAX_OP_PARAMS. In addition, ->data was used as a workaround before op_param came.

Remove / modify this comment if no further questions on this.

This reverts commit 08be9a6.

Resubmit "Implement the YaRN rop scaling feature (intel#109)" This reverts commit 2e94db2.

…"" (#122)" This reverts commit e673e27.

airMeng requested review from DDEle and intellinjun February 1, 2024 10:59

intellinjun approved these changes Feb 2, 2024

View reviewed changes

airMeng approved these changes Feb 2, 2024

View reviewed changes

DDEle approved these changes Feb 2, 2024

View reviewed changes

Implement the YaRN rop scaling feature

86b394a

Interpolate the rotary postion embedding Only inference is implemented, training is not implemetned.

xiguiw force-pushed the yarn-op branch from 9945a7a to 86b394a Compare February 2, 2024 05:57

airMeng added the ready to merge label Feb 6, 2024

DDEle reviewed Feb 6, 2024

View reviewed changes

VincyZhang merged commit 08be9a6 into intel:main Feb 6, 2024
10 checks passed

VincyZhang added a commit that referenced this pull request Feb 6, 2024

Revert "Implement the YaRN rop scaling feature (#109)"

2e94db2

This reverts commit 08be9a6.

xiguiw pushed a commit to xiguiw/neural-speed that referenced this pull request Feb 7, 2024

Revert "Revert "Implement the YaRN rop scaling feature (intel#109)""

a60669e

Resubmit "Implement the YaRN rop scaling feature (intel#109)" This reverts commit 2e94db2.

xiguiw mentioned this pull request Feb 7, 2024

Revert "Revert "Implement the YaRN rop scaling feature (#109)"" #122

Merged

VincyZhang pushed a commit that referenced this pull request Mar 1, 2024

Revert "Revert "Implement the YaRN rop scaling feature (#109)"" (#122)

e673e27

VincyZhang added a commit that referenced this pull request Mar 1, 2024

Revert "Revert "Revert "Implement the YaRN rop scaling feature (#109)…

48c1913

…"" (#122)" This reverts commit e673e27.

xiguiw mentioned this pull request Mar 1, 2024

resubmit "Implement the YaRN rop scaling feature" #147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the YaRN rop scaling feature #109

Implement the YaRN rop scaling feature #109

xiguiw commented Feb 1, 2024

airMeng commented Feb 1, 2024

intellinjun commented Feb 2, 2024

airMeng commented Feb 2, 2024

intellinjun commented Feb 2, 2024 •

edited

Loading

DDEle Feb 6, 2024

		/* what the diffrence of setting parameters in b->data and in op_parameters */
		/* float and int are in different data ?? */

Implement the YaRN rop scaling feature #109

Implement the YaRN rop scaling feature #109

Conversation

xiguiw commented Feb 1, 2024

airMeng commented Feb 1, 2024

intellinjun commented Feb 2, 2024

airMeng commented Feb 2, 2024

intellinjun commented Feb 2, 2024 • edited Loading

DDEle Feb 6, 2024

Choose a reason for hiding this comment

intellinjun commented Feb 2, 2024 •

edited

Loading