Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

Implement the YaRN rop scaling feature #109

Merged
merged 1 commit into from
Feb 6, 2024
Merged

Conversation

xiguiw
Copy link
Contributor

@xiguiw xiguiw commented Feb 1, 2024

Add new API for YaRN rop-scale:

NE_API struct ne_tensor* ne_rope_custom_inplace(struct ne_context* ctx, struct ne_tensor* a, int n_past, int n_dims, int mode,
int prompt_size, float freq_base, float freq_scale,
int yarn_orig_ctx, float ext_factor, float attn_factor,
float beta_fast, float beta_slow);

// shift all tokens by a give p (n_shift)
// Optionally give a 1d tensor of precomputed interleaved cos/sin value of n_shiftscale^k for k \in [0, n_dims)
NE_API struct ne_tensor
ne_rope_custom_shift_inplace(struct ne_context* ctx, struct ne_tensor* a, int n_shift, int n_dims,
int mode, int prompt_size, int n_keep, struct ne_tensor* cossin,
float freq_base, float freq_scale,
int yarn_orig_ctx, float ext_factor, float attn_factor,
float beta_fast, float beta_slow);

Change ne_layer internal API:
original:
struct ne_tensor* ne_rope_impl(struct ne_context* ctx, struct ne_tensor* a, int n_past, int n_dims, int mode,
int prompt_size, bool inplace, int n_keep, struct ne_tensor* cossin, int* n_padding,
bool padding_left, float freq_base, float freq_scale)
new API:
struct ne_tensor* ne_rope_impl(struct ne_context* ctx, struct ne_tensor* a, int n_past, int n_dims, int mode,
int prompt_size, bool inplace, int n_keep, struct ne_tensor* cossin, int* n_padding,
bool padding_left, float freq_base, float freq_scale,
int yarn_orig_ctx, float ext_factor, float attn_factor,
float beta_fast, float beta_slow)

model calling ne_rope_impl, it's behavior should not be changed.

@airMeng
Copy link
Contributor

airMeng commented Feb 1, 2024

you can include related python scripts update in this pr

fout.write(struct.pack("f", 0.0)) # config.json "rope_scaling.factor", not enabled

@airMeng airMeng requested review from DDEle and intellinjun February 1, 2024 10:59
@intellinjun
Copy link
Contributor

https://inteltf-jenk.sh.intel.com/job/neural_speed_extension/49/
This ci test llama2 and gptneox, check if this PR affects the original model that uses rope.

@airMeng
Copy link
Contributor

airMeng commented Feb 2, 2024

fix the format issues then merge

@intellinjun
Copy link
Contributor

intellinjun commented Feb 2, 2024

https://inteltf-jenk.sh.intel.com/job/neural_speed_extension/50/ This ci test llama2 and gptneox, check if this PR affects the original model that uses rope.

please wait result before merge.

Interpolate the rotary postion embedding
Only inference is implemented, training is not implemetned.
Comment on lines +3040 to +3041
/* what the diffrence of setting parameters in b->data and in op_parameters */
/* float and int are in different data ?? */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No difference. The only difference is that b->data can set size case by case while op params have a shard max size as NE_MAX_OP_PARAMS. In addition, ->data was used as a workaround before op_param came.

Remove / modify this comment if no further questions on this.

@VincyZhang VincyZhang merged commit 08be9a6 into intel:main Feb 6, 2024
10 checks passed
VincyZhang added a commit that referenced this pull request Feb 6, 2024
xiguiw pushed a commit to xiguiw/neural-speed that referenced this pull request Feb 7, 2024
Resubmit "Implement the YaRN rop scaling feature (intel#109)"

This reverts commit 2e94db2.
VincyZhang added a commit that referenced this pull request Mar 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants