-
Notifications
You must be signed in to change notification settings - Fork 38
Implement the YaRN rop scaling feature #109
Conversation
you can include related python scripts update in this pr
|
https://inteltf-jenk.sh.intel.com/job/neural_speed_extension/49/ |
fix the format issues then merge |
please wait result before merge. |
Interpolate the rotary postion embedding Only inference is implemented, training is not implemetned.
/* what the diffrence of setting parameters in b->data and in op_parameters */ | ||
/* float and int are in different data ?? */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No difference. The only difference is that b->data
can set size case by case while op params have a shard max size as NE_MAX_OP_PARAMS
. In addition, ->data
was used as a workaround before op_param came.
Remove / modify this comment if no further questions on this.
This reverts commit 08be9a6.
Add new API for YaRN rop-scale:
NE_API struct ne_tensor* ne_rope_custom_inplace(struct ne_context* ctx, struct ne_tensor* a, int n_past, int n_dims, int mode,
int prompt_size, float freq_base, float freq_scale,
int yarn_orig_ctx, float ext_factor, float attn_factor,
float beta_fast, float beta_slow);
// shift all tokens by a give p (n_shift)
// Optionally give a 1d tensor of precomputed interleaved cos/sin value of n_shiftscale^k for k \in [0, n_dims)
NE_API struct ne_tensor ne_rope_custom_shift_inplace(struct ne_context* ctx, struct ne_tensor* a, int n_shift, int n_dims,
int mode, int prompt_size, int n_keep, struct ne_tensor* cossin,
float freq_base, float freq_scale,
int yarn_orig_ctx, float ext_factor, float attn_factor,
float beta_fast, float beta_slow);
Change ne_layer internal API:
original:
struct ne_tensor* ne_rope_impl(struct ne_context* ctx, struct ne_tensor* a, int n_past, int n_dims, int mode,
int prompt_size, bool inplace, int n_keep, struct ne_tensor* cossin, int* n_padding,
bool padding_left, float freq_base, float freq_scale)
new API:
struct ne_tensor* ne_rope_impl(struct ne_context* ctx, struct ne_tensor* a, int n_past, int n_dims, int mode,
int prompt_size, bool inplace, int n_keep, struct ne_tensor* cossin, int* n_padding,
bool padding_left, float freq_base, float freq_scale,
int yarn_orig_ctx, float ext_factor, float attn_factor,
float beta_fast, float beta_slow)
model calling ne_rope_impl, it's behavior should not be changed.