-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Rotatory Positional Encoding #215
Conversation
Ideally the error should be on the e-5 scale, the issue is that the tensors aren't identical, some values are different. |
Your code looks very clean! LGTM |
the code looks very good. will merge this after fixing the error in 1e-5 scale |
Hmm with JIT it should be scheduled to a single kernel? (don't worry about that I will take this!)
|
#245 will it fix something for JIT=0? |
Co-Authored-By: hikettei <[email protected]>
Co-Authored-By: hikettei <[email protected]>
Co-Authored-By: hikettei <[email protected]>
Co-Authored-By: hikettei <[email protected]>
Co-Authored-By: hikettei <[email protected]>
Been doing some tests in torch and mlx, the issue is that mlx and pytorch yield different results, and my implementation is basically inspired by mlx, while the test is using the torch implementation This is the code i've been using:
and rtol atol:
I will reimplement the call function using the torch implementation in order to lower the rtol and atol to the required values. |
added: assertion for tensor number of dimensions = 4, make-list with initial true, multiple-value-bind instead of manual initialization, assertion instead of when Co-Authored-By: hikettei <[email protected]>
Co-Authored-By: hikettei <[email protected]>
Co-authored-by: hikettei <[email protected]>
this pr is for the implementation of RoPE
#195