-
Notifications
You must be signed in to change notification settings - Fork 650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPIC][CPU] DT Enablement #15313
Comments
I'm currently investigating a ~2x slowdown on an internal model when DT is enabled. I'll update the list once I know more! |
I did another round of benchmarking with ToT and reverting the problematic |
Good news! With @NatashaKnk's fix for the vecmat/matvec DT issue + @hanhanW's fix for the i1 issue, performance goes from 5x slower to ~17% faster for the i8 version of the model :) (the f32 version seems to be off still but probably some bug with an easy fix). I also see 2-3x improvements vs previous DT numbers for LLaMA and Falcon! Awesome work! Thanks for bearing with me and sorry for the pressure to fix all of this before the default enablement. Green light from me to do so :) |
This is great! I appreciate the high bar and I'm glad we were able to clear all of the hurdles. Thanks for all of the work on this. Really great to see the across the board improvements. |
Creating this issue to track all the DT issues related to compilation, runtime or performance issues that would need a fix before the default enablement. We shouldn't include here performance improvements that would be nice to have but currently do not necessarily lead to a performance regression over the existing non-DT approach.
The issues are sorted by priority:
The text was updated successfully, but these errors were encountered: