Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC][CPU] DT Enablement #15313

Closed
dcaballe opened this issue Oct 26, 2023 · 4 comments
Closed

[EPIC][CPU] DT Enablement #15313

dcaballe opened this issue Oct 26, 2023 · 4 comments
Assignees

Comments

@dcaballe
Copy link
Contributor

dcaballe commented Oct 26, 2023

Creating this issue to track all the DT issues related to compilation, runtime or performance issues that would need a fix before the default enablement. We shouldn't include here performance improvements that would be nice to have but currently do not necessarily lead to a performance regression over the existing non-DT approach.

The issues are sorted by priority:

### Tasks
- [ ] #15061
- [ ] #15692
- [ ] #15242
- [ ] #15349 
- [ ] #15391
- [ ] #15392
- [ ] #15249
@dcaballe
Copy link
Contributor Author

dcaballe commented Oct 26, 2023

I'm currently investigating a ~2x slowdown on an internal model when DT is enabled. I'll update the list once I know more!

@hanhanW hanhanW self-assigned this Oct 26, 2023
@dcaballe dcaballe changed the title [CPU] DT Enablement [EPIC][CPU] DT Enablement Oct 30, 2023
@dcaballe
Copy link
Contributor Author

I did another round of benchmarking with ToT and reverting the problematic ExpandVector commit. The performance regression of DT against the original CG base went from 10x to 4.9x slower. Not bad :). I'll look at the profiles again but fixing #15242 should be mostly what is remaining! Great job everyone and fantastic coordination and collaboration to address all these issues! Thank you all!

@dcaballe
Copy link
Contributor Author

Good news! With @NatashaKnk's fix for the vecmat/matvec DT issue + @hanhanW's fix for the i1 issue, performance goes from 5x slower to ~17% faster for the i8 version of the model :) (the f32 version seems to be off still but probably some bug with an easy fix). I also see 2-3x improvements vs previous DT numbers for LLaMA and Falcon! Awesome work!

Thanks for bearing with me and sorry for the pressure to fix all of this before the default enablement. Green light from me to do so :)

@stellaraccident @jpienaar

@stellaraccident
Copy link
Collaborator

This is great! I appreciate the high bar and I'm glad we were able to clear all of the hurdles. Thanks for all of the work on this. Really great to see the across the board improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants