Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Launch Blocker] The default --quantize config/data/desktop.json is slow with eager, compile and aoti #661

Closed
mergennachin opened this issue May 3, 2024 · 3 comments · Fixed by #664

Comments

@mergennachin
Copy link
Contributor

mergennachin commented May 3, 2024

One of the action items of this issue #621 was to have a default desktop.json quantization config. Since that was added, I tried running using the said config. It was slow.

  1. Eager:
python3 torchchat.py generate llama3 --quantize config/data/desktop.json --prompt "Hello, my name is"

Average tokens/sec: 0.67

  1. Eager+Compile:
python3 torchchat.py generate llama3 --quantize config/data/desktop.json --prompt "Hello, my name is" --compile

Average tokens/sec: 0.60

  1. AOTI:
python3 torchchat.py generate llama3 --quantize config/data/desktop.json --dso-path llama3.so --prompt "Hello my name is"

Average tokens/sec: (pending - It is currently taking a long time for the first byte)

The default desktop.json setting should be performant. An action item is to either change the desktop.json to have the right config that is fast or make the execution to be performant.

Setup:

git commit: 4a83474
python version: 3.10.0
macbook pro M1

Internal Task: T187941181

@mergennachin mergennachin changed the title [Launch Blocker] The default --quantize config/data/desktop.json is slow with eager and compile [Launch Blocker] The default --quantize config/data/desktop.json is slow with eager, compile and aoti May 3, 2024
@mikekgfb
Copy link
Contributor

mikekgfb commented May 4, 2024

OK, so we can fix this for now, by removing quantization because that won't be accelerated until we move the pytorch pin.
Temporary fix #664

@mikekgfb mikekgfb closed this as completed May 4, 2024
@mikekgfb mikekgfb reopened this May 4, 2024
@mikekgfb
Copy link
Contributor

mikekgfb commented May 4, 2024

Please confirm fix and close if resolved. @mergennachin

@mikekgfb mikekgfb linked a pull request May 4, 2024 that will close this issue
@mergennachin
Copy link
Contributor Author

Please confirm fix and close if resolved.

Let's keep it open until we move the pytorch pin to include MPS kernels. I will run the commands again on the new pin. @mikekgfb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants