[Launch Blocker] The default `--quantize config/data/desktop.json` is slow with eager, compile and aoti #661

mergennachin · 2024-05-03T22:20:31Z

One of the action items of this issue #621 was to have a default desktop.json quantization config. Since that was added, I tried running using the said config. It was slow.

Eager:

python3 torchchat.py generate llama3 --quantize config/data/desktop.json --prompt "Hello, my name is"

Average tokens/sec: 0.67

Eager+Compile:

python3 torchchat.py generate llama3 --quantize config/data/desktop.json --prompt "Hello, my name is" --compile

Average tokens/sec: 0.60

AOTI:

python3 torchchat.py generate llama3 --quantize config/data/desktop.json --dso-path llama3.so --prompt "Hello my name is"

Average tokens/sec: (pending - It is currently taking a long time for the first byte)

The default desktop.json setting should be performant. An action item is to either change the desktop.json to have the right config that is fast or make the execution to be performant.

Setup:

git commit: 4a83474
python version: 3.10.0
macbook pro M1

Internal Task: T187941181

The text was updated successfully, but these errors were encountered:

mikekgfb · 2024-05-04T04:33:38Z

OK, so we can fix this for now, by removing quantization because that won't be accelerated until we move the pytorch pin.
Temporary fix #664

mikekgfb · 2024-05-04T04:36:12Z

Please confirm fix and close if resolved. @mergennachin

mergennachin · 2024-05-06T15:19:55Z

Please confirm fix and close if resolved.

Let's keep it open until we move the pytorch pin to include MPS kernels. I will run the commands again on the new pin. @mikekgfb

mergennachin added performance launch blocker labels May 3, 2024

mergennachin changed the title ~~[Launch Blocker] The default --quantize config/data/desktop.json is slow with eager and compile~~ [Launch Blocker] The default --quantize config/data/desktop.json is slow with eager, compile and aoti May 3, 2024

mikekgfb closed this as completed May 4, 2024

mikekgfb reopened this May 4, 2024

mikekgfb linked a pull request May 4, 2024 that will close this issue

Update desktop.json #664

Merged

mikekgfb closed this as completed in #664 May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Launch Blocker] The default `--quantize config/data/desktop.json` is slow with eager, compile and aoti #661

[Launch Blocker] The default `--quantize config/data/desktop.json` is slow with eager, compile and aoti #661

mergennachin commented May 3, 2024 •

edited

Loading

mikekgfb commented May 4, 2024 •

edited

Loading

mikekgfb commented May 4, 2024

mergennachin commented May 6, 2024

[Launch Blocker] The default --quantize config/data/desktop.json is slow with eager, compile and aoti #661

[Launch Blocker] The default --quantize config/data/desktop.json is slow with eager, compile and aoti #661

Comments

mergennachin commented May 3, 2024 • edited Loading

mikekgfb commented May 4, 2024 • edited Loading

mikekgfb commented May 4, 2024

mergennachin commented May 6, 2024

[Launch Blocker] The default `--quantize config/data/desktop.json` is slow with eager, compile and aoti #661

[Launch Blocker] The default `--quantize config/data/desktop.json` is slow with eager, compile and aoti #661

mergennachin commented May 3, 2024 •

edited

Loading

mikekgfb commented May 4, 2024 •

edited

Loading