Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

[BesTLA] AVX2: Use loaded registers of B. #151

Merged
merged 1 commit into from
Mar 5, 2024
Merged

Conversation

parvizmp
Copy link
Contributor

@parvizmp parvizmp commented Mar 4, 2024

Type of Change

Small performance patch for AVX2.

Description

It appears the B registers are loaded, but the FMA uses the same pointers for B, rather than the registers that have been loaded?

Expected Behavior & Potential Risk

On SPR, with AVX512/AMX disabled, I see a performance improvement for prefill and decode:

NEURAL_SPEED_VERBOSE=0 taskset --cpu-list 1 build/bin/run_llama -m llama2.ne.weight-int8.group-128.compute-int8.avx2.bin --seed 1 -t 1 -n 8 -c 1024 -b 1024 --memory-auto -p 'Ha ha, well, a woodchuck would certainly be able to chuck some wood! But if you’re looking for a more straightforward answer, it depends on the size of the woodchuck and the type of wood. A small woodchuck might only be able to move a few sticks of firewood at a time, while a larger one might be able to move a whole log or two. Is'
...
<before PR>
========== eval time log of each prediction ==========
prediction   0, time: 5483.49ms
prediction   1, time: 1171.33ms
prediction   2, time: 1170.22ms
prediction   3, time: 1169.63ms
prediction   4, time: 1170.13ms
prediction   5, time: 1169.81ms
prediction   6, time: 1170.32ms
prediction   7, time: 1170.42ms
<after PR>
========== eval time log of each prediction ==========
prediction   0, time: 4706.76ms
prediction   1, time: 1127.17ms
prediction   2, time: 1126.01ms
prediction   3, time: 1125.48ms
prediction   4, time: 1126.33ms
prediction   5, time: 1125.82ms
prediction   6, time: 1126.05ms
prediction   7, time: 1128.50ms

How has this PR been tested?

See above.

Dependency Change?

None.

@kevinintel kevinintel requested a review from airMeng March 4, 2024 23:44
@DDEle DDEle requested a review from luoyu-intel March 5, 2024 04:23
Copy link
Contributor

@DDEle DDEle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Looks good to me at first glance.

Copy link
Contributor

@airMeng airMeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch

@airMeng airMeng merged commit aa4a8ab into intel:main Mar 5, 2024
11 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants