Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perceptron predictor #374

Merged
merged 20 commits into from
Feb 14, 2024
Merged

Perceptron predictor #374

merged 20 commits into from
Feb 14, 2024

Conversation

ABenC377
Copy link
Contributor

@ABenC377 ABenC377 commented Jan 19, 2024

Introducing a new PerceptronPredictor class as an alternative to the GenericPredictor currently used in all the config files. Shows an improvement in branch prediction rate for all the benchmarks except for STREAM, which was already being well predicted by the Generic predictor. There is more work done by this predictor per prediction so it is marginally slower per prediction than the GenericPredictor, but it compensates for this by its improved accuracy meaning that SimEng was faster with it for all but 15 of the benchmarks (72). These 15 benchmarks are generally already very quick and the regression I observed was at most 3.6% compared to improvements of up to 35% for the other benchmarks. Average improvements of -5% runtime (percentage) and -8.5% mispredictions (raw).

Benchmark DEV time DEV mispredict PERCEPTRON time PERCEPTRON mispredict Performance change (percentage) Mispredict change (raw)
CloverLeaf serial gcc8.3.0 armv8.4 251655ms 35.7% 161811ms 25% -35% -10.7%
CloverLeaf serial gcc9.3.0 armv8.4 171576ms 34.5% 158353ms 23% -7.7% -11.5%
CloverLeaf serial gcc10.3.0 armv8.4 173001ms 36.7% 163963ms 24.8% -5.2% -11.9%
CloverLeaf serial armclang20 armv8.4 142201ms 31.6% 134777ms 22.5% -5.2% -9.1%
CloverLeaf openmp gcc8.3.0 armv8.4 227772ms 34.6% 205901ms 24.1% -9.6% -10.5%
CloverLeaf openmp gcc9.3.0 armv8.4 225624ms 33.5% 199719ms 22.6% -11.5% -10.9%
CloverLeaf openmp gcc10.3.0 armv8.4 226401ms 34.5% 200722ms 23.5% -11.3% -11%
CloverLeaf openmp armclang20 armv8.4 193129ms 31.6% 170551ms 20.5% -11.7% -11.1%
miniBUDE openmp gcc8.3.0 armv8.4 201411ms 9.96% 203321ms 8.64% +0.9% -1.3%
miniBUDE openmp gcc9.3.0 armv8.4 201666ms 9.93% 202440ms 8.59% +0.4% -1.3%
miniBUDE openmp gcc10.3.0 armv8.4 201324ms 10% 202448ms 8.61% +0.6% -1.4%
miniBUDE openmp armclang20 armv8.4 183828ms 11.6% 185367ms 11.4% +0.8% -0.2%
STREAM serial gcc8.3.0 armv8.4 74849ms 0.619% 77580ms 0.601% +3.6% -0.1%
STREAM serial gcc9.3.0 armv8.4 76025ms 0.942% 78402ms 0.774% +3.1% -0.2%
STREAM serial gcc10.3.0 armv8.4 76237ms 0.654% 78152ms 0.838% +2.5% +0.2%
STREAM serial armclang20 armv8.4 84461ms 1.16% 87023ms 1.24% +3.0% +0.1%
STREAM openmp gcc8.3.0 armv8.4 129391ms 11.5% 113201ms 2.76% -12.5% -8.7%
STREAM openmp gcc9.3.0 armv8.4 127899ms 11% 115083ms 2.4% -10% -8.6%
STREAM openmp gcc10.3.0 armv8.4 126097ms 11.4% 112675ms 2.75% -10.6% -8.6%
STREAM openmp armclang20 armv8.4 131196ms 14.4% 123741ms 5.07% -5.7% -9.3%
TeaLeaf 2D serial gcc8.3.0 armv8.4 128641ms 24.9% 126449ms 20.7% -1.7% -4.2%
TeaLeaf 2D serial gcc9.3.0 armv8.4 127964ms 25% 126731ms 21.6% -1.0% -3.6%
TeaLeaf 2D serial gcc10.3.0 armv8.4 129052ms 24.9% 128624ms 20.8% -0.3% -4.1%
TeaLeaf 2D serial armclang20 armv8.4 233988ms 13.4% 236103ms 10.9% +0.9% -2.5%
TeaLeaf 2D openmp gcc8.3.0 armv8.4 217511ms 27.6% 182287ms 11.2% -16.2% -16.4%
TeaLeaf 2D openmp gcc9.3.0 armv8.4 214477ms 25.4% 183910ms 11.7% -14.3% -11.1%
TeaLeaf 2D openmp gcc10.3.0 armv8.4 216781ms 27.6% 183236ms 11.5% -15.5% -16.1%
TeaLeaf 2D openmp armclang20 armv8.4 598907ms 16.5% 585522ms 9.29% -2.2% -7.2%
TeaLeaf 3D serial gcc8.3.0 armv8.4 153938ms 17.3% 146189ms 11.2% -5.0% -6.1%
TeaLeaf 3D serial gcc9.3.0 armv8.4 158193ms 18.6% 152422ms 12.5% -3.6% -6.1%
TeaLeaf 3D serial gcc10.3.0 armv8.4 156685ms 17.9% 152368ms 12.9% -2.8% -5.0%
TeaLeaf 3D serial armclang20 armv8.4 220240ms 29.2% 216672ms 22.4% -1.6% -6.8%
TeaLeaf 3D openmp gcc8.3.0 armv8.4 297986ms 27.9% 240845ms 9.66% -19.2% -18.2%
TeaLeaf 3D openmp gcc9.3.0 armv8.4 307134ms 28.5% 248632ms 11.1% -19.0% -17.4%
TeaLeaf 3D openmp gcc10.3.0 armv8.4 300711ms 28% 242736ms 10.5% -19.3% -17.5%
TeaLeaf 3D openmp armclang20 armv8.4 489231ms 28.8% 449169ms 17.3% -8.2% -11.5%
CloverLeaf serial gcc8.3.0 armv8.4+sve 169930ms 35.2% 150743ms 24.3% -11.3% -10.9%
CloverLeaf serial gcc9.3.0 armv8.4+sve 166786ms 34.2% 149215ms 22.9% -10.5% -11.3%
CloverLeaf serial gcc10.3.0 armv8.4+sve 171339ms 36.5% 149683ms 25.2% -12.6% -11.3%
CloverLeaf serial armclang20 armv8.4+sve 159721ms 33% 137543ms 21.7% -13.9% -11.3%
CloverLeaf openmp gcc8.3.0 armv8.4+sve 225588ms 34.3% 198457ms 23.6% -12.0% -10.7%
CloverLeaf openmp gcc9.3.0 armv8.4+sve 221890ms 33.4% 194197ms 22.3% -12.5% -11.1%
CloverLeaf openmp gcc10.3.0 armv8.4+sve 223602ms 34.1% 195536ms 23.4% -12.6% -10.7%
CloverLeaf openmp armclang20 armv8.4+sve 206803ms 32.5% 179017ms 21.6% -13.4% -10.9%
miniBUDE openmp gcc8.3.0 armv8.4+sve 84149ms 23.6% 80624ms 16.5% -4.2% -7.1%
miniBUDE openmp gcc9.3.0 armv8.4+sve 81961ms 25.1% 77745ms 16.7% -5.1% -8.4%
miniBUDE openmp gcc10.3.0 armv8.4+sve 80809ms 24% 77152ms 16.5% -4.5% -7.5%
miniBUDE openmp armclang20 armv8.4+sve 80084ms 22.9% 80877ms 22.6% +1.0% -0.3%
STREAM serial gcc8.3.0 armv8.4+sve 40075ms 1.68% 40545ms 1.9% +1.2% +0.2%
STREAM serial gcc9.3.0 armv8.4+sve 40034ms 1.84% 41154ms 2.1% +2.8% +0.3%
STREAM serial gcc10.3.0 armv8.4+sve 39628ms 2.04% 40867ms 2.02% +3.1% -0.0%
STREAM serial armclang20 armv8.4+sve 24085ms 2.13% 24842ms 1.88% +3.1% -0.2%
STREAM openmp gcc8.3.0 armv8.4+sve 91458ms 19.5% 77394ms 5.01% -15.4% -14.4%
STREAM openmp gcc9.3.0 armv8.4+sve 91222ms 19.7% 78769ms 6.66% -13.7% -13%
STREAM openmp gcc10.3.0 armv8.4+sve 89951ms 19.5% 77005ms 5.41% -14.4% -15.1%
STREAM openmp armclang20 armv8.4+sve 76133ms 18.3% 65194ms 5.69% -14.4% -12.6%
TeaLeaf 2D serial gcc8.3.0 armv8.4+sve 130373ms 25% 128608ms 21.3% -1.4% -3.7%
TeaLeaf 2D serial gcc9.3.0 armv8.4+sve 129521ms 24.9% 127291ms 21.7% -1.7% -3.2%
TeaLeaf 2D serial gcc10.3.0 armv8.4+sve 131156ms 25% 128176ms 20.7% -2.3% -4.3%
TeaLeaf 2D serial armclang20 armv8.4+sve 99518ms 19.5% 95118ms 14.2% -4.4% -5.3%
TeaLeaf 2D openmp gcc8.3.0 armv8.4+sve 217552ms 27.4% 185782ms 11.6% -14.6% -15.6%
TeaLeaf 2D openmp gcc9.3.0 armv8.4+sve 214994ms 26.5% 192862ms 11.7% -10.3% -14.8%
TeaLeaf 2D openmp gcc10.3.0 armv8.4+sve 213683ms 27.5% 180638ms 11.9% -15.5% -15.7%
TeaLeaf 2D openmp armclang20 armv8.4+sve 822471ms 16.4% 576071ms 8.79% -30.0% -7.6%
TeaLeaf 3D serial gcc8.3.0 armv8.4+sve 130973ms 26.4% 130850ms 20.6% -0.1% -5.8%
TeaLeaf 3D serial gcc9.3.0 armv8.4+sve 131887ms 26.4% 131509ms 20.8% -0.3% -5.6%
TeaLeaf 3D serial gcc10.3.0 armv8.4+sve 132279ms 26.1% 130915ms 21.7% -1.0% -4.4%
TeaLeaf 3D serial armclang20 armv8.4+sve 219685ms 29.9% 212249ms 23.1% -3.4% -6.8%
TeaLeaf 3D openmp gcc8.3.0 armv8.4+sve 269587ms 31.3% 228886ms 14.1% -15.1% -16.2%
TeaLeaf 3D openmp gcc9.3.0 armv8.4+sve 273317ms 30.7% 234772ms 14.3% -14.1% -16.4%
TeaLeaf 3D openmp gcc10.3.0 armv8.4+sve 273946ms 32.1% 226325ms 15.5% -17.4% -16.6%
TeaLeaf 3D openmp armclang20 armv8.4+sve 530223ms 30.8% 481883ms 17.4% -9.1% -13.4%

@ABenC377 ABenC377 linked an issue Jan 19, 2024 that may be closed by this pull request
@FinnWilkinson FinnWilkinson added the enhancement New feature or request label Jan 19, 2024
Copy link
Contributor

@dANW34V3R dANW34V3R left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation on the whole looks great. Just some comments on clarity and naming, again for clarity

configs/DEMO_RISCV.yaml Show resolved Hide resolved
src/lib/config/ModelConfig.cc Outdated Show resolved Hide resolved
src/include/simeng/PerceptronPredictor.hh Outdated Show resolved Hide resolved
src/lib/PerceptronPredictor.cc Show resolved Hide resolved
src/lib/PerceptronPredictor.cc Outdated Show resolved Hide resolved
src/lib/PerceptronPredictor.cc Outdated Show resolved Hide resolved
src/lib/PerceptronPredictor.cc Outdated Show resolved Hide resolved
src/lib/PerceptronPredictor.cc Outdated Show resolved Hide resolved
src/lib/PerceptronPredictor.cc Outdated Show resolved Hide resolved
src/lib/PerceptronPredictor.cc Outdated Show resolved Hide resolved
@FinnWilkinson
Copy link
Contributor

Please can you update the documentation with the new config options and a small section about how the new predictor works

@rahahahat
Copy link
Contributor

I was just wondering if it would be possible to add the sources (research papers/ websites) you've used to carry out the implementation.

This could help reviewers and is a good piece of documentation for future changes.

@JosephMoore25
Copy link
Contributor

I suspect these tests were run in debug mode. It's great that you have lots of results, but it may be worth doing a couple comparative spot checks in release mode to ensure that the speedup is consistent between the two (misprediction rate should remain the same), as this is the mode that performance will matter the most. No need to rerun all the tests unless differences are noticed, or if this is already in release.

configs/DEMO_RISCV.yaml Show resolved Hide resolved
src/include/simeng/PerceptronPredictor.hh Show resolved Hide resolved
src/lib/PerceptronPredictor.cc Outdated Show resolved Hide resolved
src/lib/PerceptronPredictor.cc Outdated Show resolved Hide resolved
src/lib/PerceptronPredictor.cc Outdated Show resolved Hide resolved
src/lib/PerceptronPredictor.cc Outdated Show resolved Hide resolved
test/unit/PerceptronPredictorTest.cc Show resolved Hide resolved
Copy link
Contributor

@JosephMoore25 JosephMoore25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, nice work.

A couple small comments, although others' reviews mostly encapsulate changes that need to happen before approval.

src/lib/PerceptronPredictor.cc Outdated Show resolved Hide resolved
src/lib/PerceptronPredictor.cc Outdated Show resolved Hide resolved
src/lib/PerceptronPredictor.cc Outdated Show resolved Hide resolved
src/lib/PerceptronPredictor.cc Outdated Show resolved Hide resolved
src/include/simeng/PerceptronPredictor.hh Outdated Show resolved Hide resolved
test/regression/RegressionTest.cc Outdated Show resolved Hide resolved
@ABenC377 ABenC377 force-pushed the perceptron_predictor branch from 6f71505 to 7cf73cd Compare February 7, 2024 13:25
@ABenC377 ABenC377 force-pushed the perceptron_predictor branch from 43d94b0 to 4351315 Compare February 7, 2024 15:42
FinnWilkinson
FinnWilkinson previously approved these changes Feb 9, 2024
Copy link
Contributor

@FinnWilkinson FinnWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks great

jj16791
jj16791 previously approved these changes Feb 9, 2024
jj16791
jj16791 previously approved these changes Feb 9, 2024
FinnWilkinson
FinnWilkinson previously approved these changes Feb 12, 2024
JosephMoore25
JosephMoore25 previously approved these changes Feb 12, 2024
Copy link
Contributor

@JosephMoore25 JosephMoore25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good now. Nice work, and nice performance improvements 👍

@ABenC377 ABenC377 merged commit 7c9ed78 into dev Feb 14, 2024
2 checks passed
@ABenC377 ABenC377 deleted the perceptron_predictor branch February 14, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.9.6 Part of SimEng Release 0.9.6 enhancement New feature or request
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Update generic branch predictor
6 participants