Further vectorisation of BitArray? #29661

Gnbrkm41 · 2019-05-25T01:05:02Z

I've noticed a couple things while strolling through the source of BitArray:

And, Or, Xor have Sse2 path however it does not have Avx2 path
Not is not vectorised at all

I tested using a relatively simple implementation (....I pretty much copy & pasted the Sse2 path and amended it for Avx2/Vector256 :^) ), and it looks like vectorising might have some positive effect on the time taken.

(See https://github.com/Gnbrkm41/BitArrayIntrinsicsBenchmark)
N is the size of the array (in bits), methods with _Opt suffix are the vectorised methods and _Unopt are the original methods.

Method	N	Mean	Error	StdDev	Rank
And_Opt	8388608	33,938.916 ns	376.3072 ns	314.2336 ns	36
Or_Opt	8388608	34,029.502 ns	297.8839 ns	278.6408 ns	36
Xor_Opt	8388608	33,466.440 ns	541.9196 ns	506.9119 ns	36
Not_Opt	8388608	22,278.547 ns	54.4056 ns	48.2292 ns	34
And_Unopt	8388608	42,449.080 ns	414.0216 ns	387.2761 ns	38
Or_Unopt	8388608	42,595.040 ns	574.9606 ns	537.8185 ns	38
Xor_Unopt	8388608	42,963.108 ns	508.2865 ns	475.4516 ns	38
Not_Unopt	8388608	88,049.039 ns	686.3357 ns	573.1214 ns	39
And_Opt	8388613	32,711.390 ns	196.6901 ns	174.3607 ns	35
Or_Opt	8388613	32,282.328 ns	501.3276 ns	468.9421 ns	35
Xor_Opt	8388613	32,521.586 ns	634.6536 ns	593.6554 ns	35
Not_Opt	8388613	22,272.668 ns	37.5989 ns	33.3304 ns	34
And_Unopt	8388613	38,650.046 ns	640.1163 ns	534.5261 ns	37
Or_Unopt	8388613	38,996.726 ns	763.5589 ns	965.6576 ns	37
Xor_Unopt	8388613	38,742.539 ns	182.6211 ns	152.4969 ns	37
Not_Unopt	8388613	89,282.173 ns	1,320.3705 ns	1,170.4741 ns	39

Overall, There has been about 16% decrease in execution time for And/Or/Xor and 70% decrease in execution time for Not.

Would it be worth adding the paths & vectorising the Not method? I don't have extensive knowledge on raw instructions so I can't tell much about the Avx2 path however the Not case definitely looks convincing to me.

The text was updated successfully, but these errors were encountered:

danmoseley · 2019-05-25T04:55:09Z

@tannergooding is the right person here

danmoseley · 2019-05-25T04:55:41Z

The numbers look very promising though!

danmoseley · 2019-05-25T04:56:37Z

Could you share a link to the branch?

Gnbrkm41 · 2019-05-25T08:43:54Z

https://github.com/Gnbrkm41/corefx/tree/bitarrayintrinsics. Looks like the tests pass w/o problem, but I'm not sure if mine's the most optimal. Feedbacks... and more tricks appreciated.

EgorBo · 2019-05-26T15:30:57Z

@Gnbrkm41 It's important to keep performance the same for all values, not only for some large arrays (where AVX is obviously good). I see you extended the switch from 4 to 8 cases, I am not sure it will be converted to a jump table and also I bet there will be a performance regression between 4 and 8. So could you please benchmark those against the current implementation?

Gnbrkm41 · 2019-05-27T20:09:20Z

I don't expect mine to be the absolute best (in fact I'm not really good at optimisation), it was just a simple proof-of-concept so it can get some attention. Would love to learn from it hopefully :^)

It looked like there was some regression for any case that isn't vectorised (i.e. <7 elements in the array), about 3 nanoseconds. though once it gets over 8 elements it did get faster than the previous ones.

Gnbrkm41 · 2019-11-06T11:45:52Z

Closing in favour of dotnet/corefx#41762

Gnbrkm41 closed this as completed Nov 6, 2019

msftgits transferred this issue from dotnet/corefx Feb 1, 2020

msftgits added this to the 5.0 milestone Feb 1, 2020

ghost locked as resolved and limited conversation to collaborators Dec 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further vectorisation of BitArray? #29661

Further vectorisation of BitArray? #29661

Gnbrkm41 commented May 25, 2019

danmoseley commented May 25, 2019

danmoseley commented May 25, 2019

danmoseley commented May 25, 2019

Gnbrkm41 commented May 25, 2019

EgorBo commented May 26, 2019 •

edited

Loading

Gnbrkm41 commented May 27, 2019 •

edited

Loading

Gnbrkm41 commented Nov 6, 2019

Further vectorisation of BitArray? #29661

Further vectorisation of BitArray? #29661

Comments

Gnbrkm41 commented May 25, 2019

danmoseley commented May 25, 2019

danmoseley commented May 25, 2019

danmoseley commented May 25, 2019

Gnbrkm41 commented May 25, 2019

EgorBo commented May 26, 2019 • edited Loading

Gnbrkm41 commented May 27, 2019 • edited Loading

Gnbrkm41 commented Nov 6, 2019

EgorBo commented May 26, 2019 •

edited

Loading

Gnbrkm41 commented May 27, 2019 •

edited

Loading