-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further vectorisation of BitArray? #29661
Comments
@tannergooding is the right person here |
The numbers look very promising though! |
Could you share a link to the branch? |
https://github.com/Gnbrkm41/corefx/tree/bitarrayintrinsics. Looks like the tests pass w/o problem, but I'm not sure if mine's the most optimal. Feedbacks... and more tricks appreciated. |
@Gnbrkm41 It's important to keep performance the same for all values, not only for some large arrays (where AVX is obviously good). I see you extended the switch from 4 to 8 cases, I am not sure it will be converted to a jump table and also I bet there will be a performance regression between 4 and 8. So could you please benchmark those against the current implementation? |
I don't expect mine to be the absolute best (in fact I'm not really good at optimisation), it was just a simple proof-of-concept so it can get some attention. Would love to learn from it hopefully :^) It looked like there was some regression for any case that isn't vectorised (i.e. <7 elements in the array), about 3 nanoseconds. though once it gets over 8 elements it did get faster than the previous ones. |
Closing in favour of dotnet/corefx#41762 |
I've noticed a couple things while strolling through the source of BitArray:
And
,Or
,Xor
haveSse2
path however it does not haveAvx2
pathNot
is not vectorised at allI tested using a relatively simple implementation (....I pretty much copy & pasted the
Sse2
path and amended it forAvx2/Vector256
:^) ), and it looks like vectorising might have some positive effect on the time taken.(See https://github.com/Gnbrkm41/BitArrayIntrinsicsBenchmark)
N is the size of the array (in bits), methods with
_Opt
suffix are the vectorised methods and_Unopt
are the original methods.Overall, There has been about 16% decrease in execution time for And/Or/Xor and 70% decrease in execution time for
Not
.Would it be worth adding the paths & vectorising the
Not
method? I don't have extensive knowledge on raw instructions so I can't tell much about the Avx2 path however theNot
case definitely looks convincing to me.The text was updated successfully, but these errors were encountered: