-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AltiVec RAID-Z #9539
Add AltiVec RAID-Z #9539
Conversation
dc35d89
to
8b35a67
Compare
@rdolbeau I should have access to a couple ppc64el systems I can give this a spin on. Though I might not be able to get to it for a little bit. |
@behlendorf Thanks, and there's no hurry. Even it if all architectures works, there's still the issue of detecting AltiVec in-kernel, I'm still not sure how to do that. And there is even some current PPC64 w/o AltiVec - the e5500 core in the new AmigaOne X5000 is one of them - so enabling AltiVec all the time isn't a good option. Though the real fun will start with the variable-length SIMD ISA like Arm's SVE and RISC-V's V :-) Question - do the ppc & ppc64 buildbot do any test that could validate the code, or do they really just build? |
Codecov Report
@@ Coverage Diff @@
## master #9539 +/- ##
========================================
- Coverage 79% 79% -<1%
========================================
Files 385 385
Lines 121644 121644
========================================
- Hits 96606 96586 -20
- Misses 25038 25058 +20
Continue to review full report at Codecov.
|
They really only verify the compilation and don't perform any testing. |
I'm going to be a bit harsh here, but is there any use for this? |
@Ornias1993 It's a legitimate question to ask :-) I'll try to answer... a) Big-endian PPC (32, 64) is obsolete for desktop/laptop/servers, but there's still some Linux-supported hardware out there, and it's still in use in the embedded market (and don't tell the Amiga crowd their current processor of choice is obsolete ;-) ). Also, the older hardware needs the performance boost the most; b) Little-endian PPC (64) is very much alive at IBM, with POWER8 and POWER9 out there and POWER10 announced - and as far as I understand, standard AltiVec code should work on those VSX-enabled systems (they might use a bit more parallelism that what is in this patch, if someone wants to donate me a Blackbird mainboard with CPU and cooler, I'll make sure to check & tune for POWER9 ;-) ) . Also, this doesn't require "significant work to get working", as ZFS is already working on those architectures, and the SIMD infrastructure is common to all of them. It's only a bit of assembly, and detecting the availability of the ISA. And also, why the port to this hardware? "Because it's there" :-) Cordially, |
a) its obsolete. period. sue me. |
e1f068b
to
2b6caae
Compare
@behlendorf This should work on ppc64el now (QEMU only for me), and I've added some in-kernel detection (and updated my description accordingly). |
2b6caae
to
73b1956
Compare
The NEON code replicates too closely the SSE code, including a masked 16-bits shift. But NEON, like AltiVec (openzfs#9539), has unsigned 8-bits shift, so use that instead and drop the masking. Signed-off-by: Romain Dolbeau <[email protected]>
The NEON code replicates too closely the SSE code, including a masked 16-bits shift. But NEON, like AltiVec (#9539), has unsigned 8-bits shift, so use that instead and drop the masking. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes #9725
73b1956
to
26d1dc5
Compare
rebased & rechecked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdolbeau I was able to test your latest version of this PR on a little endian POWER9 system running a 4.14.0 kernel with ztest
and raidz_test
. Functionally everything worked great. The accelerated AltiVec code definitely provides a nice performance bump. Below are the raidz_test
benchmark results which I thought you'd like to see. Nice work!
raid_test -Bv
Benchmarking parity generation... impl, math, dcols, iosize, disk_bw, total_bw, iter original, gen_p, 8, 4096, 3656.968864, 32912.719774, 1048576 original, gen_p, 8, 8192, 391.565382, 3524.088434, 524288 original, gen_p, 8, 16384, 278.245982, 2504.213842, 262144 original, gen_p, 8, 32768, 243.224165, 2189.017486, 131072 original, gen_p, 8, 65536, 246.290623, 2216.615610, 65536 original, gen_p, 8, 131072, 241.208152, 2170.873364, 32768 original, gen_p, 8, 262144, 245.671028, 2211.039251, 16384 original, gen_p, 8, 524288, 246.788158, 2221.093423, 8192 original, gen_p, 8, 1048576, 247.661341, 2228.952066, 4096 original, gen_p, 8, 2097152, 247.455910, 2227.103189, 2048 original, gen_p, 8, 4194304, 247.287560, 2225.588040, 1024 original, gen_p, 8, 8388608, 246.660975, 2219.948771, 512 original, gen_p, 8, 16777216, 244.810247, 2203.292220, 256 original, gen_pq, 8, 4096, 2053.563877, 20535.638767, 1048576 original, gen_pq, 8, 8192, 295.594390, 2955.943900, 524288 original, gen_pq, 8, 16384, 207.364322, 2073.643217, 262144 original, gen_pq, 8, 32768, 180.671352, 1806.713523, 131072 original, gen_pq, 8, 65536, 181.452697, 1814.526970, 65536 original, gen_pq, 8, 131072, 181.149970, 1811.499696, 32768 original, gen_pq, 8, 262144, 181.579363, 1815.793634, 16384 original, gen_pq, 8, 524288, 181.978324, 1819.783238, 8192 original, gen_pq, 8, 1048576, 182.004321, 1820.043209, 4096 original, gen_pq, 8, 2097152, 133.262406, 1332.624063, 2048 original, gen_pq, 8, 4194304, 182.149313, 1821.493132, 1024 original, gen_pq, 8, 8388608, 150.016988, 1500.169876, 512 original, gen_pq, 8, 16777216, 170.881586, 1708.815860, 256 original, gen_pqr, 8, 4096, 1491.024417, 16401.268592, 1048576 original, gen_pqr, 8, 8192, 189.621040, 2085.831436, 524288 original, gen_pqr, 8, 16384, 132.406009, 1456.466094, 262144 original, gen_pqr, 8, 32768, 115.047409, 1265.521498, 131072 original, gen_pqr, 8, 65536, 115.176816, 1266.944978, 65536 original, gen_pqr, 8, 131072, 114.208494, 1256.293431, 32768 original, gen_pqr, 8, 262144, 115.369561, 1269.065176, 16384 original, gen_pqr, 8, 524288, 115.817483, 1273.992312, 8192 original, gen_pqr, 8, 1048576, 115.838635, 1274.224989, 4096 original, gen_pqr, 8, 2097152, 115.986786, 1275.854644, 2048 original, gen_pqr, 8, 4194304, 115.924897, 1275.173863, 1024 original, gen_pqr, 8, 8388608, 111.482268, 1226.304945, 512 original, gen_pqr, 8, 16777216, 106.950381, 1176.454191, 256 scalar, gen_p, 8, 4096, 1841.727771, 16575.549943, 1048576 scalar, gen_p, 8, 8192, 1853.161948, 16678.457530, 524288 scalar, gen_p, 8, 16384, 1822.644436, 16403.799926, 262144 scalar, gen_p, 8, 32768, 1807.432256, 16266.890304, 131072 scalar, gen_p, 8, 65536, 1991.621234, 17924.591105, 65536 scalar, gen_p, 8, 131072, 1545.706177, 13911.355592, 32768 scalar, gen_p, 8, 262144, 1811.350960, 16302.158642, 16384 scalar, gen_p, 8, 524288, 2062.910320, 18566.192877, 8192 scalar, gen_p, 8, 1048576, 2132.803900, 19195.235103, 4096 scalar, gen_p, 8, 2097152, 2129.542420, 19165.881776, 2048 scalar, gen_p, 8, 4194304, 2127.168241, 19144.514172, 1024 scalar, gen_p, 8, 8388608, 2055.131842, 18496.186576, 512 scalar, gen_p, 8, 16777216, 1939.809176, 17458.282585, 256 scalar, gen_pq, 8, 4096, 942.578063, 9425.780634, 1048576 scalar, gen_pq, 8, 8192, 728.793025, 7287.930247, 524288 scalar, gen_pq, 8, 16384, 650.667441, 6506.674414, 262144 scalar, gen_pq, 8, 32768, 616.020909, 6160.209087, 131072 scalar, gen_pq, 8, 65536, 641.924889, 6419.248888, 65536 scalar, gen_pq, 8, 131072, 612.713743, 6127.137432, 32768 scalar, gen_pq, 8, 262144, 676.097445, 6760.974452, 16384 scalar, gen_pq, 8, 524288, 683.510598, 6835.105981, 8192 scalar, gen_pq, 8, 1048576, 691.094490, 6910.944903, 4096 scalar, gen_pq, 8, 2097152, 693.805901, 6938.059008, 2048 scalar, gen_pq, 8, 4194304, 692.127050, 6921.270495, 1024 scalar, gen_pq, 8, 8388608, 677.172715, 6771.727148, 512 scalar, gen_pq, 8, 16777216, 686.917284, 6869.172843, 256 scalar, gen_pqr, 8, 4096, 633.388832, 6967.277155, 1048576 scalar, gen_pqr, 8, 8192, 368.661178, 4055.272961, 524288 scalar, gen_pqr, 8, 16384, 302.184634, 3324.030973, 262144 scalar, gen_pqr, 8, 32768, 277.590806, 3053.498870, 131072 scalar, gen_pqr, 8, 65536, 284.902513, 3133.927646, 65536 scalar, gen_pqr, 8, 131072, 289.570407, 3185.274474, 32768 scalar, gen_pqr, 8, 262144, 296.502373, 3261.526107, 16384 scalar, gen_pqr, 8, 524288, 294.142975, 3235.572725, 8192 scalar, gen_pqr, 8, 1048576, 306.761183, 3374.373011, 4096 scalar, gen_pqr, 8, 2097152, 307.112779, 3378.240568, 2048 scalar, gen_pqr, 8, 4194304, 307.381541, 3381.196954, 1024 scalar, gen_pqr, 8, 8388608, 307.121678, 3378.338460, 512 scalar, gen_pqr, 8, 16777216, 306.083266, 3366.915930, 256 powerpc_altivec, gen_p, 8, 4096, 2960.059604, 26640.536432, 1048576 powerpc_altivec, gen_p, 8, 8192, 2526.591637, 22739.324731, 524288 powerpc_altivec, gen_p, 8, 16384, 2300.113210, 20701.018890, 262144 powerpc_altivec, gen_p, 8, 32768, 2216.791852, 19951.126667, 131072 powerpc_altivec, gen_p, 8, 65536, 2495.108856, 22455.979701, 65536 powerpc_altivec, gen_p, 8, 131072, 1794.564766, 16151.082893, 32768 powerpc_altivec, gen_p, 8, 262144, 2218.366059, 19965.294530, 16384 powerpc_altivec, gen_p, 8, 524288, 2600.452232, 23404.070087, 8192 powerpc_altivec, gen_p, 8, 1048576, 2714.474529, 24430.270764, 4096 powerpc_altivec, gen_p, 8, 2097152, 2698.543308, 24286.889771, 2048 powerpc_altivec, gen_p, 8, 4194304, 2688.745976, 24198.713788, 1024 powerpc_altivec, gen_p, 8, 8388608, 2543.496952, 22891.472572, 512 powerpc_altivec, gen_p, 8, 16777216, 2150.532052, 19354.788469, 256 powerpc_altivec, gen_pq, 8, 4096, 1540.390169, 15403.901690, 1048576 powerpc_altivec, gen_pq, 8, 8192, 1115.555622, 11155.556222, 524288 powerpc_altivec, gen_pq, 8, 16384, 977.258056, 9772.580560, 262144 powerpc_altivec, gen_pq, 8, 32768, 923.560405, 9235.604047, 131072 powerpc_altivec, gen_pq, 8, 65536, 969.052376, 9690.523755, 65536 powerpc_altivec, gen_pq, 8, 131072, 861.794187, 8617.941874, 32768 powerpc_altivec, gen_pq, 8, 262144, 989.944038, 9899.440383, 16384 powerpc_altivec, gen_pq, 8, 524288, 1015.455283, 10154.552830, 8192 powerpc_altivec, gen_pq, 8, 1048576, 1020.936189, 10209.361889, 4096 powerpc_altivec, gen_pq, 8, 2097152, 1026.762787, 10267.627866, 2048 powerpc_altivec, gen_pq, 8, 4194304, 1029.455805, 10294.558049, 1024 powerpc_altivec, gen_pq, 8, 8388608, 1022.938885, 10229.388850, 512 powerpc_altivec, gen_pq, 8, 16777216, 998.514707, 9985.147074, 256 powerpc_altivec, gen_pqr, 8, 4096, 1041.330279, 11454.633066, 1048576 powerpc_altivec, gen_pqr, 8, 8192, 648.855154, 7137.406696, 524288 powerpc_altivec, gen_pqr, 8, 16384, 546.148364, 6007.632003, 262144 powerpc_altivec, gen_pqr, 8, 32768, 508.354000, 5591.894002, 131072 powerpc_altivec, gen_pqr, 8, 65536, 500.657823, 5507.236051, 65536 powerpc_altivec, gen_pqr, 8, 131072, 484.756543, 5332.321970, 32768 powerpc_altivec, gen_pqr, 8, 262144, 510.101846, 5611.120303, 16384 powerpc_altivec, gen_pqr, 8, 524288, 541.016596, 5951.182553, 8192 powerpc_altivec, gen_pqr, 8, 1048576, 539.579715, 5935.376864, 4096 powerpc_altivec, gen_pqr, 8, 2097152, 538.651620, 5925.167824, 2048 powerpc_altivec, gen_pqr, 8, 4194304, 539.098816, 5930.086976, 1024 powerpc_altivec, gen_pqr, 8, 8388608, 536.446906, 5900.915970, 512 powerpc_altivec, gen_pqr, 8, 16777216, 528.037246, 5808.409709, 256 Benchmarking data reconstruction... impl, math, dcols, iosize, disk_bw, total_bw, iter original, rec_p, 8, 32768, 1365.404873, 15019.453606, 16384 original, rec_p, 8, 65536, 1500.172203, 16501.894237, 8192 original, rec_p, 8, 131072, 1540.449718, 16944.946903, 4096 original, rec_p, 8, 262144, 1607.881029, 17686.691317, 2048 original, rec_p, 8, 524288, 1570.430251, 17274.732761, 1024 original, rec_p, 8, 1048576, 1631.146830, 17942.615133, 512 original, rec_p, 8, 2097152, 1645.682839, 18102.511230, 256 original, rec_p, 8, 4194304, 1672.468958, 18397.158538, 128 original, rec_p, 8, 8388608, 1664.757571, 18312.333284, 64 original, rec_p, 8, 16777216, 1620.383041, 17824.213455, 32 original, rec_q, 8, 32768, 258.946541, 2848.411954, 16384 original, rec_q, 8, 65536, 263.869676, 2902.566432, 8192 original, rec_q, 8, 131072, 269.861720, 2968.478924, 4096 original, rec_q, 8, 262144, 270.006075, 2970.066823, 2048 original, rec_q, 8, 524288, 273.609740, 3009.707135, 1024 original, rec_q, 8, 1048576, 273.103945, 3004.143397, 512 original, rec_q, 8, 2097152, 273.461091, 3008.072003, 256 original, rec_q, 8, 4194304, 273.374356, 3007.117921, 128 original, rec_q, 8, 8388608, 273.133757, 3004.471328, 64 original, rec_q, 8, 16777216, 272.439348, 2996.832824, 32 original, rec_r, 8, 32768, 28.272344, 310.995788, 16384 original, rec_r, 8, 65536, 29.191250, 321.103745, 8192 original, rec_r, 8, 131072, 29.460318, 324.063496, 4096 original, rec_r, 8, 262144, 29.357856, 322.936418, 2048 original, rec_r, 8, 524288, 29.614490, 325.759394, 1024 original, rec_r, 8, 1048576, 29.617688, 325.794568, 512 original, rec_r, 8, 2097152, 29.655820, 326.214018, 256 original, rec_r, 8, 4194304, 28.939436, 318.333798, 128 original, rec_r, 8, 8388608, 29.799910, 327.799012, 64 original, rec_r, 8, 16777216, 29.441554, 323.857096, 32 original, rec_pq, 8, 32768, 81.007427, 891.081699, 16384 original, rec_pq, 8, 65536, 82.856417, 911.420582, 8192 original, rec_pq, 8, 131072, 83.886532, 922.751849, 4096 original, rec_pq, 8, 262144, 84.022392, 924.246313, 2048 original, rec_pq, 8, 524288, 88.307637, 971.384005, 1024 original, rec_pq, 8, 1048576, 86.591222, 952.503444, 512 original, rec_pq, 8, 2097152, 88.104370, 969.148067, 256 original, rec_pq, 8, 4194304, 89.467332, 984.140656, 128 original, rec_pq, 8, 8388608, 85.940431, 945.344746, 64 original, rec_pq, 8, 16777216, 85.884375, 944.728125, 32 original, rec_pr, 8, 32768, 11.623874, 127.862612, 16384 original, rec_pr, 8, 65536, 11.567150, 127.238647, 8192 original, rec_pr, 8, 131072, 11.529299, 126.822290, 4096 original, rec_pr, 8, 262144, 11.415832, 125.574150, 2048 original, rec_pr, 8, 524288, 11.260947, 123.870421, 1024 original, rec_pr, 8, 1048576, 11.149352, 122.642868, 512 original, rec_pr, 8, 2097152, 11.092749, 122.020238, 256 original, rec_pr, 8, 4194304, 11.008933, 121.098259, 128 original, rec_pr, 8, 8388608, 11.342347, 124.765822, 64 original, rec_pr, 8, 16777216, 11.293360, 124.226965, 32 original, rec_qr, 8, 32768, 11.608232, 127.690553, 16384 original, rec_qr, 8, 65536, 11.756600, 129.322601, 8192 original, rec_qr, 8, 131072, 11.596978, 127.566758, 4096 original, rec_qr, 8, 262144, 11.383399, 125.217394, 2048 original, rec_qr, 8, 524288, 11.400347, 125.403813, 1024 original, rec_qr, 8, 1048576, 11.418515, 125.603668, 512 original, rec_qr, 8, 2097152, 11.404965, 125.454615, 256 original, rec_qr, 8, 4194304, 11.340958, 124.750533, 128 original, rec_qr, 8, 8388608, 11.655530, 128.210834, 64 original, rec_qr, 8, 16777216, 11.920010, 131.120113, 32 original, rec_pqr, 8, 32768, 9.719898, 106.918877, 16384 original, rec_pqr, 8, 65536, 9.590323, 105.493550, 8192 original, rec_pqr, 8, 131072, 9.420586, 103.626441, 4096 original, rec_pqr, 8, 262144, 9.143235, 100.575581, 2048 original, rec_pqr, 8, 524288, 8.926874, 98.195610, 1024 original, rec_pqr, 8, 1048576, 8.769308, 96.462390, 512 original, rec_pqr, 8, 2097152, 8.707765, 95.785419, 256 original, rec_pqr, 8, 4194304, 8.619654, 94.816197, 128 original, rec_pqr, 8, 8388608, 9.047230, 99.519531, 64 original, rec_pqr, 8, 16777216, 9.209364, 101.303004, 32 scalar, rec_p, 8, 32768, 1722.008893, 18942.097828, 16384 scalar, rec_p, 8, 65536, 1922.851589, 21151.367478, 8192 scalar, rec_p, 8, 131072, 2000.711816, 22007.829973, 4096 scalar, rec_p, 8, 262144, 2092.398965, 23016.388617, 2048 scalar, rec_p, 8, 524288, 2029.589641, 22325.486052, 1024 scalar, rec_p, 8, 1048576, 2113.481612, 23248.297731, 512 scalar, rec_p, 8, 2097152, 2139.860693, 23538.467624, 256 scalar, rec_p, 8, 4194304, 2128.252556, 23410.778116, 128 scalar, rec_p, 8, 8388608, 2064.779955, 22712.579501, 64 scalar, rec_p, 8, 16777216, 1949.351760, 21442.869358, 32 scalar, rec_q, 8, 32768, 536.894032, 5905.834357, 16384 scalar, rec_q, 8, 65536, 562.671286, 6189.384148, 8192 scalar, rec_q, 8, 131072, 578.279406, 6361.073469, 4096 scalar, rec_q, 8, 262144, 552.370216, 6076.072374, 2048 scalar, rec_q, 8, 524288, 591.942946, 6511.372408, 1024 scalar, rec_q, 8, 1048576, 603.384965, 6637.234617, 512 scalar, rec_q, 8, 2097152, 606.392302, 6670.315317, 256 scalar, rec_q, 8, 4194304, 605.955614, 6665.511753, 128 scalar, rec_q, 8, 8388608, 605.920120, 6665.121318, 64 scalar, rec_q, 8, 16777216, 605.101051, 6656.111561, 32 scalar, rec_r, 8, 32768, 357.845576, 3936.301340, 16384 scalar, rec_r, 8, 65536, 370.608057, 4076.688628, 8192 scalar, rec_r, 8, 131072, 369.235271, 4061.587982, 4096 scalar, rec_r, 8, 262144, 373.443551, 4107.879066, 2048 scalar, rec_r, 8, 524288, 387.798563, 4265.784192, 1024 scalar, rec_r, 8, 1048576, 398.905324, 4387.958564, 512 scalar, rec_r, 8, 2097152, 401.993218, 4421.925399, 256 scalar, rec_r, 8, 4194304, 403.176288, 4434.939171, 128 scalar, rec_r, 8, 8388608, 402.539666, 4427.936321, 64 scalar, rec_r, 8, 16777216, 402.087308, 4422.960389, 32 scalar, rec_pq, 8, 32768, 348.133726, 3829.470981, 16384 scalar, rec_pq, 8, 65536, 358.920096, 3948.121061, 8192 scalar, rec_pq, 8, 131072, 365.928799, 4025.216794, 4096 scalar, rec_pq, 8, 262144, 367.182858, 4039.011435, 2048 scalar, rec_pq, 8, 524288, 374.582448, 4120.406929, 1024 scalar, rec_pq, 8, 1048576, 377.202619, 4149.228806, 512 scalar, rec_pq, 8, 2097152, 375.485363, 4130.338996, 256 scalar, rec_pq, 8, 4194304, 376.623923, 4142.863152, 128 scalar, rec_pq, 8, 8388608, 376.274054, 4139.014590, 64 scalar, rec_pq, 8, 16777216, 375.262198, 4127.884174, 32 scalar, rec_pr, 8, 32768, 261.087141, 2871.958550, 16384 scalar, rec_pr, 8, 65536, 268.793841, 2956.732247, 8192 scalar, rec_pr, 8, 131072, 274.633992, 3020.973914, 4096 scalar, rec_pr, 8, 262144, 271.735599, 2989.091586, 2048 scalar, rec_pr, 8, 524288, 279.579945, 3075.379392, 1024 scalar, rec_pr, 8, 1048576, 283.390444, 3117.294880, 512 scalar, rec_pr, 8, 2097152, 284.499873, 3129.498599, 256 scalar, rec_pr, 8, 4194304, 285.468453, 3140.152980, 128 scalar, rec_pr, 8, 8388608, 285.574598, 3141.320574, 64 scalar, rec_pr, 8, 16777216, 285.222815, 3137.450965, 32 scalar, rec_qr, 8, 32768, 160.174862, 1761.923483, 16384 scalar, rec_qr, 8, 65536, 163.101908, 1794.120983, 8192 scalar, rec_qr, 8, 131072, 164.839079, 1813.229874, 4096 scalar, rec_qr, 8, 262144, 163.858116, 1802.439271, 2048 scalar, rec_qr, 8, 524288, 166.517959, 1831.697551, 1024 scalar, rec_qr, 8, 1048576, 168.883924, 1857.723168, 512 scalar, rec_qr, 8, 2097152, 169.959898, 1869.558880, 256 scalar, rec_qr, 8, 4194304, 170.192651, 1872.119162, 128 scalar, rec_qr, 8, 8388608, 170.206913, 1872.276041, 64 scalar, rec_qr, 8, 16777216, 169.988370, 1869.872065, 32 scalar, rec_pqr, 8, 32768, 125.973107, 1385.704180, 16384 scalar, rec_pqr, 8, 65536, 127.771243, 1405.483678, 8192 scalar, rec_pqr, 8, 131072, 129.033768, 1419.371453, 4096 scalar, rec_pqr, 8, 262144, 129.283033, 1422.113360, 2048 scalar, rec_pqr, 8, 524288, 131.051098, 1441.562079, 1024 scalar, rec_pqr, 8, 1048576, 132.074573, 1452.820299, 512 scalar, rec_pqr, 8, 2097152, 132.262644, 1454.889085, 256 scalar, rec_pqr, 8, 4194304, 132.363997, 1456.003964, 128 scalar, rec_pqr, 8, 8388608, 132.138988, 1453.528870, 64 scalar, rec_pqr, 8, 16777216, 132.308035, 1455.388383, 32 powerpc_altivec, rec_p, 8, 32768, 2062.916238, 22692.078614, 16384 powerpc_altivec, rec_p, 8, 65536, 2381.982269, 26201.804962, 8192 powerpc_altivec, rec_p, 8, 131072, 2451.254506, 26963.799567, 4096 powerpc_altivec, rec_p, 8, 262144, 2645.385969, 29099.245664, 2048 powerpc_altivec, rec_p, 8, 524288, 2545.830215, 28004.132360, 1024 powerpc_altivec, rec_p, 8, 1048576, 2694.220913, 29636.430043, 512 powerpc_altivec, rec_p, 8, 2097152, 2664.085723, 29304.942949, 256 powerpc_altivec, rec_p, 8, 4194304, 2632.513425, 28957.647672, 128 powerpc_altivec, rec_p, 8, 8388608, 2469.897259, 27168.869847, 64 powerpc_altivec, rec_p, 8, 16777216, 2102.279442, 23125.073867, 32 powerpc_altivec, rec_q, 8, 32768, 985.001187, 10835.013055, 16384 powerpc_altivec, rec_q, 8, 65536, 1059.659176, 11656.250934, 8192 powerpc_altivec, rec_q, 8, 131072, 1080.222705, 11882.449760, 4096 powerpc_altivec, rec_q, 8, 262144, 1022.957892, 11252.536816, 2048 powerpc_altivec, rec_q, 8, 524288, 1127.513933, 12402.653266, 1024 powerpc_altivec, rec_q, 8, 1048576, 1133.197023, 12465.167250, 512 powerpc_altivec, rec_q, 8, 2097152, 1133.449954, 12467.949489, 256 powerpc_altivec, rec_q, 8, 4194304, 1136.923599, 12506.159594, 128 powerpc_altivec, rec_q, 8, 8388608, 1134.996401, 12484.960411, 64 powerpc_altivec, rec_q, 8, 16777216, 1127.215915, 12399.375064, 32 powerpc_altivec, rec_r, 8, 32768, 746.311646, 8209.428108, 16384 powerpc_altivec, rec_r, 8, 65536, 789.128794, 8680.416734, 8192 powerpc_altivec, rec_r, 8, 131072, 740.453027, 8144.983294, 4096 powerpc_altivec, rec_r, 8, 262144, 806.123008, 8867.353089, 2048 powerpc_altivec, rec_r, 8, 524288, 825.341087, 9078.751957, 1024 powerpc_altivec, rec_r, 8, 1048576, 826.715775, 9093.873525, 512 powerpc_altivec, rec_r, 8, 2097152, 827.377957, 9101.157530, 256 powerpc_altivec, rec_r, 8, 4194304, 836.060895, 9196.669844, 128 powerpc_altivec, rec_r, 8, 8388608, 835.287547, 9188.163020, 64 powerpc_altivec, rec_r, 8, 16777216, 832.549718, 9158.046894, 32 powerpc_altivec, rec_pq, 8, 32768, 627.698909, 6904.688001, 16384 powerpc_altivec, rec_pq, 8, 65536, 655.366771, 7209.034485, 8192 powerpc_altivec, rec_pq, 8, 131072, 668.994076, 7358.934838, 4096 powerpc_altivec, rec_pq, 8, 262144, 678.056253, 7458.618783, 2048 powerpc_altivec, rec_pq, 8, 524288, 687.138476, 7558.523235, 1024 powerpc_altivec, rec_pq, 8, 1048576, 691.050006, 7601.550068, 512 powerpc_altivec, rec_pq, 8, 2097152, 693.391627, 7627.307894, 256 powerpc_altivec, rec_pq, 8, 4194304, 694.614706, 7640.761771, 128 powerpc_altivec, rec_pq, 8, 8388608, 693.442189, 7627.864076, 64 powerpc_altivec, rec_pq, 8, 16777216, 691.485597, 7606.341571, 32 powerpc_altivec, rec_pr, 8, 32768, 526.813381, 5794.947193, 16384 powerpc_altivec, rec_pr, 8, 65536, 546.085555, 6006.941106, 8192 powerpc_altivec, rec_pr, 8, 131072, 544.750774, 5992.258513, 4096 powerpc_altivec, rec_pr, 8, 262144, 553.919737, 6093.117110, 2048 powerpc_altivec, rec_pr, 8, 524288, 567.634231, 6243.976536, 1024 powerpc_altivec, rec_pr, 8, 1048576, 568.227901, 6250.506910, 512 powerpc_altivec, rec_pr, 8, 2097152, 567.402074, 6241.422809, 256 powerpc_altivec, rec_pr, 8, 4194304, 568.030638, 6248.337013, 128 powerpc_altivec, rec_pr, 8, 8388608, 568.315850, 6251.474347, 64 powerpc_altivec, rec_pr, 8, 16777216, 567.375011, 6241.125125, 32 powerpc_altivec, rec_qr, 8, 32768, 378.569690, 4164.266594, 16384 powerpc_altivec, rec_qr, 8, 65536, 388.179104, 4269.970148, 8192 powerpc_altivec, rec_qr, 8, 131072, 386.562253, 4252.184782, 4096 powerpc_altivec, rec_qr, 8, 262144, 392.206001, 4314.266016, 2048 powerpc_altivec, rec_qr, 8, 524288, 400.264785, 4402.912637, 1024 powerpc_altivec, rec_qr, 8, 1048576, 400.918811, 4410.106918, 512 powerpc_altivec, rec_qr, 8, 2097152, 398.745683, 4386.202515, 256 powerpc_altivec, rec_qr, 8, 4194304, 398.989236, 4388.881591, 128 powerpc_altivec, rec_qr, 8, 8388608, 398.931527, 4388.246794, 64 powerpc_altivec, rec_qr, 8, 16777216, 398.311458, 4381.426040, 32 powerpc_altivec, rec_pqr, 8, 32768, 294.490208, 3239.392291, 16384 powerpc_altivec, rec_pqr, 8, 65536, 296.667955, 3263.347508, 8192 powerpc_altivec, rec_pqr, 8, 131072, 304.419184, 3348.611022, 4096 powerpc_altivec, rec_pqr, 8, 262144, 308.369860, 3392.068460, 2048 powerpc_altivec, rec_pqr, 8, 524288, 310.252375, 3412.776120, 1024 powerpc_altivec, rec_pqr, 8, 1048576, 310.320527, 3413.525798, 512 powerpc_altivec, rec_pqr, 8, 2097152, 309.558867, 3405.147540, 256 powerpc_altivec, rec_pqr, 8, 4194304, 309.448900, 3403.937904, 128 powerpc_altivec, rec_pqr, 8, 8388608, 309.409284, 3403.502121, 64 powerpc_altivec, rec_pqr, 8, 16777216, 308.730451, 3396.034960, 32
Unrelated to this PR I did run in to one strange build issue which only seems to occur on this platform I'll get a PR opened for.
Is there any additional testing, or pending work, you'd like to see done before we merge this? If not, then from my perspective this should be ready to go.
@behlendorf Now that #9848 is merged, I'll try to rebase/upgrade and use the new CPU handling. |
26d1dc5
to
0b50ac3
Compare
Implements the RAID-Z function using AltiVec SIMD. This is basically the NEON code translated to AltiVec. Note that the 'fletcher' algorithm requires 64-bits operations, and the initial implementations of AltiVec (PPC74xx a.k.a. G4, PPC970 a.k.a. G5) only has up to 32-bits operations, so no 'fletcher'. Signed-off-by: Romain Dolbeau <[email protected]>
0b50ac3
to
c83827e
Compare
Rebased, updated to use #9848, rechecked with raidz_test on 32BE/64BE/64LE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've retested with raidz_test
and ztest
using the hardware I have available, 64LE, and everything looks good.
However, loading the kmods on an altivec enabled kernel resulted in the following warning. This maps to the following kernel WARN_ON
.
WARNING: CPU: 92 PID: 123163 at arch/powerpc/kernel/process.c:285 enable_kernel_altivec+0x110/0x170
void enable_kernel_altivec(void)
{
....
>>> WARN_ON(preemptible());
Looking at the other enable_kernel_altivec()
callers it appears that the caller in responsible for disabling preemption, unlike arm and x86. Adding the missing preempt_disable()
and preempt_enable()
resolved the issue. Like this:
diff --git a/include/os/linux/kernel/linux/simd_powerpc.h b/include/os/linux/kernel/linux/simd_powerpc.h
index ebb88f9..194eeaa 100644
--- a/include/os/linux/kernel/linux/simd_powerpc.h
+++ b/include/os/linux/kernel/linux/simd_powerpc.h
@@ -57,16 +57,27 @@
#include <sys/types.h>
#include <linux/version.h>
-#define kfpu_allowed() 1
-#define kfpu_begin() enable_kernel_altivec()
+#define kfpu_allowed() 1
+#define kfpu_begin() \
+{ \
+ preempt_disable(); \
+ enable_kernel_altivec(); \
+}
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 5, 0)
-#define kfpu_end() disable_kernel_altivec()
+#define kfpu_end() \
+{ \
+ disable_kernel_altivec(); \
+ preempt_enable(); \
+}
#else
/* seems that before 4.5 no-one bothered disabling ... */
-#define kfpu_end() ((void) 0)
+#define kfpu_end() \
+{ \
+ preempt_enable(); \
+}
#endif
-#define kfpu_init() 0
-#define kfpu_fini() ((void) 0)
+#define kfpu_init() 0
+#define kfpu_fini() ((void) 0)
/*
* Check if AltiVec instruction set is available
Everything worked well after resolving this and the small issue (commented inline) which prevented me for forcing altivec to be used.
7 no 32 bits ztest, userland for BE is 32 bits and ztest crashes at start-up:
I wasn't able to do any 32-bit testing, but based on your last comment it sounds like you were able to test the 32BE implementation. Should this comment from the top post be updated. Are there any other specific tests you'd like to run?
.gen = RAIDZ_GEN_METHODS(powerpc_altivec), | ||
.rec = RAIDZ_REC_METHODS(powerpc_altivec), | ||
.is_supported = &raidz_will_powerpc_altivec_work, | ||
.name = "powerpc_altivec" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"powerpc_altivec" is right at the 16 character limit which causes EINVAL
to be returned when trying to set it with echo powerpc_altivec >/sys/module/zfs/parameters/zfs_vdev_raidz_impl". Increasing
RAIDZ_IMPL_NAME_MAX` from 16 to 20 resolves the issue. Alternately we could shorten the name to "altivec".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a) Will fix preemption ASAP
b) For BE, I did test with raidz_test, loading modules and some zpool/zfs operations, but ztest itself always crashes on my (32 bits userland) BE systems
c) I count only 15 for powerpc_altivec... I'll push the limit to 20, as I think we should keep the $arch_$simd nomenclature for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put the updated PR through some additional manual testing, including moving an existing pool between architectures, and didn't encounter any problems. From my perspective this PR is ready to be merged.
@behlendorf I still need to merge the commits, I'll do that ASAP |
@rdolbeau sounds good. Alternately, I can squash them when merging if you prefer. |
@behlendorf if you can squash while merging it's OK for me. BTW - I couldn't get the pre-emption issue message in syslog before the patch, weirdly. I might not have run operations long-running enough for the issue to show up though :-( |
Implements the RAID-Z function using AltiVec SIMD. This is basically the NEON code translated to AltiVec. Note that the 'fletcher' algorithm requires 64-bits operations, and the initial implementations of AltiVec (PPC74xx a.k.a. G4, PPC970 a.k.a. G5) only has up to 32-bits operations, so no 'fletcher'. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes openzfs#9539
Implements the RAID-Z function using AltiVec SIMD.
This is basically the NEON code translated to AltiVec.
Note that the 'fletcher' algorithm requires 64-bits
operations, and the initial implementations of AltiVec
(PPC74xx a.k.a. G4, PPC970 a.k.a. G5) only has up to
32-bits operations, so no 'fletcher'.
Signed-off-by: Romain Dolbeau [email protected]
Motivation and Context
Performance only, on a limited amount of hardware...
Description
This add AltiVec (PowerPC SIMD) support for RAID-Z SIMD.
However :
onlybeen tested on a big-endian 64 bits PPC (ppc64) & BE 32 bits PPC (ppc)This should be tested on little-endian 64 bits PPC (ppc64el)I've also tested in on Debian ppc64el in QEMU, it passes ztest & raidz_testThe testing code for kernel is disabled, has it seems to be GPL-only on my kernel and I don't know how to properly detect AltiVec in-kernel (catching SIGILL is probably not an option there...)This checks the MSR 'Vec' bit is set when AltiVec is enabledSeems that adding -maltivec is crashing every non-ppc arch, I'm not sure how to properly add the option to the file that needs it (the compiler doesn't seem to want to deal with AltiVec asm without -maltivec)Fixed, but Makefile implementation might not be very clean.Makefile was updated to use the symbols introduced with Unify target_cpu handling #9848However, ztest seems OK in QEMU on ppc64el with 64 bits userland.
disable_kernel_altivec() was apparently introduced around kernel 4.5 so fails to compile on e.g. 3.16disable_kernel_altivec() is only used for kernel >= 4.5Performance (on G5):
How Has This Been Tested?
Only tested on a BE PPC64 970MP "G5" running 4.19, both raidz_test & trying a pool.
Also on a BE PPC 7455 "G4" running 5.3 (and to a limited extent 3.16).
Also on a LE POWER9 in QEMU, running 4.19, ztest & raiz_test.
ztest was not run on BE systems, see above.
Types of changes
Checklist:
Signed-off-by
.