Use SIMD in matrix multiplication #8

akiradeveloper · 2021-09-26T12:58:26Z

I think portable-simd will improve multiplication because it is gather operation.

   fn apply(self, to: Self) -> Self {
        let out = gather(&self.inv_perm, &to.inv_perm);
        Self { inv_perm: out }
    }

and portable_simd has one.

    pub fn gather_or_default(slice: &[T], idxs: Simd<usize, LANES>) -> Self
    where
        T: Default,
    {
        Self::gather_or(slice, idxs, Self::splat(T::default()))
    }

It looks quite an early stage but let's trust it.

akiradeveloper · 2021-09-26T13:10:23Z

packed_simd will be deprecated and portable_simd will be the mainstream.

rust-lang/packed_simd#282

Going forward, the project group will be developing a stdsimd crate, which will be eventually be stabilized within the standard library under core::simd and std::simd. The design of stdsimd will likely be similar to the packed_simd crate in terms of practical usage, but there will inevitably be small changes in the transition.

Of course, this will take quite a bit of time to develop and stabilize. Accordingly, the project group will take over maintenance of the packed_simd crate, and we will at least ensure that it keeps building (such as fixing up the recent mmx related issues).

Broadly speaking, we are not planning to develop packed_simd any further. It should be considered to be in long-term maintenance mode, and when stdsimd is eventually stabilized then packed_simd will become completely deprecated.

akiradeveloper · 2021-09-26T13:55:30Z

I am sure about the reason but core_simd doesn't support u8x64 vector.

rust-lang/portable-simd#80 (comment)

I reduced the maximum lane count from 64 to 32, since the error only occurred with 64-length vectors.

This prevents AVX-512 vectors of u8, but this is really only temporary until we can get an LLVM fix (and get that fixed version into rustc)

Here is the error. 1.57 nightly + current master c2f59483f96cf1ab1e92cf10e0f9094432a8374c

error[E0277]: the trait bound `LaneCount<64_usize>: SupportedLaneCount` is not satisfied
  --> src/matrix/math.rs:76:13
   |
76 |     let a = Simd::from_array(vv);
   |             ^^^^^^^^^^^^^^^^^^^^ the trait `SupportedLaneCount` is not implemented for `LaneCount<64_usize>`
   |
   = help: the following implementations were found:
             <LaneCount<16_usize> as SupportedLaneCount>
             <LaneCount<1_usize> as SupportedLaneCount>
             <LaneCount<2_usize> as SupportedLaneCount>
             <LaneCount<32_usize> as SupportedLaneCount>
           and 2 others
note: required by a bound in `Simd`
  --> /Users/akira/.cargo/git/checkouts/portable-simd-311bd65e6952a0da/c2f5948/crates/core_simd/src/vector.rs:20:23
   |
20 |     LaneCount<LANES>: SupportedLaneCount;
   |                       ^^^^^^^^^^^^^^^^^^ required by this bound in `Simd`

But I think twice and found depending on AVX512 might not be a good way.

akiradeveloper · 2021-09-26T14:32:25Z

fn gather_simd(index: &[u8;54], v: &[u8;54]) -> [u8;54] {
    let mut idx0 = [55;32];
    let mut idx1 = [55;32];
    for i in 0..32 {
        idx0[i] = index[i] as usize;
    }
    for i in 0..22 {
        idx1[i] = index[i+32] as usize;
    }
    let idx0 = Simd::from_array(idx0);
    let idx1 = Simd::from_array(idx1);
    let res0 = Simd::gather_or_default(v, idx0);
    let res1 = Simd::gather_or_default(v, idx1);
    let mut out = [0;54];
    for i in 0..32 {
        out[i] = res0[i];
    }
    for i in 0..22 {
        out[i+32] = res1[i];
    }
    out
}

The initial impl is shit and x70 slower than before. But hope is that if 512bit vector is supported. We can remove the copies in both directions (because we can use Simd vector as inv_perm) and of course will be only one operation.

akiradeveloper added the enhancement New feature or request label Sep 26, 2021

akiradeveloper self-assigned this Sep 26, 2021

akiradeveloper closed this as completed Sep 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use SIMD in matrix multiplication #8

Use SIMD in matrix multiplication #8

akiradeveloper commented Sep 26, 2021

akiradeveloper commented Sep 26, 2021

akiradeveloper commented Sep 26, 2021

akiradeveloper commented Sep 26, 2021

Use SIMD in matrix multiplication #8

Use SIMD in matrix multiplication #8

Comments

akiradeveloper commented Sep 26, 2021

akiradeveloper commented Sep 26, 2021

akiradeveloper commented Sep 26, 2021

akiradeveloper commented Sep 26, 2021