Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SIMD in matrix multiplication #8

Closed
akiradeveloper opened this issue Sep 26, 2021 · 3 comments
Closed

Use SIMD in matrix multiplication #8

akiradeveloper opened this issue Sep 26, 2021 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@akiradeveloper
Copy link
Owner

I think portable-simd will improve multiplication because it is gather operation.

   fn apply(self, to: Self) -> Self {
        let out = gather(&self.inv_perm, &to.inv_perm);
        Self { inv_perm: out }
    }

and portable_simd has one.

    pub fn gather_or_default(slice: &[T], idxs: Simd<usize, LANES>) -> Self
    where
        T: Default,
    {
        Self::gather_or(slice, idxs, Self::splat(T::default()))
    }

It looks quite an early stage but let's trust it.

@akiradeveloper akiradeveloper added the enhancement New feature or request label Sep 26, 2021
@akiradeveloper akiradeveloper self-assigned this Sep 26, 2021
@akiradeveloper
Copy link
Owner Author

packed_simd will be deprecated and portable_simd will be the mainstream.

rust-lang/packed_simd#282

Going forward, the project group will be developing a stdsimd crate, which will be eventually be stabilized within the standard library under core::simd and std::simd. The design of stdsimd will likely be similar to the packed_simd crate in terms of practical usage, but there will inevitably be small changes in the transition.

Of course, this will take quite a bit of time to develop and stabilize. Accordingly, the project group will take over maintenance of the packed_simd crate, and we will at least ensure that it keeps building (such as fixing up the recent mmx related issues).

Broadly speaking, we are not planning to develop packed_simd any further. It should be considered to be in long-term maintenance mode, and when stdsimd is eventually stabilized then packed_simd will become completely deprecated.

@akiradeveloper
Copy link
Owner Author

I am sure about the reason but core_simd doesn't support u8x64 vector.

rust-lang/portable-simd#80 (comment)

I reduced the maximum lane count from 64 to 32, since the error only occurred with 64-length vectors.

This prevents AVX-512 vectors of u8, but this is really only temporary until we can get an LLVM fix (and get that fixed version into rustc)

Here is the error. 1.57 nightly + current master c2f59483f96cf1ab1e92cf10e0f9094432a8374c

error[E0277]: the trait bound `LaneCount<64_usize>: SupportedLaneCount` is not satisfied
  --> src/matrix/math.rs:76:13
   |
76 |     let a = Simd::from_array(vv);
   |             ^^^^^^^^^^^^^^^^^^^^ the trait `SupportedLaneCount` is not implemented for `LaneCount<64_usize>`
   |
   = help: the following implementations were found:
             <LaneCount<16_usize> as SupportedLaneCount>
             <LaneCount<1_usize> as SupportedLaneCount>
             <LaneCount<2_usize> as SupportedLaneCount>
             <LaneCount<32_usize> as SupportedLaneCount>
           and 2 others
note: required by a bound in `Simd`
  --> /Users/akira/.cargo/git/checkouts/portable-simd-311bd65e6952a0da/c2f5948/crates/core_simd/src/vector.rs:20:23
   |
20 |     LaneCount<LANES>: SupportedLaneCount;
   |                       ^^^^^^^^^^^^^^^^^^ required by this bound in `Simd`

But I think twice and found depending on AVX512 might not be a good way.

@akiradeveloper
Copy link
Owner Author

fn gather_simd(index: &[u8;54], v: &[u8;54]) -> [u8;54] {
    let mut idx0 = [55;32];
    let mut idx1 = [55;32];
    for i in 0..32 {
        idx0[i] = index[i] as usize;
    }
    for i in 0..22 {
        idx1[i] = index[i+32] as usize;
    }
    let idx0 = Simd::from_array(idx0);
    let idx1 = Simd::from_array(idx1);
    let res0 = Simd::gather_or_default(v, idx0);
    let res1 = Simd::gather_or_default(v, idx1);
    let mut out = [0;54];
    for i in 0..32 {
        out[i] = res0[i];
    }
    for i in 0..22 {
        out[i+32] = res1[i];
    }
    out
}

The initial impl is shit and x70 slower than before. But hope is that if 512bit vector is supported. We can remove the copies in both directions (because we can use Simd vector as inv_perm) and of course will be only one operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant