Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More inner loops with more arguments than necessary #350

Open
2 of 7 tasks
sjakobi opened this issue Jan 18, 2021 · 0 comments
Open
2 of 7 tasks

More inner loops with more arguments than necessary #350

sjakobi opened this issue Jan 18, 2021 · 0 comments

Comments

@sjakobi
Copy link
Member

sjakobi commented Jan 18, 2021

In #273 and #347 we have optimized several functions by essentially performing static argument transformations by hand. The functions below look like good candidates to attempt similar optimizations.

  • unfoldrN: (Done in Hand SAT optimization for Data.ByteString.unfoldrN #356) I believe we could float out the p argument and use n as the offset to use with pokeByteOff.
    unfoldrN :: Int -> (a -> Maybe (Word8, a)) -> a -> (ByteString, Maybe a)
    unfoldrN i f x0
    | i < 0 = (empty, Just x0)
    | otherwise = unsafePerformIO $ createAndTrim' i $ \p -> go p x0 0
    where
    go !p !x !n
    | n == i = return (0, n, Just x)
    | otherwise = case f x of
    Nothing -> return (0, n, Nothing)
    Just (w,x') -> do poke p w
    go (p `plusPtr` 1) x' (n+1)
  • partition: In the sep and rev loops, p2 could be computed from p1. Alternatively, both memory locations could be tracked via a single offset.

    bytestring/Data/ByteString.hs

    Lines 1477 to 1507 in c470cf2

    partition :: (Word8 -> Bool) -> ByteString -> (ByteString, ByteString)
    partition f s = unsafeDupablePerformIO $
    do fp' <- mallocByteString len
    withForeignPtr fp' $ \p ->
    do let end = p `plusPtr` (len - 1)
    mid <- sep 0 p end
    rev mid end
    let i = mid `minusPtr` p
    return (BS fp' i,
    BS (plusForeignPtr fp' i) (len - i))
    where
    len = length s
    incr = (`plusPtr` 1)
    decr = (`plusPtr` (-1))
    sep !i !p1 !p2
    | i == len = return p1
    | f w = do poke p1 w
    sep (i + 1) (incr p1) p2
    | otherwise = do poke p2 w
    sep (i + 1) p1 (decr p2)
    where
    w = s `unsafeIndex` i
    rev !p1 !p2
    | p1 >= p2 = return ()
    | otherwise = do a <- peek p1
    b <- peek p2
    poke p1 b
    poke p2 a
    rev (incr p1) (decr p2)
  • hGetLine.findEOL: It might be worthwhile to float out the w and raw arguments.

    bytestring/Data/ByteString.hs

    Lines 1846 to 1852 in c470cf2

    findEOL r w raw
    | r == w = return w
    | otherwise = do
    c <- readWord8Buf raw r
    if c == fromIntegral (ord '\n')
    then return r -- NB. not r+1: don't include the '\n'
    else findEOL (r+1) w raw
  • Char8.readInt.digits: The p and b arguments are always incremented in sync, which indicates that one could be made static and floated out.
    digits !maxq !maxr !e !ptr = go ptr
    where
    go :: Ptr Word8 -> Int -> Word -> IO (Int, Word, Bool)
    go !p !b !a | p == e = return (b, a, True)
    go !p !b !a = do
    !w <- fromIntegral <$> peek p
    let !d = w - 0x30
    if d > 9 -- No more digits
    then return (b, a, True)
    else if a < maxq -- Look for more
    then go (p `plusPtr` 1) (b + 1) (a * 10 + d)
    else if a > maxq -- overflow
    then return (b, a, False)
    else if d <= maxr -- Ideally this will be the last digit
    then go (p `plusPtr` 1) (b + 1) (a * 10 + d)
    else return (b, a, False) -- overflow
  • Lazy.Char8.readInt.digits: Same situation as in the strict version
    digits !maxq !maxr !e !ptr = go ptr
    where
    go :: Ptr Word8 -> Int -> Word -> IO (Int, Word, Bool)
    go !p !b !a | p == e = return (b, a, True)
    go !p !b !a = do
    !w <- fromIntegral <$> peek p
    let !d = w - 0x30
    if d > 9 -- No more digits
    then return (b, a, True)
    else if a < maxq -- Look for more
    then go (p `plusPtr` 1) (b + 1) (a * 10 + d)
    else if a > maxq -- overflow
    then return (b, a, False)
    else if d <= maxr -- Ideally this will be the last digit
    then go (p `plusPtr` 1) (b + 1) (a * 10 + d)
    else return (b, a, False) -- overflow
  • packZipWith.zipWith_: The r argument could be floated out. @Boarders points out that this doesn't always improve performance though: Try optimizing loops in Data.ByteString.findIndex[End] in the style of #273 #338 (comment). Investigating this might be worthwhile though.

    bytestring/Data/ByteString.hs

    Lines 1667 to 1684 in c470cf2

    packZipWith :: (Word8 -> Word8 -> Word8) -> ByteString -> ByteString -> ByteString
    packZipWith f (BS fp l) (BS fq m) = unsafeDupablePerformIO $
    withForeignPtr fp $ \a ->
    withForeignPtr fq $ \b ->
    create len $ go a b
    where
    go p1 p2 = zipWith_ 0
    where
    zipWith_ :: Int -> Ptr Word8 -> IO ()
    zipWith_ !n !r
    | n >= len = return ()
    | otherwise = do
    x <- peekByteOff p1 n
    y <- peekByteOff p2 n
    pokeByteOff r n (f x y)
    zipWith_ (n+1) r
    len = min l m
  • Short.partition.go.go': bw2 = br - bw1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant