Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad dropWhile performance #108

Closed
nmk opened this issue Feb 16, 2016 · 13 comments
Closed

Bad dropWhile performance #108

nmk opened this issue Feb 16, 2016 · 13 comments

Comments

@nmk
Copy link

nmk commented Feb 16, 2016

I am filing this as a follow-up to http://stackoverflow.com/questions/35419908/efficient-alternative-to-data-vector-dropwhile

module Main where

import           Criterion.Main
import qualified Data.Vector as V

bNoDrop :: V.Vector Double -> Double
bNoDrop xs
  | V.null xs = 0
  | otherwise = V.last xs / V.head xs

bDropWhile' :: V.Vector Double -> Double
bDropWhile' xs
  | V.null xs = 0
  | otherwise = V.last xss / V.head xss
    where xss = dropWhile' (< 10) xs

          dropWhile' :: (a -> Bool) -> V.Vector a -> V.Vector a
          dropWhile' p v = V.drop (go 0) v where
            go n | n == V.length v       = n
                 | p (V.unsafeIndex v n) = go (n + 1)
                 | otherwise             = n

bDropWhile :: V.Vector Double -> Double
bDropWhile xs
  | V.null xs = 0
  | otherwise = V.last xss / V.head xss
    where xss = V.dropWhile (< 10) xs

setupEnv :: IO (V.Vector Double)
setupEnv = return $ V.enumFromN 0 10000000

main :: IO ()
main = defaultMain [
  env setupEnv $ \v ->
    bgroup "funcs" [
      bench "noDrop"      $ nf bNoDrop     v
    , bench "V.dropWhile" $ nf bDropWhile  v
    , bench "dropWhile'"  $ nf bDropWhile' v
    ]
  ]

Compiling this with GHC 7.10.3 and --make -O2 results in the following:

$ ./A --time-limit 20         
benchmarking funcs/noDrop
time                 30.34 ns   (30.12 ns .. 30.62 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 30.92 ns   (30.57 ns .. 31.44 ns)
std dev              1.819 ns   (1.473 ns .. 2.487 ns)
variance introduced by outliers: 80% (severely inflated)

benchmarking funcs/V.dropWhile
time                 84.39 ms   (83.17 ms .. 85.75 ms)
                     0.999 R²   (0.997 R² .. 1.000 R²)
mean                 84.63 ms   (83.96 ms .. 85.43 ms)
std dev              1.710 ms   (1.242 ms .. 2.378 ms)

benchmarking funcs/dropWhile'
time                 103.2 ns   (102.4 ns .. 104.2 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 104.1 ns   (103.1 ns .. 105.3 ns)
std dev              4.647 ns   (3.911 ns .. 5.638 ns)
variance introduced by outliers: 68% (severely inflated)

./A --time-limit 20  63.85s user 1.83s system 99% cpu 1:05.81 total
@cartazio
Copy link
Contributor

  1. what optimization level did you build the code with? -O2 ? or default? (which would be -O1 or '-O0') i think

  2. could you benchmark that again with setupEnv marked with {-# NOINLINE setupEnv #-}, i suspect that its possibly seeing "through" your io wrapper, inlining it, and in the case of your code, partially evaluating it away! 100ns seems suspiciously fast for walking through 10,000,000 numbers in a loop! If you make your setup function NONINLINE, i believe it should be a more apples to apples comparison

  3. if neither of the previous steps line things up, could you compile the module with
    -ddump-to-file -ddump-simpl -dsuppress-all and link to a gist.github paste of the core?

i suspect that its either 1 or 2, but if not, 3 might be useful.

could you also tell us what OS / GHC version / Vector version you're running this with?

@cartazio
Copy link
Contributor

also it'd be worth seeing if theres a performance difference if you used Data.Vector.Unboxed instead

@nmk
Copy link
Author

nmk commented Feb 17, 2016

  1. All examples are compiled with -O2, GHC 7.10.3 on OS X 10.11.3. Using vector 0.11.0.0.

  2. {-# NOINLINE -#} for setupEnv should not make a difference as it is not measured at all. The vectors are evaluated to NF and then passed to the benchmarked functions. So walking the list is not a part of the measured code. I tried it anyway, the results are more or less the same:

$ ./A
benchmarking funcs/noDrop
time                 33.63 ns   (33.12 ns .. 34.19 ns)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 33.67 ns   (33.25 ns .. 34.07 ns)
std dev              1.468 ns   (1.271 ns .. 1.732 ns)
variance introduced by outliers: 67% (severely inflated)

benchmarking funcs/V.dropWhile
time                 88.60 ms   (86.72 ms .. 90.20 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 88.49 ms   (87.77 ms .. 89.11 ms)
std dev              1.124 ms   (861.5 μs .. 1.351 ms)

benchmarking funcs/dropWhile'
time                 112.5 ns   (111.1 ns .. 113.9 ns)
                     0.998 R²   (0.998 R² .. 0.999 R²)
mean                 114.0 ns   (112.1 ns .. 115.8 ns)
std dev              6.062 ns   (5.231 ns .. 7.442 ns)
variance introduced by outliers: 73% (severely inflated)

I tried it with unboxed vectors, and it speeds up all functions by a factor of 2, though the ratio remains more or less the same:

module Main where

import           Criterion.Main
import qualified Data.Vector.Unboxed as V

bNoDrop :: V.Vector Double -> Double
bNoDrop xs
  | V.null xs = 0
  | otherwise = V.last xs / V.head xs

bDropWhile' :: V.Vector Double -> Double
bDropWhile' xs
  | V.null xs = 0
  | otherwise = V.last xss / V.head xss
    where xss = dropWhile' (< 10) xs

          dropWhile' :: V.Unbox a => (a -> Bool) -> V.Vector a -> V.Vector a
          dropWhile' p v = V.drop (go 0) v where
            go n | n == V.length v       = n
                 | p (V.unsafeIndex v n) = go (n + 1)
                 | otherwise             = n

bDropWhile :: V.Vector Double -> Double
bDropWhile xs
  | V.null xs = 0
  | otherwise = V.last xss / V.head xss
    where xss = V.dropWhile (< 10) xs

setupEnv :: IO (V.Vector Double)
setupEnv = return $ V.enumFromN 0 10000000
{-# NOINLINE setupEnv #-}

main :: IO ()
main = defaultMain [
  env setupEnv $ \v ->
    bgroup "funcs" [
      bench "noDrop"      $ nf bNoDrop     v
    , bench "V.dropWhile" $ nf bDropWhile  v
    , bench "dropWhile'"  $ nf bDropWhile' v
    ]
  ]
$ ./A
benchmarking funcs/noDrop
time                 22.69 ns   (22.51 ns .. 22.92 ns)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 22.96 ns   (22.68 ns .. 23.40 ns)
std dev              1.138 ns   (811.1 ps .. 1.623 ns)
variance introduced by outliers: 73% (severely inflated)

benchmarking funcs/V.dropWhile
time                 39.46 ms   (37.93 ms .. 41.89 ms)
                     0.987 R²   (0.968 R² .. 0.998 R²)
mean                 38.79 ms   (37.89 ms .. 40.21 ms)
std dev              2.271 ms   (1.181 ms .. 3.348 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking funcs/dropWhile'
time                 53.11 ns   (52.61 ns .. 53.70 ns)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 53.36 ns   (52.69 ns .. 54.22 ns)
std dev              2.515 ns   (1.871 ns .. 3.525 ns)
variance introduced by outliers: 69% (severely inflated)
  1. https://gist.github.com/nmk/ed115b30480a30cc917c (this is for the version with unboxed vectors)

@cartazio
Copy link
Contributor

ok, i think it has to do with sharing, i'll test it after i get back from lunch, stay tuned!

@cartazio
Copy link
Contributor

my hypothesis i shall test this afternoon, based on reading the core , is if you put the (\x -> V.last $ V.dropWhile (< 10) x) and (\x -> V.head $ dropWhile (< 10) x) into two seperate top level functions and mark them noinline, the performance should matchup.

@cartazio
Copy link
Contributor

I'm still trying to do some sleuthing, but so we can unpack whats going on, lets unwrap some definitions!

the generic vector dropwhile is define like so:

-- | /O(n)/ Drop the longest prefix of elements that satisfy the predicate
-- without copying.
dropWhile :: Vector v a => (a -> Bool) -> v a -> v a
{-# INLINE dropWhile #-}
dropWhile f = unstream . Bundle.dropWhile f . stream

then the bundle rep is defined as follows

-- | Drop the longest prefix of elements that satisfy the predicate
dropWhile :: Monad m => (a -> Bool) -> Bundle m v a -> Bundle m v a
{-# INLINE dropWhile #-}
dropWhile f = dropWhileM (return . f)

-- | Drop the longest prefix of elements that satisfy the monadic predicate
dropWhileM :: Monad m => (a -> m Bool) -> Bundle m v a -> Bundle m v a
{-# INLINE_FUSED dropWhileM #-}
dropWhileM f Bundle{sElems = s, sSize = n} = fromStream (S.dropWhileM f s) (toMax n)

then the stream version is defined as

-- | Drop the longest prefix of elements that satisfy the predicate
dropWhile :: Monad m => (a -> Bool) -> Stream m a -> Stream m a
{-# INLINE dropWhile #-}
dropWhile f = dropWhileM (return . f)

data DropWhile s a = DropWhile_Drop s | DropWhile_Yield a s | DropWhile_Next s

-- | Drop the longest prefix of elements that satisfy the monadic predicate
dropWhileM :: Monad m => (a -> m Bool) -> Stream m a -> Stream m a
{-# INLINE_FUSED dropWhileM #-}
dropWhileM f (Stream step t) = Stream step' (DropWhile_Drop t)
  where
    -- NOTE: we jump through hoops here to have only one Yield; local data
    -- declarations would be nice!

    {-# INLINE_INNER step' #-}
    step' (DropWhile_Drop s)
      = do
          r <- step s
          case r of
            Yield x s' -> do
                            b <- f x
                            return $ if b then Skip (DropWhile_Drop    s')
                                          else Skip (DropWhile_Yield x s')
            Skip    s' -> return $ Skip (DropWhile_Drop    s')
            Done       -> return $ Done

    step' (DropWhile_Yield x s) = return $ Yield x (DropWhile_Next s)

    step' (DropWhile_Next s)
      = liftM (\r ->
          case r of
            Yield x s' -> Skip    (DropWhile_Yield x s')
            Skip    s' -> Skip    (DropWhile_Next    s')
            Done       -> Done
        ) (step s)

@treeowl
Copy link
Contributor

treeowl commented Feb 17, 2016

Without digging deeply into this, I think the liftM in step' looks a bit
weird. I'd have expected <$!> instead, or (better, perhaps) manual
expansion with >>= pushing the returns into the case branches. But this
could have nothing to do with it.
On Feb 17, 2016 4:03 PM, "Carter Tazio Schonwald" [email protected]
wrote:

I'm still trying to do some sleuthing, but so we can unpack whats going
on, lets unwrap some definitions!

the generic vector dropwhile is define like so:

-- | /O(n)/ Drop the longest prefix of elements that satisfy the predicate
-- without copying.
dropWhile :: Vector v a => (a -> Bool) -> v a -> v a
{-# INLINE dropWhile #-}
dropWhile f = unstream . Bundle.dropWhile f . stream

then the bundle rep is defined as follows

-- | Drop the longest prefix of elements that satisfy the predicate
dropWhile :: Monad m => (a -> Bool) -> Bundle m v a -> Bundle m v a
{-# INLINE dropWhile #-}
dropWhile f = dropWhileM (return . f)

-- | Drop the longest prefix of elements that satisfy the monadic predicate
dropWhileM :: Monad m => (a -> m Bool) -> Bundle m v a -> Bundle m v a
{-# INLINE_FUSED dropWhileM #-}
dropWhileM f Bundle{sElems = s, sSize = n} = fromStream (S.dropWhileM f s) (toMax n)

then the stream version is defined as

-- | Drop the longest prefix of elements that satisfy the predicate
dropWhile :: Monad m => (a -> Bool) -> Stream m a -> Stream m a
{-# INLINE dropWhile #-}
dropWhile f = dropWhileM (return . f)

data DropWhile s a = DropWhile_Drop s | DropWhile_Yield a s | DropWhile_Next s

-- | Drop the longest prefix of elements that satisfy the monadic predicate
dropWhileM :: Monad m => (a -> m Bool) -> Stream m a -> Stream m a
{-# INLINE_FUSED dropWhileM #-}
dropWhileM f (Stream step t) = Stream step' (DropWhile_Drop t)
where
-- NOTE: we jump through hoops here to have only one Yield; local data
-- declarations would be nice!

{-# INLINE_INNER step' #-}
step' (DropWhile_Drop s)
  = do
      r <- step s
      case r of
        Yield x s' -> do
                        b <- f x
                        return $ if b then Skip (DropWhile_Drop    s')
                                      else Skip (DropWhile_Yield x s')
        Skip    s' -> return $ Skip (DropWhile_Drop    s')
        Done       -> return $ Done

step' (DropWhile_Yield x s) = return $ Yield x (DropWhile_Next s)

step' (DropWhile_Next s)
  = liftM (\r ->
      case r of
        Yield x s' -> Skip    (DropWhile_Yield x s')
        Skip    s' -> Skip    (DropWhile_Next    s')
        Done       -> Done
    ) (step s)


Reply to this email directly or view it on GitHub
#108 (comment).

@treeowl
Copy link
Contributor

treeowl commented Feb 17, 2016

Returning the result of an if-then-else is also surprisingly (to me) lazy.
On Feb 17, 2016 4:03 PM, "Carter Tazio Schonwald" [email protected]
wrote:

I'm still trying to do some sleuthing, but so we can unpack whats going
on, lets unwrap some definitions!

the generic vector dropwhile is define like so:

-- | /O(n)/ Drop the longest prefix of elements that satisfy the predicate
-- without copying.
dropWhile :: Vector v a => (a -> Bool) -> v a -> v a
{-# INLINE dropWhile #-}
dropWhile f = unstream . Bundle.dropWhile f . stream

then the bundle rep is defined as follows

-- | Drop the longest prefix of elements that satisfy the predicate
dropWhile :: Monad m => (a -> Bool) -> Bundle m v a -> Bundle m v a
{-# INLINE dropWhile #-}
dropWhile f = dropWhileM (return . f)

-- | Drop the longest prefix of elements that satisfy the monadic predicate
dropWhileM :: Monad m => (a -> m Bool) -> Bundle m v a -> Bundle m v a
{-# INLINE_FUSED dropWhileM #-}
dropWhileM f Bundle{sElems = s, sSize = n} = fromStream (S.dropWhileM f s) (toMax n)

then the stream version is defined as

-- | Drop the longest prefix of elements that satisfy the predicate
dropWhile :: Monad m => (a -> Bool) -> Stream m a -> Stream m a
{-# INLINE dropWhile #-}
dropWhile f = dropWhileM (return . f)

data DropWhile s a = DropWhile_Drop s | DropWhile_Yield a s | DropWhile_Next s

-- | Drop the longest prefix of elements that satisfy the monadic predicate
dropWhileM :: Monad m => (a -> m Bool) -> Stream m a -> Stream m a
{-# INLINE_FUSED dropWhileM #-}
dropWhileM f (Stream step t) = Stream step' (DropWhile_Drop t)
where
-- NOTE: we jump through hoops here to have only one Yield; local data
-- declarations would be nice!

{-# INLINE_INNER step' #-}
step' (DropWhile_Drop s)
  = do
      r <- step s
      case r of
        Yield x s' -> do
                        b <- f x
                        return $ if b then Skip (DropWhile_Drop    s')
                                      else Skip (DropWhile_Yield x s')
        Skip    s' -> return $ Skip (DropWhile_Drop    s')
        Done       -> return $ Done

step' (DropWhile_Yield x s) = return $ Yield x (DropWhile_Next s)

step' (DropWhile_Next s)
  = liftM (\r ->
      case r of
        Yield x s' -> Skip    (DropWhile_Yield x s')
        Skip    s' -> Skip    (DropWhile_Next    s')
        Done       -> Done
    ) (step s)


Reply to this email directly or view it on GitHub
#108 (comment).

@cartazio
Copy link
Contributor

did this ticket have any actionable conclusions or was it ultimately just a good "tradeoffs are complicated" dialogue? :)

@cartazio
Copy link
Contributor

i'm closing for now, please chime in if you think theres a concrete action we can take that improves perf for all

@treeowl
Copy link
Contributor

treeowl commented Jul 24, 2016

I think removing the mysterious laziness is likely good for compiler
analysis generally. Even if it works okay in simple cases, it could get
confused by more complex context.

On Jul 24, 2016 2:55 PM, "Carter Tazio Schonwald" [email protected]
wrote:

i'm closing for now, please chime in if you think theres a concrete action
we can take that improves perf for all


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#108 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABzi_b9EMaZx6oKBUPGgRNTOoeI6oH7Vks5qY7USgaJpZM4HbBrN
.

@cartazio
Copy link
Contributor

@treeowl could you throw up a PR with the proposed diff so we can see what you're meaning? sounds like it would be an easy win if you can walk me and @dolio through the change and the semantics that way

@nmk nmk mentioned this issue Dec 22, 2016
@charles-cooper
Copy link

Perhaps I am misunderstanding the implementation, but it seems that dropWhile uses unstream which ultimately calls vmunstream -- resulting in a big fat copy of the rest of the vector. Meanwhile drop uses unsafeSlice which does no copy.

The fix would then seem to be to use the findIndex -> drop method for the Vector version of dropWhile and maintain the current stream version otherwise.

charles-cooper added a commit to charles-cooper/vector that referenced this issue Dec 22, 2016
The `unstream` version results in a copy of the remainder of the vector
which is rarely what you want to do. This commit uses `drop` directly
after `findIndex` which results in a big speed gain if the remaining
part of the vector is large.

Cf. haskell#108
and haskell#141
gksato added a commit to gksato/vector that referenced this issue Aug 2, 2020
Now dropWhile's implementation uses stream
only in case the stream fusion is already running.
This is expected to reduce unnecessary re-allocation.

cf. haskell#141
haskell#108
Shimuuar pushed a commit that referenced this issue Aug 13, 2020
* make dropWhile fusion adaptive

Now dropWhile's implementation uses stream
only in case the stream fusion is already running.
This is expected to reduce unnecessary re-allocation.

cf. #141
#108

* Adaptive dropWhile Implementation: Implementation comment modified

Modified the comment for the RULE "dropWhile/unstream [Vector]"
in order to make the intention of the implementation there clear.
lehins pushed a commit that referenced this issue Jan 16, 2021
* make dropWhile fusion adaptive

Now dropWhile's implementation uses stream
only in case the stream fusion is already running.
This is expected to reduce unnecessary re-allocation.

cf. #141
#108

* Adaptive dropWhile Implementation: Implementation comment modified

Modified the comment for the RULE "dropWhile/unstream [Vector]"
in order to make the intention of the implementation there clear.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants