-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic.takeWhile is not copy-free #330
Comments
Great catch!
But what the problem? Both functions are basically same. In case of no fusion: find index of first element satisfying predicate and then slice underlying vector. Ony difference is instead of |
Think of: import qualified Data.Vector.Unboxed as VU
xs :: VU.Vector Int
xs = [some stray vector actually living on heap]
f :: Int -> Bool
f = ...
g :: Int -> Word
g = ...
h :: Word -> Bool
h = ...
n :: Int
n = ...
val :: VU.Vector Word
val = VU.take n $ VU.filter h $ VU.map g $ VU.takeWhile f xs Let: t0 = fromMaybe (VU.length xs) $ VU.findIndex (not . f) xs
t1 = maybe (VU.length xs) (+1) $ VU.findIndices (h . g) xs VU.!? (n-1) Then, the time complexity of the computation of |
Therefore, |
My proposal is to utilize the field Data.Vector.Generic.new (Data.Vector.Generic.New.unstream Bundle{ sVector = Just v } = v Of course, we redefine |
Specifically, all the slice-taking functions must set |
Wouldn't that be nice. Unfortunately a rule like that is simply not possible. The problem is that However, what I think might be possible is defining different |
@lehins Thanks for analysis.
I see. I had a hunch that it's the case; it looked somewhat weird since it had pattern match in LHS 😅
I don't really get it. Brutally roughly speaking, I was thinking of: -- in Data.Vector.Generic.New
{-# INLINE_INNER clone' #-} -- Very creepy we have to have INLINE_INNER here.
clone' v = Data.Vector.Generic.clone v
{-# INLINE_FUSED unstream #-}
unstream (B.Bundle {sVector = Just v}) = clone' v
unstream B.Bundle{..} = [as always]
{-# RULES "new/clone'" Data.Vector.Generic.new (clone' v) = v #-} -- Of course it goes to Data.Vector.Generic How different is your solution from mine? Where does the function
I can't agree more. With my proposal, the fix would be a big deal for a single function anyhow we implement it. Let me point out that we should make sure we remove "without copying" from the documentation if we are not able to make |
Uh, I've forgotten to say. I'd be happy to implement it. That is, I'm a laaaaazy person and it might take a month or two, so I'd be happy if you'd wait for me. This would be a big change, so I needed to discuss possible implementations of my vague idea. |
Oops. That doesn't work.... |
Here is what I had in mind is at first: -- Data.Vector.Generic
unstreamSlice :: Vector v a => Bundle v a -> v a
{-# INLINE unstream #-}
unstreamSlice (B.Bundle {sVector = Just v}) = v
unstreamSlice s = unstream s followed by addition of more rewrite rules, but ... now I think all these complications might be unnecessary. We might get away with a simpler solution. We can keep --- Data.Vector.Generic
unstream :: Vector v a => Bundle v a -> v a
{-# INLINE unstream #-}
unstream (MBundle.Bundle {MBundle.sVector = Just v}) = v
unstream s = new (New.unstream s)
--- Data.Vector.Generic.Base (or some other module with all functions that don't rely on streaming)
sliceTakeWhile :: Vector v a => (a -> Bool) -> v a -> v a
{-# INLINE takeWhile #-}
sliceTakeWhile f v = take (go 0) v
where
k = length v
go i
| i < k && f (unsafeIndex v i) = go (i + 1)
| otherwise = i
-- Data.Vector.Fusion.Bundle.Monadic
takeWhile :: Monad m => (a -> Bool) -> Bundle m v a -> Bundle m v a
{-# INLINE takeWhile #-}
takeWhile f b@(B.Bundle {sVector = mv}) = (takeWhileM (return . f) b) { sVector = fmap (sliceTakeWhile f) mv) } I've tried this idea out here lehins@1cab907 and it seems to work, but I did not investigate it any deeper. If you are up for a challenge and would like to work on it that would be fantastic. Here are some things I think would be important to have answers to if you do decide to pursue this:
There is no rush. I don't have much time on my hands to work on this and I don't see anyone else eager on getting this done, so we can definitely wait for you.
We haven't discussed the timeline for the next release, but I imagine we could have one before the end of the year. So I think if you won't get it done prior to the next release we can certainly remove "without copying" before making the actual release. |
After looking at your solution, I was thinking of utilizing |
I think I now clarified the situation a lot. I seemingly found something very inconvenient -- we can't make For this comment, I consider any method, including the ones utilizing
I claim that we have an only one solution satisfying the conditions above, and in the solution we rewrite -- in Data.Vector.Generic.New
unstream :: Vector v a => Bundle v a -> New v a
{-# INLINE_FUSED unstream #-}
unstream Bundle {sVector = Just v} = New (Data.Vector.Generic.unsafeThaw v)
unstream s = s `seq` New (MVector.vunstream s) and redefine all slice-taking
Actually, this "solution" replaces However, this is CREEPY. This is a highly unsafe solution. Since we regard Generic.new (New.modify f (New.unstream (Generic.stream v))) is completely legal and modifies the immutable vector |
I am not terribly convinced that this approach is the right one, stuff like I see now how my suggested change to takeWhile :: Vector v a => (a -> Bool) -> v a -> v a
{-# INLINE takeWhile #-}
takeWhile f = unstreamSlice . Bundle.takeWhile f . stream
-- | /O(n)/ Construct a vector from a 'Bundle', but unlike `unstream` prioritize getting a
-- potentially sliced vector, while falling back on construction from the stream.
unstreamSlice :: Vector v a => Bundle v a -> v a
{-# INLINE_FUSED unstreamSlice #-}
unstreamSlice (MBundle.Bundle {MBundle.sVector = Just v}) = v
unstreamSlice s = new (New.unstream s)
{-# RULES
"stream/unstreamSlice [Vector]" forall s.
stream (unstreamSlice s) = s
#-} Do you think this approach will not work and if so what sort of problems do you think we can run into going this route? The benefits I see is that:
This approach would satisfy the no-copy part of documentation, because when the input is a stream it would either fuse if or materialize it, but if
All that being said, I do not have 100% confidence in either solution since we don't have any of these: actual implementation, corresponding benchmarks or tests which check that things fuse properly. So whichever approach you choose to implement it's up to you. If it works in the end and doesn't cause any problems I am sure it will get merged. What I'd recommend is collecting a whole bunch of examples (like I listed above), define their expected behavior and try them out on some rough implementation see if it even works on individual cases. Math and proofs are nice, but without hard code and a running executable they are just theory. ;) |
Huh... I didn't think of a verson of Also, thanks for providing a list of possible tests! In the previous comment, I just needed math because I wanted to reject some buggy implementations: I wanted to detect some bugs that may or may not be invoked depending on GHC's preference of optimization order (and I thought I ruled everything out and I was sad 😅). It's tests that'll finally matter, actually. |
Since current implementation of takeWhile is not copy-free, correct the document to reflect that fact. See also: haskell#330
Since current implementation of takeWhile is not copy-free, correct the document to reflect that fact. See also: haskell#330
I've posted a PR #359 for documentation fix for now. |
Since current implementation of takeWhile is not copy-free, correct the document to reflect that fact. See also: #330
Renamed due to #359 |
This issue is closely related to #182, but it is a contract failure, so I guess we need to do something about this.
The description of this issue is simple: the document of the function
Data.Vector.Generic.takeWhile
says:However, the function's implementation is:
(See it on Hackage, or on GitHub)
The document says the function is copy-free, but it is obvious from the code that it requires copy when:
Note that
Data.Vector.Generic.dropWhile
also had this problem, and that we resolved it on PR #327 by makingdropWhile
actually copy-free.On PR #327, we made it possible by letting
dropWhile
be fusible only in the case the argument vector to the function is already anunstream
ed stream.An obvious solution to this problem is to remove the phrase "without copying" from the documentation. It is completely sensible to choose that way.
The problem is more complex than that of
dropWhile
, and we cannot utilize the method same as the one used in #327.I guess I've come up with a solution that makes
takeWhile
copy-free while preserving all required stream fusion, but it is rather global and complicated, and might let bugs sneak in.I'm being lazy and I couldn't write up everything at once. I'll explain the difficulty of this problem and the proposed solution making
takeWhile
copy-free in subsequent comments.See Also: #182 #327(+#141)
The text was updated successfully, but these errors were encountered: