Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

findIndexEnd #155

Merged
merged 4 commits into from
Jul 6, 2020
Merged

findIndexEnd #155

merged 4 commits into from
Jul 6, 2020

Conversation

strake
Copy link
Contributor

@strake strake commented May 4, 2018

No description provided.

@strake
Copy link
Contributor Author

strake commented Jul 3, 2018

ping

1 similar comment
@strake
Copy link
Contributor Author

strake commented Aug 19, 2018

ping

@strake
Copy link
Contributor Author

strake commented Mar 26, 2019

ping

1 similar comment
@strake
Copy link
Contributor Author

strake commented Nov 23, 2019

ping

Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks very solid! A few requests though:

  1. Add complexity annotations.
  2. Add tests, ideally property tests.
  3. Check whether we can refactor elemIndexEnd to use findIndexEnd and still get the same core.

let !n' = n + S.length c
!i = fmap (fromIntegral . (n +)) $ S.findIndexEnd k c
in findIndexEnd' n' cs `mplus` i
{-# INLINE findIndexEnd #-}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how efficient this is, but it's very similar to the existing elemIndexEnd, so at least it shouldn't be worse.

-- | /O(n)/ The 'elemIndexEnd' function returns the last index of the
-- element in the given 'ByteString' which is equal to the query
-- element, or 'Nothing' if there is no such element. The following
-- holds:
--
-- > elemIndexEnd c xs ==
-- > (-) (length xs - 1) `fmap` elemIndex c (reverse xs)
--
-- @since 0.10.6.0
elemIndexEnd :: Word8 -> ByteString -> Maybe Int64
elemIndexEnd w = elemIndexEnd' 0
where
elemIndexEnd' _ Empty = Nothing
elemIndexEnd' n (Chunk c cs) =
let !n' = n + S.length c
!i = fmap (fromIntegral . (n +)) $ S.elemIndexEnd w c
in elemIndexEnd' n' cs `mplus` i

if k w
then return (Just n)
else go ptr (n-1)
{-# INLINE findIndexEnd #-}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very similar to elemIndexEnd, although some style and variable name differences make that a bit hard to see.

bytestring/Data/ByteString.hs

Lines 1092 to 1109 in 95fe6bd

-- | /O(n)/ The 'elemIndexEnd' function returns the last index of the
-- element in the given 'ByteString' which is equal to the query
-- element, or 'Nothing' if there is no such element. The following
-- holds:
--
-- > elemIndexEnd c xs ==
-- > (-) (length xs - 1) `fmap` elemIndex c (reverse xs)
--
elemIndexEnd :: Word8 -> ByteString -> Maybe Int
elemIndexEnd ch (PS x s l) = accursedUnutterablePerformIO $ withForeignPtr x $ \p ->
go (p `plusPtr` s) (l-1)
where
go !p !i | i < 0 = return Nothing
| otherwise = do ch' <- peekByteOff p i
if ch == ch'
then return $ Just i
else go p (i-1)
{-# INLINE elemIndexEnd #-}

@strake strake force-pushed the findIndexEnd branch 5 times, most recently from d0928c4 to 8815799 Compare December 19, 2019 07:21
@hvr
Copy link
Member

hvr commented Dec 19, 2019

The reason I was hesitating about this new addition is that it tends to increase the surface API of bytestring; and for text which is effectively bytestring's cousin we have the policy to grow the surface API only when there's good motivation.

Moreover, this PR is a bit assymetric as it only extends the non-Char8 version.

@sjakobi
Copy link
Member

sjakobi commented Dec 19, 2019

Given that elemIndex has an End variant, and given that findIndex is a very simple generalization of elemIndex, I think it's just natural and consistent to add findIndexEnd too.

Since Char8.findIndexEnd is so trivial to define in terms of the Word8 version, I'm not sure about the value of adding it too. But of course, consistency is always nice.

BTW Data.ByteString.Lazy.Char8 doesn't have elemIndexEnd, as noted in #104. I'm not sure whether I should read that issue as a feature request though.

@strake
Copy link
Contributor Author

strake commented Dec 19, 2019

Indeed, given we have findIndex, elemIndex, and elemIndexEnd, it was an unpleasant surprise we lack findIndexEnd.

@sjakobi
Copy link
Member

sjakobi commented Jan 25, 2020

The reason I was hesitating about this new addition is that it tends to increase the surface API of bytestring; and for text which is effectively bytestring's cousin we have the policy to grow the surface API only when there's good motivation.

While I don't think that API considerations for text should necessarily be an important concern in decisions on the bytestring API, I'd like to point out that findIndexEnd has also been requested for Text:

haskell/text#243

Copy link
Contributor

@Bodigrim Bodigrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second that it's worth to fix assymetry between elemIndex / findIndex / elemIndexEnd / missing findIndexEnd. Let us merge this and raise a separate issue about similar changes for Char8 version.

@Bodigrim
Copy link
Contributor

@strake care to add changelog entries please?

@strake
Copy link
Contributor Author

strake commented May 13, 2020

@Bodigrim Done

@Bodigrim
Copy link
Contributor

@strake sorry for confusion, 0.10.10.1 has not been released yet, so please put your entry under it.

@sjakobi are you satisfied with changes after your review?

@strake
Copy link
Contributor Author

strake commented May 13, 2020

@Bodigrim I believe this needs a minor (rather than patch) version bump, per PVP.

@Bodigrim
Copy link
Contributor

I agree, I think it is fair to change 0.10.10.1 to 0.10.X.X and the rest of the line to TBA, since I believe @hvr will make a release only at some point in future.

@strake
Copy link
Contributor Author

strake commented May 13, 2020

@Bodigrim I'm not sure whether you're asking me to bump the version in the changelog or implying someone else will do so and i should then merge/rebase.

@Bodigrim
Copy link
Contributor

I'm asking you to change the existing header in Changelog.md from 0.10.10.1 ... to 0.10.X.X TBA and put your entry under this section.

There was no 0.10.10.1 release, and the next version number will be up to @hvr to decide, when the time is right. So let's put 0.10.X.X for now.

@sjakobi
Copy link
Member

sjakobi commented May 14, 2020

Since I was interested in the effect of the refactoring of elemIndexEnd, I compared the old Core with the one produced by this patch:

--- RHS size: {terms: 102, types: 81, coercions: 0, joins: 3/5}
-elemIndexEnd
-  = \ ch ds ->
-      case ds of { PS dt dt1 dt2 dt3 ->
-      let { ww = plusAddr# dt dt2 } in
-      let { ww1 = -# dt3 1# } in
+-- RHS size: {terms: 102, types: 79, coercions: 0, joins: 3/5}
+$welemIndexEnd
+  = \ w ww ww1 ww2 ww3 ->
+      let { ww4 = plusAddr# ww ww2 } in
+      let { ww5 = -# ww3 1# } in
       join {
-        exit ww2 ipv
-          = case touch# dt1 ipv of { __DEFAULT -> Just (I# ww2) } } in
-      join { exit1 w = case touch# dt1 w of { __DEFAULT -> Nothing } } in
-      case <# ww1 0# of {
+        exit ww6 ipv
+          = case touch# ww1 ipv of { __DEFAULT -> Just (I# ww6) } } in
+      join {
+        exit1 w1 = case touch# ww1 w1 of { __DEFAULT -> Nothing } } in
+      case <# ww5 0# of {
         __DEFAULT ->
-          case readWord8OffAddr# (plusAddr# ww ww1) 0# realWorld# of
+          case readWord8OffAddr# (plusAddr# ww4 ww5) 0# realWorld# of
           { (# ipv, ipv1 #) ->
-          case ch of { W8# x ->
+          case w of { W8# x ->
           case eqWord# x ipv1 of {
             __DEFAULT ->
               joinrec {
-                $wgo2 @ b ww2 ww3 w
-                  = case <# ww3 0# of {
+                $wgo2 @ b ww6 ww7 w1
+                  = case <# ww7 0# of {
                       __DEFAULT ->
-                        case readWord8OffAddr# (plusAddr# ww2 ww3) 0# w of
+                        case readWord8OffAddr# (plusAddr# ww6 ww7) 0# w1 of
                         { (# ipv2, ipv3 #) ->
                         case eqWord# x ipv3 of {
-                          __DEFAULT -> jump $wgo2 ww2 (-# ww3 1#) ipv2;
-                          1# -> jump exit ww3 ipv2
+                          __DEFAULT -> jump $wgo2 ww6 (-# ww7 1#) ipv2;
+                          1# -> jump exit ww7 ipv2
                         }
                         };
-                      1# -> jump exit1 w
+                      1# -> jump exit1 w1
                     }; } in
-              jump $wgo2 ww (-# ww1 1#) ipv;
-            1# -> jump exit ww1 ipv
+              jump $wgo2 ww4 (-# ww5 1#) ipv;
+            1# -> jump exit ww5 ipv
           }
           }
           };
         1# -> jump exit1 realWorld#
       }
+
+-- RHS size: {terms: 11, types: 7, coercions: 0, joins: 0/0}
+elemIndexEnd
+  = \ w w1 ->
+      case w1 of { PS ww1 ww2 ww3 ww4 ->
+      $welemIndexEnd w ww1 ww2 ww3 ww4
+      }

Apart from a few changed variable names, the main effect is that elemIndexEnd is split into a wrapper around a worker function $welemIndexEnd. That seems alright to me. :)

Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo the changelog entry. :)

Thanks @strake!

@strake
Copy link
Contributor Author

strake commented May 16, 2020

I'm asking you to change the existing header in Changelog.md from 0.10.10.1 ... to 0.10.X.X TBA and put your entry under this section.

@Bodigrim Done

@Bodigrim
Copy link
Contributor

@hvr @cartazio could we please have this merged?

@sjakobi
Copy link
Member

sjakobi commented Jul 2, 2020

This looks ready to go to me.

@hsyl20 Would you like to review?

Copy link
Contributor

@hsyl20 hsyl20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add INLINABLE pragmas to elemIndexEnd to ensure that they are exported.

Otherwise it looks good to me.

@sjakobi sjakobi added this to the 0.10.12.0 milestone Jul 2, 2020
@strake
Copy link
Contributor Author

strake commented Jul 6, 2020

@hsyl20 done

@Bodigrim Bodigrim merged commit 89261e0 into haskell:master Jul 6, 2020
@Bodigrim
Copy link
Contributor

Bodigrim commented Jul 6, 2020

Thanks, @strake!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants