Skip to content

Commit

Permalink
Merge shortbytestring package back into bytestring wrt #444 (#471)
Browse files Browse the repository at this point in the history
* Merge `shortbytestring` package back into `bytestring` wrt #444

* Fix build on ARM

Reusing compareByteArrays and avoiding
excessive pointer arithmetic.

* Speed up reverse by using byteSwap64 tricks

* Remove phase control from inlines

* Improve performance of elemIndex

* Use setByteArray in replicate

* Implement intercalate manually

* Annotate partial functions with HasCallStack

* Fix build on base < 4.12.0.0

* Add uncons/unsnoc

* Correct complexities

* Exclude reverse optimization path from ARM

It seems to cause segfaults on armv7, suggesting
there are issues with 'indexWord8ArrayAsWord64#'.

All other platforms are fine and tests pass.

* Add benchmarks for ShortByteString

* Improve inlining

* Adjust haddock identifiers

* Get rid of writeCharArray#

* Haddock fixes

* Clean up tests

* Use -fexpose-all-unfoldings

* Improve reverse

* Cleanup 'reverse'

* Fix possible GC race with foreign imports

For more information, see
  #471 (comment)

* Disable asserts in shortbytestring.c

* Remove redundant import

* Add documentation about partial functions

* Fold ShortByteString prop tests into ByteString

* Restore previous INLINEs

* Improve naming of bindings

* Consolidate error handling functions

* Remove trailing whitespace

* Fix uncons in documentation

* Rename indexWord64Array to indexWord8ArrayAsWord64

* Improve error message

* Clean up incorrect documentation

* Use div/mod instead of quot/rem

* Simplify branching in reverse

* Move asserts to Haskell

* Prefix C functions

* Fix return type of c_elem_index

* Fix documentation in unfoldrN

* Make unfoldrN more efficient

* Fix maintainer field

* Fix formatting

* Implement takeEnd, dropeEnd and splitAt manually

* Fix some haddock identifiers

* Fix unfoldrN doc

* Add a primops bounds-checking job to CI

* Document and clean up createAndTrim

* Rename errorEmptyList to errorEmptySBS

* Improve documentation for findFromEndUntil

* Improve documentation and naming

* Optimize out quotRem

* Document compareByteArraysOff

* Simplify findIndexOrLength and findFromEndUntil

* Use c_count for count

* Simplify elemIndex

* Remove use of 'mempty'

* Make sure breakSubstring is inlined into isInfixOf

* Simplify stripSuffix and stripPrefix

* Fix redundant import warnings

* Improve 'take'

* Use existing bounnds check in 'drop'

* Avoid 'create' when bytestring is empty

* Optimize filter

* Remove redundant INLINABLE

* Use shorter 'createAndTrim' in 'filter'

* Simplify 'take'

* Simplify 'drop'

* Better formatting

* Add comment to explain DNDEBUG

* Refactor elemIndex

* Optimize 'partition'

* Optimize hot loop in 'partition'

(cherry picked from commit 731caea)
  • Loading branch information
hasufell authored and sjakobi committed Feb 15, 2022
1 parent 90d98c4 commit bb60525
Show file tree
Hide file tree
Showing 14 changed files with 1,793 additions and 143 deletions.
21 changes: 21 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -92,3 +92,24 @@ jobs:
ghc --version
ghc --make -Iinclude -itests:tests/builder -o Main cbits/*.c tests/Main.hs +RTS -s
./Main +RTS -s
bounds-checking:
runs-on: ubuntu-latest
container:
image: fedora:34
steps:
- name: install deps
run: |
dnf install -y gcc gmp gmp-devel make ncurses ncurses-compat-libs xz perl
curl --proto '=https' --tlsv1.2 -sSf https://get-ghcup.haskell.org | BOOTSTRAP_HASKELL_NONINTERACTIVE=1 BOOTSTRAP_HASKELL_MINIMAL=1 sh
source ~/.ghcup/env
ghcup install ghc -u https://downloads.haskell.org/~ghcup/unofficial-bindists/ghc/9.3.20220124/ghc-9.3.20220124-x86_64-linux-fedora-34-bounds-checking-ddf50f4b.tar.xz --set 9.3.20220124
ghcup install cabal
shell: bash
- uses: actions/checkout@v1
- name: test
run: |
source ~/.ghcup/env
cabal update
cabal run -w ghc-9.3.20220124 --ghc-options='-fcheck-prim-bounds -fno-ignore-asserts' bytestring-tests
shell: bash
8 changes: 8 additions & 0 deletions Changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
[0.11.3.0] — Unreleased

* merge `shortbytestring` package back into `bytestring` wrt [#444](https://github.com/haskell/bytestring/issues/444),
adding lots of additional API:
- [Add `all`, `any`, `append`, `break`, `breakEnd`, `breakSubstring`, `concat`, `cons`, `count`, `drop`, `dropEnd`, `dropWhile`, `dropWhileEnd`, `elem`, `elemIndex`, `elemIndices`, `filter`, `find`, `findIndex`, `findIndices`, `foldl'`, `foldl`, `foldl1'`, `foldl1`, `foldr'`, `foldr`, `foldr1'`, `foldr1`, `head`, `init`, `intercalate`, `isInfixOf`, `isPrefixOf`, `isSuffixOf`, `last`, `map`, `partition`, `replicate`, `reverse`, `singleton`, `snoc`, `span`, `spanEnd`, `split`, `splitAt`, `splitWith`, `stripPrefix`, `stripSuffix`, `tail`, `take`, `takeEnd`, `takeWhile`, `takeWhileEnd`, `uncons`, `unfoldr`, `unfoldrN`, `unsnoc`](https://github.com/haskell/bytestring/pull/471)

[0.11.3.0]: https://github.com/haskell/bytestring/compare/0.11.2.0...0.11.3.0

[0.11.2.0] — December 2021

* [Add `Data.ByteString.isValidUtf8`](https://github.com/haskell/bytestring/pull/423)
Expand Down
10 changes: 5 additions & 5 deletions Data/ByteString.hs
Original file line number Diff line number Diff line change
Expand Up @@ -951,7 +951,7 @@ splitAt n ps@(BS x l)
| otherwise = (BS x n, BS (plusForeignPtr x n) (l-n))
{-# INLINE splitAt #-}

-- | Similar to 'P.takeWhile',
-- | Similar to 'Prelude.takeWhile',
-- returns the longest (possibly empty) prefix of elements
-- satisfying the predicate.
takeWhile :: (Word8 -> Bool) -> ByteString -> ByteString
Expand Down Expand Up @@ -979,7 +979,7 @@ takeWhileEnd :: (Word8 -> Bool) -> ByteString -> ByteString
takeWhileEnd f ps = unsafeDrop (findFromEndUntil (not . f) ps) ps
{-# INLINE takeWhileEnd #-}

-- | Similar to 'P.dropWhile',
-- | Similar to 'Prelude.dropWhile',
-- drops the longest (possibly empty) prefix of elements
-- satisfying the predicate and returns the remainder.
dropWhile :: (Word8 -> Bool) -> ByteString -> ByteString
Expand All @@ -997,7 +997,7 @@ dropWhile f ps = unsafeDrop (findIndexOrLength (not . f) ps) ps
dropWhile (`eqWord8` x) = snd . spanByte x
#-}

-- | Similar to 'P.dropWhileEnd',
-- | Similar to 'Prelude.dropWhileEnd',
-- drops the longest (possibly empty) suffix of elements
-- satisfying the predicate and returns the remainder.
--
Expand All @@ -1008,7 +1008,7 @@ dropWhileEnd :: (Word8 -> Bool) -> ByteString -> ByteString
dropWhileEnd f ps = unsafeTake (findFromEndUntil (not . f) ps) ps
{-# INLINE dropWhileEnd #-}

-- | Similar to 'P.break',
-- | Similar to 'Prelude.break',
-- returns the longest (possibly empty) prefix of elements which __do not__
-- satisfy the predicate and the remainder of the string.
--
Expand Down Expand Up @@ -1054,7 +1054,7 @@ breakByte c p = case elemIndex c p of
breakEnd :: (Word8 -> Bool) -> ByteString -> (ByteString, ByteString)
breakEnd p ps = splitAt (findFromEndUntil p ps) ps

-- | Similar to 'P.span',
-- | Similar to 'Prelude.span',
-- returns the longest (possibly empty) prefix of elements
-- satisfying the predicate and the remainder of the string.
--
Expand Down
10 changes: 5 additions & 5 deletions Data/ByteString/Lazy.hs
Original file line number Diff line number Diff line change
Expand Up @@ -849,7 +849,7 @@ splitAt i cs0 = splitAt' i cs0
in (Chunk c cs', cs'')


-- | Similar to 'P.takeWhile',
-- | Similar to 'Prelude.takeWhile',
-- returns the longest (possibly empty) prefix of elements
-- satisfying the predicate.
takeWhile :: (Word8 -> Bool) -> ByteString -> ByteString
Expand Down Expand Up @@ -882,7 +882,7 @@ takeWhileEnd f = takeWhileEnd'
c' | S.length c' == S.length c -> (True, Chunk c bs)
| otherwise -> (False, fromStrict c' `append` bs)

-- | Similar to 'P.dropWhile',
-- | Similar to 'Prelude.dropWhile',
-- drops the longest (possibly empty) prefix of elements
-- satisfying the predicate and returns the remainder.
dropWhile :: (Word8 -> Bool) -> ByteString -> ByteString
Expand All @@ -893,7 +893,7 @@ dropWhile f = dropWhile'
n | n < S.length c -> Chunk (S.drop n c) cs
| otherwise -> dropWhile' cs

-- | Similar to 'P.dropWhileEnd',
-- | Similar to 'Prelude.dropWhileEnd',
-- drops the longest (possibly empty) suffix of elements
-- satisfying the predicate and returns the remainder.
--
Expand All @@ -916,7 +916,7 @@ dropWhileEnd f = go []
x' | S.null x' -> dropEndBytes xs
| otherwise -> List.foldl' (flip Chunk) Empty (x' : xs)

-- | Similar to 'P.break',
-- | Similar to 'Prelude.break',
-- returns the longest (possibly empty) prefix of elements which __do not__
-- satisfy the predicate and the remainder of the string.
--
Expand Down Expand Up @@ -995,7 +995,7 @@ spanByte c (LPS ps) = case (spanByte' ps) of (a,b) -> (LPS a, LPS b)
| otherwise -> (x' : [], x'' : xs)
-}

-- | Similar to 'P.span',
-- | Similar to 'Prelude.span',
-- returns the longest (possibly empty) prefix of elements
-- satisfying the predicate and the remainder of the string.
--
Expand Down
107 changes: 97 additions & 10 deletions Data/ByteString/Short.hs
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

-- |
-- Module : Data.ByteString.Short
-- Copyright : (c) Duncan Coutts 2012-2013
-- Copyright : (c) Duncan Coutts 2012-2013, Julian Ospald 2022
-- License : BSD-style
--
-- Maintainer : [email protected]
-- Maintainer : [email protected]
-- Stability : stable
-- Portability : ghc only
--
Expand Down Expand Up @@ -67,26 +67,113 @@ module Data.ByteString.Short (
-- small unpinned strings are allocated in the same way as normal heap
-- allocations, rather than in a separate pinned area.

-- * Conversions
toShort,
fromShort,
-- * Introducing and eliminating 'ShortByteString's
empty,
singleton,
pack,
unpack,
fromShort,
toShort,

-- * Other operations
empty, null, length, index, indexMaybe, (!?),
-- * Basic interface
snoc,
cons,
append,
last,
tail,
uncons,
head,
init,
unsnoc,
null,
length,

-- ** Encoding validation
-- * Encoding validation
isValidUtf8,

-- * Transforming ShortByteStrings
map,
reverse,
intercalate,

-- * Reducing 'ShortByteString's (folds)
foldl,
foldl',
foldl1,
foldl1',

foldr,
foldr',
foldr1,
foldr1',

-- ** Special folds
all,
any,
concat,

-- ** Generating and unfolding ByteStrings
replicate,
unfoldr,
unfoldrN,

-- * Substrings

-- ** Breaking strings
take,
takeEnd,
takeWhileEnd,
takeWhile,
drop,
dropEnd,
dropWhile,
dropWhileEnd,
breakEnd,
break,
span,
spanEnd,
splitAt,
split,
splitWith,
stripSuffix,
stripPrefix,

-- * Predicates
isInfixOf,
isPrefixOf,
isSuffixOf,

-- ** Search for arbitrary substrings
breakSubstring,

-- * Searching ShortByteStrings

-- ** Searching by equality
elem,

-- ** Searching with a predicate
find,
filter,
partition,

-- * Indexing ShortByteStrings
index,
indexMaybe,
(!?),
elemIndex,
elemIndices,
count,
findIndex,
findIndices,

-- * Low level conversions
-- ** Packing 'Foreign.C.String.CString's and pointers
packCString,
packCStringLen,

-- ** Using ByteStrings as 'Foreign.C.String.CString's
-- ** Using ShortByteStrings as 'Foreign.C.String.CString's
useAsCString,
useAsCStringLen
useAsCStringLen,
) where

import Data.ByteString.Short.Internal
Expand Down
Loading

0 comments on commit bb60525

Please sign in to comment.