Iron out API discrepancies #289

Bodigrim · 2020-09-27T21:43:10Z

Missing for lazy bytestrings:

unfoldrN (see also Benchmark unfoldrN and inline createAndTrim' #128)
isInfixOf, breakSubstring (see Lazy variants of isInfixOf and breakSubstring #30 and Search algorithms #307)
~~foldr' and foldr1' (not that strict folds make tremendously much sense for lazy structures, but we already have foldl' and foldl1' in place) (see Straighten folds and scans. #373)~~
~~scanl1, scanr and scanr1 (see Straighten folds and scans. #373)~~
~~takeEnd, dropEnd, takeWhileEnd, dropWhileEnd, spanEnd and breakEnd (see Lazy dropEnd and friends #306).~~

Missing both for strict and lazy bytestrings:

replace :: ByteString -> ByteString -> ByteString -> ByteString (lazy variant would require breakSubstring or similar first, so see Search algorithms #307)
splitOn :: ByteString -> ByteString -> [ByteString] (see splitOn for Data.ByteString/Data.ByteString.Lazy #100)

There are minor but annoying differences between API provided by Data.ByteString, Data.ByteString.Char8, Data.ByteString.Lazy, Data.ByteString.Lazy.Char8. Plus we are missing certain encoding-agnostic functions available for Text, such as compareLength, replace, splitOn, etc.

~~A good starting point (even if a bit outdated) is https://github.com/haskell-backpack/backpack-str/tree/master/str-sig#feature-matrix~~

The text was updated successfully, but these errors were encountered:

sjakobi · 2020-09-28T12:29:38Z

There are minor but annoying differences between API provided by Data.ByteString, Data.ByteString.Char8, Data.ByteString.Lazy, Data.ByteString.Lazy.Char8.

I agree that it would be nice if these APIs were more consistent.

Plus we are missing certain encoding-agnostic functions available for Text, such as compareLength, replace, splitOn, etc.

compareLength makes some sense for lazy bytestrings, but not so much for strict ones IMHO.

Bodigrim · 2020-10-03T21:27:01Z

Missing for lazy bytestrings:

compareLength :: ByteString -> Int -> Ordering
foldr' and foldr1' (not that strict folds make tremendously much sense for lazy structures, but we already have foldl' and foldl1' in place)
scanl1, scanr and scanr1
unfoldrN
isInfixOf, breakSubstring (these may be non-trivial to do right)
takeEnd, dropEnd, takeWhileEnd, dropWhileEnd, spanEnd and breakEnd (but these require design discussion first about a desired degree of laziness).

Missing both for strict and lazy bytestrings:

concatReplicate :: Int -> ByteString -> ByteString (which is called replicate in text)
replace :: ByteString -> ByteString -> ByteString -> ByteString
splitOn :: ByteString -> ByteString -> [ByteString]

sjakobi · 2020-10-04T10:03:16Z

concatReplicate :: Int -> ByteString -> ByteString (which is called replicate in text)

I think users could simply use stimes. No need for an API extension.

For an earlier attempt at a lazy breakSubstring see #129.

Bodigrim · 2020-10-04T14:11:04Z

I think users could simply use stimes. No need for an API extension.

Good idea. In this case it boils down to defining an efficient stimes.

kindaro · 2021-02-18T12:30:11Z

I would like to dibs this. (Together with all the subordinate issues.) It looks like a bunch of fun little exercises, and I think I am able to prototype all of them easily. Then there is the low level optimization part, and at that I may need some advice. Would this sort of help be wanted?

My first goal would be to make a draft pull request with all functions doing roughly the right thing, so that we may discuss small discrepancies in behaviour to death and codify the final behaviour in automated checks.

Bodigrim · 2021-02-18T18:38:56Z

@kindaro yes, sure. Please leave functions depending on lazy breakSubstring out of scope for a while, because we need to decide on a search algorithm first (see #307), but everything else on the list should be accessible and mergeable in a finite amount of time.

kindaro · 2021-03-01T12:04:53Z

sort is missing from lazy variants of byte strings. Is it intentional? I understand that it would need to force the entire stream, but the same reasoning applies to Data.List and to strict folds that we are presently adding and no one is objecting to it. Should we add lazy sorting as well? It would likely mean mashing the chunks together, delegating to strict sort and then chunking up again.

sjakobi · 2021-03-01T12:28:39Z

Should we add lazy sorting as well? It would likely mean mashing the chunks together, delegating to strict sort and then chunking up again.

Yeah, let's address that discrepancy too. I think we can use a counting sort where we make a single pass over the input to determine the total length and the number of occurrences for each byte. Then we simply write out the bytes. Take a look at the strict sort which uses a counting sort for larger inputs too.

kindaro · 2021-03-01T12:50:52Z

Alright, I added sort to the list in #364. A counting algorithm makes sense, I shall use it.

sjakobi · 2021-03-01T13:10:48Z

@kindaro My recommendation would be to split this work across multiple PRs. If you do it all in a single PR, it will be harder to review and you're more likely to run into merge conflicts.

kindaro · 2021-03-01T13:28:14Z

Can you explain why it will be easier to review multiple small pull requests? Either way there is the same amount of code. But with smaller pull requests more bureaucracy is added. What is the gain?

sjakobi · 2021-03-01T13:55:19Z

I simply find large diffs harder to understand and navigate. For example, the question "What tests do we have for addition x?" is IMHO easier to figure out in a small diff.

kindaro · 2021-03-01T14:14:22Z

This makes sense. But by this logic I should add functions one by one. This would result in there being a dozen of tiny pull requests. It would be hard to manage.

"What tests do we have for addition x?"

So I think we should find a cheaper way to answer this question. In most cases you can replace a function by an error "…" and see what breaks. (In the presence of fancy types, some values cannot be replaced by error "…", but we do not have any fancy types in this project.)

I agree that sort may be out of place in #364, since all other features there are essentially variations of foldr.

kindaro · 2021-03-01T14:45:08Z

I narrowed the scope of #364 to folds and scans.

Bodigrim added the pr-welcome label Sep 27, 2020

Bodigrim mentioned this issue Oct 13, 2020

Expand ShortByteStrings API? #83

Closed

kindaro mentioned this issue Feb 20, 2021

Straighten folds and scans. #364

Merged

6 tasks

kindaro mentioned this issue Mar 1, 2021

Straighten folds and scans. #373

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iron out API discrepancies #289

Iron out API discrepancies #289

Bodigrim commented Sep 27, 2020 •

edited

Loading

sjakobi commented Sep 28, 2020

Bodigrim commented Oct 3, 2020 •

edited

Loading

sjakobi commented Oct 4, 2020

Bodigrim commented Oct 4, 2020

kindaro commented Feb 18, 2021

Bodigrim commented Feb 18, 2021

kindaro commented Mar 1, 2021

sjakobi commented Mar 1, 2021

kindaro commented Mar 1, 2021 •

edited

Loading

sjakobi commented Mar 1, 2021

kindaro commented Mar 1, 2021

sjakobi commented Mar 1, 2021

kindaro commented Mar 1, 2021

kindaro commented Mar 1, 2021

Iron out API discrepancies #289

Iron out API discrepancies #289

Comments

Bodigrim commented Sep 27, 2020 • edited Loading

sjakobi commented Sep 28, 2020

Bodigrim commented Oct 3, 2020 • edited Loading

sjakobi commented Oct 4, 2020

Bodigrim commented Oct 4, 2020

kindaro commented Feb 18, 2021

Bodigrim commented Feb 18, 2021

kindaro commented Mar 1, 2021

sjakobi commented Mar 1, 2021

kindaro commented Mar 1, 2021 • edited Loading

sjakobi commented Mar 1, 2021

kindaro commented Mar 1, 2021

sjakobi commented Mar 1, 2021

kindaro commented Mar 1, 2021

kindaro commented Mar 1, 2021

Bodigrim commented Sep 27, 2020 •

edited

Loading

Bodigrim commented Oct 3, 2020 •

edited

Loading

kindaro commented Mar 1, 2021 •

edited

Loading