Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fusion not working as advertised #202

Closed
nomeata opened this issue Oct 17, 2017 · 3 comments
Closed

Fusion not working as advertised #202

nomeata opened this issue Oct 17, 2017 · 3 comments
Labels
bug question Requires more investigation

Comments

@nomeata
Copy link
Contributor

nomeata commented Oct 17, 2017

The documentation of Data.Text states

Most of the functions in this module are subject to fusion, meaning that a pipeline of such functions will usually allocate at most one Text value.

As an example, consider the following pipeline:

import Data.Text as T
import Data.Text.Encoding as E
import Data.ByteString (ByteString)
 
countChars :: ByteString -> Int
countChars = T.length . T.toUpper . E.decodeUtf8

From the type signatures involved, this looks like it should allocate one ByteString value, and two Text values. However, when a module is compiled with optimisation enabled under GHC, the two intermediate Text values will be optimised away, and the function will be compiled down to a single loop over the source ByteString.

Functions that can be fused by the compiler are documented with the phrase "Subject to fusion".

but (with GHC-8.0 or 8.2 and text-1.2.2.2) this is not true: Looking at -ddump-simpl I clearly see a Text value constructed of which the length is calculated.

I assume that originally, the example from the introduction did fuse properly, and then it broke…

Do you have a test suite that somehow tests the fusion promises of text? I stumbled over this because I am thinking about a proper, structured way of testing such non-functional properties of code in test suites, based on first successes with using ghc-proofs for that in generic-lens, and maybe text would be a good second case study.

nomeata added a commit to nomeata/inspection-testing that referenced this issue Nov 7, 2017
including the failing example from
haskell/text#202
It seems that in partilar `T.length` fails to fuse as advertised.
@nomeata
Copy link
Contributor Author

nomeata commented May 14, 2018

I did a more comprehensive test of testing properties of the text library, and the test harness that I created spits out this information:

Data.Text:
Total functions (95)
Untestable functions (32)
Fuses (40): pack, decodeUtf8, singleton, unfoldr, unfoldrN, cons, snoc, append, tail, init, map, intersperse, toCaseFold, toLower, toUpper, toTitle, justifyLeft, scanl, take, drop, takeWhile, dropWhile, filter, unpack, head, last, null, compareLength, foldl, foldl', foldl1, foldl1', foldr, foldr1, any, all, maximum, minimum, findIndex, isPrefixOf
Does not fuse (but should) (11): unpackCString#, reverse, scanl1, scanr1, takeWhileEnd, dropWhileEnd, dropAround, strip, stripStart, stripEnd, length
Fuses (but should not) (2): find, index
Does not fuse (10)

Data.Text.Lazy:
Total functions (100)
Untestable functions (37)
Fuses (36): pack, singleton, cons, snoc, append, tail, init, map, intersperse, toLower, toUpper, toTitle, justifyLeft, scanl, take, drop, takeWhile, dropWhile, filter, unpack, head, last, null, length, compareLength, foldl, foldl', foldl1, foldl1', foldr, foldr1, any, all, maximum, minimum, isPrefixOf
Does not fuse (but should) (6): decodeUtf8, toCaseFold, scanl1, dropAround, strip, stripStart
Fuses (but should not) (4): unfoldr, unfoldrN, find, index
Does not fuse (17)
  • The Untestable functions are functions that have additional Text arguments that are not expected to fuse, but which would confuse my test of whether things fuse. But none of these claim to be “Subject to fusion” with the exception of uncons, where the documentation claims that fusion happens, but the definitions is clearly not set up for fusion.
  • Yay, many functions that we claim fuse actually fuse!
  • Most interesting are of course the Does not fuse (but should) category. It included length, responsible for the original bug report. This is where the most work is needed.
  • Fuses (but should not) means that the documentation for these functions shoul say “Subject to fusion”.

For most functions, the question of whether it fuses or not is a clear-cut yes/no question. An exception is singleton: When a text pipeline starts with singleton then some functions suddenly fuse that did not before (e.g. center), others fail to fuse. I am not worried about that.

This is tested with GHC-8.4 and text ==1.2.3.0.

@Bodigrim
Copy link
Contributor

Automatic fusion has been disabled some time ago.

@nomeata
Copy link
Contributor Author

nomeata commented May 18, 2022

Oh are they? Ah, found it: #348

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug question Requires more investigation
Projects
None yet
Development

No branches or pull requests

3 participants