Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unfoldrN, unfoldrNM and fromListN are dangerous #301

Closed
lehins opened this issue Feb 28, 2020 · 23 comments · Fixed by #387
Closed

unfoldrN, unfoldrNM and fromListN are dangerous #301

lehins opened this issue Feb 28, 2020 · 23 comments · Fixed by #387
Milestone

Comments

@lehins
Copy link
Contributor

lehins commented Feb 28, 2020

Size argument supplied to all three functions unfoldrN, unfoldrNM and fromListN serves as hint for upper bound for the length of a vector. The problem is that regardless of how big a vector would have been, the supplied upper bound memory is allocated, even if it too big, which can result in an application being killed with an asynchronous exception HeapOverflow

fromListN.hs:

import qualified Data.Vector.Primitive as V

main = do
  let xs = [1, 2, 3, 4, 5] :: [Int]
  print $ V.fromListN (maxBound `div` 8) xs
$ ghc fromListN.hs && ./fromListN
[1 of 1] Compiling Main             ( fromListN.hs, fromListN.o )
fromListN: Out of memory

unfoldrN.hs

import qualified Data.Vector.Primitive as V

main = print (V.unfoldrN (maxBound `div` 8) (const Nothing) () :: V.Vector Int)
$ ghc unfoldrN.hs && ./unfoldrN
[1 of 1] Compiling Main             ( unfoldrN.hs, unfoldrN.o )
Linking unfoldrN ...
unfoldrN: Out of memory

unfoldrNM.hs

import Control.Exception
import qualified Data.Vector.Primitive as V

main = do
  eRes <- try (V.unfoldrNM (maxBound `div` 8) (const $ pure Nothing) () :: IO (V.Vector Int))
  print (eRes :: Either SomeException (V.Vector Int))
$ stack exec -- ghc unfoldrNM.hs -O2 && ./unfoldrNM
[1 of 1] Compiling Main             ( unfoldrNM.hs, unfoldrNM.o )
Linking unfoldrNM ...
Left heap overflow
@cartazio cartazio changed the title unfoldrN, unfoldrNM and fromListN are dangerous unfoldrN, unfoldrNM and fromListN are *unreasonable* with respect to their size parameters Feb 28, 2020
@cartazio
Copy link
Contributor

dangerous might be the wrong word here, but I agree with the intent (and please forgive my edit).

lets think about when/how we might want it to fail/behave instead!
if end users of an application can tickle this, thats certainly a denial of service (for at least the thread doing the calculation).

i actually think its probably very reasonable for apis to throw an exception when too much memory is requested. (at least if thats not exposed in the normal api results).

But you do raise a really important point, size here should perhaps be treated as a combo of both a hint and lint!

I dont think theres a trivial answer here as such. since you CAN (with extreme care) have off heap memory mapped arrays that take up terabytes of virtual/physical memory. I think physical memory limits on most platforms are ~48 bits of addressable bytes, but I believe thats orthogonal to cpu architecture related virtual memory limits?

@cartazio
Copy link
Contributor

to be clear: i dont view this as a security problem as such, at that rate, Eq and Ord on vectors are security problems :)

@cartazio
Copy link
Contributor

cartazio commented Feb 28, 2020

@lehins would you prefer that it errors instead because the list doesn't match the provided size (with one of those amortized doubling schemes + a force/deep copy at the end?)

@lehins
Copy link
Contributor Author

lehins commented Feb 28, 2020

Dangerous was a very good word here. An application normally should not recover from AsyncExceptions, so it can be viewed as security concern.

There is no need to throw any errors all three functions, since they are well behaved. I think the easiest solution would be to set the size to Unknown and that would handle the problem for all three them.

@lehins
Copy link
Contributor Author

lehins commented Feb 28, 2020

I mean replacing Max with an Unknown

@cartazio
Copy link
Contributor

cartazio commented Feb 28, 2020 via email

@cartazio
Copy link
Contributor

cartazio commented Feb 28, 2020 via email

@lehins
Copy link
Contributor Author

lehins commented Feb 28, 2020

I looked at the code :)

Out of curiosity how’d you hit this sharp edge ?

@lehins
Copy link
Contributor Author

lehins commented Feb 28, 2020

Not exactly. slice can fail by definition. fromListN on the other hand should never fail

That particular change sounds analogous to the fusible slice vs exceptions
issue we discussed before.

@lehins
Copy link
Contributor Author

lehins commented Feb 28, 2020

This is not entirely true, at least for a correct implementation of concurrency. Exception should be rethrown in the main thread:

thats certainly a denial of service (for at least the thread doing the calculation).

module Main where

import Control.Concurrent
import Control.Concurrent.Async
import Control.Exception
import qualified Data.Vector.Primitive as V

main = do
  eRes <- try $ concurrently_
    (print =<< (V.unfoldrNM (maxBound `div` 8) (const $ pure Nothing) () :: IO (V.Vector Int)))
    (print "foo" >> threadDelay 1000000 >> print "bar")
  print (eRes :: Either AsyncException ())
$ ghc unfoldrn.hs -O2 && ./unfoldrn
[1 of 1] Compiling Main             ( unfoldrn.hs, unfoldrn.o )
Linking unfoldrn ...
"foo"
Left heap overflow

@cartazio
Copy link
Contributor

cartazio commented Feb 28, 2020 via email

@lehins
Copy link
Contributor Author

lehins commented Feb 28, 2020

I don't think I follow what you are suggesting.

All three functions take the size argument as the upper bound, it never has to be exact, that's the whole point of the unfoldrN[M] functions.

As far as fromListN is concerned semantics are clear from documentation that it is also assumed to be upper bound:

vector/Data/Vector.hs

Lines 1709 to 1716 in eeb42ad

-- | /O(n)/ Convert the first @n@ elements of a list to a vector
--
-- @
-- fromListN n xs = 'fromList' ('take' n xs)
-- @
fromListN :: Int -> [a] -> Vector a
{-# INLINE fromListN #-}
fromListN = G.fromListN

Implementation of traversable can continue using the dangerous version, because we know the source vector has correct size

Because if you look at some of
the patches motivated by folks using vector with compact heap code,
fromListN is used with the Exact semantic in a few operations. I think
traverse for boxed vector is one example.

@cartazio
Copy link
Contributor

cartazio commented Feb 28, 2020 via email

@lehins
Copy link
Contributor Author

lehins commented Feb 28, 2020

Note that current semantics in vector for fromListN do not follow other libraries, so it could as well be a bug, but I personally don't really care which way we go. Especially since IsList documentation states:

If the given hint does not equal to the input list's length the behaviour of fromListN is not specified.

Other data structures that have IsList instance (eg, List, NonEmpty, Set) take the size argument as a hint and it has no affect the resulting data structure. But since vector used it as an upper bound for the longest time, it might not make sense to switch the semantics now

@cartazio
Copy link
Contributor

cartazio commented Feb 28, 2020 via email

@lehins
Copy link
Contributor Author

lehins commented Feb 28, 2020

Documenting is certainly a good idea. But you can't really grep for it :) Any place where OverloadedLists is being used there is a chance that function is called. But I don't think it is really important

@cartazio
Copy link
Contributor

cartazio commented Feb 28, 2020 via email

@cartazio
Copy link
Contributor

https://gist.github.com/cartazio/517752ce92b3859a5b86fc396b404f6b (i did a rg fromListN -t haskell over all of current hackage)
using an old run of cabal list --simple | awk '{ print $1 }' | uniq | xargs -P12 -n1 cabal get from feb 2019
(which gives us all the extent packages that uses fromListN).

i'll look though it and a more recent run i'm preparing.

and then combine that with the positions and perspectives articulated on the library list to make a determinition about this :)

@Shimuuar Shimuuar added this to the 0.13 milestone Jun 11, 2020
@lehins lehins changed the title unfoldrN, unfoldrNM and fromListN are *unreasonable* with respect to their size parameters unfoldrN, unfoldrNM and fromListN are dangerous Jan 16, 2021
@Shimuuar Shimuuar mentioned this issue Jan 17, 2021
6 tasks
@Shimuuar
Copy link
Contributor

I think that changing type hint from Max to Unknown for unfoldrN/unfoldrNM is right thing to do. In addition to possible heap overflow it's poor strategy in cases when upper limit is used as some sort of safeguard and vector is usually much smaller. It just wastes memory.

On other hand such change for fromListN makes no sense. It's mostly an optimization for cases when vector size is known in advance. It allows avoid reallocation of buffers when growing vector. Yes. it allows to induce heap overflow if size if hands of attacker. Without such preallocation we could just drop fromListN or implement it as fromList . take n. I think it's better to leave function as it is and just update documentation explaining possible dangers

Shimuuar added a commit to Shimuuar/vector that referenced this issue May 1, 2021
Before functions allocated array of maximum size immediately. This is problem in
case when attacker controls size parameter and could lead to DoS. It's also
problem when size parameter is just a limit vector is usually shorter. Usual
doubling strategy looks more conservative in this function.

Fixes haskell#301
@gksato
Copy link
Contributor

gksato commented May 21, 2021

This may be off-topic, but looking through this conversation, I wish the definition of Size were

data Size = Exact Int
          | DoublingMax { preAlloc :: Int, max :: Int }
          | DoublingUnknown { preAlloc :: Int }

Who would imagine that prefixing take n to a vector would enlarge the possibility of DoS just because n came from the outer world? Who would imagine that the more strict constraint to the size would produce more DoS-prone code?

@gksato
Copy link
Contributor

gksato commented May 21, 2021

This style of definition of Size is the exact copy of Rust Iterator: in that language we have

trait Iterator {

    ...

    fn size_hint(&self) -> (usize, Option<usize>)
}
unsafe trait TrustedLen: Iterator {}

@Shimuuar
Copy link
Contributor

This may be off-topic, but looking through this conversation, I wish the definition of Size were

I think it's good idea. It allows to specify allocation strategy more precisely

@gksato
Copy link
Contributor

gksato commented May 21, 2021

specify allocation strategy more precisely

Yes. Since preAlloc only affects time and memory consumption and not memory safety nor visible result, we could even provide a combinator that manipulate preAlloc in Vector.x (x=Generic, Unboxed, etc) modules. We could just default preAlloc to the minimum possible size.

Shimuuar added a commit to Shimuuar/vector that referenced this issue Sep 14, 2021
It turns out we don't exercise munstream in the test suite at all. (Easy check
is to replace definition with undefined and run test) This is to check
equivalence of all variants.

This is necessary for any changes to unstream machinery. Such as ones that
discussed in haskell#301, haskell#388, haskell#406
Shimuuar added a commit to Shimuuar/vector that referenced this issue Sep 26, 2021
It turns out we don't exercise munstream in the test suite at all. (Easy check
is to replace definition with undefined and run test) This is to check
equivalence of all variants.

This is necessary for any changes to unstream machinery. Such as ones that
discussed in haskell#301, haskell#388, haskell#406
Shimuuar added a commit to Shimuuar/vector that referenced this issue May 23, 2022
Before functions allocated array of maximum size immediately. This is problem in
case when attacker controls size parameter and could lead to DoS. It's also
problem when size parameter is just a limit vector is usually shorter. Usual
doubling strategy looks more conservative in this function.

Fixes haskell#301
Shimuuar added a commit to Shimuuar/vector that referenced this issue Oct 31, 2024
This is much more precise encoding with both lower and upper bound. It
implements idea discussed in haskell#388 and for example avoids problems from haskell#301.

However benchmarks result are at best mixed: benchmarks change range from 0.75
to 17. Investigation of tridiag benchmark (it's not worst but one of simplest)
showed that main loop retained Bundles, allocated closures in inner loop and so
were quite slow.

It seems that generation of tight loops from vector functions is rather fragile
and what worse we have no way to know whether this problem exists for code in
the wild and have no way to measure this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants