unfoldrN, unfoldrNM and fromListN are dangerous #301

lehins · 2020-02-28T02:00:02Z

Size argument supplied to all three functions unfoldrN, unfoldrNM and fromListN serves as hint for upper bound for the length of a vector. The problem is that regardless of how big a vector would have been, the supplied upper bound memory is allocated, even if it too big, which can result in an application being killed with an asynchronous exception HeapOverflow

`fromListN`.hs:

import qualified Data.Vector.Primitive as V

main = do
  let xs = [1, 2, 3, 4, 5] :: [Int]
  print $ V.fromListN (maxBound `div` 8) xs

$ ghc fromListN.hs && ./fromListN
[1 of 1] Compiling Main             ( fromListN.hs, fromListN.o )
fromListN: Out of memory

`unfoldrN`.hs

import qualified Data.Vector.Primitive as V

main = print (V.unfoldrN (maxBound `div` 8) (const Nothing) () :: V.Vector Int)

$ ghc unfoldrN.hs && ./unfoldrN
[1 of 1] Compiling Main             ( unfoldrN.hs, unfoldrN.o )
Linking unfoldrN ...
unfoldrN: Out of memory

`unfoldrNM`.hs

import Control.Exception
import qualified Data.Vector.Primitive as V

main = do
  eRes <- try (V.unfoldrNM (maxBound `div` 8) (const $ pure Nothing) () :: IO (V.Vector Int))
  print (eRes :: Either SomeException (V.Vector Int))

$ stack exec -- ghc unfoldrNM.hs -O2 && ./unfoldrNM
[1 of 1] Compiling Main             ( unfoldrNM.hs, unfoldrNM.o )
Linking unfoldrNM ...
Left heap overflow

The text was updated successfully, but these errors were encountered:

cartazio · 2020-02-28T02:14:18Z

dangerous might be the wrong word here, but I agree with the intent (and please forgive my edit).

lets think about when/how we might want it to fail/behave instead!
if end users of an application can tickle this, thats certainly a denial of service (for at least the thread doing the calculation).

i actually think its probably very reasonable for apis to throw an exception when too much memory is requested. (at least if thats not exposed in the normal api results).

But you do raise a really important point, size here should perhaps be treated as a combo of both a hint and lint!

I dont think theres a trivial answer here as such. since you CAN (with extreme care) have off heap memory mapped arrays that take up terabytes of virtual/physical memory. I think physical memory limits on most platforms are ~48 bits of addressable bytes, but I believe thats orthogonal to cpu architecture related virtual memory limits?

cartazio · 2020-02-28T02:15:03Z

to be clear: i dont view this as a security problem as such, at that rate, Eq and Ord on vectors are security problems :)

cartazio · 2020-02-28T02:16:06Z

@lehins would you prefer that it errors instead because the list doesn't match the provided size (with one of those amortized doubling schemes + a force/deep copy at the end?)

lehins · 2020-02-28T02:23:10Z

Dangerous was a very good word here. An application normally should not recover from AsyncExceptions, so it can be viewed as security concern.

There is no need to throw any errors all three functions, since they are well behaved. I think the easiest solution would be to set the size to Unknown and that would handle the problem for all three them.

lehins · 2020-02-28T02:26:59Z

I mean replacing Max with an Unknown

here:

vector/Data/Vector/Fusion/Bundle/Monadic.hs

Line 1043 in eeb42ad

fromListN n xs = fromStream (S.fromListN n xs) (Max (delay_inline max n 0))
and here:

vector/Data/Vector/Fusion/Bundle/Monadic.hs

Line 646 in eeb42ad

unfoldrNM n f s = fromStream (S.unfoldrNM n f s) (Max (delay_inline max n 0))

cartazio · 2020-02-28T02:31:07Z

Ok. I can see the unhandled async rts exception perspective. That particular change sounds analogous to the fusible slice vs exceptions issue we discussed before. Maybe itd be helpful if we do a little markdown table of how we currently have these operations behave in strange Inputs currently vs their failure modes under various possible changes that would change either the behavior or failure mode. I’m happy to type that table out later tomorrow for grounding the discussion. Another option would be to add something like deferred exact to the size sum type perhaps ?

…

On Thu, Feb 27, 2020 at 9:23 PM Alexey Kuleshevich ***@***.***> wrote: Dangerous was a very good word here. An application normally should not recover from AsyncExceptions, so it can be viewed as security concern. There is no need to throw any errors all three functions, since they are well behaved. I think the easiest solution would be to set the size to Unknown and that would handle the problem for all three them. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#301?email_source=notifications&email_token=AAABBQTA5KCDU2EKWHXSJATRFBYQ7A5CNFSM4K5HIJ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENGXJBQ#issuecomment-592278662>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAABBQR376ZQCVA7G77E4M3RFBYQ7ANCNFSM4K5HIJ4A> .

cartazio · 2020-02-28T02:33:45Z

Oh ok. Hrmm. So currently it’s not the Exact flavor. Out of curiosity how’d you hit this sharp edge ? On Thu, Feb 27, 2020 at 9:30 PM Carter Schonwald <[email protected]> wrote:

…

Ok. I can see the unhandled async rts exception perspective. That particular change sounds analogous to the fusible slice vs exceptions issue we discussed before. Maybe itd be helpful if we do a little markdown table of how we currently have these operations behave in strange Inputs currently vs their failure modes under various possible changes that would change either the behavior or failure mode. I’m happy to type that table out later tomorrow for grounding the discussion. Another option would be to add something like deferred exact to the size sum type perhaps ? On Thu, Feb 27, 2020 at 9:23 PM Alexey Kuleshevich < ***@***.***> wrote: > Dangerous was a very good word here. An application normally should not > recover from AsyncExceptions, so it can be viewed as security concern. > > There is no need to throw any errors all three functions, since they are > well behaved. I think the easiest solution would be to set the size to > Unknown and that would handle the problem for all three them. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#301?email_source=notifications&email_token=AAABBQTA5KCDU2EKWHXSJATRFBYQ7A5CNFSM4K5HIJ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENGXJBQ#issuecomment-592278662>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAABBQR376ZQCVA7G77E4M3RFBYQ7ANCNFSM4K5HIJ4A> > . >

lehins · 2020-02-28T02:34:45Z

I looked at the code :)

Out of curiosity how’d you hit this sharp edge ?

lehins · 2020-02-28T02:35:44Z

Not exactly. slice can fail by definition. fromListN on the other hand should never fail

That particular change sounds analogous to the fusible slice vs exceptions
issue we discussed before.

lehins · 2020-02-28T02:37:52Z

This is not entirely true, at least for a correct implementation of concurrency. Exception should be rethrown in the main thread:

thats certainly a denial of service (for at least the thread doing the calculation).

module Main where

import Control.Concurrent
import Control.Concurrent.Async
import Control.Exception
import qualified Data.Vector.Primitive as V

main = do
  eRes <- try $ concurrently_
    (print =<< (V.unfoldrNM (maxBound `div` 8) (const $ pure Nothing) () :: IO (V.Vector Int)))
    (print "foo" >> threadDelay 1000000 >> print "bar")
  print (eRes :: Either AsyncException ())

$ ghc unfoldrn.hs -O2 && ./unfoldrn
[1 of 1] Compiling Main             ( unfoldrn.hs, unfoldrn.o )
Linking unfoldrn ...
"foo"
Left heap overflow

cartazio · 2020-02-28T02:49:50Z

Good catch. And it’s def true we can catch async exceptions. In my mind danger in Haskell means unsafe coerce or other fun segfaults. A prompt rts managed exception is polite by comparison I guess I’ve only seen uses of fooN where the size is intended to be exact. So there’s probably a few cleanups we need to do before we make that change. Or it needs its own major revision. Because if you look at some of the patches motivated by folks using vector with compact heap code, fromListN is used with the Exact semantic in a few operations. I think traverse for boxed vector is one example. a key question is whether most users of fromListN expected it to be exact or just an upper bound. I suspect that most assume exact.

…

On Thu, Feb 27, 2020 at 9:35 PM Alexey Kuleshevich ***@***.***> wrote: Not exactly. slice can fail by definition. fromListN on the other hand should never fail That particular change sounds analogous to the fusible slice vs exceptions issue we discussed before. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#301?email_source=notifications&email_token=AAABBQSIY4NPPJ7FFSXGKZDRFB2ADA5CNFSM4K5HIJ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENGYAVA#issuecomment-592281684>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAABBQQAZ4EED2QISCFUPSDRFB2ADANCNFSM4K5HIJ4A> .

lehins · 2020-02-28T02:59:47Z

I don't think I follow what you are suggesting.

All three functions take the size argument as the upper bound, it never has to be exact, that's the whole point of the unfoldrN[M] functions.

As far as fromListN is concerned semantics are clear from documentation that it is also assumed to be upper bound:

vector/Data/Vector.hs

Lines 1709 to 1716 in eeb42ad

    
           -- | /O(n)/ Convert the first @n@ elements of a list to a vector 
        
           -- 
        
           -- @ 
        
           -- fromListN n xs = 'fromList' ('take' n xs) 
        
           -- @ 
        
           fromListN :: Int -> [a] -> Vector a 
        
           {-# INLINE fromListN #-} 
        
           fromListN = G.fromListN

Implementation of traversable can continue using the dangerous version, because we know the source vector has correct size

Because if you look at some of
the patches motivated by folks using vector with compact heap code,
fromListN is used with the Exact semantic in a few operations. I think
traverse for boxed vector is one example.

cartazio · 2020-02-28T03:02:28Z

Ok. I think I follow. I’ll collect my thoughts and followup in the morning!

…

On Thu, Feb 27, 2020 at 9:59 PM Alexey Kuleshevich ***@***.***> wrote: I don't think I follow what you are suggesting. All three functions take the size argument as the upper bound, it never has to be exact, that's the whole point of the unfoldrN[M] functions. As far as fromListN is concerned semantics are clear from documentation that it is also assumed to be upper bound: https://github.com/haskell/vector/blob/eeb42ad42aa345ce192086baed80c805bcfc3e72/Data/Vector.hs#L1709-L1716 Implementation of traversable can continue using the *dangerous* version, because we know the source vector has correct size Because if you look at some of the patches motivated by folks using vector with compact heap code, fromListN is used with the Exact semantic in a few operations. I think traverse for boxed vector is one example. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#301?email_source=notifications&email_token=AAABBQUISHQ546C7OXVDE3DRFB42HA5CNFSM4K5HIJ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENGZLLI#issuecomment-592287149>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAABBQVQFR6WF7UGXCIPZL3RFB42HANCNFSM4K5HIJ4A> .

lehins · 2020-02-28T03:10:49Z

Note that current semantics in vector for fromListN do not follow other libraries, so it could as well be a bug, but I personally don't really care which way we go. Especially since IsList documentation states:

If the given hint does not equal to the input list's length the behaviour of fromListN is not specified.

Other data structures that have IsList instance (eg, List, NonEmpty, Set) take the size argument as a hint and it has no affect the resulting data structure. But since vector used it as an upper bound for the longest time, it might not make sense to switch the semantics now

cartazio · 2020-02-28T03:16:04Z

At the very least it’s worth documenting possible in retrospect design choices and doing a Ye old hackage grep of how it’s used in public codes.

…

On Thu, Feb 27, 2020 at 10:10 PM Alexey Kuleshevich < ***@***.***> wrote: Note that current semantics in vector for fromListN do not follow other libraries, so it could as well be a bug, but I personally don't really care which way we go. Especially since IsList documentation states: If the given hint does not equal to the input list's length the behaviour of fromListN is not specified. Other data structures that have IsList instance (eg, List, NonEmpty, Set) take the size argument as a hint and it has no affect the resulting data structure. But since vector used it as an upper bound for the longest time, it might not make sense to switch the semantics now — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#301?email_source=notifications&email_token=AAABBQU7WNE22GHP2TVDHD3RFB6DTA5CNFSM4K5HIJ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENGZ7CA#issuecomment-592289672>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAABBQRCB2QTJGJQYWOHOZDRFB6DTANCNFSM4K5HIJ4A> .

lehins · 2020-02-28T03:18:28Z

Documenting is certainly a good idea. But you can't really grep for it :) Any place where OverloadedLists is being used there is a chance that function is called. But I don't think it is really important

cartazio · 2020-02-28T03:22:07Z

There’s some tools that make it pretty easy to have a checkout of all of current hackage. Worth it for impact assessments for explicit calls to the code. Do agree won’t change code that calls this via desugaring

…

On Thu, Feb 27, 2020 at 10:18 PM Alexey Kuleshevich < ***@***.***> wrote: Documenting is certainly a good idea. But you can't really grep for it :) Any place where OverloadedLists is being used there is a chance that function is called. But I don't think it is really important — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#301?email_source=notifications&email_token=AAABBQXYRI2ISMSIF4R3ZRLRFB7ALA5CNFSM4K5HIJ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENG2ORA#issuecomment-592291652>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAABBQVKVKYK5VHVSDJ2EP3RFB7ALANCNFSM4K5HIJ4A> .

cartazio · 2020-03-12T17:27:15Z

https://gist.github.com/cartazio/517752ce92b3859a5b86fc396b404f6b (i did a rg fromListN -t haskell over all of current hackage)
using an old run of cabal list --simple | awk '{ print $1 }' | uniq | xargs -P12 -n1 cabal get from feb 2019
(which gives us all the extent packages that uses fromListN).

i'll look though it and a more recent run i'm preparing.

and then combine that with the positions and perspectives articulated on the library list to make a determinition about this :)

Shimuuar · 2021-04-18T07:40:55Z

I think that changing type hint from Max to Unknown for unfoldrN/unfoldrNM is right thing to do. In addition to possible heap overflow it's poor strategy in cases when upper limit is used as some sort of safeguard and vector is usually much smaller. It just wastes memory.

On other hand such change for fromListN makes no sense. It's mostly an optimization for cases when vector size is known in advance. It allows avoid reallocation of buffers when growing vector. Yes. it allows to induce heap overflow if size if hands of attacker. Without such preallocation we could just drop fromListN or implement it as fromList . take n. I think it's better to leave function as it is and just update documentation explaining possible dangers

Before functions allocated array of maximum size immediately. This is problem in case when attacker controls size parameter and could lead to DoS. It's also problem when size parameter is just a limit vector is usually shorter. Usual doubling strategy looks more conservative in this function. Fixes haskell#301

gksato · 2021-05-21T01:12:59Z

This may be off-topic, but looking through this conversation, I wish the definition of Size were

data Size = Exact Int
          | DoublingMax { preAlloc :: Int, max :: Int }
          | DoublingUnknown { preAlloc :: Int }

Who would imagine that prefixing take n to a vector would enlarge the possibility of DoS just because n came from the outer world? Who would imagine that the more strict constraint to the size would produce more DoS-prone code?

gksato · 2021-05-21T01:26:53Z

This style of definition of Size is the exact copy of Rust Iterator: in that language we have

trait Iterator {

    ...

    fn size_hint(&self) -> (usize, Option<usize>)
}
unsafe trait TrustedLen: Iterator {}

Shimuuar · 2021-05-21T09:18:08Z

This may be off-topic, but looking through this conversation, I wish the definition of Size were

I think it's good idea. It allows to specify allocation strategy more precisely

gksato · 2021-05-21T12:51:03Z

specify allocation strategy more precisely

Yes. Since preAlloc only affects time and memory consumption and not memory safety nor visible result, we could even provide a combinator that manipulate preAlloc in Vector.x (x=Generic, Unboxed, etc) modules. We could just default preAlloc to the minimum possible size.

It turns out we don't exercise munstream in the test suite at all. (Easy check is to replace definition with undefined and run test) This is to check equivalence of all variants. This is necessary for any changes to unstream machinery. Such as ones that discussed in haskell#301, haskell#388, haskell#406

Before functions allocated array of maximum size immediately. This is problem in case when attacker controls size parameter and could lead to DoS. It's also problem when size parameter is just a limit vector is usually shorter. Usual doubling strategy looks more conservative in this function. Fixes haskell#301

This is much more precise encoding with both lower and upper bound. It implements idea discussed in haskell#388 and for example avoids problems from haskell#301. However benchmarks result are at best mixed: benchmarks change range from 0.75 to 17. Investigation of tridiag benchmark (it's not worst but one of simplest) showed that main loop retained Bundles, allocated closures in inner loop and so were quite slow. It seems that generation of tight loops from vector functions is rather fragile and what worse we have no way to know whether this problem exists for code in the wild and have no way to measure this.

cartazio changed the title ~~unfoldrN, unfoldrNM and fromListN are dangerous~~ unfoldrN, unfoldrNM and fromListN are *unreasonable* with respect to their size parameters Feb 28, 2020

Shimuuar added this to the 0.13 milestone Jun 11, 2020

lehins changed the title ~~unfoldrN, unfoldrNM and fromListN are *unreasonable* with respect to their size parameters~~ unfoldrN, unfoldrNM and fromListN are dangerous Jan 16, 2021

lehins mentioned this issue Jan 16, 2021

Make a minor release: vector-0.12.2.0 #346

Closed

Shimuuar mentioned this issue Jan 17, 2021

Release plan for 0.13 #357

Closed

6 tasks

Shimuuar mentioned this issue May 1, 2021

Change allocation strategy for unfoldrN & add warning for fromListN #387

Merged

Shimuuar mentioned this issue May 23, 2021

Allocation strategies for vector creation #388

Open

Shimuuar mentioned this issue Sep 14, 2021

Add tests for munstream #418

Merged

lehins closed this as completed in #387 May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unfoldrN, unfoldrNM and fromListN are dangerous #301

unfoldrN, unfoldrNM and fromListN are dangerous #301

lehins commented Feb 28, 2020

cartazio commented Feb 28, 2020

cartazio commented Feb 28, 2020

cartazio commented Feb 28, 2020 •

edited

Loading

lehins commented Feb 28, 2020

lehins commented Feb 28, 2020

cartazio commented Feb 28, 2020 via email

cartazio commented Feb 28, 2020 via email

lehins commented Feb 28, 2020

lehins commented Feb 28, 2020

lehins commented Feb 28, 2020

cartazio commented Feb 28, 2020 via email

lehins commented Feb 28, 2020

cartazio commented Feb 28, 2020 via email

lehins commented Feb 28, 2020

cartazio commented Feb 28, 2020 via email

lehins commented Feb 28, 2020

cartazio commented Feb 28, 2020 via email

cartazio commented Mar 12, 2020

Shimuuar commented Apr 18, 2021

gksato commented May 21, 2021

gksato commented May 21, 2021

Shimuuar commented May 21, 2021

gksato commented May 21, 2021 •

edited

Loading

unfoldrN, unfoldrNM and fromListN are dangerous #301

unfoldrN, unfoldrNM and fromListN are dangerous #301

Comments

lehins commented Feb 28, 2020

fromListN.hs:

unfoldrN.hs

unfoldrNM.hs

cartazio commented Feb 28, 2020

cartazio commented Feb 28, 2020

cartazio commented Feb 28, 2020 • edited Loading

lehins commented Feb 28, 2020

lehins commented Feb 28, 2020

cartazio commented Feb 28, 2020 via email

cartazio commented Feb 28, 2020 via email

lehins commented Feb 28, 2020

lehins commented Feb 28, 2020

lehins commented Feb 28, 2020

cartazio commented Feb 28, 2020 via email

lehins commented Feb 28, 2020

cartazio commented Feb 28, 2020 via email

lehins commented Feb 28, 2020

cartazio commented Feb 28, 2020 via email

lehins commented Feb 28, 2020

cartazio commented Feb 28, 2020 via email

cartazio commented Mar 12, 2020

Shimuuar commented Apr 18, 2021

gksato commented May 21, 2021

gksato commented May 21, 2021

Shimuuar commented May 21, 2021

gksato commented May 21, 2021 • edited Loading

`fromListN`.hs:

`unfoldrN`.hs

`unfoldrNM`.hs

cartazio commented Feb 28, 2020 •

edited

Loading

gksato commented May 21, 2021 •

edited

Loading