-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unfoldrN, unfoldrNM and fromListN are dangerous #301
Comments
dangerous might be the wrong word here, but I agree with the intent (and please forgive my edit). lets think about when/how we might want it to fail/behave instead! i actually think its probably very reasonable for apis to throw an exception when too much memory is requested. (at least if thats not exposed in the normal api results). But you do raise a really important point, size here should perhaps be treated as a combo of both a hint and lint! I dont think theres a trivial answer here as such. since you CAN (with extreme care) have off heap memory mapped arrays that take up terabytes of virtual/physical memory. I think physical memory limits on most platforms are ~48 bits of addressable bytes, but I believe thats orthogonal to cpu architecture related virtual memory limits? |
to be clear: i dont view this as a security problem as such, at that rate, Eq and Ord on vectors are security problems :) |
@lehins would you prefer that it errors instead because the list doesn't match the provided size (with one of those amortized doubling schemes + a force/deep copy at the end?) |
There is no need to throw any errors all three functions, since they are well behaved. I think the easiest solution would be to set the size to |
I mean replacing
|
Ok. I can see the unhandled async rts exception perspective.
That particular change sounds analogous to the fusible slice vs exceptions
issue we discussed before.
Maybe itd be helpful if we do a little markdown table of how we currently
have these operations behave in strange Inputs currently vs their failure
modes under various possible changes that would change either the behavior
or failure mode.
I’m happy to type that table out later tomorrow for grounding the
discussion.
Another option would be to add something like deferred exact to the size
sum type perhaps ?
…On Thu, Feb 27, 2020 at 9:23 PM Alexey Kuleshevich ***@***.***> wrote:
Dangerous was a very good word here. An application normally should not
recover from AsyncExceptions, so it can be viewed as security concern.
There is no need to throw any errors all three functions, since they are
well behaved. I think the easiest solution would be to set the size to
Unknown and that would handle the problem for all three them.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#301?email_source=notifications&email_token=AAABBQTA5KCDU2EKWHXSJATRFBYQ7A5CNFSM4K5HIJ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENGXJBQ#issuecomment-592278662>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABBQR376ZQCVA7G77E4M3RFBYQ7ANCNFSM4K5HIJ4A>
.
|
Oh ok. Hrmm. So currently it’s not the Exact flavor.
Out of curiosity how’d you hit this sharp edge ?
On Thu, Feb 27, 2020 at 9:30 PM Carter Schonwald <[email protected]>
wrote:
… Ok. I can see the unhandled async rts exception perspective.
That particular change sounds analogous to the fusible slice vs exceptions
issue we discussed before.
Maybe itd be helpful if we do a little markdown table of how we currently
have these operations behave in strange Inputs currently vs their failure
modes under various possible changes that would change either the behavior
or failure mode.
I’m happy to type that table out later tomorrow for grounding the
discussion.
Another option would be to add something like deferred exact to the size
sum type perhaps ?
On Thu, Feb 27, 2020 at 9:23 PM Alexey Kuleshevich <
***@***.***> wrote:
> Dangerous was a very good word here. An application normally should not
> recover from AsyncExceptions, so it can be viewed as security concern.
>
> There is no need to throw any errors all three functions, since they are
> well behaved. I think the easiest solution would be to set the size to
> Unknown and that would handle the problem for all three them.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#301?email_source=notifications&email_token=AAABBQTA5KCDU2EKWHXSJATRFBYQ7A5CNFSM4K5HIJ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENGXJBQ#issuecomment-592278662>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAABBQR376ZQCVA7G77E4M3RFBYQ7ANCNFSM4K5HIJ4A>
> .
>
|
I looked at the code :)
|
Not exactly.
|
This is not entirely true, at least for a correct implementation of concurrency. Exception should be rethrown in the main thread:
module Main where
import Control.Concurrent
import Control.Concurrent.Async
import Control.Exception
import qualified Data.Vector.Primitive as V
main = do
eRes <- try $ concurrently_
(print =<< (V.unfoldrNM (maxBound `div` 8) (const $ pure Nothing) () :: IO (V.Vector Int)))
(print "foo" >> threadDelay 1000000 >> print "bar")
print (eRes :: Either AsyncException ()) $ ghc unfoldrn.hs -O2 && ./unfoldrn
[1 of 1] Compiling Main ( unfoldrn.hs, unfoldrn.o )
Linking unfoldrn ...
"foo"
Left heap overflow |
Good catch. And it’s def true we can catch async exceptions. In my mind
danger in Haskell means unsafe coerce or other fun segfaults. A prompt rts
managed exception is polite by comparison
I guess I’ve only seen uses of fooN where the size is intended to be exact.
So there’s probably a few cleanups we need to do before we make that
change. Or it needs its own major revision. Because if you look at some of
the patches motivated by folks using vector with compact heap code,
fromListN is used with the Exact semantic in a few operations. I think
traverse for boxed vector is one example.
a key question is whether most users of fromListN expected it to be exact
or just an upper bound.
I suspect that most assume exact.
…On Thu, Feb 27, 2020 at 9:35 PM Alexey Kuleshevich ***@***.***> wrote:
Not exactly. slice can fail by definition. fromListN on the other hand
should never fail
That particular change sounds analogous to the fusible slice vs exceptions
issue we discussed before.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#301?email_source=notifications&email_token=AAABBQSIY4NPPJ7FFSXGKZDRFB2ADA5CNFSM4K5HIJ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENGYAVA#issuecomment-592281684>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABBQQAZ4EED2QISCFUPSDRFB2ADANCNFSM4K5HIJ4A>
.
|
I don't think I follow what you are suggesting. All three functions take the size argument as the upper bound, it never has to be exact, that's the whole point of the As far as Lines 1709 to 1716 in eeb42ad
Implementation of traversable can continue using the dangerous version, because we know the source vector has correct size
|
Ok. I think I follow.
I’ll collect my thoughts and followup in the morning!
…On Thu, Feb 27, 2020 at 9:59 PM Alexey Kuleshevich ***@***.***> wrote:
I don't think I follow what you are suggesting.
All three functions take the size argument as the upper bound, it never
has to be exact, that's the whole point of the unfoldrN[M] functions.
As far as fromListN is concerned semantics are clear from documentation
that it is also assumed to be upper bound:
https://github.com/haskell/vector/blob/eeb42ad42aa345ce192086baed80c805bcfc3e72/Data/Vector.hs#L1709-L1716
Implementation of traversable can continue using the *dangerous* version,
because we know the source vector has correct size
Because if you look at some of
the patches motivated by folks using vector with compact heap code,
fromListN is used with the Exact semantic in a few operations. I think
traverse for boxed vector is one example.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#301?email_source=notifications&email_token=AAABBQUISHQ546C7OXVDE3DRFB42HA5CNFSM4K5HIJ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENGZLLI#issuecomment-592287149>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABBQVQFR6WF7UGXCIPZL3RFB42HANCNFSM4K5HIJ4A>
.
|
Note that current semantics in
Other data structures that have |
At the very least it’s worth documenting possible in retrospect design
choices and doing a Ye old hackage grep of how it’s used in public codes.
…On Thu, Feb 27, 2020 at 10:10 PM Alexey Kuleshevich < ***@***.***> wrote:
Note that current semantics in vector for fromListN do not follow other
libraries, so it could as well be a bug, but I personally don't really care
which way we go. Especially since IsList documentation states:
If the given hint does not equal to the input list's length the behaviour
of fromListN is not specified.
Other data structures that have IsList instance (eg, List, NonEmpty, Set)
take the size argument as a hint and it has no affect the resulting data
structure. But since vector used it as an upper bound for the longest
time, it might not make sense to switch the semantics now
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#301?email_source=notifications&email_token=AAABBQU7WNE22GHP2TVDHD3RFB6DTA5CNFSM4K5HIJ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENGZ7CA#issuecomment-592289672>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABBQRCB2QTJGJQYWOHOZDRFB6DTANCNFSM4K5HIJ4A>
.
|
Documenting is certainly a good idea. But you can't really grep for it :) Any place where |
There’s some tools that make it pretty easy to have a checkout of all of
current hackage. Worth it for impact assessments for explicit calls to the
code.
Do agree won’t change code that calls this via desugaring
…On Thu, Feb 27, 2020 at 10:18 PM Alexey Kuleshevich < ***@***.***> wrote:
Documenting is certainly a good idea. But you can't really grep for it :)
Any place where OverloadedLists is being used there is a chance that
function is called. But I don't think it is really important
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#301?email_source=notifications&email_token=AAABBQXYRI2ISMSIF4R3ZRLRFB7ALA5CNFSM4K5HIJ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENG2ORA#issuecomment-592291652>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABBQVKVKYK5VHVSDJ2EP3RFB7ALANCNFSM4K5HIJ4A>
.
|
https://gist.github.com/cartazio/517752ce92b3859a5b86fc396b404f6b (i did a i'll look though it and a more recent run i'm preparing. and then combine that with the positions and perspectives articulated on the library list to make a determinition about this :) |
I think that changing type hint from On other hand such change for |
Before functions allocated array of maximum size immediately. This is problem in case when attacker controls size parameter and could lead to DoS. It's also problem when size parameter is just a limit vector is usually shorter. Usual doubling strategy looks more conservative in this function. Fixes haskell#301
This may be off-topic, but looking through this conversation, I wish the definition of data Size = Exact Int
| DoublingMax { preAlloc :: Int, max :: Int }
| DoublingUnknown { preAlloc :: Int } Who would imagine that prefixing |
This style of definition of trait Iterator {
...
fn size_hint(&self) -> (usize, Option<usize>)
}
unsafe trait TrustedLen: Iterator {} |
I think it's good idea. It allows to specify allocation strategy more precisely |
Yes. Since |
It turns out we don't exercise munstream in the test suite at all. (Easy check is to replace definition with undefined and run test) This is to check equivalence of all variants. This is necessary for any changes to unstream machinery. Such as ones that discussed in haskell#301, haskell#388, haskell#406
It turns out we don't exercise munstream in the test suite at all. (Easy check is to replace definition with undefined and run test) This is to check equivalence of all variants. This is necessary for any changes to unstream machinery. Such as ones that discussed in haskell#301, haskell#388, haskell#406
Before functions allocated array of maximum size immediately. This is problem in case when attacker controls size parameter and could lead to DoS. It's also problem when size parameter is just a limit vector is usually shorter. Usual doubling strategy looks more conservative in this function. Fixes haskell#301
This is much more precise encoding with both lower and upper bound. It implements idea discussed in haskell#388 and for example avoids problems from haskell#301. However benchmarks result are at best mixed: benchmarks change range from 0.75 to 17. Investigation of tridiag benchmark (it's not worst but one of simplest) showed that main loop retained Bundles, allocated closures in inner loop and so were quite slow. It seems that generation of tight loops from vector functions is rather fragile and what worse we have no way to know whether this problem exists for code in the wild and have no way to measure this.
Size argument supplied to all three functions
unfoldrN
,unfoldrNM
andfromListN
serves as hint for upper bound for the length of a vector. The problem is that regardless of how big a vector would have been, the supplied upper bound memory is allocated, even if it too big, which can result in an application being killed with an asynchronous exceptionHeapOverflow
fromListN
.hs:$ ghc fromListN.hs && ./fromListN [1 of 1] Compiling Main ( fromListN.hs, fromListN.o ) fromListN: Out of memory
unfoldrN
.hs$ ghc unfoldrN.hs && ./unfoldrN [1 of 1] Compiling Main ( unfoldrN.hs, unfoldrN.o ) Linking unfoldrN ... unfoldrN: Out of memory
unfoldrNM
.hsThe text was updated successfully, but these errors were encountered: