-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: pageAlloc.searchAddr may point to unmapped memory in discontiguous heaps, violating its invariant #40191
Comments
@gopherbot Please open a backport issue to Go 1.14. |
Backport issue(s) opened: #40192 (for 1.14). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
Why then is there a free page indicated in unmapped memory? How is page i not mapped? |
@randall77 The higher-level summaries describe large portions of the address space, some of which may not be mapped. IIRC each level-0 summary on linux/amd64 represents 128 GiB of address space. If your heap is small (like one arena in size) then one level-0 summary will indicate that pages are free in that 128 GiB region are free, but not all of that 128 GiB need be mapped. You're right that unmapped pages necessarily have Also, d'oh, I forgot to CC you. Sorry about that. |
As an addendum to option two: |
Change https://golang.org/cl/242680 mentions this issue: |
Change https://golang.org/cl/242678 mentions this issue: |
Change https://golang.org/cl/242677 mentions this issue: |
Change https://golang.org/cl/242679 mentions this issue: |
Change https://golang.org/cl/246197 mentions this issue: |
…geAlloc.find Currently pageAlloc.find attempts to find a better estimate for the first free page in the heap, even if the space its looking for isn't necessarily going to be the first free page in the heap (e.g. if npages >= 2). However, in doing so it has the potential to return a searchAddr candidate that doesn't actually correspond to mapped memory, but this candidate might still be adopted. As a result, pageAlloc.alloc's fast path may look at unmapped summary memory and segfault. This case is rare on most operating systems since the heap is kept fairly contiguous, so the chance that the candidate searchAddr discovered is unmapped is fairly low. Even so, this is totally possible and outside the user's control when it happens (in fact, it's likely to happen consistently for a given user on a given system). Fix this problem by ensuring that our candidate always points to mapped memory. We do this by looking at mheap's arenas structure first. If it turns out our candidate doesn't correspond to mapped memory, then we look at inUse to round up the searchAddr to the next mapped address. While we're here, clean up some documentation related to searchAddr. For #40191. Fixes #40192. Change-Id: I759efec78987e4a8fde466ae45aabbaa3d9d4214 Reviewed-on: https://go-review.googlesource.com/c/go/+/242680 Run-TryBot: Michael Knyszek <[email protected]> Reviewed-by: Austin Clements <[email protected]> Reviewed-by: Michael Pratt <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> (cherry picked from commit b56791c) Reviewed-on: https://go-review.googlesource.com/c/go/+/246197 Run-TryBot: Dmitri Shuralyov <[email protected]>
Currently the AddrRange used for testing is defined separately from addrRange in the runtime, making it difficult to test it as well as addrRanges. Redefine AddrRange in terms of addrRange instead. For #40191. Change-Id: I3aa5b8df3e4c9a3c494b46ab802dd574b2488141 Reviewed-on: https://go-review.googlesource.com/c/go/+/242677 Trust: Michael Knyszek <[email protected]> Run-TryBot: Michael Knyszek <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Michael Pratt <[email protected]> Reviewed-by: Austin Clements <[email protected]>
This change adds a test suite for addrRanges.findSucc so we can change the implementation more safely. For #40191. Change-Id: I14a834b6d54836cbc676eb0edb292ba6176705cc Reviewed-on: https://go-review.googlesource.com/c/go/+/242678 Trust: Michael Knyszek <[email protected]> Run-TryBot: Michael Knyszek <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Austin Clements <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
This change modifies addrRanges.findSucc to more efficiently find the successor range in an addrRanges by using a binary search to narrow down large addrRanges and iterate over no more than 8 addrRanges. This change makes the runtime more robust against systems that may aggressively randomize the address space mappings it gives the runtime (e.g. Fuchsia). For #40191. Change-Id: If529df2abd2edb1b1496d8690ddd284ecd7138c2 Reviewed-on: https://go-review.googlesource.com/c/go/+/242679 Trust: Michael Knyszek <[email protected]> Reviewed-by: Austin Clements <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
It's been brought to my attention by the Fuchsia team that in cases where the Go heap is very discontiguous, the documented invariant for
(*pageAlloc).searchAddr
may be violated, leading to errant segfaults.Consider the following case.
Suppose
searchAddr
points into the region described by summaryi-1
, and on the last page allocation, it was exhausted. An allocation then comes in looking for, say, 12 consecutive pages.(*pageAlloc).find
will look at summaryi
, and skip over it, but will set the "free window" to the first page of the region described by summaryi
, which need not be mapped.In a contiguous heap we're unlikely to run into this case because the first page of
i
is likely to be mapped, so the summary would be there.Most operating systems, even when not following the address hints provided to
sysReserve
in the runtime, tend to keep the heap contiguous. Fuchsia specifically goes out of its way to map memory discontinuously unless a flag (ZX_VM_COMPACT
) is passed to the equivalent function.This is a real problem, even for existing systems. Note that the above scenario need not happen for summary level
0
, it can happen at any level (say there's a small hole in the address space due to a region being reserved by the OS or something and we allocate virtual address space around it).Fixing this problem is not straight-forward without some kind of performance consequences (though it's unclear to me how big those consequences will be).
Fundamentally, this issue arises because we try to update
searchAddr
"on the way" to allocating memory which may not be the first free page. One simplifying solution is to simply not do that. It means that there will need to be more iteration when allocating, and we're less likely to take the fast path, but it will be correct.Another option is to actually find the first piece of mapped memory in that region, which would be possible by iterating over
(*pageAlloc).inUse
(inUse
will be large, but the search could be turned into a binary search as per the comment inaddrRanges.findSucc
). We could also find it by walking down the summaries, but this is unnecessary and is basically like doing thefind
operation twice, which is a bit wasteful.A third option is to remove the invariant on
searchAddr
altogether and just be a lot more strict in all the places where we try to usesearchAddr
to access summaries directly (mostly on fast paths in the page allocator and scavenger). The trouble with this is that the only efficient way to verify thatsearchAddr
is valid (short of walking the tree, which we're hoping to avoid in these cases!) is to checkinUse
, but that's not exactly efficient either. In this case, it would likely be better to just always movesearchAddr
to the right location on the slow path by checkinginUse
.My vote is for option 2. It's not terribly complicated, and the change is local to just the
find
operation. Option 1 might be preferred anyway just because it simplifiesfind
a bunch, though.This issue should not be considered a blocking issue for Go 1.15 because the bug was technically introduced in Go 1.14, but it should be fixed and probably backported too, since it can cause failures with no workaround.
CC @aclements @prattmic
The text was updated successfully, but these errors were encountered: