Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd: cap cache size if exceeds reasonable range #16800

Merged
merged 1 commit into from
Jun 4, 2018

Conversation

rjl493456442
Copy link
Member

No description provided.

@rjl493456442 rjl493456442 changed the title WIP cmd: cap cache size if exceeds reasonable range cmd: cap cache size if exceeds reasonable range May 25, 2018
@rjl493456442 rjl493456442 requested review from karalabe and holiman and removed request for karalabe May 25, 2018 02:17
Copy link
Member

@karalabe karalabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general solution is ok-ish, but it gets needlessly complicated because you tried to limit the cache too deep inside the code, which causes all kinds of weird corner cases to appear.

A much cleaner solution is to check the cache allowance and limit it when entering main in Geth. That way you only need to update it once, and the code resides in an easy to understand location.

My conversion of your code:

diff --git a/cmd/geth/main.go b/cmd/geth/main.go
index 09d9c493d..57bbbe3b0 100644
--- a/cmd/geth/main.go
+++ b/cmd/geth/main.go
@@ -19,12 +19,16 @@ package main
 
 import (
        "fmt"
+       "math"
        "os"
        "runtime"
+       godebug "runtime/debug"
        "sort"
+       "strconv"
        "strings"
        "time"
 
+       "github.com/elastic/gosigar"
        "github.com/ethereum/go-ethereum/accounts"
        "github.com/ethereum/go-ethereum/accounts/keystore"
        "github.com/ethereum/go-ethereum/cmd/utils"
@@ -188,6 +192,22 @@ func init() {
                if err := debug.Setup(ctx); err != nil {
                        return err
                }
+               // Cap the cache allowance and tune the garbage colelctor
+               var mem gosigar.Mem
+               if err := mem.Get(); err == nil {
+                       allowance := int(mem.Total / 1024 / 1024 / 3)
+                       if cache := ctx.GlobalInt(utils.CacheFlag.Name); cache > allowance {
+                               log.Warn("Sanitizing cache to Go's GC limits", "provided", cache, "updated", allowance)
+                               ctx.GlobalSet(utils.CacheFlag.Name, strconv.Itoa(allowance))
+                       }
+               }
+               // Ensure Go's GC ignores the database cache for trigger percentage
+               cache := ctx.GlobalInt(utils.CacheFlag.Name)
+               gogc := math.Max(20, math.Min(100, 100/(float64(cache)/1024)))
+
+               log.Debug("Sanitizing Go's GC trigger", "percent", int(gogc))
+               godebug.SetGCPercent(int(gogc))
+
                // Start system runtime metrics collection
                go metrics.CollectProcessMetrics(3 * time.Second)

// CapCacheSize returns megabytes of memory allocated to internal caching.
func CapCacheSize(size int) int {
if availableMemory == 0 {
mem := gosigar.Mem{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's usually cleaner to just declare the variable if it's the default value:

va mem gosigar.Mem

if availableMemory == 0 {
mem := gosigar.Mem{}
err := mem.Get()
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use err := mem.Get(); err != nil { for these short scenarios, much more readable and compact.

if err != nil {
return size
}
availableMemory = mem.ActualFree
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mem.ActualFree is a bit dangerous, because Geth's memory allowance will all of a sudden be based on how loaded the machine is. We definitely don't want to do that level of user control. Let's just stick to mem.Total. We can't protect the user from overloading their machine, we just want to protect from going over physical limits.

Another catch with ActualFree is that most OSes use RAM as file system cache. In theory ActualFree also includes these vs. Free. In practice, some filesystems (ZFS) will use RAM themselves for caching, and not rely on the OS cache. As such, ActualFree will almost always report 0 on a ZFS based system.

cacheCapOnce sync.Once // Make sure available memory detection, GOGC setting and warning log printing
// will be executed only once.
availableMemory uint64 // Current system's available memory.
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use global variables for anything. They can be racey (e.g. you didn't sync access to availableMemory), and there's really no need for it. Requesting the available memory once vs. 3 times is irrelevant compared to what load we're doing here. Cleaner code is preferred over micro optimizations.

return size
}
availableMemory = mem.ActualFree
log.Debug("Available system memory", "size(MB)", availableMemory/1024/1024)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use common.StorageSize(availableMemory), which will correctly format a user friendly string, instead of you having to divide. Also you don't need the (MB), since that's also appended to the size when printed.

ret = allowance
}

cacheCapOnce.Do(func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This global once kind of highlights that capping the cache allowance is done at the wrong level. If you did it in an outer scope, you wouldn't need globals, neither complex constructs to avoid recalculating multiple times. The correct solution is to limit caches and GC when entering the program. That ensures every code path updates correctly.

percent = 100
} else if percent < 20 {
percent = 20
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use Math.Min and Math.Max here. Much cleaner and shorter :P

@rjl493456442 rjl493456442 force-pushed the memory_allowance_warining branch from 5fe430d to 59f8d2d Compare May 25, 2018 07:49
@rjl493456442
Copy link
Member Author

@karalabe Just copy your implementation and update

@karalabe
Copy link
Member

We'll probably need to run a fast sync and a full sync benchmark too with this. Seems good to me, but it's not something we want to merge last minute into a release. We can get it in after the next release and have a week or so to test it.

@karalabe karalabe added this to the 1.8.10 milestone May 25, 2018
@karalabe karalabe force-pushed the memory_allowance_warining branch from 59f8d2d to c4f81e5 Compare June 4, 2018 07:59
@karalabe karalabe force-pushed the memory_allowance_warining branch from c4f81e5 to a20cc75 Compare June 4, 2018 10:46
@karalabe
Copy link
Member

karalabe commented Jun 4, 2018

Benchmarks seem completely uninteresting. Merging.

@JEJodesty
Copy link

JEJodesty commented Aug 7, 2018

Hi everyone,
Can someone put this merged PR into perspective for me / explain why a certain cache size is within an un-reasonable range?

@karalabe
Copy link
Member

karalabe commented Aug 7, 2018

It's the way Go's garbage collector works. It always allocates more memory than you allow it, and when it exceeds some threshold, it cleans up the accumulated junk. The default threshold is 2x the useful memory. If you set the cache to 1GB, then you will always have at least 1GB of useful live memory, so Go will happily go and accumulate 2GB junk before it decides to clean up. This resulted in people permitting too generous cache allowances, and then opening bug reports that Geth is crashing with OOM.

@JEJodesty
Copy link

Thanks @karalabe !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants