Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal: out of memory issue #15428

Closed
johnluan opened this issue Nov 6, 2017 · 27 comments
Closed

Fatal: out of memory issue #15428

johnluan opened this issue Nov 6, 2017 · 27 comments

Comments

@johnluan
Copy link

johnluan commented Nov 6, 2017

System information

Geth version: 1.7.2
OS & Version: Ubuntu 16.04
Commit hash : (if develop)

Expected behaviour

Start geth, sync blocks

Actual behaviour

After syncing a little while, throw out of memory error

Steps to reproduce the behaviour

geth --cache=256 --rpc --rpcapi admin,eth,net,personal --rpcaddr=0.0.0.0 --etherbase 0x085bba56c11be9f235f460195f5bdd940076b034 --verbosity=2 --datadir "/data/geth"

Backtrace

Nov 6 07:51:25 10-9-102-100 geth[3587]: WARN [11-06|07:51:24] Stalling state sync, dropping peer peer=da3a2b4295214d0a
Nov 6 07:51:25 10-9-102-100 geth[3587]: WARN [11-06|07:51:25] Stalling state sync, dropping peer peer=0c74e66a402212ab
Nov 6 07:51:43 10-9-102-100 geth[3587]: WARN [11-06|07:51:43] Stalling state sync, dropping peer peer=5e22b60c2e148310
Nov 6 07:52:00 10-9-102-100 geth[3587]: WARN [11-06|07:52:00] Stalling state sync, dropping peer peer=cd9e12e7b98e5c51
Nov 6 07:52:14 10-9-102-100 geth[3587]: WARN [11-06|07:52:14] Stalling state sync, dropping peer peer=e34f400b179bbfca
Nov 6 07:54:03 10-9-102-100 geth[3587]: WARN [11-06|07:54:03] Stalling state sync, dropping peer peer=baf24807d46f29c7
Nov 6 07:54:06 10-9-102-100 geth[3587]: WARN [11-06|07:54:06] Stalling state sync, dropping peer peer=7d339a8d86268feb
Nov 6 07:54:45 10-9-102-100 geth[3587]: WARN [11-06|07:54:45] Stalling state sync, dropping peer peer=4b4ba5f8f361797b
Nov 6 07:55:01 10-9-102-100 geth[3587]: WARN [11-06|07:55:01] Stalling state sync, dropping peer peer=488640b18d7675cc
Nov 6 07:55:49 10-9-102-100 geth[3587]: WARN [11-06|07:55:49] Stalling state sync, dropping peer peer=39e5d2229a4334e7
Nov 6 07:55:53 10-9-102-100 geth[3587]: WARN [11-06|07:55:53] Stalling state sync, dropping peer peer=988772acb5ba840d
Nov 6 07:56:20 10-9-102-100 geth[3587]: WARN [11-06|07:56:20] Stalling state sync, dropping peer peer=d8e7276840df25e9
Nov 6 07:56:31 10-9-102-100 geth[3587]: fatal error: runtime: out of memory
Nov 6 07:56:31 10-9-102-100 geth[3587]: runtime stack:
Nov 6 07:56:31 10-9-102-100 geth[3587]: runtime.throw(0xf540f7, 0x16)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/panic.go:605 +0x95
Nov 6 07:56:31 10-9-102-100 geth[3587]: runtime.sysMap(0xc4f7d20000, 0x8000000, 0x0, 0x1932878)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/mem_linux.go:216 +0x1d0
Nov 6 07:56:31 10-9-102-100 geth[3587]: runtime.(*mheap).sysAlloc(0x1918fe0, 0x8000000, 0x1)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/malloc.go:470 +0xd7
Nov 6 07:56:31 10-9-102-100 geth[3587]: runtime.(*mheap).grow(0x1918fe0, 0x4000, 0x0)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/mheap.go:887 +0x60
Nov 6 07:56:31 10-9-102-100 geth[3587]: runtime.(*mheap).allocSpanLocked(0x1918fe0, 0x4000, 0x1932888, 0x7f717f786210)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/mheap.go:800 +0x334
Nov 6 07:56:31 10-9-102-100 geth[3587]: runtime.(*mheap).alloc_m(0x1918fe0, 0x4000, 0x7f7199ed0101, 0x7f7199ed4e18)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/mheap.go:666 +0x118
Nov 6 07:56:31 10-9-102-100 geth[3587]: runtime.(*mheap).alloc.func1()
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/mheap.go:733 +0x4d
Nov 6 07:56:31 10-9-102-100 geth[3587]: runtime.systemstack(0x7f7199ed4e10)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/asm_amd64.s:360 +0xab
Nov 6 07:56:31 10-9-102-100 geth[3587]: runtime.(*mheap).alloc(0x1918fe0, 0x4000, 0x7f7199010101, 0x41aad4)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/mheap.go:732 +0xa1
Nov 6 07:56:31 10-9-102-100 geth[3587]: runtime.largeAlloc(0x8000000, 0x7f71b7140101, 0x45dd5b)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/malloc.go:827 +0x98
Nov 6 07:56:31 10-9-102-100 geth[3587]: runtime.mallocgc.func1()
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/malloc.go:722 +0x46
Nov 6 07:56:31 10-9-102-100 geth[3587]: runtime.systemstack(0xc420017300)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/asm_amd64.s:344 +0x79
Nov 6 07:56:31 10-9-102-100 geth[3587]: runtime.mstart()
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/proc.go:1125
Nov 6 07:56:31 10-9-102-100 geth[3587]: goroutine 34170 [running]:
Nov 6 07:56:31 10-9-102-100 geth[3587]: runtime.systemstack_switch()
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/asm_amd64.s:298 fp=0xc4293c0be8 sp=0xc4293c0be0 pc=0x4608b0
Nov 6 07:56:31 10-9-102-100 geth[3587]: runtime.mallocgc(0x8000000, 0xd9ca40, 0xdeadbe01, 0xc4569805d0)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/malloc.go:721 +0x7b8 fp=0xc4293c0c90 sp=0xc4293c0be8 pc=0x417158
Nov 6 07:56:31 10-9-102-100 geth[3587]: runtime.makeslice(0xd9ca40, 0x0, 0x8000000, 0x7f71b70e7000, 0xc4c9a88000, 0xc497594d01)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/slice.go:54 +0x77 fp=0xc4293c0cc0 sp=0xc4293c0c90 pc=0x44a087
Nov 6 07:56:31 10-9-102-100 geth[3587]: github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/memdb.New(0x181cc40, 0xc42028fa20, 0x8000000, 0x0)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/gopath/src/github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/memdb/memdb.go:470 +0xfc fp=0xc4293c0d60 sp=0xc4293c0cc0 pc=0x7a106c
Nov 6 07:56:31 10-9-102-100 geth[3587]: github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb.(*DB).mpoolGet(0xc420158780, 0x1d481, 0x0)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/gopath/src/github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/db_state.go:90 +0xb5 fp=0xc4293c0da8 sp=0xc4293c0d60 pc=0x7cadd5
Nov 6 07:56:31 10-9-102-100 geth[3587]: github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb.(*DB).newMem(0xc420158780, 0x1d481, 0x0, 0x0, 0x0)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/gopath/src/github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/db_state.go:147 +0x235 fp=0xc4293c0e58 sp=0xc4293c0da8 pc=0x7cb355
Nov 6 07:56:31 10-9-102-100 geth[3587]: github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb.(*DB).rotateMem(0xc420158780, 0x1d481, 0x0, 0xc4293c1030, 0x4645b6, 0x223d)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/gopath/src/github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/db_write.go:45 +0x81 fp=0xc4293c0eb0 sp=0xc4293c0e58 pc=0x7cea81
Nov 6 07:56:31 10-9-102-100 geth[3587]: github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb.(*DB).flush.func1(0xbe78072bcc7a1900)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/gopath/src/github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/db_write.go:101 +0x2db fp=0xc4293c0f50 sp=0xc4293c0eb0 pc=0x7e505b
Nov 6 07:56:31 10-9-102-100 geth[3587]: github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb.(*DB).flush(0xc420158780, 0x1d481, 0xc456638240, 0xc685, 0x0, 0x0)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/gopath/src/github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/db_write.go:113 +0x171 fp=0xc4293c1020 sp=0xc4293c0f50 pc=0x7ced51
Nov 6 07:56:31 10-9-102-100 geth[3587]: github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb.(*DB).writeLocked(0xc420158780, 0xc430be8540, 0x0, 0x1, 0x0, 0x0)
Nov 6 07:56:31 10-9-102-100 geth[3587]: #11/home/travis/gopath/src/github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/db_write.go:150 +0x6c fp=0xc4293c11c8 sp=0xc4293c1020 pc=0x7cf03c

[backtrace]
@johnluan
Copy link
Author

johnluan commented Nov 6, 2017

Tested on window2012, there is no issue, so I think so far is only existing on ubuntu 16.04

@johnluan
Copy link
Author

johnluan commented Nov 6, 2017

Sorry, I jumped the gun, window has the same issue. But it is not crashed yet. The memory setting is 256, but it took almost 3 GB so far.

@skykingit
Copy link

got the same error in windows 7 ,my memory is 8G

@holiman
Copy link
Contributor

holiman commented Nov 10, 2017

@johnluan how much memory does the machine have ?
The cache is a hint to the database on how much to cache; but there are other memory allocations going on beside that, so the total memory requirements will be (a lot) larger than that.

@johnluan
Copy link
Author

johnluan commented Nov 10, 2017 via email

@karalabe
Copy link
Member

@holiman We have some memory leak during sync. It's not a full blown leak, as in it gets cleaned up after sync completes, but there's some dangling reference that prevents some objects from getting cleaned up. This causes memory use to spike when doing a big sync. I've been trying for a long while to catch it, so did @fjl, but we couldn't yet catch where the objects retain their refs.

@stephenhodgkiss
Copy link

I had the same issue .. after the fast sync failed, I the just continued in normal mode.

I used a cache of 512 and 1024 and each time it ran out of memory.

After reading the comments here, I tried it with no cache parameter, and all is good.

@LYY
Copy link

LYY commented Dec 14, 2017

My geth on a 4 core 8G ubuntu 16.04 server, every day crashed because of OOM

@0x1F602
Copy link

0x1F602 commented Dec 25, 2017

Also seeing this issue with an up to date Amazon AMI

@GoodMirek
Copy link

Same issue with latest master.

@shwill
Copy link

shwill commented Jan 26, 2018

geth is chucking RAM like it was chrome. Same behaviour for me on a 8GB node running ubuntu 16.04. Did install yesterday and started syncing over night, the systemd unit I created for geth restarted several times and it does take only a couple of minutes to drain all available memory and to make the system swap hard.

top - 08:54:58 up 11:27,  1 user,  load average: 1.89, 3.68, 3.78
Tasks: 126 total,   2 running, 124 sleeping,   0 stopped,   0 zombie
%Cpu(s): 10.9 us,  4.0 sy,  0.0 ni, 72.3 id, 12.4 wa,  0.0 hi,  0.1 si,  0.3 st
KiB Mem :  8174956 total,   144792 free,  3638332 used,  4391832 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  4163596 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 4542 root      20   0 4096548 3.435g  54856 S  61.1 44.1  10:16.78 geth

@zzd1990421
Copy link

My geth(1.8.7) on a 4 core 8G ubuntu 16.04 server, every day crashed because of OOM

@karalabe
Copy link
Member

@zzd1990421 We're tracking down a possible issue. The current master branch seems to behave a lot nicer in this respect, both from a memory and CPU consumption standpoint. Might want to try that and see how it performs.

@zzd1990421
Copy link

@karalabe now my geth version is:
geth-linux-amd64-1.8.7-66432f38
I thought this should be the latest version of geth?
I use geth for production environment so I have no confidence in master branch than a stable version.

my command:
geth --datadir /data/geth-data-dir --maxpeers 100 --maxpendpeers 100 --cache 8192 --rpc --rpcapi admin,eth,personal,net --rpcaddr my-ip

the mem monitor in a week
mem

@karalabe
Copy link
Member

The latest release is 1.8.8, but that doesn't yet contain the fix for the memory issue. If you don't want to run master, the next stable release should be out around next week Monday. You can track progress on this memory issue on #16728.

@zzd1990421
Copy link

@karalabe thanks.I'll follow this issue!

@GoodMirek
Copy link

@zzd1990421 Running with --cache 8192 on 8GB node is not going to work even if there were no leaks, because there are other needs for memory allocation than just geth cache.
I used to run with --cache 2048 on 8GB node and it used to run out of memory about once a week or so.
Last time I tried running geth was about a month ago, but for the time being I stick with parity, as it has less memory leaks and performs better than geth with the given disk subsystem.

@karalabe
Copy link
Member

@zzd1990421 Oh wow, ok, definitely don't specify more than 1/3rd of your memory for caching. Go's GC will permit junk to accumulate to twice the useful memory before cleaning up. So with an 8GB allowance, Go will flat out let the memory go up to 16GB before cleaning up. On an 8GB machine 2GB cache seems a good choice. The OS will use the remainder of the memory for disk caching too, so you're not missing out that much.

@zzd1990421
Copy link

@GoodMirek It's my mistake.The mem is actually 16GB.

@karalabe
Copy link
Member

@zzd1990421 still 8GB cache is too high, since other parts of geth will use some memory too, so (8GB cache + some memory) x 2 will overflow the 16. --cache=4096 seems a reasonable choice on that machine.

@zzd1990421
Copy link

@karalabe I've changed this parameter.
I'll watch the memory monitor to see if OOM still happen.

now my command:
geth --datadir /data/geth-data-dir --maxpeers 100 --maxpendpeers 100 --cache 4096 --rpc --rpcapi admin,eth,personal,net --rpcaddr my-ip

And I have to run geth with supervisor ☹
Thanks for your patience anyway!

@locdinh209
Copy link

@karalabe I've got the same issue. But my memory is 16 GB, OOM killed my Geth process.

When I run geth, I don't define "cache" and I think the problem is derived reason :
My command:
geth --datadir /home/ubuntu/.ethereum --rpc --rpccorsdomain=* --ws --wsorigins=* --wsaddr 0.0.0.0 --rpcaddr 0.0.0.0

@zzd1990421 Do you fixed your problem now?

Thank you for your support !

@401825317
Copy link

I had the same problem.
16GRAM
image

@401825317
Copy link

@karalabe hello, can you help me?

@marcosmartinez7
Copy link

@karalabe my eth node is increasing the memory usage like +10mb per hour, the memory used is still growing slowly.

I have run the geth full node without specifying the cache, so it is using the default value (1GB)

Is this a normal behaviour because of the junk accumulated (since the garbage collector will clean up at 2Gb rigth)?

Thanks

@adamschmideg adamschmideg added this to the Backlog milestone Nov 20, 2018
@icodezjb
Copy link
Contributor

i've got the same issue. and solved it.

my geth v1.8.27, 16G Memory, four nodes with POA consensus algorithm, one node generate blocks and other three nodes sync blocks for TPS tests, on the generate blocks node, the memory used is growing with txpool incoming lots of txs continually.

i use "go tool pprof" analyze the issue, and find the root cause: AsyncSendTransactions function in peer.go

Fortunately, the PR #19702 fix it

and i cherry-pick it on my geth v1.8.27

@karalabe
Copy link
Member

karalabe commented Sep 8, 2020

We've merged in various memory fixes since this issue was opened, particularly even in the next release 1.9.21 there's one leak fix for fast sync. There's not much information to go on in this PR, so I'll close and ask you to open a fresh one if something still persists.

@karalabe karalabe closed this as completed Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests