Memory used increasing slowly #17450

marcosmartinez7 · 2018-08-20T14:15:14Z

System information

Geth version: 1.8.12 stable
OS & Version: Linux 16.04

Expected behaviour

Memory usage stays constant

Actual behaviour

im running the node without the rpcapis, the node started 3 days ago using 1.9% of my RAM (8gb). Now is consuming 2.3% and it keeps increasing slowly (like 10mb / h).

I ran the node without specifing the --cache flag, so i assume it is using 1gb.

Is this something that i must worry about or maybe is related with the garbage collection?

Steps to reproduce the behaviour

I ran the node with this command:

geth --datadir e1/ --syncmode 'full' --port 30357 --rpc --rpcport 8545 --rpccorsdomain '*' --rpcaddr 'server_ip' --ws --wsaddr "server_ip" --wsorigins "some_ip" --wsport 9583 --wsapi 'db,eth,net,web3,txpool,miner' --networkid 21 --gasprice '1'

The text was updated successfully, but these errors were encountered:

karalabe · 2018-08-20T14:21:53Z

Opening up the HTTP/WebSocket interfaces to outside traffic is dangerous because people are actively trying to break into nodes. Are you sure you need remote access to your node via HTTP? That should only be used if you're behind a firewall and can control access. Couldn't you use SSH + IPC to attack to a remote node?

With regard to memory use and RPC, what requests are you making? I can imagine that there might be some leak in our code, but providing some details about your usage could be invaluable to track it down.

marcosmartinez7 · 2018-08-20T14:27:23Z

Hi, yes, but i need to hit the smart contract from any origin. Is there any way to accomplish that without opening up the HTTP interface? is a private network that must be accessed from any origin (metamask, myetherwallet, any etherem wallet..)

About memory, the memory usage increase with every transaction submitted. I thought that it might be related to the garbage collector, maybe isnt executing... i have send 100 transactions to be submitted and that increased my ram usage about 50MB

Is there any information that i can provide in order to identify the possible leak?

I think that the problem might be related with the fact that the RPC port is opened and anyone could be making something that reserves memory... the node started with:

geth --datadir e1/ --syncmode 'full' --port 30357 --rpc --rpcport 8545 --rpccorsdomain '*' --rpcaddr 'server_ip' --ws --wsaddr "server_ip" --wsorigins "some_ip" --wsport 9583 --wsapi 'db,eth,net,web3,txpool,miner' --networkid 21 --gasprice '1'

so there is no exposed rpcapi, only ws but from some specific IP

any ideas of how i can troubleshoot this? it is only increasing when i send transactions to the blockchain using metamask..

marcosmartinez7 · 2018-08-20T23:06:38Z

Recently i get attached to a console and send 400 transactions (hitting a Smart contract) in a for loop.

The memory used increased 400mb

any idea of what can i research in order to check what is causing this?

After 20 - 30 minutes it goes back to the previous ram.. or some MB more.. is this normal? the tendency is that the time passes and the memory used increases when transactions are submitted..

marcosmartinez7 · 2018-08-21T14:25:32Z

I have tried on another chain that doesnt expose RPC endpoint, sending 5000 transactions from geth console.

Memory started at 1140 MB and after 5000 transactions grows to 1550 MB, so geth takes 400 MB in order to process those transactions.

Since the block time is 15 seconds, it will take a while to confirm those transactions, so .. it is normal to stay into 1550MB for a while ? also.. the cache memory is still increasing..

is there anything that i can share with us in order to check if this behavior is ok? it seems like after 10k transactions the cache used by Geth growed 400mb and the used memory also increased.. the values are not constant after sending a batch of transactions, maybe is standard for geth to consume more memory the more heavy the chain is

Also, without RPC interactions the cache used increases.

cdljsj · 2019-01-07T22:08:03Z

The problem gets worse in 1.8.20, I have to restart geth node every half day due to the high memory usage.

hapsody · 2019-01-08T00:57:48Z

Same problem here
I used --cache flag (--cache "64") but problem is still occured

Version: 1.8.16-stable
Git Commit: 477eb09
Architecture: amd64
Protocol Versions: [63 62]
Network Id: 1
Go Version: go1.11
Operating System: linux
GOPATH=
GOROOT=/home/travis/.gimme/versions/go1.11.linux.amd64

top - 00:58:06 up 21:28, 3 users, load average: 2.39, 2.18, 1.28
Tasks: 129 total, 1 running, 128 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.2 sy, 0.0 ni, 19.6 id, 79.8 wa, 0.0 hi, 0.0 si, 0.5 st
KiB Mem : 3985340 total, 116412 free, 3826964 used, 41964 buff/cache
KiB Swap: 7812496 total, 5197612 free, 2614884 used. 7448 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3811 ubuntu 20 0 7055996 3.570g 4120 D 2.0 93.9 2:07.48 geth

hadv · 2019-01-26T16:06:58Z

I have the same problem. It seems there're memory leak somewhere. And it seems that if node is mining and open the rpc endpoint have no problem.

If the node is not mining then the memory increase regularly as @marcosmartinez7 report when sending thousand of tnx continuously via rpc endpoint.

marcosmartinez7 · 2019-01-27T19:32:01Z

On 1.8.18 version i havent experimented anymore this problem, i mean, the memory is increasing but it reachs a stable max value. T

ake in care that geth has in memory some information that is written to disk on each epoch, and an epoch is about 20-30k blocks.

So if youre using less than 4GB of RAM i think this can happen.

hadv · 2019-01-28T16:17:48Z

Thank you for your information @marcosmartinez7

By the way, I still can reproduce the issue on 1.8.20 & 1.8.21 with 16GB of RAM by sending 10 thousands of tnx (2000 txs/block sealing & txpool always fill up with ~10,000 txs)

And one thing very strange is mining node with public RPC endpoint don't have the issue on the same computing configuration. So I think there's must be memory leak somewhere.

mining

un-mining (memory leak)

hadv · 2019-01-29T02:59:02Z

cc: @karalabe @fjl
goroutine leak on unmining RPC node

mining node (no leak)

hadv · 2019-01-29T03:26:34Z

almost the goroutine leak is calling feed.Send() as below

goroutine profile: total 49352
48057 @ 0x44384b 0x4438f3 0x41b15e 0x41ae4b 0x6edc6f 0x472051
#	0x6edc6e	github.com/ethereum/go-ethereum/event.(*Feed).Send+0x12e	/home/admin/gonex/build/_workspace/src/github.com/ethereum/go-ethereum/event/feed.go:133

hadv · 2019-01-29T03:43:09Z

the goroutine leak might be the case of feed.Send() blocking issue reported on #18021

fjl · 2019-01-29T16:54:36Z

The issue is not that Feed.Send() is blocking, it's that the send to the feed happens in a background goroutine. Please provide a longer stack trace so we can see which part of the system is trying to send on the feed.

hadv · 2019-01-30T00:56:10Z

The issue is not that Feed.Send() is blocking, it's that the send to the feed happens in a background goroutine.

I think that the issue happen as @liuzhijun23 figure out in the #18021 when the for loop is blocked in feed.Send() then another call will be stuck at line 133 <-f.sendLock because at which the f.sendLock is empty

Please provide a longer stack trace so we can see which part of the system is trying to send on the feed.

it's almost from the txpool

goroutine 1267548 [chan receive, 1 minutes]:
github.com/ethereum/go-ethereum/event.(*Feed).Send(0xc0082560b0, 0xe93bc0, 0xc015639a20, 0xc0706cf0c8)
	/home/admin/gonex/build/_workspace/src/github.com/ethereum/go-ethereum/event/feed.go:133 +0x12f
created by github.com/ethereum/go-ethereum/core.(*TxPool).promoteExecutables
	/home/admin/gonex/build/_workspace/src/github.com/ethereum/go-ethereum/core/tx_pool.go:1098 +0x17f8

hadv · 2019-01-30T07:49:02Z

@fjl a clue: on no-mining node, the TrySend() return false very frequency then make more and more goroutine leak overtime. On the mining node, TrySend() always succeed

			if cases[i].Chan.TrySend(rvalue) {
				nsent++
				cases = cases.deactivate(i)
				i--
			}

holiman · 2019-01-30T08:24:37Z

@hadv can you produce a full trace and upload somewhere? There's a debug_stacks/debug.stacks() method to dump out the trace. It will output it to stdout, so if you do geth .... console > stacks.txt and then later execute debug.stacks() it should be caught to file.

That should show what particular receiver is bottlenecking the events

OBS: if you do > stacks.txt you won't see the actual console, but it'll still work if you type in the command

hadv · 2019-01-30T08:46:45Z

@holiman hope this will help

https://drive.google.com/open?id=1xK6F95bmLsleuvM9qMQaH2Y7K9Gk2ZFD

holiman · 2019-01-30T09:01:27Z

You've got 203 threads stuck on

1: semacquire [Created by http.(*Server).Serve @ server.go:2851]
    sync       sema.go:71                      runtime_SemacquireMutex(*uint32(#1648), bool(#6018))
    sync       rwmutex.go:50                   (*RWMutex).RLock(*RWMutex(#1647))
    miner      worker.go:252                   (*worker).pending(#1646, 0, 0)
    miner      miner.go:155                    (*Miner).Pending(#1676, #28503, #129)
    eth        api_backend.go:92               (*EthAPIBackend).StateAndHeaderByNumber(#1920, #27, #16012, 0xfffffffffffffffe, #45, 0, 0, 0x5208)
    ethapi     api.go:700                      (*PublicBlockChainAPI).doCall(#1121, #27, #16012, #28616, #28747, #244, #22014, 0x551ae, 0, 0, ...)
    ethapi     api.go:791                      (*PublicBlockChainAPI).EstimateGas.func1(0x551ae, #26)
    ethapi     api.go:800                      (*PublicBlockChainAPI).EstimateGas(#1121, #27, #16012, #28616, #28747, #244, #22014, 0x551ae, 0, 0, ...)
    reflect    value.go:447                    Value.call(string(#2177, len=824635791808), []Value(0x13 len=16418845 cap=4), #2846, 0x3, 0x4, 0x0, #2846, ...)
    reflect    value.go:308                    Value.Call([]Value(#2177 len=824635791808 cap=19), #2846, 0x3, 0x4, 0x1, 0x1, 0x0)
    rpc        server.go:309                   (*Server).handle(#3676, #27, #16012, #37, #23615, #2845, #23616, 0, 0xe630e0)
    rpc        server.go:330                   (*Server).exec(#3676, #27, #16012, #37, #23615, #2845)
    rpc        server.go:192                   (*Server).serveRequest(#3676, #29, #27953, #37, #23615, 0xfb1801, 0x1, 0, 0)
    rpc        server.go:223                   (*Server).ServeSingleRequest(#3676, #29, #27953, #37, #23615, 0x1)
    rpc        http.go:257                     (*Server).ServeHTTP(#3676, #24, #12949, #14736)
    cors       cors.go:190                     (*Cors).Handler.func1(#24, #12949, #14736)
    http       server.go:1964                  HandlerFunc.ServeHTTP(ResponseWriter(#3660), *Request(#12949), #14736)
    rpc        http.go:324                     (*virtualHostHandler).ServeHTTP(#3661, #24, #12949, #14736)
    http       server.go:2741                  serverHandler.ServeHTTP(ResponseWriter(#3680), *Request(#12949), #14736)

If you are batch-adding thousands of transactions, and doing an estimategas on each and every one, it will be quite resource intensive for the node. They will each be competing for the lock to obtain a particular state.
I guess a more batch-friendly method could be used, that would take the pending state, and reuse it for every tx in the batch. But however we do it, it'll be messy -- e.g. do we want to apply transactions on top of eachother or reset the state again after each?

hadv · 2019-01-30T09:12:46Z

@holiman Can you please explain us why only not-mining node need to run below code? That's might be the reason why only not-mining node face the goroutine leak issue, right? Thank you!

		case ev := <-w.txsCh:
			// Apply transactions to the pending state if we're not mining.
			//
			// Note all transactions received may not be continuous with transactions
			// already included in the current mining block. These transactions will
			// be automatically eliminated.
			if !w.isRunning() && w.current != nil {
				w.mu.RLock()
				coinbase := w.coinbase
				w.mu.RUnlock()

				txs := make(map[common.Address]types.Transactions)
				for _, tx := range ev.Txs {
					acc, _ := types.Sender(w.current.signer, tx)
					txs[acc] = append(txs[acc], tx)
				}
				txset := types.NewTransactionsByPriceAndNonce(w.current.signer, txs)
				w.commitTransactions(txset, coinbase, nil)
				w.updateSnapshot()
			} else {

holiman · 2019-01-30T09:41:06Z

I don't know yet.. However, there appears to be ~10K routines spawned by promoteExecutables, that are waiting on the lock in feed.go. One is busy in the loop.

I think an underlying problem is that a better model for the transaction handling would be to use active objects (one thread/routine) which receives data, instead of each sender spawning it's own goroutine (https://github.com/ethereum/go-ethereum/blob/master/core/tx_pool.go#L1000) . Btw, @hadv , I don't know what code you're running, but the line numbers from your stack does not match up with what's on master now.

I'm not sure if there are any simple solutions to this ticket, since IMO it would probably require a non-trivial rewrite of tx pool internals.

hadv · 2019-01-30T09:45:54Z

okay, thank you for your information. For the code I'm adding some log to figure out the issue then the line of code might be difference with the master but the logic are the same though.

bishaoqing · 2019-01-30T11:55:59Z

@holiman Can you please explain us why only not-mining node need to run below code? That's might be the reason why only not-mining node face the goroutine leak issue, right? Thank you!

		case ev := <-w.txsCh:
			// Apply transactions to the pending state if we're not mining.
			//
			// Note all transactions received may not be continuous with transactions
			// already included in the current mining block. These transactions will
			// be automatically eliminated.
			if !w.isRunning() && w.current != nil {
				w.mu.RLock()
				coinbase := w.coinbase
				w.mu.RUnlock()

				txs := make(map[common.Address]types.Transactions)
				for _, tx := range ev.Txs {
					acc, _ := types.Sender(w.current.signer, tx)
					txs[acc] = append(txs[acc], tx)
				}
				txset := types.NewTransactionsByPriceAndNonce(w.current.signer, txs)
				w.commitTransactions(txset, coinbase, nil)
				w.updateSnapshot()
			} else {

it might want to refresh the pending state,so that the rpc-client can get the latest pending information,for example, nonce of account, and users don't have to maintain the information themselves

hadv · 2019-01-30T12:24:00Z

it might want to refresh the pending state,so that the rpc-client can get the latest pending information,for example, nonce of account, and users don't have to maintain the information themselves

that we can understand obviously but the mining and non-mining node use difference way to apply the pending tnx and the way of non-mining node make goroutine leak.

fjl · 2019-01-30T14:12:57Z

I think a simple fix to try would be removing the go on this line https://github.com/ethereum/go-ethereum/blob/master/core/tx_pool.go#L1000. There should be no downside to this (if I'm looking right) because promoteExecutables is called from two places: addTx and reset. Removing the goroutine would just mean that the callers of those methods need to wait until the events have been delivered.

hadv · 2019-01-31T03:52:46Z

Removing the goroutine would just mean that the callers of those methods need to wait until the events have been delivered.

Yeah, I'm afraid that in case Feed.Send() is blocked then whole txpool is blocked also

hadv · 2019-02-01T03:57:39Z

by the way, I think we have enough information for the issue then could you remove the label need:more-information? @fjl Thanks!

0x234 · 2019-02-08T03:58:25Z

Here's some data from a geth running 1.8.22 over the past week: https://imgur.com/a/2LJpY8U

The first drop was when the node was upgraded from 1.8.21 to 1.8.22. The second one was from where the node was restarted with mining enabled.

hadv · 2019-02-08T05:50:27Z

by the way, I think we have enough information for the issue then could you remove the label need:more-information? @fjl Thanks!

@karalabe @holiman @fjl Will we have any update on this issue in short term?

hadv · 2019-02-08T16:09:12Z

@fjl any more information do you need for this issue?

no-response · 2019-02-28T17:52:36Z

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have more relevant information or answers to our questions so that we can investigate further.

hadv · 2019-03-01T01:01:07Z

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have more relevant information or answers to our questions so that we can investigate further.

@karalabe @fjl @holiman We already provide the details information. Please remove the inappropriate label and re-open this issue. Thank you!

hadv · 2019-03-01T12:50:33Z

open new issue #19192

passionofvc mentioned this issue Sep 19, 2018

debug.traceTransaction cause geth run in 99% RAM usage when geth in syncing? #17394

Closed

fjl added the special:memory label Jan 25, 2019

hadv mentioned this issue Jan 28, 2019

Memory leak due to big.Int(rlp encode/decode) #18954

Closed

hadv mentioned this issue Jan 29, 2019

I find a bug in event/feed.go #18021

Closed

fjl self-assigned this Jan 29, 2019

fjl added the need more information label Jan 29, 2019

hadv mentioned this issue Feb 25, 2019

core: fix transaction event asynchronicity #16843

Merged

no-response bot closed this as completed Feb 28, 2019

hadv mentioned this issue Mar 1, 2019

goroutine memory leak #19192

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory used increasing slowly #17450

Memory used increasing slowly #17450

marcosmartinez7 commented Aug 20, 2018

karalabe commented Aug 20, 2018

marcosmartinez7 commented Aug 20, 2018 •

edited

Loading

marcosmartinez7 commented Aug 20, 2018 •

edited

Loading

marcosmartinez7 commented Aug 21, 2018 •

edited

Loading

cdljsj commented Jan 7, 2019

hapsody commented Jan 8, 2019 •

edited

Loading

hadv commented Jan 26, 2019

marcosmartinez7 commented Jan 27, 2019 •

edited

Loading

hadv commented Jan 28, 2019

hadv commented Jan 29, 2019

hadv commented Jan 29, 2019

hadv commented Jan 29, 2019

fjl commented Jan 29, 2019 •

edited

Loading

hadv commented Jan 30, 2019 •

edited

Loading

hadv commented Jan 30, 2019

holiman commented Jan 30, 2019 •

edited

Loading

hadv commented Jan 30, 2019

holiman commented Jan 30, 2019

hadv commented Jan 30, 2019 •

edited

Loading

holiman commented Jan 30, 2019

hadv commented Jan 30, 2019

bishaoqing commented Jan 30, 2019

hadv commented Jan 30, 2019

fjl commented Jan 30, 2019 •

edited

Loading

hadv commented Jan 31, 2019

hadv commented Feb 1, 2019

0x234 commented Feb 8, 2019

hadv commented Feb 8, 2019

hadv commented Feb 8, 2019

no-response bot commented Feb 28, 2019

hadv commented Mar 1, 2019

hadv commented Mar 1, 2019

Memory used increasing slowly #17450

Memory used increasing slowly #17450

Comments

marcosmartinez7 commented Aug 20, 2018

System information

Expected behaviour

Actual behaviour

Steps to reproduce the behaviour

karalabe commented Aug 20, 2018

marcosmartinez7 commented Aug 20, 2018 • edited Loading

marcosmartinez7 commented Aug 20, 2018 • edited Loading

marcosmartinez7 commented Aug 21, 2018 • edited Loading

cdljsj commented Jan 7, 2019

hapsody commented Jan 8, 2019 • edited Loading

hadv commented Jan 26, 2019

marcosmartinez7 commented Jan 27, 2019 • edited Loading

hadv commented Jan 28, 2019

hadv commented Jan 29, 2019

hadv commented Jan 29, 2019

hadv commented Jan 29, 2019

fjl commented Jan 29, 2019 • edited Loading

hadv commented Jan 30, 2019 • edited Loading

hadv commented Jan 30, 2019

holiman commented Jan 30, 2019 • edited Loading

hadv commented Jan 30, 2019

holiman commented Jan 30, 2019

hadv commented Jan 30, 2019 • edited Loading

holiman commented Jan 30, 2019

hadv commented Jan 30, 2019

bishaoqing commented Jan 30, 2019

hadv commented Jan 30, 2019

fjl commented Jan 30, 2019 • edited Loading

hadv commented Jan 31, 2019

hadv commented Feb 1, 2019

0x234 commented Feb 8, 2019

hadv commented Feb 8, 2019

hadv commented Feb 8, 2019

no-response bot commented Feb 28, 2019

hadv commented Mar 1, 2019

hadv commented Mar 1, 2019

marcosmartinez7 commented Aug 20, 2018 •

edited

Loading

marcosmartinez7 commented Aug 20, 2018 •

edited

Loading

marcosmartinez7 commented Aug 21, 2018 •

edited

Loading

hapsody commented Jan 8, 2019 •

edited

Loading

marcosmartinez7 commented Jan 27, 2019 •

edited

Loading

fjl commented Jan 29, 2019 •

edited

Loading

hadv commented Jan 30, 2019 •

edited

Loading

holiman commented Jan 30, 2019 •

edited

Loading

hadv commented Jan 30, 2019 •

edited

Loading

fjl commented Jan 30, 2019 •

edited

Loading