Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long tx broadcasting delays on geth 1.8.6 #16617

Closed
vogelito opened this issue Apr 30, 2018 · 11 comments
Closed

Long tx broadcasting delays on geth 1.8.6 #16617

vogelito opened this issue Apr 30, 2018 · 11 comments

Comments

@vogelito
Copy link

System information

Geth version:

Geth
Version: 1.8.6-stable
Git Commit: 12683feca7483f0b0bf425c3c520e2724f69f2aa
Architecture: amd64
Protocol Versions: [63 62]
Network Id: 1
Go Version: go1.10
Operating System: linux
GOPATH=
GOROOT=/usr/local/go

OS & Version: OSX

Expected behaviour

After calling eth.sendTransaction, the local node should immediately broadcast the transaction to other nodes in the network.

Actual behaviour

After upgrading to 1.8.6 (from 1.8.3) we see that our transactions (sent by calling eth.sendTransaction) are not seen by other nodes for 15-30 minutes. Our node has peers and it is fully synced.

Steps to reproduce the behaviour

Unsure, we just upgraded to 1.8.6 and noticed the issue about 32 hours afterwards. We don't have active monitoring for this, so unsure whether the condition degraded or if started happening since the upgrade.

Backtrace

Attaching debug.stack() output: 2018.04.30_geth_1.8.6_debug_stacks.log

@mtbitcoin
Copy link

i can concur that I've observed the same, especially with the pending transactions. It takes awhile to get propogated to the other nodes

@djken2006
Copy link

djken2006 commented May 3, 2018

When subscribe to pending transactions, I receive transactions which were already mined several minutes ago.
Version: 1.8.6-stable and Version: 1.8.7-stable

@ryanschneider
Copy link
Contributor

Related to #14669.

I have a code change here ryanschneider@7b4f6c0 that we've been running that seems to help.

Notice in the debug.stack the large number of promoteTx calls with a very long times blocked on chan receive. We were seeing the same issue until we started running with the above commit (and the other changes I mentioned in #14669).

However, of the 3 changes, the one above is definitely the safest, so I'll go ahead and send it as PR and reference this issue.

@vogelito
Copy link
Author

vogelito commented May 4, 2018

Thanks @ryanschneider

@holiman
Copy link
Contributor

holiman commented May 18, 2018

Some update on this, we've indentifed a couple of quirks which affects both memory and transaction propagation. The fixes are in progress.

@vogelito
Copy link
Author

vogelito commented Jun 5, 2018

Thanks @holiman - we have secondary nodes and scripts to rebroadcast transactions to fix this issue. Should we attempt turning them off to see if geth is handling things correctly now? We're running geth v1.8.10

@holiman
Copy link
Contributor

holiman commented Jun 5, 2018

The memory problems that were basically fixed by changing the broadcast process. Earlier, we basically had queue which is broadcast serially to all peers. A problematic peer could jam the broadcast, causing two problems:

  1. Hanging/blocked go-routines, eating memory
  2. Failure to broadcast the transactions to the other peers

We changed that to have N channels for N peers. So in case a peer is problematic, that does not affect the broadcast to the other peers. And also, the channel to that peer will start dropping messages once it becomes full, so it does not just pile up in memory.

So I believe it is fixed, and it would be great to get confirmation of that -- if you disable your custom broadcaster and checks if it seems to work.

@vogelito
Copy link
Author

vogelito commented Jun 5, 2018

Sounds good, we'll do that and report back

@stale
Copy link

stale bot commented Jun 6, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status:inactive label Jun 6, 2019
@adamschmideg
Copy link
Contributor

I assume this issue is already resolved in the current version. If you experience a similar problem, please open a new issue.

@vogelito
Copy link
Author

Sorry for never reporting back. I can confirm that we no longer see this issue. Thanks, as always, for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants