Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

main: fix #15858 by setting dropPeerFn in copydb #15992

Merged
merged 1 commit into from
Feb 5, 2018

Conversation

holiman
Copy link
Contributor

@holiman holiman commented Jan 30, 2018

Fixes #15858 by setting the peer-drop-function to non-nil.

@holiman
Copy link
Contributor Author

holiman commented Jan 30, 2018

Test:

INFO [01-30|09:02:23] Imported new block receipts              count=4   elapsed=123.965µs bytes=1096 number=6   hash=1f1aed…6b326e ignored=0
WARN [01-30|09:02:23] Downloader tried to fake-peer; ignoring. peer=local
INFO [01-30|09:02:23] Imported new block headers               count=192 elapsed=29.385ms  number=384 hash=d3d5d5…c79cf3 ignored=0

dl := downloader.New(syncmode, chainDb, new(event.TypeMux), chain, nil, nil)

dl := downloader.New(syncmode, chainDb, new(event.TypeMux), chain, nil, func(id string) {
log.Warn("Downloader tried to fake-peer; ignoring.", "peer", id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"fake" -> "drop"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...doh

@karalabe
Copy link
Member

All in all I'm a bit unsure about this PR:

  • If we're syncing with ourselves, why do we want to drop? That seems to point to a bug in the downloader and which should rather be fixed.
  • Also my gut feeling is that this PR would break some internal downloader logic which assumes that the peer will be gone and can be cleaned up after it vs. this code just ignoring it. I'm afraid of some unforeseen side effects.

@holiman
Copy link
Contributor Author

holiman commented Jan 30, 2018

If we're syncing with ourselves, why do we want to drop?

In my case, it occurred when I was doing a measurement using @fjl's memsize tool, which stops everything and scans memory for a minute or so. I'm assuming that when it starts again, the downloader suddenly finds that the time has progressed, and that the local peer obviously hasn't responded in time.

For the other cases, I'm assuming disk IO problems.

I'm afraid of some unforeseen side effects.

Well, worse than panic ?

@karalabe
Copy link
Member

For the other cases, I'm assuming disk IO problems.

Fair point.

Well, worse than panic ?

Ok, I can live with this PR if you can "prove" that a noop drop won't corrupt the downloader. I.e. please skim through the downloader and make sure there won't be something going wrong.

@holiman
Copy link
Contributor Author

holiman commented Jan 30, 2018

Ok, I can live with this PR if you can "prove" that a noop drop won't corrupt the downloader. I.e. please skim through the downloader and make sure there won't be something going wrong.

I added a snippet so that the drop-case runs immediately after it receives a packet. It stops importing headers/receipts, and continues only downloading states... myeah it's not great...

@holiman
Copy link
Contributor Author

holiman commented Jan 30, 2018

So if I actually do drop it, we get this:

INFO [01-30|10:21:27] Imported new block headers               count=1344 elapsed=161.532ms number=27648 hash=e3446d…019e62 ignored=0
WARN [01-30|10:21:27] Downloader wants to drop fake-peer       peer=local
WARN [01-30|10:21:27] Node data write error                    err="state node feab52…7a51be failed with all peers (0 tries, 0 peers)"
INFO [01-30|10:21:27] Imported new block headers               count=2048 elapsed=159.505ms number=29696 hash=1d7dd6…675b9f ignored=0
WARN [01-30|10:21:28] Rolled back headers                      count=2048 header=29696->27648 fast=26304->26304 block=0->0
WARN [01-30|10:21:28] Synchronisation failed, dropping peer    peer=local err="no peers available or all tried for download"
WARN [01-30|10:21:28] Downloader wants to drop fake-peer       peer=local
ERROR[01-30|10:21:28] Failed to unregister sync peer           peer=local err="peer is not registered"
no peers available or all tried for download

@holiman
Copy link
Contributor Author

holiman commented Jan 30, 2018

Ok, maybe this is the best solution:

diff --git a/cmd/geth/chaincmd.go b/cmd/geth/chaincmd.go
index e094484..918ce9e 100644
--- a/cmd/geth/chaincmd.go
+++ b/cmd/geth/chaincmd.go
@@ -298,9 +298,7 @@ func copyDb(ctx *cli.Context) error {
 
        syncmode := *utils.GlobalTextMarshaler(ctx, utils.SyncModeFlag.Name).(*downloader.SyncMode)
 
-       dl := downloader.New(syncmode, chainDb, new(event.TypeMux), chain, nil, func(id string) {
-               log.Warn("Downloader tried to drop fake-peer; ignoring.", "peer", id)
-       })
+       dl := downloader.New(syncmode, chainDb, new(event.TypeMux), chain, nil,nil)
 
        // Create a source peer to satisfy downloader requests from
        db, err := ethdb.NewLDBDatabase(ctx.Args().First(), ctx.GlobalInt(utils.CacheFlag.Name), 256)
[~/go/src/github.com/ethereum/go-ethereum]
#git diff eth/downloader/downloader.go
diff --git a/eth/downloader/downloader.go b/eth/downloader/downloader.go
index b338129..1c393d3 100644
--- a/eth/downloader/downloader.go
+++ b/eth/downloader/downloader.go
@@ -853,6 +853,11 @@ func (d *Downloader) fetchHeaders(p *peerConnection, from uint64) error {
                        getHeaders(from)
 
                case <-timeout.C:
+
+                       if d.dropPeer == nil{
+                               p.log.Warn("Downloader wants to drop peer, but peerdrop-function is not set")
+                               break
+                       }
                        // Header retrieval timed out, consider the peer bad and drop
                        p.log.Debug("Header request timed out", "elapsed", ttl)
                        headerTimeoutMeter.Mark(1)

@holiman holiman force-pushed the fix_15858 branch 2 times, most recently from 12950fe to 50e7430 Compare January 30, 2018 09:34
@holiman
Copy link
Contributor Author

holiman commented Jan 30, 2018

Ok, updated the fix to check if function is nil instead, and if so just log a warn and ignore the timeout

Copy link
Member

@karalabe karalabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally ok. However I'd like to ask you to add a comment above all new log.Warn messages, along the lines of:

// The dropPeer method is nil only for copydb, which can call it if compaction hits at the wrong time

@@ -853,6 +856,11 @@ func (d *Downloader) fetchHeaders(p *peerConnection, from uint64) error {
getHeaders(from)

case <-timeout.C:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please drop this empty line here.

@karalabe karalabe added this to the 1.8.0 milestone Feb 2, 2018
@holiman
Copy link
Contributor Author

holiman commented Feb 2, 2018

Addressed concerns, and squashed ☑️

Copy link
Member

@karalabe karalabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@karalabe karalabe merged commit bc0666f into ethereum:master Feb 5, 2018
prestonvanloon pushed a commit to prestonvanloon/go-ethereum that referenced this pull request Apr 2, 2018
mariameda pushed a commit to NiluPlatform/go-nilu that referenced this pull request Aug 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants