trie: Fix concurrent map access on trie.dirties #23674

piersy · 2021-10-01T15:14:10Z

The dirties map of trie.Database was not read protected inside the
commit function, since the trie.Database is called from many go-routines
this sometimes lead to concurrent map read and write errors.

The solution is to read lock over trie.dirties in commit

The dirties map of trie.Database was not read protected inside the commit function, since the trie.Database is called from many go-routines this sometimes lead to concurrent map read and write errors. The solution is to read lock over trie.dirties in commit

holiman · 2021-10-01T18:14:52Z

The commit caller should already hold the lock (the docstring says "commit is the private locked version of Commit."). If so, this PR introduces a double-lock.
Do you actually have a stack trace from a read/write error? E.g. running with the race detector?

piersy · 2021-10-04T11:34:03Z

Hi @holiman, I do have a stack trace, it's from https://github.com/celo-org/celo-blockchain (a fork) but I think it is still representative of the go-ethereum behaviour. I've added it below in case that helps.

So I'm not sure whether the comment or the implementation of Commit is at fault here, but if you look at Commit you can see that it does not acquire the lock before calling commit

go-ethereum/trie/database.go

Lines 695 to 720 in b5a3d8e

    
           func (db *Database) Commit(node common.Hash, report bool, callback func(common.Hash)) error { 
        
           	// Create a database batch to flush persistent data out. It is important that 
        
           	// outside code doesn't see an inconsistent state (referenced data removed from 
        
           	// memory cache during commit but not yet in persistent storage). This is ensured 
        
           	// by only uncaching existing data when the database write finalizes. 
        
           	start := time.Now() 
        
           	batch := db.diskdb.NewBatch() 
        
           	// Move all of the accumulated preimages into a write batch 
        
           	if db.preimages != nil { 
        
           		rawdb.WritePreimages(batch, db.preimages) 
        
           		// Since we're going to replay trie node writes into the clean cache, flush out 
        
           		// any batched pre-images before continuing. 
        
           		if err := batch.Write(); err != nil { 
        
           			return err 
        
           		} 
        
           		batch.Reset() 
        
           	} 
        
           	// Move the trie itself into the batch, flushing if enough data is accumulated 
        
           	nodes, storage := len(db.dirties), db.dirtiesSize 
        
           	uncacher := &cleaner{db} 
        
           	if err := db.commit(node, batch, uncacher, callback); err != nil { 
        
           		log.Error("Failed to commit trie from trie database", "err", err) 
        
           		return err 
        
           	}

fatal error: concurrent map read and map write

goroutine 19213 [running]:
runtime.throw(0x1404e87, 0x21)
	/usr/local/go/src/runtime/panic.go:1117 +0x72 fp=0xc006986ef8 sp=0xc006986ec8 pc=0x4def12
runtime.mapaccess2(0x12ab400, 0xc012f84360, 0xc006986fa0, 0x20, 0xc01a5ad5c0)
	/usr/local/go/src/runtime/map.go:469 +0x255 fp=0xc006986f38 sp=0xc006986ef8 pc=0x4b5ff5
github.com/celo-org/celo-blockchain/trie.(*Database).commit(0xc012938f70, 0xb4437b13ba81435f, 0xa8bb0a7fd9a4ecb4, 0xa5876800b1d37e3a, 0xbcd94e42d5940f2f, 0x1707288, 0xc01a5fe540, 0xc015b1e0f0, 0x0, 0x0, ...)
	/home/piersy/projects/celo-blockchain/trie/database.go:781 +0x9f fp=0xc006987030 sp=0xc006986f38 pc=0x7715df
github.com/celo-org/celo-blockchain/trie.(*Database).commit.func1(0xb4437b13ba81435f, 0xa8bb0a7fd9a4ecb4, 0xa5876800b1d37e3a, 0xbcd94e42d5940f2f)
	/home/piersy/projects/celo-blockchain/trie/database.go:788 +0x85 fp=0xc0069870a0 sp=0xc006987030 pc=0x782a85
github.com/celo-org/celo-blockchain/trie.forGatherChildren(0x16f9d70, 0xc018c37ed8, 0xc006987520)
	/home/piersy/projects/celo-blockchain/trie/database.go:209 +0x1f8 fp=0xc006987230 sp=0xc0069870a0 pc=0x76c218
github.com/celo-org/celo-blockchain/trie.forGatherChildren(0x16f9d98, 0xc019096240, 0xc006987520)
	/home/piersy/projects/celo-blockchain/trie/database.go:206 +0xe5 fp=0xc0069873c0 sp=0xc006987230 pc=0x76c105
github.com/celo-org/celo-blockchain/trie.(*cachedNode).forChilds(0xc0170e3f80, 0xc006987520)
	/home/piersy/projects/celo-blockchain/trie/database.go:194 +0x110 fp=0xc006987470 sp=0xc0069873c0 pc=0x76c010
github.com/celo-org/celo-blockchain/trie.(*Database).commit(0xc012938f70, 0x5230d078f0e96fa3, 0xafc9ae992a2ca510, 0x29bdd84110a0a1c5, 0x3a45b8c51e4c1dd3, 0x1707288, 0xc01a5fe540, 0xc015b1e0f0, 0x0, 0x41, ...)
	/home/piersy/projects/celo-blockchain/trie/database.go:786 +0x16e fp=0xc006987568 sp=0xc006987470 pc=0x7716ae
github.com/celo-org/celo-blockchain/trie.(*Database).Commit(0xc012938f70, 0x5230d078f0e96fa3, 0xafc9ae992a2ca510, 0x29bdd84110a0a1c5, 0x3a45b8c51e4c1dd3, 0x1, 0x0, 0x0, 0x0)
	/home/piersy/projects/celo-blockchain/trie/database.go:740 +0x4d1 fp=0xc006987778 sp=0xc006987568 pc=0x770af1
github.com/celo-org/celo-blockchain/core.(*BlockChain).Stop(0xc016113200)
	/home/piersy/projects/celo-blockchain/core/blockchain.go:1021 +0x685 fp=0xc006987a78 sp=0xc006987778 pc=0xb8a245
github.com/celo-org/celo-blockchain/eth.(*Ethereum).Stop(0xc00f184b00, 0xc01a2feb40, 0x4acb85)
	/home/piersy/projects/celo-blockchain/eth/backend.go:592 +0x125 fp=0xc006987ac8 sp=0xc006987a78 pc=0xda0805
github.com/celo-org/celo-blockchain/node.(*Node).stopServices(0xc00f1ef790, 0xc012d3c890, 0x1, 0x1, 0xb00c0b, 0xc00fff1920)
	/home/piersy/projects/celo-blockchain/node/node.go:309 +0xc2 fp=0xc006987b48 sp=0xc006987ac8 pc=0xd66f62
github.com/celo-org/celo-blockchain/node.(*Node).Close(0xc00f1ef790, 0x0, 0x0)
	/home/piersy/projects/celo-blockchain/node/node.go:221 +0x147 fp=0xc006987be8 sp=0xc006987b48 pc=0xd66167
github.com/celo-org/celo-blockchain/test.(*Node).Close(0xc0004fbe40, 0x0, 0x0)
	/home/piersy/projects/celo-blockchain/test/node.go:317 +0x465 fp=0xc006987d20 sp=0xc006987be8 pc=0xdf7345
github.com/celo-org/celo-blockchain/test.Network.Shutdown(0xc01c6091a0, 0x4, 0x4)
	/home/piersy/projects/celo-blockchain/test/node.go:510 +0x6b fp=0xc006987dd8 sp=0xc006987d20 pc=0xdf814b
github.com/celo-org/celo-blockchain/e2e_test_test.TestStartStopValidators(0xc00ff05b00)
	/home/piersy/projects/celo-blockchain/e2e_test/e2e_test.go:171 +0x122d fp=0xc006987f80 sp=0xc006987dd8 pc=0xdfde6d
testing.tRunner(0xc00ff05b00, 0x15fa3a8)
	/usr/local/go/src/testing/testing.go:1193 +0xef fp=0xc006987fd0 sp=0xc006987f80 pc=0x5ce42f
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc006987fd8 sp=0xc006987fd0 pc=0x519201
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:1238 +0x2b3

holiman · 2021-10-04T12:56:13Z

Indeed, the read of dirties during Commit/commit is not RLock-protected!

holiman

I think this is correct. @karalabe PTAL

fjl · 2021-10-08T18:45:56Z

I also think it's correct, but gotta say the locking looks a bit messed-up in trie.Database. The method comment of commit says it's the private 'locked' version of Commit, but that's not true at all. It only really takes the lock while actually writing to the database.

IMHO we should just take the lock once in Commit, it would remove any doubts about correctness here. The lock becoming available intermittently during Commit doesn't seem like a feature to me.

gballet · 2021-10-12T06:49:18Z

trie/database.go

+	db.lock.RLock()
 	node, ok := db.dirties[hash]
 	if !ok {
+		db.lock.RUnlock()
 		return nil
 	}
+	db.lock.RUnlock()


Suggested change

db.lock.RLock()

node, ok := db.dirties[hash]

if !ok {

db.lock.RUnlock()

return nil

}

db.lock.RUnlock()

db.lock.RLock()

node, ok := db.dirties[hash]

db.lock.RUnlock()

if !ok {

return nil

}

Thank you for this @gballet 👍

rjl493456442 · 2021-10-12T08:41:10Z

Btw, the dirty read access in Cap function is also not read lock protected. If we eventually decide to fix it, please fix the Cap too.

karalabe · 2021-10-12T08:43:35Z

This PR doesn't seem right to me.

The trie.Database is meant to be "mutated" sequentially: insert trie nodes; commit, insert trie nodes, commit. This is even mentioned in the type docs:

// Note, the trie Database is **not** thread safe in its mutations, but it **is**
// thread safe in providing individual, independent node access. The rationale
// behind this split design is to provide read access to RPC handlers and sync
// servers even while the trie is executing expensive garbage collection.

If your code crashes, it means that this requirement is violated somewhere. Your crash dump seems to be only half of the dump, it should ideally contain the other goroutine which accessed it at the same time? Perhaps there's something wrong with the shutdown and it doesn't wait for block imports to finish? Is this custom code or stock Geth?

piersy · 2021-10-12T11:51:52Z

Hi @karalabe, the stack trace was from https://github.com/celo-org/celo-blockchain (a geth fork).

Unfortunately golang doesn't guarantee showing both sides of a concurrent map access error in the stack trace (see here) there was only one side in the stack I previously posted.

It may be that the changes in celo-blockchain are violating the use of trie.Database and I can also see some places in the geth code where I think violations are occurring.

For example here commitLoop is executed in its own go routine:

go-ethereum/trie/trie.go

Lines 527 to 555 in 633e7ef

    
           func (t *Trie) Commit(onleaf LeafCallback) (common.Hash, int, error) { 
        
           	if t.db == nil { 
        
           		panic("commit called on trie with nil database") 
        
           	} 
        
           	if t.root == nil { 
        
           		return emptyRoot, 0, nil 
        
           	} 
        
           	// Derive the hash for all dirty nodes first. We hold the assumption 
        
           	// in the following procedure that all nodes are hashed. 
        
           	rootHash := t.Hash() 
        
           	h := newCommitter() 
        
           	defer returnCommitterToPool(h) 
        
           	// Do a quick check if we really need to commit, before we spin 
        
           	// up goroutines. This can happen e.g. if we load a trie for reading storage 
        
           	// values, but don't write to it. 
        
           	if _, dirty := t.root.cache(); !dirty { 
        
           		return rootHash, 0, nil 
        
           	} 
        
           	var wg sync.WaitGroup 
        
           	if onleaf != nil { 
        
           		h.onleaf = onleaf 
        
           		h.leafCh = make(chan *leaf, leafChanSize) 
        
           		wg.Add(1) 
        
           		go func() { 
        
           			defer wg.Done() 
        
           			h.commitLoop(t.db) 
        
           		}() 
        
           	}

And then here in blockchain.Stop trie.Database.Commit is called:

go-ethereum/core/blockchain.go

Lines 1012 to 1065 in 633e7ef

    
           func (bc *BlockChain) Stop() { 
        
           	if !atomic.CompareAndSwapInt32(&bc.running, 0, 1) { 
        
           		return 
        
           	} 
        
           	// Unsubscribe all subscriptions registered from blockchain. 
        
           	bc.scope.Close() 
        
           	// Signal shutdown to all goroutines. 
        
           	close(bc.quit) 
        
           	bc.StopInsert() 
        
           	// Now wait for all chain modifications to end and persistent goroutines to exit. 
        
           	// 
        
           	// Note: Close waits for the mutex to become available, i.e. any running chain 
        
           	// modification will have exited when Close returns. Since we also called StopInsert, 
        
           	// the mutex should become available quickly. It cannot be taken again after Close has 
        
           	// returned. 
        
           	bc.chainmu.Close() 
        
           	bc.wg.Wait() 
        
           	// Ensure that the entirety of the state snapshot is journalled to disk. 
        
           	var snapBase common.Hash 
        
           	if bc.snaps != nil { 
        
           		var err error 
        
           		if snapBase, err = bc.snaps.Journal(bc.CurrentBlock().Root()); err != nil { 
        
           			log.Error("Failed to journal state snapshot", "err", err) 
        
           		} 
        
           	} 
        
           	// Ensure the state of a recent block is also stored to disk before exiting. 
        
           	// We're writing three different states to catch different restart scenarios: 
        
           	//  - HEAD:     So we don't need to reprocess any blocks in the general case 
        
           	//  - HEAD-1:   So we don't do large reorgs if our HEAD becomes an uncle 
        
           	//  - HEAD-127: So we have a hard limit on the number of blocks reexecuted 
        
           	if !bc.cacheConfig.TrieDirtyDisabled { 
        
           		triedb := bc.stateCache.TrieDB() 
        
           		for _, offset := range []uint64{0, 1, TriesInMemory - 1} { 
        
           			if number := bc.CurrentBlock().NumberU64(); number > offset { 
        
           				recent := bc.GetBlockByNumber(number - offset) 
        
           				log.Info("Writing cached state to disk", "block", recent.Number(), "hash", recent.Hash(), "root", recent.Root()) 
        
           				if err := triedb.Commit(recent.Root(), true, nil); err != nil { 
        
           					log.Error("Failed to commit recent state trie", "err", err) 
        
           				} 
        
           			} 
        
           		} 
        
           		if snapBase != (common.Hash{}) { 
        
           			log.Info("Writing snapshot state to disk", "root", snapBase) 
        
           			if err := triedb.Commit(snapBase, true, nil); err != nil { 
        
           				log.Error("Failed to commit recent state trie", "err", err) 
        
           			} 
        
           		}

And blockchain.Stop can be called as a result of geth receiving the SIGTERM signal:

go-ethereum/cmd/utils/cmd.go

Lines 69 to 100 in 633e7ef

    
           func StartNode(ctx *cli.Context, stack *node.Node) { 
        
           	if err := stack.Start(); err != nil { 
        
           		Fatalf("Error starting protocol stack: %v", err) 
        
           	} 
        
           	go func() { 
        
           		sigc := make(chan os.Signal, 1) 
        
           		signal.Notify(sigc, syscall.SIGINT, syscall.SIGTERM) 
        
           		defer signal.Stop(sigc) 
        
           		minFreeDiskSpace := ethconfig.Defaults.TrieDirtyCache 
        
           		if ctx.GlobalIsSet(MinFreeDiskSpaceFlag.Name) { 
        
           			minFreeDiskSpace = ctx.GlobalInt(MinFreeDiskSpaceFlag.Name) 
        
           		} else if ctx.GlobalIsSet(CacheFlag.Name) || ctx.GlobalIsSet(CacheGCFlag.Name) { 
        
           			minFreeDiskSpace = ctx.GlobalInt(CacheFlag.Name) * ctx.GlobalInt(CacheGCFlag.Name) / 100 
        
           		} 
        
           		if minFreeDiskSpace > 0 { 
        
           			go monitorFreeDiskSpace(sigc, stack.InstanceDir(), uint64(minFreeDiskSpace)*1024*1024) 
        
           		} 
        
           		<-sigc 
        
           		log.Info("Got interrupt, shutting down...") 
        
           		go stack.Close() 
        
           		for i := 10; i > 0; i-- { 
        
           			<-sigc 
        
           			if i > 1 { 
        
           				log.Warn("Already shutting down, interrupt more to panic.", "times", i-1) 
        
           			} 
        
           		} 
        
           		debug.Exit() // ensure trace and CPU profile data is flushed. 
        
           		debug.LoudPanic("boom") 
        
           	}() 
        
           }

So it seems that these 2 go routines could be concurrently attempting to modify the trie.Database, commitLoop locks over its call to trie.Database.insert but trie.Commit does not lock before trying to read the dirties map. This seems to be one way that this violation could occur in the current geth code.

Also I just noticed this ticket:

concurrent map read and map write #22892 (which seems to be this issue occurring in stock geth)

fjl · 2021-10-12T13:44:33Z

Regarding BlockChain.Stop, we have already merged another PR that should provide the proper exclusion between updates in Stop and other chain mutations.

@piersy if you found this race during your testing with celo, could you retry with the changes in go-ethereum master? This issue may already be fixed due to the improved synchronization in BlockChain.

piersy · 2021-10-13T09:00:15Z

Hi @fjl I'm assuming you meant #22853, I merged that change into https://github.com/celo-org/celo-blockchain and it does seem to have solved this issue, thank you :)

However it seems that this problem could arise again, all it would take is for some code to try and access some combination methods of trie.Database or trie.Trie concurrently. Given the number of places that trie.Database or trie.Trie are accessed it doesn't seem straightforward to verify that the usage will be safe. What do you think about just making trie.Database thread safe by simply locking over its mutating methods and read locking over its non mutating methods?

karalabe · 2021-10-13T09:08:45Z

Given the number of places that trie.Database or trie.Trie are accessed it doesn't seem straightforward to verify that the usage will be safe.

The only requirement is that you can't execute transactions and commit the state at the same time concurrently. I think that's a legit requirement: one mutates the state while another is saving it. There's no possible scenario where this can meaningfully happen outside of a programming error; so IMHO let it crash, it's a bug that needs fixing.

You also can't just call commit in between two inserts as the ref counts would be invalid (trie node insertions goes from bottom upwards, so you can end up with dangling references until all the trie nodes are inserted). Calling commit in between - even if lock protected - would corrupt the garbage collector's internal state and will lead to losing trie nodes from memory and eventually from disk.

karalabe · 2021-10-13T09:10:22Z

If this issue happens in Geth, sure, we need to fix it, but the fix isn't in the trie database, it's at the callsite.

piersy · 2021-10-13T11:18:39Z

Hi @karalabe thank you for the clarification.

I'm also in favour of crashing for programming errors, but if possible I try to ensure that programming errors are very easy to detect. In this case I ran hundreds of iterations of a test before I encountered the concurrent map modification error. In the case of geth it seems that the programming error responsible for the concurrent map access existed since at least the 18th of May when #22892 was raised.

So I'm wondering is this code safe when it is executed concurrently but the concurrent hash map error is not hit?
Also can you think of a way that this programming error could be made more prominent, so that this programming error does not go unnoticed for a long time?

fjl · 2021-10-13T12:50:07Z

While it would be nice to detect misuse of trie.Database, it isn't really possible. We don't want to add additional locking because the database is really not meant to be mutated concurrently.

I am closing this PR now because it will not be merged in this form. We are very thankful that you pushed us toward fixing the shutdown races with your end-to-end test. Please do let us know if you find similar issues in the future!

piersy mentioned this pull request Oct 1, 2021

Add start stop e2e test celo-org/celo-blockchain#1705

Merged

holiman approved these changes Oct 4, 2021

View reviewed changes

gballet reviewed Oct 12, 2021

View reviewed changes

rjl493456442 added the status:triage label Oct 12, 2021

fjl closed this Oct 13, 2021

ligi removed the status:triage label Oct 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trie: Fix concurrent map access on trie.dirties #23674

trie: Fix concurrent map access on trie.dirties #23674

piersy commented Oct 1, 2021

holiman commented Oct 1, 2021 •

edited

Loading

piersy commented Oct 4, 2021

holiman commented Oct 4, 2021

holiman left a comment

fjl commented Oct 8, 2021 •

edited

Loading

gballet Oct 12, 2021

piersy Oct 12, 2021

rjl493456442 commented Oct 12, 2021

karalabe commented Oct 12, 2021

piersy commented Oct 12, 2021 •

edited

Loading

fjl commented Oct 12, 2021

piersy commented Oct 13, 2021

karalabe commented Oct 13, 2021

karalabe commented Oct 13, 2021

piersy commented Oct 13, 2021

fjl commented Oct 13, 2021

trie: Fix concurrent map access on trie.dirties #23674

trie: Fix concurrent map access on trie.dirties #23674

Conversation

piersy commented Oct 1, 2021

holiman commented Oct 1, 2021 • edited Loading

piersy commented Oct 4, 2021

holiman commented Oct 4, 2021

holiman left a comment

Choose a reason for hiding this comment

fjl commented Oct 8, 2021 • edited Loading

gballet Oct 12, 2021

Choose a reason for hiding this comment

piersy Oct 12, 2021

Choose a reason for hiding this comment

rjl493456442 commented Oct 12, 2021

karalabe commented Oct 12, 2021

piersy commented Oct 12, 2021 • edited Loading

fjl commented Oct 12, 2021

piersy commented Oct 13, 2021

karalabe commented Oct 13, 2021

karalabe commented Oct 13, 2021

piersy commented Oct 13, 2021

fjl commented Oct 13, 2021

holiman commented Oct 1, 2021 •

edited

Loading

fjl commented Oct 8, 2021 •

edited

Loading

piersy commented Oct 12, 2021 •

edited

Loading