-
Notifications
You must be signed in to change notification settings - Fork 20.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I find a bug in event/feed.go #18021
Comments
I think the send lock is correct because only one send should be running at any time. Send blocks until all channels have received the value. |
But If one send is blocked,all the next send would be blocked.And I do not understand why the send should run in sequence but not in parallel. |
I have modify the send function as follow,it can solve the blocked problem.The code is as follow: func (f *Feed) Send(value interface{}) (nsent int) {
rvalue := reflect.ValueOf(value)
f.once.Do(f.init)
//<-f.sendLock
sendCases := caseList{{Chan: reflect.ValueOf(f.removeSub), Dir: reflect.SelectRecv}}
sendCases = append(sendCases, f.inbox...)
// Set the sent value on all channels.
for i := firstSubSendCase; i < len(sendCases); i++ {
sendCases[i].Send = rvalue
}
// Send until all channels except removeSub have been chosen. 'cases' tracks a prefix
// of sendCases. When a send succeeds, the corresponding case moves to the end of
// 'cases' and it shrinks by one element.
cases := sendCases
//LOOP:
for {
// Fast path: try sending without blocking before adding to the select set.
// This should usually succeed if subscribers are fast enough and have free
// buffer space.
for i := firstSubSendCase; i < len(cases); i++ {
if cases[i].Chan.TrySend(rvalue) {
nsent++
cases = cases.deactivate(i)
i--
}
}
if len(cases) == firstSubSendCase {
break
}
// Select on all the receivers, waiting for them to unblock.
chosen, recv, _ := reflect.Select(cases)
//<-f.removeSub
if chosen == 0 {
index := f.sendCases.find(recv.Interface())
f.sendCases = f.sendCases.delete(index)
if index >= 0 && index < len(cases) {
// Shrink 'cases' too because the removed case was still active.
cases = f.sendCases[:len(cases)-1]
}
} else {
cases = cases.deactivate(chosen)
nsent++
}
}
// Forget about the sent value and hand off the send lock.
for i := firstSubSendCase; i < len(f.sendCases); i++ {
f.sendCases[i].Send = reflect.Value{}
}
//f.sendLock <- struct{}{}
return nsent
}_ The pre send maybe blocked,but it could not block the next send.And I make the public access as less as possible. |
The way to call the send should also very careful.Let me see the function in blockchain.go: func (bc *BlockChain) PostChainEvents(events []interface{}, logs []*types.Log) {
log.Info("lzj-log PostChainEvents", "events len",len(events))
// post event logs for further processing
if logs != nil {
bc.logsFeed.Send(logs)
}
for _, event := range events {
switch ev := event.(type) {
case ChainEvent:
log.Info("lzj-log send ChainEvent")
bc.chainFeed.Send(ev)
case ChainHeadEvent:
log.Info("lzj-log send ChainHeadEvent")
bc.chainHeadFeed.Send(ev)
case ChainSideEvent:
log.Info("lzj-log send ChainSideEvent")
bc.chainSideFeed.Send(ev)
}
}
} This function send three event in the for loop.If the pre send is blocked,the function would be blocked,and the next send would never be sent.We can modify the function as follow: func (bc *BlockChain) PostChainEvents(events []interface{}, logs []*types.Log) {
log.Info("lzj-log PostChainEvents", "events len",len(events))
// post event logs for further processing
if logs != nil {
bc.logsFeed.Send(logs)
}
for _, event := range events {
switch ev := event.(type) {
case ChainEvent:
log.Info("lzj-log send ChainEvent")
go bc.chainFeed.Send(ev)//lzj add go
case ChainHeadEvent:
log.Info("lzj-log send ChainHeadEvent")
go bc.chainHeadFeed.Send(ev)//lzj add go
case ChainSideEvent:
log.Info("lzj-log send ChainSideEvent")
go bc.chainSideFeed.Send(ev)//lzj add go
}
}
} I just put the send function in a separate go routine.so,the event would be sent all. |
@liuzhijun23 Can you make a patch or PR then everyone can confirm your fix. Thank you! |
|
yes, I can reproduce sometimes but still dont know what is the reason. I try to fix the problem as your code but it didnot work properly. Can you make a pull request to summary your fix? @liuzhijun23 Thank you! |
All my code change is shown above.The normal node maybe produce three errors:1,Do not broadcast transaction to the miner.2. Do not synchronize block from miner. 3. Do not update pendding transactions state.The reason of all these problem is that the Feed event blocked.You can add logs in the source codes where you have doubt to trace the program execute.You can find the reason. @hadv |
This is one of my problem resolve records in Chinese: |
Thank you @liuzhijun23 I will investigate more. Even I use separate goroutine for sending events in the PostChainEvents() but I still face the problem when all the network cannot mine any new block. They just propagate the tnx between each others and the new block only be mined when the txpool reach the maximum pool size. |
@hadv I use POA consensus and I set the block period 5s.The block would always be mined with 5s period. |
I'm also using POA but with block period 2s. Yes, normally network will mint new block every ~2s but when making the loading there's a case that I mentioned above at which all the network stuck at a block number and only can continue mint new block when the total pending tnx reach txpool size or when stoping sending more tnx. it seems that all the node busy with tnx execution and postpone the block mining. I think there's bugs somewhere. |
tracing the geth log when the network is stuck, I found that all the miners re-submit mining work continuously at the same block number that means new block cannot be created recommit block every 3s by default due to this PR #17413
|
Are there many miners in your blockchain?You can make only one miner node and one normal node in your blockchain to make it simple. |
yes, I'm running on 54 miners. There should be problem make the miner stuck in |
You can make it simple first to locate the bug. |
@liuzhijun23 your fix still make goroutine leak on this line as I tested on my private chain
|
The original issue is about event.Feed. The design of event.Feed is to send values sequentially like a go channel, but to all subscribers. If mining gets stuck for concurrency reasons, please open a new issue. |
@liuzhijun23 我也遇到了同样的问题,但是我用了你的代码修改了,但是还是不起作用。 |
transaction getting stuck in tx mining pool for more than 1/2 hour , its a private blockchain i am running, and a single node im doing mining, please help me |
try to use |
I read the resource code of event/feed.go in go-ethereum Version 1.8.17,I find there is a bug in Send function。Let me see the code about sendLock:
`func (f *Feed) Send(value interface{}) (nsent int) {
.........
f.once.Do(f.init)
//1. read sendLock chan
<-f.sendLock
}`
The bug is ,if the for loop is blocked,the step 3 would not excute, so the sendLock would result to
the next send operation blocked.Maybe there is no need to use sendLock chan in this function.
The text was updated successfully, but these errors were encountered: