Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can‘t connect to ws server in v0.7.0 #500

Closed
huahuayu opened this issue May 21, 2022 · 7 comments
Closed

can‘t connect to ws server in v0.7.0 #500

huahuayu opened this issue May 21, 2022 · 7 comments

Comments

@huahuayu
Copy link

huahuayu commented May 21, 2022

Env

cronos: 0.6.5 and 0.7.0

Issue

Can't subscribe newHead and newTxs:

SubscribeNewHeads / newTxs from golang

dial tcp <my_ip>:8546: connect: connection reset by peer ws://<my_ip>:8546

Subscribe from wscat

wscat -c ws://<my_ip>:8546
error: connect ECONNRESET <my_ip>:8546

Behavior in v0.6.5

You can connect to ws after restarting, it works only 5-10 mins, then you get error and need another restart.

Behavior in v0.7.0

Today I upgrade to v0.7.0, the longest record I have is about 1 hour, within that 1 hour, I can subscribe newHead and newTx, I test it many times, so I thought it is been fixed in v0.7.0.

But when I tried again just now, I can't connect ws server anymore. Even restarting cronosd is useless.

The issue is still there.

@yihuang Please help to check, thx.

@huahuayu huahuayu changed the title can connect to ws server in v0.7.0 can‘t connect to ws server in v0.7.0 May 21, 2022
@huahuayu
Copy link
Author

What's the problem do you think, I am willing to dig into the issue, please share your findings.

@huahuayu
Copy link
Author

My findings

In tendermint/state/txindex/indexer_service.go use unbuffered channel for blockHead and tx subscribe, which may block the channel

	blockHeadersSub, err := is.eventBus.SubscribeUnbuffered(
		context.Background(),
		subscriber,
		types.EventQueryNewBlockHeader)
	if err != nil {
		return err
	}

	txsSub, err := is.eventBus.SubscribeUnbuffered(context.Background(), subscriber, types.EventQueryTx)
	if err != nil {
		return err
	}

In tendermint/libs/pubsub/pubsub.go I do observed send event msg get blocked, the logic goes to --> mark.

func (state *state) send(msg interface{}, events map[string][]string) error {
	for qStr, clientSubscriptions := range state.subscriptions {
		q := state.queries[qStr].q

		match, err := q.Matches(events)
		if err != nil {
			return fmt.Errorf("failed to match against query %s: %w", q.String(), err)
		}

		if match {
			for clientID, subscription := range clientSubscriptions {
				if cap(subscription.out) == 0 {
					// block on unbuffered channel
-->					subscription.out <- NewMessage(msg, events)
				} else {
					// don't block on buffered channels
					select {
					case subscription.out <- NewMessage(msg, events):
					default:
						state.remove(clientID, qStr, ErrOutOfCapacity)
					}
				}
			}
		}
	}

	return nil
}

Solution

I changed blockHeadersSub, err := is.eventBus.SubscribeUnbuffered and txsSub, err := is.eventBus.SubscribeUnbuffered to buffered channel, so far so good, let me keep observing for a while.

@huahuayu
Copy link
Author

Three days passed, still works. @yihuang

@yihuang
Copy link
Collaborator

yihuang commented May 24, 2022

Three days passed, still works. @yihuang

awsome, so the issue is dead lock on unbuffered channel? Can you open a PR to tendermint directly?

@huahuayu
Copy link
Author

I don't know how to effectively reproduce the issue and am not sure if there are other side effects.

@JayT106
Copy link
Collaborator

JayT106 commented May 24, 2022

Hi @huahuayu, I think the block on unbuffered channel was designed for the indexer services in Tendermint, it guaranteed that every event will be processed to the indexer. If the indexer has a heavy I/O loading, it will blocks the pubsub module temporarily for sure.

What's your experimental_websocket_write_buffer_size and experimental_subscription_buffer_size in config.toml?
it shouldn't be 0 then you will get a buffered channels subscription.

Do you need to use indexer service from the node? maybe you can set it to null, and to see if this issue still happens.

@yihuang
Copy link
Collaborator

yihuang commented Sep 27, 2022

I think ws server issue will eventually be fixed by this solution:#665

@yihuang yihuang closed this as completed Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants