-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Websocket timeout issues #2026
Comments
How big do you expect the returned data-set to be? in megabytes? |
Changing this timeout to |
Hmm...that look sketchy! I will investigate |
We're going to make a change so that the 3 second timeout will only happen during unit tests. Websocket connections from the loopback address will be 30 seconds. |
Sounds good to me. |
Looks like the same issue crops up more sporadically, one or two times a day:
|
Is it because you are requesting data that takes more than 30 seconds? Try changing the timeout to 5 minutes? |
After the first 10 or 15 seconds at startup, I've done all the |
So could this be that... it actually timed out due to inactivity? |
Well, my code got disconnected last time around ledger 27881937. Doesn't seem like there was any large gaps between ledgers around that time, so the stream data should have been been resetting the timeout I assume. |
If a client goes idle, rippled should issue a websocket ping every so often to keep the connection alive. We are looking into it. Probably a bug in the server. |
Try giving this a spin? https://github.com/vinniefalco/rippled/tree/0.50.3 Thanks |
|
Sorry about that, Visual Studio strikes again. I've fixed the error (hopefully) waiting on Travis now. The branch has been rewritten. |
Seems to close the connection even more aggressively now (both connections with in around 5 seconds) . Can you replicate my original issue at your end using 0.50.2? |
I know this one works |
This one also works, tested it using a browser script. It stays connected even when idle I am not sure what your issue is. It might be something other than an idle timeout. |
I tried your 0.50.3 tag and it disconnected more quickly as described. Not sure a browser implementation of a websockets client is going to cover all bases. In my bot, using Go, I'm also pinging every 60 seconds: https://github.com/rubblelabs/ripple/blob/master/websockets/remote.go#L552-L558 This is a regression as 0.40 could keep running for weeks/months at a time :-) If you need a Go repro I can knock one together tomorrow, but you'd have to install Go to build it. |
Well 0.40.0 used websocketpp while 0.50.0 uses Beast. Its not surprising that there's a bug. We'll figure it out! |
Dat's da moonshine!
|
@donovanhide is your Go websocket client responding to pings (I see that it sends them but does it also reply)? rippled will not reset its timer just because it hears a ping. It only resets the timer if it gets a pong in response to a sent ping. |
Default behaviour is to reply to all pings with a pong: https://godoc.org/github.com/gorilla/websocket#hdr-Control_Messages I'll try and get a barebones repro built. |
See #2032 |
Is this fix not going to be in 0.60? |
There were two versions of the fix. A good one, and a great one. The great one includes improvements to Beast that allow concurrent ping and write operations. Its being sorted out so it can go into some version. |
The fix Vinnie describes will go into 0.70.0. |
I'm still getting intermittent disconnections with 0.70.0. This sounds like other people experiencing the same: https://www.xrpchat.com/topic/3341-websocket-closing-ripple-lib |
@donovanhide Were you testing with 0.70.0-b2? The fix should be 15f969a. |
Yep, can't paste in the git log, but that's the exact commit I'm using. Gaps are longer between disconnections, but still occurs. |
"closing slow client" means the Websocket write queue filled up. Writing to the socket outperformed sending. We currently impose a limit of 100 queue items before considering the client too slow and disconnecting. |
Seems a bit aggressive if the plan is to process 1000 tx/sec :-) |
@donovanhide I agree, something ain't right there. The server implementation needs to be redesigned anyway. But in this case i think you are really too slow, it means that the other end is not reading the data fast enough. You should be able to read on your end fast enough to keep the queue empty. |
My code has been happily running for over two years, so not sure I want to optimise it because rippled changed its websocket library. |
@donovanhide That is the limit of a single Websocket write queue. It's unusual for a queue to get beyond a few items in my experience. I can work with you to figure out why it happened. @vinniefalco Another possibility is the server upstream bandwidth was momentarily saturated. |
If rippled queues more than 100 items at a time, it would cause that problem. @donovanhide Are you running your own server? You could increase the queue limit to 10,000 or remove it completely as a test. |
Sounds like it should be a configurable. My code is the only client of that server. I need to add each item of metaData that I receive into my version of the ledger state. I could buffer them all and then process them, but seems like quite a lot of work because the number 100 was hard-coded. |
@donovanhide How do you prevent a memory exhaustion attack where someone subscribes to a million streams and then reads only one byte at a time per second on the connection? |
Here's a typical snapshot of my time budget for each ledger:
|
In that situation, you leave the send queue limit to a low value. For private servers, it's not an issue. |
Or you put an nginx websocket proxy in front of it and configure the rate limiter :-) |
Oh I like that much better |
@donovanhide Can you please try a large queue limit size to see if it eliminates the problem? It might be a good idea to also log the max size the queue grows to. |
Just to be clear, a rate-limited nginx websocket proxy is a solution to attack prevention, not my slow transaction stream message reading problem :-) |
@donovanhide Thank you. I will replay that ledger and reproduce on my end. What does your RPC subscription look like? |
|
Actually just reviewed my code (it's been a while...) and seen that I am buffering all received messages in the transaction stream for a ledger before processing them, which suggests that the typical time of around 70ms to receive a typical ledger is not down to what my code is doing, as far as I can see. Would be interesting to see how long it takes for you to receive a full ledger's worth of transactions. That is, examine TxCount in the ledger message, start timer, receive TxCount transaction messages, stop timer. Ledger 28,696,629 was another biggie :-) |
Yes, I get that. I would rather rippled focus on what it does best, implementing RCL, and offload security solutions to other products that specialize in it. I don't think rippled can compete with nginx when it comes to offering versatility for managing network resources (nor would I want it to). |
Now it seems stable. My bots lost around 100k XRP...but fine...maybe I should stop this "business" :) |
Mmmm...it is not true...it is not stable. |
@tuloski Are you using you running rippled? If so, which version and what is it hosted on? |
No I'm using Ripple's public servers. |
What does your RPC subscription look like? |
I'm using Ripple-lib... |
@tuloski I would like to figure why you are getting disconnected. Would you please share with me the relevant sections of your JS script so that I may use them to debug on my end? I can be reached at |
Well it's Ripple-lib 0.12.6: https://github.com/ripple/ripple-lib/tree/0.12.6 |
This is what I'm getting: Error: not opened The servers I'm connected are: wss://s1.ripple.com:443, wss://s-west.ripple.com:443 |
@tuloski What error message or code do you receive when you are disconnected? |
I don't know...it doesn't report the error. |
With 0.50.2 repeated calls to
ledger_data
whilst also streaming theledger
andtransaction
streams leads to timeouts in the Server code in rippled withlog_level
set totrace
:Problem is not seen in 0.40.
The text was updated successfully, but these errors were encountered: