-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
accept4: too many open files; retrying in 1s #2621
Comments
got this error this morning, even after set ulimit -n 65000. our RPC are Arb1 and alchemy. node running on Ubuntu
|
I'm not sure which connection it is exactly, but the server seems to not be reusing some of these connections properly. Increasing the limit might work but feels like a temporary bandaid until there's enough single shot requests to also reach that limit. You can check the amount of open files next time the error occurs by running
Get the livepeer
|
don't know if that could be related, but just before this error appear, i got this error on 2 of our nodes (combined O/T) so i decided to split them to keep the O running if the error was from the T, after that i started to be hit by the "too many open files error" on the standalone O. |
This error first occured for me on Friday. Running opensuse 15.3, my system file limit is ample but my user-level file limit was only 1024, however after upping this to 70k+ and restarting the service in a new shell I still get the same error crop up within 24 hours. Interestingly, each time I've run a count of open files mentioning livepeer with lsof -u root | grep livepeer | wc -l and it always seems to be around the 1000+ mark. Could there be a per-process file limit as well? Anyhow still doesn't explain why it's suddenly started hitting any limits, and to so many other O's at around the same time. |
Here's what I believe happens:
The simple solution is to add the timeout and I added it in #2628. Still, technically, you will be able to cause a DoS attack if you send |
I think I might have accidentaly caused this issue: I am using a nodejs grpc library to collect O performance and capabilitites (by simply doing Hopefully the timeout is enough, but it is concerning that such a simple script would cause O's to stop functioning. I can imagine that in the future as more B's start experimenting with the network this issue will resurface. Is there any way we can pace the amount of sessions a B can create and keep open? |
Thanks for the surprise stress test @stronk-dev 😄 I was actually discussing with @leszko and @yondonfu whether this was likely caused by someone doing something malicious or accidental! Agreed that something like this shouldn't make Os stop functioning |
Yeah my goal was to give O's a reliable way to gauge their response times in each region and compare this with other O's. There are some weird results with response times varying a lot for some O's, so I tried to fix this by forcing a new client to be created as I suspected they were being reused Apologies to al the O's which were affected by this |
Describe the bug
A couple reports of this error happening putting the node into a frozen state.
Desktop (please complete the following information):
Don't know how to reproduce but 2 of my nodes have had this error in 12 hours and other reports also. No recent updates or upgrades.
May community node RPC endpoint issue?
The text was updated successfully, but these errors were encountered: