-
Notifications
You must be signed in to change notification settings - Fork 213
Connection reset by peer during long knife solo cook #459
Comments
Have you tried ssh-level keepalive settings? For example I keep this in my
I suspect it's not knife-solo in particular but just any long-lived idle ssh connection would get cut in your setup. |
Thanks for the quick reply and suggestion. Even after trying out above options, it produces exact same behavior. |
Should be fine to increase, though it may not solve the problem. The python scientific stack should have fairly recent versions available in package repos for major OSes. Or if you need your own you can build them once and package using a tool like https://github.com/jordansissel/fpm If you can install via a pre-compiled package the provisioning should be a lot faster which may avoid the need to keep SSH open & idle. Finally, if that's not an option, you may want to investigate the server's sshd settings (e.g., that TCPKeepAlive is turned on, and ClientAlive* aren't set too short). And finally any firewall settings between you and the server (since one commenter talks about it here). |
I don't have experience with either, but in your opinion which option would work better from scalability/flexibility perspective? a. packer image: perform pip install on an AMI, save AMI and use it for further deployment via chef-solo |
I tend to prefer a combination of both. The process goes like.
Though I see many people skip step 4 & 5 since many of the rewards of AMI-based deployments only come once you're using autoscaling groups. And 5 requires additional infrastructure. |
I have long running knife solo script (20-30 minutes), which performs following installation on AWS instance.
ERROR: Errno::ECONNRESET: Connection reset by peer - recvfrom(2)
knife solo bootstrap ubuntu@<ip_address> nodes/.json
The underlying recipe installs necessary python infrastructure from scratch
a. numpy
b. pandas
c. scipy
d. matplotlib
On target machine cc1/cc1plus is running at > 90% CPU - runs for roughly 20+ minutes
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27230 ubuntu 20 0 633540 584520 9440 R 98.5 57.5 0:45.89 cc1plus
It always results into connection reset issue, after re-running it completes successfully.
I see that in one of the thread suggestion was to "use proxy settings". I am not sure, whether it applies here as I am running these scripts from northeast US and my AWS instance is running in US. East as well.
I would appreciate any recommendation on alternative approach/settings to avoid such issue.
The text was updated successfully, but these errors were encountered: