-
-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network in containers breaks under bigger network load #140
Comments
+1 I'm seeing the same thing. Same colima, lima and qemu versions. Mac 12.1 Monterey
I tried downgrading to colima 0.3.1 and it didn't seem to help. I did a |
This seems to be related too #137 |
I did more investigation, but I couldn't find the root cause. So far it seems that the problem lies in Lima or QEMU itself. I could reproduce it on machines running raw Lima images, without Colima. I found two issues in Lima repo that seem describing the very same problem: |
I believe the problem is not inside Docker or containers, maybe even not in Co(lima)/qemu... Some additional information: Steps to reproduce:
I receive lots of errors like
If last command is run on the host, everything is good:
Before MacOS upgrade (12.0.1 -> 12.2) running P.S. In Docker problem is the same, for example:
|
Another note: Rancher Desktop 1.0.0 works without problems on the same machine, when I run docker command:
|
Hi, When pulling multiple images at the same time with Great if the problem could be addressed soon.
|
That looks like a problem with Alpine itself. I did the same kind of configuration that lima does using yaml file straight to limavm and the results are the same. That is not reproducible with Ubuntu using the yaml and loading docker as provisioning $ colima version 0.3.2 runtime: docker $ limactl --version $ qemu-img --version |
I was having this issue and was able to work around it by adding the following to
|
If that's the case, user-configurable dns can be added to the next version. |
not working on my setup :( |
I'm working on a proper fix for configurable dns on each startup. |
The problem doesn't seem to be only DNS related. As author described, after VM gets into a "bad" state, ping/connection by IP doesn't work either. As I wrote before, rancher desktop (which also uses alpine lima images under the hood) works fine on the same machine. Maybe we can try to use the same images? I am ready to experiment with new versions, just need some guidance. I have M1, M1Pro and Intel Macs at my disposal to test it. |
This works for me. |
I got some time to try this with my previous |
I actually got a similar experience with Rancher Desktop. It seems to be something specific to Alpine as I could not reproduce with an Ubuntu image. I am still troubleshooting and would prefer not to ditch Alpine. |
I take that back, it looks to be specific to Lima as I have reproduced it with multiple distros. |
I think it is more related to this lima-vm/lima#561. |
I'm seeing this as well, coming up while running |
Hm, well I had thought it was a problem only once or twice day. However, I just tried starting up a PHP project and
|
@deviantintegral Networking may be broken, but ICMP doesn't really work over the slirp network, so
|
I dug deeper into this issue I have been able to work around it within lima using PTP based networking as reported in lima-vm/lima#724. It would be nice to able to make this all work seamlessly without manually managing the colima template or the One half solution is to add in ---
networks:
- vnl: "/tmp/vde.ptp"
switchPort: 65535 To inject the PTP network into the colima image without changing the template, but it will require manually starting the |
If this provides the best results so far, Colima can be updated to handle this. |
With this workaround, I was able to get the desired result with this test #140 (comment). I will keep an eye on the upstream issue. And in the meantime I will look at implementing this workaround in Colima. |
Just an FYI that I have made notable progress with this. Going with PTP based networking (thanks @elventear) minimised the dependencies required to only vde_vmnet. It then turned out easy to bundle with Colima due to its small size. In addition to fixing this issue (hopefully finally), all VMs also get IP addresses that are reachable from the host, which then fixes #189, #97, #71 and provides a workaround for #135. |
@abiosoft I have noticed that after running for a while echo 'NO_GATEWAY="eth0"' >> /etc/udhcpd.conf Currently testing this. |
@abiosoft the configuration setting didn't seem to enough (I am not familiar at all with alpine). Looking deeper, it seems the configuration happens via |
@elventear what I do notice is that the default route gets reset on startup. |
@abiosoft I think I got it. The issue seems to be when the DHCP client refreshes the connection (as echo 'NO_GATEWAY="eth0"' >> /etc/udhcpc/udhcpc.conf Having this in my provision script seems to be more robust: mkdir -p /etc/udhcpc
touch /etc/udhcpc/udhcpc.conf
if ! grep -q 'NO_GATEWAY' /etc/udhcpc/udhcpc.conf > /dev/null; then
echo 'NO_GATEWAY="eth0"' >> /etc/udhcpc/udhcpc.conf
fi
kill -s SIGUSR2 $(cat /var/run/udhcpc.eth0.pid) # force DHCP release
kill -s SIGUSR1 $(cat /var/run/udhcpc.eth0.pid) # force DHCP reconfigure No need to delete the default route explicitly, You can test that things are correctly configuring from the shell doing The most elegant solution though would be to have |
@elventear kindly install the current development version with Thanks. |
I just gave this a go with HEAD-5e2e413 and initially got the following output during
After reviewing the generated
Edit: See #140 (comment). I had a custom DNS is still using the user mode network which I have found to be unreliable with some DNS-heavy loads, even when all other traffic is routing via lima0. I’m using the following useHostResolver: false
dns:
- 192.168.106.1 With a couple more optional tweaks I can also get direct IP access to containers from the host: sudo route -n add -net 172.17.0.0/16 192.168.106.2
colima ssh -- sudo iptables -A FORWARD -i lima0 -j ACCEPT The following in "default-address-pools": [
{
"base": "172.17.0.0/16",
"size": 24
}
] |
@jasoncodes thanks for troubleshooting that. Are you on Intel or M1 mac? |
I was testing on an Intel Mac running macOS 12.3. I’ll give it a go on an M1 soon. |
@jasoncodes with regards to your scenario, can you kindly answer the following.
Thanks. |
Ah, I see what’s going on. I had manually created a I wonder if it’d be worth outputting a warning if this file exists without an entry for |
Yeah, if the file does exist it should be checked for the entry, and if the entry is missing it should be appended. |
I have installed colima from the latest I have some concerns about the privilege setup, this what I have noticed:
While convenient to embed the tools with the application, I personally use MacPorts where the distribution itself will manage the dependencies for you and also setup things in a more locked down manner. I am wondering if it would be possible to provide an explicit way to manage the installation of the dependencies so that colima doesn't install them but use something that is provided externally. |
Also, I think IPv6 is broken in the container (I haven't dug the root cause), but if you have IPv6 it can mess things up for you. In my workaround, I just disable IPv6 until it can be solved why routing is not working properly: sysctl -w net.ipv6.conf.all.disable_ipv6=1
sysctl -w net.ipv6.conf.default.disable_ipv6=1 |
I gave this a go just now using arm64 Homebrew and it fails with the following:
Looking at the |
Oh my! how did I miss this 🙈 . Can you build from source @jasoncodes? If yes, I would appreciate if you can assist with testing directly on a development branch before getting it into main. |
Yes, I’m more than happy to test any development branches you may have. Looking forward to having a release with built-in support for VDE networking. Thanks for your great work. :) Aside: Is there a documented uninstall process anywhere? Prior to this a |
This should be fixed now. You can give it another try on your M1 device.
Yeah, I have considered this as well. |
Looking good on M1 now. 👍 |
Thanks for taking note. These will be tightened and limited to privileged user, should be
It is primarily for convenience since vmnet can only be started by a privileged user. I am very much open to better ideas around this.
Embedding in this case is a decent option as it provides a consistent experience without adding notable size overhead. Just to clarify, I am not against external dependencies. Lima, Qemu are dependencies as well after all. But I think vde_vmnet is better embedded. |
May I enquire as to why The advantage of this is that you can SSH into a machine (such as the M1 machine I am testing on :)) without having a graphical login. I just double checked spinning up a VM directly with Lima using shared networking over SSH and it works well. Colima expectedly fails to start the launch agent. |
@jasoncodes launchd is used mainly to keep it as background running process. I can borrow from the approach used by Lima or find a way to tie it to the qemu process. Thanks, your feedbacks have been helpful. |
This should be fixed by now |
Agreed, I haven't had any trouble over the past few weeks running code from |
Network breaks in containers when they start multiple network connections at the same time.
I noticed this behaviour e.g. during downloading Python dependencies. When multiple packages are downloaded at the same time I start getting
Network is unreachable
error. Then when I login to the underlying QEMU machine (limactl shell colima
) I can see that it can't reach any network address. I cannot even ping 8.8.8.8. My host computer doesn't have any connection issues.It gets better after few moments of inactivity. Restarting QEMU machine (
colima stop && colima start
) fixes the network, but the problem comes back when I increase the network load.This is a problem that I can consistently reproduce. I created a minimum setup to demonstrate it: https://github.com/mjkonarski-b/colima-poc
I experience that problem on multiple Macbooks, so it doesn't seem to be related to any particular processor or macOS version:
The text was updated successfully, but these errors were encountered: