-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os, internal/poll, runtime: how to use /dev/net/tun on Linux #30426
Comments
So this mostly reverts the switch to Sysconn for Linux. Issue: golang/go#30426
The way that netpoll uses |
Just to be sure, can you please confirm that:
|
Yes.
|
Using the same flags as Go's usage as epoll, I'm able to reproduce this in C. Here's the working blocking case as a baseline: #include <sys/types.h>
#include <sys/epoll.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <net/if.h>
#include <linux/if_tun.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char buf[2000];
ssize_t len;
int tunfd, ret;
struct ifreq ifreq = {
.ifr_name = "cheese",
.ifr_flags = IFF_TUN
};
tunfd = open("/dev/net/tun", O_RDWR);
if (tunfd < 0) {
perror("open(/dev/net/tun");
return 1;
}
ret = ioctl(tunfd, TUNSETIFF, &ifreq);
if (ret < 0) {
perror("ioctl(IFF_TUN)");
return 1;
}
system("ip link set up cheese && ip a a 192.168.9.2/24 dev cheese");
popen("ping -c 4 -f 192.168.9.1; ip link set down cheese; ip a f dev cheese", "r");
while ((len = read(tunfd, buf, sizeof(buf))) >= 0)
printf("Read %ld bytes\n", len);
return 0;
} Here's the broken epoll case: #include <sys/types.h>
#include <sys/epoll.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <net/if.h>
#include <linux/if_tun.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char buf[2000];
ssize_t len;
int tunfd, efd, ret;
struct ifreq ifreq = {
.ifr_name = "cheese",
.ifr_flags = IFF_TUN
};
struct epoll_event event = {
.events = EPOLLIN | EPOLLOUT | EPOLLRDHUP | EPOLLET
};
tunfd = open("/dev/net/tun", O_RDWR);
if (tunfd < 0) {
perror("open(/dev/net/tun");
return 1;
}
ret = fcntl(tunfd, F_GETFL);
if (ret < 0) {
perror("F_GETFL");
return 1;
}
ret = fcntl(tunfd, F_SETFL, ret | O_NONBLOCK);
if (ret < 0) {
perror("F_SETFL");
return 1;
}
efd = epoll_create1(0);
if (efd < 0) {
perror("epoll_create1");
return 1;
}
ret = epoll_ctl(efd, EPOLL_CTL_ADD, tunfd, &event);
if (ret < 0) {
perror("epoll_ctl");
return 1;
}
ret = ioctl(tunfd, TUNSETIFF, &ifreq);
if (ret < 0) {
perror("ioctl(IFF_TUN)");
return 1;
}
system("ip link set up cheese && ip a a 192.168.9.2/24 dev cheese");
popen("ping -c 4 -f 192.168.9.1; ip link set down cheese; ip a f dev cheese", "r");
for (;;) {
len = read(tunfd, buf, sizeof(buf));
if (len < 0 && errno == EAGAIN) {
ret = epoll_wait(efd, &event, 1, -1);
if (ret < 0) {
perror("epoll_wait");
return 1;
}
continue;
}
if (len < 0)
break;
printf("Read %ld bytes\n", len);
}
return 0;
} |
Interestingly, it appears that removing |
Sounds like you need to find out some good way to accommodate either a) blocking I/O w/ level-triggered notification, or b) non-blocking I/O w/ edge-triggered notification; the current runtime-integrated network poller is designed for just the latter. If marking a tun/tap device file with non-blocking does make it possible to work together with the current runtime-integrated network poller, well, it's unlikely, tun_ring_recv in drivers/net/tun.c always returns EAGAIN when the argument noblock is true. A naive fix might be to make the epoll registration adaptive by referring to the target file capability for non-blocking I/O. |
Are we reading the same source? It returns 0 and with the buffer if noblock is true and a buffer is available. Otherwise it returns EAGAIN: static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
{
DECLARE_WAITQUEUE(wait, current);
void *ptr = NULL;
int error = 0;
ptr = ptr_ring_consume(&tfile->tx_ring);
if (ptr)
goto out;
if (noblock) {
error = -EAGAIN;
goto out;
}
//[...]
out:
*err = error;
return ptr;
} |
If |
A workaround: https://play.golang.org/p/iu6ayVT3Yfe |
@mikioh Is that workaround remotely safe to do? That is, if an fd is in netpoll, and then you manually twiddle it to be nonblocking, won't netpoll tweak out? Or just become really inefficient? Generally the epoll ET pattern is something like: for (;;) {
while ((ret = read(fd, ...)) >= 0)
...
if (ret < 0 && errno == EAGAIN)
epoll(efd, ...);
} If you put the fd into blocking mode, the reads will just block forever, and so it'll never return EAGAIN and epoll basically won't be used. This sounds like in theory it would make cancellation very difficult, since that read(fd) call just hangs there until a packet comes in. And if Go thinks it can epoll, it might not spawn a thread for the blocking call, which could then starve other Go routines. Is this analysis correct? Or does Go somehow use epoll internally in a way that makes ET+blocking acceptable? |
You had some comments the other day about this working, then not working, on the BSDs, but I can't find them now for some reason. What was the verdict of that? In my quick trials with code similar to OP, I was able to Close() the file from one go routine and have the read canceled in the other. I thought this was decent enough indication things were working fine on the BSDs. From further inspecting what's going on, though, it looks like all the BSDs examine the file descriptor and then might actually wind up disable polling under certain conditions. Are we hitting these conditions? But if that's the case, why does the cancellation appear to work? |
It looks like your workaround code actually doesn't work at all. Extend that timeout from 3 seconds to 10 seconds, so that there's time for the broadcast packet stuff to stop happening. That way the file is actually closed during a period when there isn't new data. Then, you'll see the same hang that we had in Go 1.11, which I'm forced to solve with this monstrosity. |
[I deleted my previous comments mentioning BSDs because I was confused a bit, sorry.] I skimmed Linux kernel code a bit and realized that the byte sequence (or character) interface on tun device doesn't support epoll, as your example code displays that the first epoll_pwait always returns EPOLLERR regardless of EPOLLET or blocking/non-blocking I/O; see tun_chr_poll in drivers/net/tun.c. I expected vfs_poll in fs/eventpoll.c to handle poll-capable stuff well but tun_chr_poll returns EPOLLERR for non-NETREG_REGISTERED devices, /dev/net/tun device files. Right now, I have no good idea to accommodate such stuff like poll-capable but non-epoll capable device files. So, a workaround would be to have own poll for such devices files: https://play.golang.org/p/D3B8KBeW10y PS: On BSD variants, the tun or similar software interfaces are well integrated with kqueue, so that's the reason I was confused initially, sorry for the confusion. |
Please see #22939 |
Oh, nice; that means that calling ioctl w/ IFF_XXX makes the device file NETREG_REGISTERED? |
Nice observation. This seems to work correctly: package main
import "log"
import "os"
import "unsafe"
import "time"
import "os/exec"
import "sync"
import "golang.org/x/sys/unix"
func main() {
tunfd, err := unix.Open("/dev/net/tun", os.O_RDWR, 0)
if err != nil {
log.Fatal(err)
}
var ifr [unix.IFNAMSIZ + 64]byte
copy(ifr[:], []byte("cheese"))
*(*uint16)(unsafe.Pointer(&ifr[unix.IFNAMSIZ])) = unix.IFF_TUN
_, _, errno := unix.Syscall(
unix.SYS_IOCTL,
uintptr(tunfd),
uintptr(unix.TUNSETIFF),
uintptr(unsafe.Pointer(&ifr[0])),
)
if errno != 0 {
log.Fatal(errno)
}
unix.SetNonblock(tunfd, true)
fd := os.NewFile(uintptr(tunfd), "/dev/net/tun")
wait := sync.WaitGroup{}
wait.Add(1)
go func() {
var err error
c := exec.Command("sh", "-c", "ip link set up cheese && ip a a 192.168.9.2/24 dev cheese")
c.Start()
c.Wait()
exec.Command("sh", "-c", "ping -c 4 -f 192.168.9.1; ip link set down cheese; ip a f dev cheese").Start()
b := [2000]byte{}
for {
var n int
n, err = fd.Read(b[:])
if err != nil {
break
}
log.Printf("Read %d bytes", n)
}
log.Print("Read errored: ", err)
wait.Done()
}()
time.Sleep(time.Second * 15)
log.Print("Closing")
err = fd.Close()
if err != nil {
log.Print("Close errored: ", err)
}
wait.Wait()
log.Print("Exiting")
} |
Closing, thanks @crvv for the valuable information. |
running the tuntap in go 1.13 resulted Interface.Read() returning an "not pollable" error (from the runtime's poll.ErrNotPollable). This, it turns out, is due to /dev/net/tun in linux not being pollable (in the epoll sense) until after the TUNSETIFF ioctl has been done. The right fix, done here, is to open /dev/net/tun as a raw file descriptor, and ioctl it before constructing an *os.File which gets added to the poll set when Read() is called. See golang/go#30426 and golang/go#30624 and go source code commit a5fdd58c84b6b0a1ae5a53faebc0550024e3a066 which adds ErrNotPollable and exposes this error which otherwise was getting silently thrown away. This code works properly on the AP, too (master branch, using go 1.12.9, but it should work a long way back)
Go 1.12 brought Sysconn() for os.File. In theory that should let us OpenFile on /dev/net/tun, and then use Sysconn() to do all of the TUN-specific ioctls for setting up the device and giving it a name and setting some properties and such. From then out, it's supposed to be a matter of Read, Write, and Close. Since we don't need to call Fd() on the os.File at any point, we gain the benefits of using netpoll (which is epoll behind the scenes).
In addition to allowing the scheduler to make better decisions and not allocating an OS thread for every IO operation, netpoll also lets us call Read in one Go routine and Close in another, and the currently running Read will return immediately with an error saying that it's been closed. This is terrific for shutting down gracefully. To illustrate here's something that does not work as a consequence of using Fd:
The problem with the above code is that
fd.Read(b[:])
never returns afterfd.Close()
executes, and so the program hangs forever. Thanks to Sysconn in Go 1.12, we can fix that problem like this:This works as expected with regards to that
fd.Read(b[:])
getting cancelled. (In Go 1.11, I previously worked around this by manually polling on a cancellation pipe and the tun fd with some pretty gnarly ugliness. I've been eagerly awaiting the Go 1.12 release to stop having to play those games.)There's a big problem, however: netpoll's use of epoll doesn't seem to agree with the the Linux tun driver's tun_chr_poll. Consider the following program:
This is supposed to work, but actually the call to Read winds up blocking and not returning any data, and only ever returns upon the call to Close. The above program can be "fixed" by adding
fd.Fd()
just above thego func() {
line, in order to remove fd from netpoll. This, however, incurs the pre-Sysconn-era problem of Close not being cancelable and loosing the nice other benefits of netpoll.Anybody familiar with netpoll's particular use of epoll interested in taking a look under the hood?
The text was updated successfully, but these errors were encountered: