libnative: Convert non-blocking fds to blocking when necessary #13355

lilyball · 2014-04-06T07:43:23Z

When doing a read or write on a file descriptor, if it returns EAGAIN or
EWOULDBLOCK, we need to turn off the O_NONBLOCK flag and try again. This
can happen because TTYs and FIFOs share the file descriptor flags with
other processes.

Fixes #13336.

lilyball · 2014-04-06T07:43:58Z

r? @alexcrichton

I know you aren't sold on this approach, but I think libnative needs to preserve its intended semantics of doing blocking I/O, and I don't think there's a better solution.

huonw · 2014-04-06T07:46:13Z

src/libnative/io/file_unix.rs

@@ -87,6 +92,25 @@ impl FileDesc {
    }
 }

+unsafe fn blocking(fd: fd_t, f: |fd_t| -> i64) -> i64 {


I feel like this needs (at least) one comment, describing what it's for/why it's doing what it does.

lilyball · 2014-04-06T19:39:20Z

@alexcrichton r? I added comments and tweaked it so that if O_NONBLOCK wasn't set in the F_GETFL we bail out.

lilyball · 2014-04-06T23:30:15Z

@alexcrichton I've changed the errno case to just re-call read()/write() and return. But I firmly believe the loop is necessary.

alexcrichton · 2014-04-07T05:07:30Z

I am trying to reproduce this, and I am unable to with your test. I would like to survey what happens in other languages such as ruby/python.

lilyball · 2014-04-07T05:09:21Z

My test reproduces 100% of the time on OS X. I haven't tried other platforms. I am also curious as to how ruby/python handles this, although I'm not sure that really makes a difference.

alexcrichton · 2014-04-07T06:49:33Z

I am very worried about the size of the hammer you are applying to this problem. This is making a significant number of assumptions which are usually dangerous to make when dealing with tricky situations such as this. I do not want to take this lightly.

I am quite surprised that this is only an error on OSX, and not an error on linux. I investigated into other languages to see what other runtimes do:

ruby - works ok. I'm not sure how, but there's a few ioctl calls which may be involved
C++ (using cout) - prints 7 times, sees EAGAIN, and stops printing (but doesn't throw an exception or anything), the program exits normally
Go - same as C++, prints 7 times and then stops printing, no error is thrown
D - throws an exception
python - throws an exception
haskell - works ok, I am seeing invocations of select next to every call to write. I presume this is because they have a libuv-like event loop driving their I/O (which solves this problem for free)

This data does not convince me that the PR here is the right course of action to take. It appears no one handles this in a libnative-like setting except for ruby.

lilyball · 2014-04-07T07:02:12Z

@alexcrichton Those results aren't surprising to me. C++ is behaving correctly here, AFAIK (I don't believe iostream makes any guarantee about being blocking, so obeying the O_NONBLOCK makes sense). Go is the only one here that I think is actually wrong, I think Go should be doing the same thing I'm trying to do here (because AFAIK Go does not expose the underlying fd and instead just exposes a set of behaviors).

Rust's libnative is exposing a blocking I/O API. As such, clients of this API reasonably expect that they never have to deal with EAGAIN and EWOULDBLOCK and therefore any errors they see are legitimate errors. What's more, even if they do look for those errors, there's no API to handle them. We don't expose any equivalent to select() or fcntl() and so the only two possible responses are either busy-waiting on the I/O (which is a terrible idea) or treating it as a hard error (which sucks).

I do not believe that toggling O_NONBLOCK off when we expected a blocking operation is a hammer. I believe it is actually a very targeted approach that is explicitly trying to adjust the state of the file descriptor into that which we want it to be, namely, blocking. This is no more a hammer than automatically retrying when we get EINTR is.

Also, I am very surprised you're not seeing this issue on Linux. I would have expected Linux to behave in the same fashion as OS X.

alexcrichton · 2014-04-07T07:12:51Z

This is no more a hammer than automatically retrying when we get EINTR is.

I do not agree with this. I can precisely explain to why EINTR occurs, and why retrying will always make progress. I cannot do the same for EAGAIN and flipping flags which can be modified/viewed from other processes. EINTR is purely a local solution, while modifying the nonblocking flag is globally affected.

What's more, even if they do look for those errors, there's no API to handle them

Lack of APIs for this does not justify this solution in my opinion. This is quite a rare error to see, and apis for select and such are planned for the future (and possibly with fcntl).

You have stated that these results are not surprising, and you believe that Rust should differ from essentially all of them. Every one of these languages is providing a blocking interface to I/O (just like libnative), yet none of them handle this error (except ruby). I will need further convincing that we need to travel down this path flipping nonblocking flags.

lilyball · 2014-04-07T07:18:04Z

@alexcrichton Ah hah! Linux does reproduce this, it just has a much higher internal buffer. I'm able to write 64k on Linux before it exhausts itself, instead of the 8k on OS X.

This tests rust-lang#13336.

When doing a read or write on a file descriptor, if it returns EAGAIN or EWOULDBLOCK, we need to turn off the O_NONBLOCK flag and try again. This can happen because TTYs and FIFOs share the file descriptor flags with other processes.

lilyball · 2014-04-07T07:22:43Z

@alexcrichton I rebased and tweaked the test to write 1MB instead of 32kB. I haven't run this precise test on linux (because I'm not building rustc from source there) but the equivalent code that I based this test on fails on linux with the same "Resource temporarily unavailable" error.

lilyball · 2014-04-07T07:27:49Z

I do not agree with this. I can precisely explain to why EINTR occurs, and why retrying will always make progress. I cannot do the same for EAGAIN and flipping flags which can be modified/viewed from other processes. EINTR is purely a local solution, while modifying the nonblocking flag is globally affected.

EINTR is because your call was interrupted. Retrying makes absolutely no guarantee that the call will not be interrupted a second time.

EAGAIN/EWOULDBLOCK means the fd is in non-blocking mode and turning that off makes no guarantee that the fd will not be set back to nonblocking by another thread or, in the case of TTYs/FIFIOs, another process, before you reissue the read/write call.

The situations are identical. You just understand the first a lot better than the second (and the first is likely to be a lot more common). In both cases, progress is made and you retry.

Lack of APIs for this does not justify this solution in my opinion. This is quite a rare error to see, and apis for select and such are planned for the future (and possibly with fcntl).

It's an error that has no solution without this change. Being rare does not make it meaningless. This whole problem was discovered by someone who failed to compile Rust because rustc hit the error while emitting build output. When someone is hitting the error while compiling Rust itself, without doing anything special, I think that warrants fixing.

I stated that those results don't surprise me, partially because I believe at least some of those languages do in fact expose a way to tweak the non-blocking nature of I/O, and partially because I expect that most language developers haven't actually run into this problem or given any thought to it. But while I'm not surprised by those results, I don't necessarily think they're correct. If a given language exposes a way to get/set the O_NONBLOCK flag on its I/O primitives, then letting the caller handle it seems reasonable. If a language does not (like Rust), then letting the caller handle it is absolutely wrong because the caller can't. And no, I don't think this is an argument for Rust exposing a way to get/set the non-blocking flag on its I/O. I think we just need to handle the case properly.

This update brings a few months of changes, but primarily a fix for the following situation. When creating a handle to stdin, libuv used to set the stdin handle to nonblocking mode. This would end up affect this stdin handle across all processes that shared it, which mean that stdin become nonblocking for everyone using the same stdin. On linux, this also affected *stdout* because stdin/stdout roughly point at the same thing. This problem became apparent when running the test suite manually on a local computer. The stdtest suite (running with libgreen) would set stdout to nonblocking mode (as described above), and then the next test suite would always fail for a printing failure (because stdout was returning EAGAIN). This has been fixed upstream, joyent/libuv@342e8c, and this update pulls in this fix. This also brings us in line with a recently upstreamed libuv patch. Closes #13336 Closes #13355

This update brings a few months of changes, but primarily a fix for the following situation. When creating a handle to stdin, libuv used to set the stdin handle to nonblocking mode. This would end up affect this stdin handle across all processes that shared it, which mean that stdin become nonblocking for everyone using the same stdin. On linux, this also affected *stdout* because stdin/stdout roughly point at the same thing. This problem became apparent when running the test suite manually on a local computer. The stdtest suite (running with libgreen) would set stdout to nonblocking mode (as described above), and then the next test suite would always fail for a printing failure (because stdout was returning EAGAIN). This has been fixed upstream, joyent/libuv@342e8c, and this update pulls in this fix. This also brings us in line with a recently upstreamed libuv patch. Closes rust-lang#13336 Closes rust-lang#13355

This update brings a few months of changes, but primarily a fix for the following situation. When creating a handle to stdin, libuv used to set the stdin handle to nonblocking mode. This would end up affect this stdin handle across all processes that shared it, which mean that stdin become nonblocking for everyone using the same stdin. On linux, this also affected *stdout* because stdin/stdout roughly point at the same thing. This problem became apparent when running the test suite manually on a local computer. The stdtest suite (running with libgreen) would set stdout to nonblocking mode (as described above), and then the next test suite would always fail for a printing failure (because stdout was returning EAGAIN). This has been fixed upstream, joyent/libuv@342e8c, and this update pulls in this fix. This also brings us in line with a recently upstreamed libuv patch. Closes #12827 Closes #13336 Closes #13355

lilyball · 2014-04-24T18:55:57Z

This is not fixed. #13619 may fix a common reproduction scenario, but the underlying problem still exists and can be reproduced without libgreen's involvement.

alexcrichton · 2014-04-24T22:18:37Z

As I commented the root cause of this bug had noting to with libnative and this is a hugely invasive solution (it was the source of the bugs in libuv).

I personally do not believe that this is something which should be handled by libnative.

lilyball · 2014-04-24T22:24:54Z

I don't think you understand what actually causes this bug. The root cause of this bug is the way file descriptors work in linux/unix for various file types (character devices and FIFOs, not sure about others). I've commented in #13336 about why I feel this bug still needs to be handled in libnative.

For the sake of sanity, I suggest any further discussion be restricted to #13336 so we don't keep copying information back and forth.

thestinger · 2014-05-04T22:38:53Z

There isn't consensus that this is actually a bug, so I think it makes sense to leave this closed until it's agreed that there's something to fix here. I don't know if something this small deserves an RFC, but certainly seems controversial enough to merit one.

lilyball · 2014-08-25T20:41:27Z

This topic came up again in IRC. It's still an issue. servo/servo#2601 is likely a manifestation of it. This PR almost certainly needs a rebase, or maybe even a rewrite, at this point, but I'd like it to be considered again.

The fundamental point here is libnative makes guarantees about blocking behavior, and these guarantees are violated if the fd is O_NONBLOCK. Since fds that refer to certain types of file objects (notably, FIFOs and terminal devices) can have O_NONBLOCK toggled by a separate process, and this has nothing to do with sharing fds with children or inheriting them from parents, this is not something that the Rust program, or anything under the programmer's control, can deal with. The only solution that actually works is for libnative to toggle O_NONBLOCK off again whenever it hits this situation.

lilyball · 2014-08-25T21:01:24Z

As mentioned by me several times in #13336, an alternative approach is to fall back to select() on EAGAIN/EWOULDBLOCK. Nobody voiced any support for that idea. But I'm willing to rewrite this PR for that approach if it would get this fixed.

alexcrichton · 2014-09-08T00:21:57Z

Closing due to inactivity (13 days). I would recommend holding off until the libnative redesign has landed at which point it may be much clearer how to deal with this.

…ason, r=xFrednet Not trigger `duplicated_attributes` on duplicate reasons As at rust-lang#13355 we shoudn't trigger `duplicated_attributes` on duplicate reasons attr changelog: [`duplicated_attributes`]: not trigger `duplicated_attributes` on duplicate reasons

huonw reviewed Apr 6, 2014
View reviewed changes

lilyball added 2 commits April 7, 2014 00:21

Add run-make test for libnative file I/O and O_NONBLOCK

e14a2b6

This tests rust-lang#13336.

alexcrichton mentioned this pull request Apr 19, 2014

Update libuv #13619

Merged

bors closed this in #13619 Apr 24, 2014

lilyball reopened this Apr 24, 2014

thestinger closed this May 4, 2014

lilyball reopened this Aug 25, 2014

alexcrichton closed this Sep 8, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libnative: Convert non-blocking fds to blocking when necessary #13355

libnative: Convert non-blocking fds to blocking when necessary #13355

lilyball commented Apr 6, 2014

lilyball commented Apr 6, 2014

huonw Apr 6, 2014

lilyball commented Apr 6, 2014

lilyball commented Apr 6, 2014

alexcrichton commented Apr 7, 2014

lilyball commented Apr 7, 2014

alexcrichton commented Apr 7, 2014

lilyball commented Apr 7, 2014

alexcrichton commented Apr 7, 2014

lilyball commented Apr 7, 2014

lilyball commented Apr 7, 2014

lilyball commented Apr 7, 2014

lilyball commented Apr 24, 2014

alexcrichton commented Apr 24, 2014

lilyball commented Apr 24, 2014

thestinger commented May 4, 2014

lilyball commented Aug 25, 2014

lilyball commented Aug 25, 2014

alexcrichton commented Sep 8, 2014

libnative: Convert non-blocking fds to blocking when necessary #13355

libnative: Convert non-blocking fds to blocking when necessary #13355

Conversation

lilyball commented Apr 6, 2014

lilyball commented Apr 6, 2014

huonw Apr 6, 2014

Choose a reason for hiding this comment

lilyball commented Apr 6, 2014

lilyball commented Apr 6, 2014

alexcrichton commented Apr 7, 2014

lilyball commented Apr 7, 2014

alexcrichton commented Apr 7, 2014

lilyball commented Apr 7, 2014

alexcrichton commented Apr 7, 2014

lilyball commented Apr 7, 2014

lilyball commented Apr 7, 2014

lilyball commented Apr 7, 2014

lilyball commented Apr 24, 2014

alexcrichton commented Apr 24, 2014

lilyball commented Apr 24, 2014

thestinger commented May 4, 2014

lilyball commented Aug 25, 2014

lilyball commented Aug 25, 2014

alexcrichton commented Sep 8, 2014