Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libnative: Convert non-blocking fds to blocking when necessary #13355

Closed
wants to merge 2 commits into from

Conversation

lilyball
Copy link
Contributor

@lilyball lilyball commented Apr 6, 2014

When doing a read or write on a file descriptor, if it returns EAGAIN or
EWOULDBLOCK, we need to turn off the O_NONBLOCK flag and try again. This
can happen because TTYs and FIFOs share the file descriptor flags with
other processes.

Fixes #13336.

@lilyball
Copy link
Contributor Author

lilyball commented Apr 6, 2014

r? @alexcrichton

I know you aren't sold on this approach, but I think libnative needs to preserve its intended semantics of doing blocking I/O, and I don't think there's a better solution.

@@ -87,6 +92,25 @@ impl FileDesc {
}
}

unsafe fn blocking(fd: fd_t, f: |fd_t| -> i64) -> i64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this needs (at least) one comment, describing what it's for/why it's doing what it does.

@lilyball
Copy link
Contributor Author

lilyball commented Apr 6, 2014

@alexcrichton r? I added comments and tweaked it so that if O_NONBLOCK wasn't set in the F_GETFL we bail out.

@lilyball
Copy link
Contributor Author

lilyball commented Apr 6, 2014

@alexcrichton I've changed the errno case to just re-call read()/write() and return. But I firmly believe the loop is necessary.

@alexcrichton
Copy link
Member

I am trying to reproduce this, and I am unable to with your test. I would like to survey what happens in other languages such as ruby/python.

@lilyball
Copy link
Contributor Author

lilyball commented Apr 7, 2014

My test reproduces 100% of the time on OS X. I haven't tried other platforms. I am also curious as to how ruby/python handles this, although I'm not sure that really makes a difference.

@alexcrichton
Copy link
Member

I am very worried about the size of the hammer you are applying to this problem. This is making a significant number of assumptions which are usually dangerous to make when dealing with tricky situations such as this. I do not want to take this lightly.

I am quite surprised that this is only an error on OSX, and not an error on linux. I investigated into other languages to see what other runtimes do:

  • ruby - works ok. I'm not sure how, but there's a few ioctl calls which may be involved
  • C++ (using cout) - prints 7 times, sees EAGAIN, and stops printing (but doesn't throw an exception or anything), the program exits normally
  • Go - same as C++, prints 7 times and then stops printing, no error is thrown
  • D - throws an exception
  • python - throws an exception
  • haskell - works ok, I am seeing invocations of select next to every call to write. I presume this is because they have a libuv-like event loop driving their I/O (which solves this problem for free)

This data does not convince me that the PR here is the right course of action to take. It appears no one handles this in a libnative-like setting except for ruby.

@lilyball
Copy link
Contributor Author

lilyball commented Apr 7, 2014

@alexcrichton Those results aren't surprising to me. C++ is behaving correctly here, AFAIK (I don't believe iostream makes any guarantee about being blocking, so obeying the O_NONBLOCK makes sense). Go is the only one here that I think is actually wrong, I think Go should be doing the same thing I'm trying to do here (because AFAIK Go does not expose the underlying fd and instead just exposes a set of behaviors).

Rust's libnative is exposing a blocking I/O API. As such, clients of this API reasonably expect that they never have to deal with EAGAIN and EWOULDBLOCK and therefore any errors they see are legitimate errors. What's more, even if they do look for those errors, there's no API to handle them. We don't expose any equivalent to select() or fcntl() and so the only two possible responses are either busy-waiting on the I/O (which is a terrible idea) or treating it as a hard error (which sucks).

I do not believe that toggling O_NONBLOCK off when we expected a blocking operation is a hammer. I believe it is actually a very targeted approach that is explicitly trying to adjust the state of the file descriptor into that which we want it to be, namely, blocking. This is no more a hammer than automatically retrying when we get EINTR is.

Also, I am very surprised you're not seeing this issue on Linux. I would have expected Linux to behave in the same fashion as OS X.

@alexcrichton
Copy link
Member

This is no more a hammer than automatically retrying when we get EINTR is.

I do not agree with this. I can precisely explain to why EINTR occurs, and why retrying will always make progress. I cannot do the same for EAGAIN and flipping flags which can be modified/viewed from other processes. EINTR is purely a local solution, while modifying the nonblocking flag is globally affected.

What's more, even if they do look for those errors, there's no API to handle them

Lack of APIs for this does not justify this solution in my opinion. This is quite a rare error to see, and apis for select and such are planned for the future (and possibly with fcntl).

You have stated that these results are not surprising, and you believe that Rust should differ from essentially all of them. Every one of these languages is providing a blocking interface to I/O (just like libnative), yet none of them handle this error (except ruby). I will need further convincing that we need to travel down this path flipping nonblocking flags.

@lilyball
Copy link
Contributor Author

lilyball commented Apr 7, 2014

@alexcrichton Ah hah! Linux does reproduce this, it just has a much higher internal buffer. I'm able to write 64k on Linux before it exhausts itself, instead of the 8k on OS X.

lilyball added 2 commits April 7, 2014 00:21
When doing a read or write on a file descriptor, if it returns EAGAIN or
EWOULDBLOCK, we need to turn off the O_NONBLOCK flag and try again. This
can happen because TTYs and FIFOs share the file descriptor flags with
other processes.
@lilyball
Copy link
Contributor Author

lilyball commented Apr 7, 2014

@alexcrichton I rebased and tweaked the test to write 1MB instead of 32kB. I haven't run this precise test on linux (because I'm not building rustc from source there) but the equivalent code that I based this test on fails on linux with the same "Resource temporarily unavailable" error.

@lilyball
Copy link
Contributor Author

lilyball commented Apr 7, 2014

I do not agree with this. I can precisely explain to why EINTR occurs, and why retrying will always make progress. I cannot do the same for EAGAIN and flipping flags which can be modified/viewed from other processes. EINTR is purely a local solution, while modifying the nonblocking flag is globally affected.

EINTR is because your call was interrupted. Retrying makes absolutely no guarantee that the call will not be interrupted a second time.

EAGAIN/EWOULDBLOCK means the fd is in non-blocking mode and turning that off makes no guarantee that the fd will not be set back to nonblocking by another thread or, in the case of TTYs/FIFIOs, another process, before you reissue the read/write call.

The situations are identical. You just understand the first a lot better than the second (and the first is likely to be a lot more common). In both cases, progress is made and you retry.

Lack of APIs for this does not justify this solution in my opinion. This is quite a rare error to see, and apis for select and such are planned for the future (and possibly with fcntl).

It's an error that has no solution without this change. Being rare does not make it meaningless. This whole problem was discovered by someone who failed to compile Rust because rustc hit the error while emitting build output. When someone is hitting the error while compiling Rust itself, without doing anything special, I think that warrants fixing.

I stated that those results don't surprise me, partially because I believe at least some of those languages do in fact expose a way to tweak the non-blocking nature of I/O, and partially because I expect that most language developers haven't actually run into this problem or given any thought to it. But while I'm not surprised by those results, I don't necessarily think they're correct. If a given language exposes a way to get/set the O_NONBLOCK flag on its I/O primitives, then letting the caller handle it seems reasonable. If a language does not (like Rust), then letting the caller handle it is absolutely wrong because the caller can't. And no, I don't think this is an argument for Rust exposing a way to get/set the non-blocking flag on its I/O. I think we just need to handle the case properly.

@alexcrichton alexcrichton mentioned this pull request Apr 19, 2014
alexcrichton added a commit that referenced this pull request Apr 24, 2014
This update brings a few months of changes, but primarily a fix for the
following situation.

When creating a handle to stdin, libuv used to set the stdin handle to
nonblocking mode. This would end up affect this stdin handle across all
processes that shared it, which mean that stdin become nonblocking for everyone
using the same stdin. On linux, this also affected *stdout* because stdin/stdout
roughly point at the same thing.

This problem became apparent when running the test suite manually on a local
computer. The stdtest suite (running with libgreen) would set stdout to
nonblocking mode (as described above), and then the next test suite would always
fail for a printing failure (because stdout was returning EAGAIN).

This has been fixed upstream, joyent/libuv@342e8c, and this update pulls in this
fix. This also brings us in line with a recently upstreamed libuv patch.

Closes #13336
Closes #13355
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Apr 24, 2014
This update brings a few months of changes, but primarily a fix for the
following situation.

When creating a handle to stdin, libuv used to set the stdin handle to
nonblocking mode. This would end up affect this stdin handle across all
processes that shared it, which mean that stdin become nonblocking for everyone
using the same stdin. On linux, this also affected *stdout* because stdin/stdout
roughly point at the same thing.

This problem became apparent when running the test suite manually on a local
computer. The stdtest suite (running with libgreen) would set stdout to
nonblocking mode (as described above), and then the next test suite would always
fail for a printing failure (because stdout was returning EAGAIN).

This has been fixed upstream, joyent/libuv@342e8c, and this update pulls in this
fix. This also brings us in line with a recently upstreamed libuv patch.

Closes rust-lang#13336
Closes rust-lang#13355
bors added a commit that referenced this pull request Apr 24, 2014
This update brings a few months of changes, but primarily a fix for the
following situation.

When creating a handle to stdin, libuv used to set the stdin handle to
nonblocking mode. This would end up affect this stdin handle across all
processes that shared it, which mean that stdin become nonblocking for everyone
using the same stdin. On linux, this also affected *stdout* because stdin/stdout
roughly point at the same thing.

This problem became apparent when running the test suite manually on a local
computer. The stdtest suite (running with libgreen) would set stdout to
nonblocking mode (as described above), and then the next test suite would always
fail for a printing failure (because stdout was returning EAGAIN).

This has been fixed upstream, joyent/libuv@342e8c, and this update pulls in this
fix. This also brings us in line with a recently upstreamed libuv patch.

Closes #12827
Closes #13336
Closes #13355
@bors bors closed this in #13619 Apr 24, 2014
@lilyball
Copy link
Contributor Author

This is not fixed. #13619 may fix a common reproduction scenario, but the underlying problem still exists and can be reproduced without libgreen's involvement.

@lilyball lilyball reopened this Apr 24, 2014
@alexcrichton
Copy link
Member

As I commented the root cause of this bug had noting to with libnative and this is a hugely invasive solution (it was the source of the bugs in libuv).

I personally do not believe that this is something which should be handled by libnative.

@lilyball
Copy link
Contributor Author

I don't think you understand what actually causes this bug. The root cause of this bug is the way file descriptors work in linux/unix for various file types (character devices and FIFOs, not sure about others). I've commented in #13336 about why I feel this bug still needs to be handled in libnative.

For the sake of sanity, I suggest any further discussion be restricted to #13336 so we don't keep copying information back and forth.

@thestinger
Copy link
Contributor

There isn't consensus that this is actually a bug, so I think it makes sense to leave this closed until it's agreed that there's something to fix here. I don't know if something this small deserves an RFC, but certainly seems controversial enough to merit one.

@thestinger thestinger closed this May 4, 2014
@lilyball
Copy link
Contributor Author

This topic came up again in IRC. It's still an issue. servo/servo#2601 is likely a manifestation of it. This PR almost certainly needs a rebase, or maybe even a rewrite, at this point, but I'd like it to be considered again.

The fundamental point here is libnative makes guarantees about blocking behavior, and these guarantees are violated if the fd is O_NONBLOCK. Since fds that refer to certain types of file objects (notably, FIFOs and terminal devices) can have O_NONBLOCK toggled by a separate process, and this has nothing to do with sharing fds with children or inheriting them from parents, this is not something that the Rust program, or anything under the programmer's control, can deal with. The only solution that actually works is for libnative to toggle O_NONBLOCK off again whenever it hits this situation.

@lilyball lilyball reopened this Aug 25, 2014
@lilyball
Copy link
Contributor Author

As mentioned by me several times in #13336, an alternative approach is to fall back to select() on EAGAIN/EWOULDBLOCK. Nobody voiced any support for that idea. But I'm willing to rewrite this PR for that approach if it would get this fixed.

@alexcrichton
Copy link
Member

Closing due to inactivity (13 days). I would recommend holding off until the libnative redesign has landed at which point it may be much clearer how to deal with this.

bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 24, 2024
…ason, r=xFrednet

Not trigger `duplicated_attributes` on duplicate reasons

As at rust-lang#13355 we shoudn't  trigger `duplicated_attributes` on duplicate reasons attr
changelog: [`duplicated_attributes`]: not trigger `duplicated_attributes` on duplicate reasons
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

libnative does not handle fds with O_NONBLOCK
4 participants