-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: New launch policy for eagerly evaluated synchronous future continuations #111
Comments
Looking around further, one possible workaround for the absence of the proposed launch policy is to use a global inline executor. This is what I'm going to use while waiting for your inputs on this proposal. Here are some thoughts on why it seems less optimal than a dedicated launch policy to me:
The alternative would be to have one inline executor per task which can spawn synchronous continuations, which just feels like code duplication. |
Thanks for you suggestion. I'll try to comment soon. |
You don't need to guess, you can read the documentation. In order to reduce the cost of threads one alternative is to use Boost.Fiber that cost less to create as they are on the user land. Another is to use an executor for all those continuation, so that you don't have a new thread for each continuation.
It could be interesting to have a launch::sync policy and I believe others have suggested this already (see the boost ML). The question is why do you need to execute the callback synchronously? This will be a little bit intrusive, as you will be able to work on the producer thread. I believe that we should follow the C++ standard proposals design. Do you know if there is something like this policy proposed for the standard or something that is missing in Boost.Thread and that will respond to your need? I have no time to work on this now, but if you find something on the standard proposal that can help you, I will appreciate a PR. P.S. Note that you have the possibility to add a callback on the shared state. |
I did look it up, however the behaviour of calling .then() without specifying a launch policy is (rightfully) undefined.
I do not think that I can use them in the context of future continuations. To work efficiently, fibers must return control to the scheduler when blocking, and I have not seen a customization point in .then() that would allow me to do so.
This is what I currently do, as the inline executor is pretty close to what I want. However, it forces me to add either a global executor variable or one inline executor member per class which can potentially spawn synchronous continuations. In addition, the executor's check for closed_ on every submission, which is protected by a mutex, may end up being a scalability bottleneck when working with many threads and very short tasks. For these reasons, I think a dedicated launch policy could serve this use case better.
I'm not sure if we should call it launch::sync, as that would conflict with the previous name of std::launch::deferred in the C++0x draft, a naming convention which I believe Boost still supports when the right combination of flags is set. At least, that's where my attempts at searching this name led me. I think synchronous callbacks are the right solution in this case, because I want to schedule very tiny pieces of work which are essentially adapters from one part of my processing pipeline to the next. Such small workloads tend to put a lot of stress on the scheduling infrastructure, so the lighter-weight the underlying infrastructure the better. In addition, since these pieces of work are latency-critical, scheduling them directly on the CPU where the data is in cache would be beneficial. Some concrete examples:
I have not seen any mention of this yet in the C++ standard proposals that I am aware of, but I did not do a very in-depth search (only looking up the TSs on CPPreference). Where should I look to get the most up-to-date information on what's been proposed? Longer-term, I would certainly be in favor of proposing such a launch policy for inclusion in the C++ standard, but my experience with standard committees is that they usually want to look at an implementation before accepting a new proposal.
The upcoming week is similarly busy for me, but longer-term I'll see what I can do :)
How does this differ from future::then()? |
The behaviour mentioned by @HadrienG2 is the default behaviour of futures in many languages I have worked with. Java CompletableFutures, Scala Futures, etc. They are also the standard behaviour of facebook's folly futures. The reason for this is that it allows for better performance as well as having a futures abstraction that can work independent of the underlying async execution model. Since c++ standard does not define the thread used when then is called with no launch policy, we might as well execute the continuation inline. This is consistent with c++ standard, as well as all practical futures implementations in use today. |
@HadrienG2 Sorry for being so late. I missed this. A dedicated policy that need to use a pool of threads will need a global variable which I don't want to have neither. At least not until the standard adopt it. If you have a patch for the additional policy you want, I will be interested in looking at it. You can look at Look for Executors and concurrency. If you don't reach I could give you more specific pointers. Have you read |
@debdattabasu The C++ standard doesn't define what thread to use, but as if a new thread was used. Any concrete proposal that is backward compatible and doesn't imply a lost of the current performances is welcome. If the backward compatibility cannot be ensured the alternative been to have a new Boost.Thread V5 :( which I have no time today to work on. This doesn't mean that someone else can not do it. |
@viboes Right now the boost documentation says: -> When the launch policy is launch::none the continuation is called on an unspecified thread of execution. Since the thread is unspecified, we can choose any thread we want and still be backwards compatible with the existing library. This should not be a compatibility issue according to docs. :) |
@viboes I had a look at the lazy future documentation. These are pretty close in spirit to the std::launch::deferred launch policy, which allows for lazy continuation evaluation, in which future continuations are only executed when the host future is awaited. A defining feature of lazy evaluation is that in the following code, the future continuation will never be executed: f.then(std::launch::deferred, continuation);
while(true) {} Although lazy evaluation has its use cases, the above behavior is often undesirable. It makes it worryingly trivial to build code which deadlocks, and it hurts processing latency if the scheduling code takes too much time to reach the associated .get() or .wait() call for any reason, as the evaluation of the continuation will be delayed. Eager evaluation does not have these problems, which is why it is in most case preferable. With eager evaluation, the only question is which thread should run the continuation. We can spawn one thread per continuation (which is what most std::launch::async implementations do), but that is often wasteful. We can reuse a finite thread pool (which is what HPX and Boost's thread pool executor do), but as you mention doing it as a standard policy requires a global executor object, which may be undesirable. The only solution left is thus to reuse one of the existing threads, either the one which sets the continuation, or the one which sets the value of the futures, whichever comes last. And this is what the proposed inline evaluation policy would do. |
Hi, I will have some time to experiment on this. Do you have a PR that I can test? |
Question: am I missing something or is this (executing continuations in the thread calling .set_value/.set_exception or .then() when finished already) already provided by the |
Hi, we will need some test and documentation as well. |
allow sync policy for future then continuation (fix #111)
From a conceptual point of view, future continuations are awesome. They allow the dynamic definition of a processing pipeline, where multiple asynchronous operations are chained transparently without ever blocking for any of them until doing so is actually required.
However, I am finding myself in a situation where none of Boost's continuation launch policies seem to fit, in the sense that all of them result in bad performance or unnecessarily convoluted code. And I would like to propose another launch policy which could address that.
Consider the following processing pipeline, which is a heavily simplified form of my processing chain where future continuations are a attached in several places throughout the codebase.
In this processing pipeline, requestStorage queues an allocation request to a bounded storage mechanism, which will be honored once some storage becomes available. At this point, the future returned by requestStorage will be set with a handle to the corresponding storage location. When that happens, I want to load inputs from some IO resource into the storage system, then call a user-provided function which operates on those inputs in a CPU-intensive manner, and finally write down the outputs. At the end, I get a future or similar that I can use to tell when the whole process has completed.
Now, my goal is to run this processing pipeline for a bunch of user requests without spawning a large number of OS threads, because I know the later to kill my task scheduling performance. How do I do that?
So, it is possibly to implement every piece of work in my processing chain with a constant number of threads and a tiny bit of scheduling work here and there. The problem is that the default behaviour of Boost's future continuations is to spawn all of that scheduling work in extra threads, resulting in a huge number of OS threads being spawned and destroyed per request.
To quite GDB on this matter:
...and, unsurprisingly, looking at performance profiles, pthread_clone ends up being a signficant contributor to my request scheduling overhead.
Why does this happen? Well, my first guess is that future.then()'s default launch policy is boost::launch::async, which spawns an extra thread. Definitely not what I want here. So, what are my other options?
None of these seem to be very good fit here. Since again, my scheduling work is very small and nonblocking, what I would like to do instead is the following:
An astute reader will notice that this launch policy is essentially a variant of boost::launch::deferred that uses eager evaluation instead of lazy evaluation: we want to run the continuation as soon as the future is ready, and not as soon as its value is requested.
In addition, this launch policy is extremely light on resource requirements: you do not need extra threads, executors, concurrent queues, mutexes or any other kind of heavy-handed infrastructure, you could in principle implement it directly in the future class using nothing but an atomic compare-and-swap in future.then() and an atomic read in promise.set_xyz().
So, what would you think about this proposal?
The text was updated successfully, but these errors were encountered: