-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement event batching in the miniframework #143
base: master
Are you sure you want to change the base?
Conversation
OK, this looks usable now. The next step will be to try and make the CUDA implementation use it ... |
It will take me some time to digest this. Organizationally I'd prefer if this version of |
Sure. Making the PR on top of the existing |
I'm now looking how to apply this to the The first problem I ran into is that the Just to show how much I didn't know about c++:
So I'm using a third approach:
With this approach the only requirement on |
The previous implementation incurred in a huge performance penalty, because the This avoids it: diff --git a/src/cuda/Framework/EDProducer.h b/src/cuda/Framework/EDProducer.h
index 17e7e1be1bdb..76e39e93bf62 100644
--- a/src/cuda/Framework/EDProducer.h
+++ b/src/cuda/Framework/EDProducer.h
@@ -38,27 +38,25 @@ namespace edm {
bool hasAcquire() const { return true; }
void doAcquire(Event const& event, EventSetup const& eventSetup, WaitingTaskWithArenaHolder holder) {
- state_.reset();
- state_.emplace();
- acquire(event, eventSetup, std::move(holder), *state_);
+ acquire(event, eventSetup, std::move(holder), state_);
}
void doProduce(Event& event, EventSetup const& eventSetup) {
- produce(event, eventSetup, *state_);
- state_.reset();
+ produce(event, eventSetup, state_);
}
virtual void acquire(Event const& event,
EventSetup const& eventSetup,
WaitingTaskWithArenaHolder holder,
AsyncState& state) const = 0;
+
virtual void produce(Event& event, EventSetup const& eventSetup, AsyncState& state) = 0;
void doEndJob() { endJob(); }
virtual void endJob() {}
private:
- std::optional<AsyncState> state_;
+ AsyncState state_;
};
template <> I've propagated this change to the branch and pull request, though I'm still not sure that's the best approach. |
Add a template type parameter to the edm::EDProducerExternalWork class. The acquire() method is now const, and it communicates its asynchronous state to the the produce() method via an object of this type, rather than using data members. Implementations that do not need to pass any state between acquire() and produce() can use "void" or leave the template parameter empty.
Reorganise the existing EDProducer tests, and add tests for - EDBatchingProducer - EDBatchingProducerExternalWork
Introduce three new edm classes: - edm::EventBatch implements the ownership of a batch of events, implemented as an std::vector<edm::Event>; used by the Source to provide events to the StreamSchedule. - edm::EventRange implements non-owning, non-const access to a batch of events; wraps a pair of Event* to the begin and end of an event batch; passed (by value) to doProduce() and produce(). - edm::ConstEventRange implements non-owning const access to a batch of events; wraps a pair of Event* to the begin and end of an event batch; passed (by value) to doAcquire() and acquire().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a reasonable approach for a fixed-size batching and having total number of concurrent events to be "batch size * number of streams".
I found only two rather nitpicking-level comments (which may mean that I didn't digest all in the end). I would not worry too much about the interfaces (as long as they are reasonable) before performance is compared against the current non-batched approach.
|
||
void doProduce(EventRange events, EventSetup const& eventSetup) { produce(events, eventSetup, state_); } | ||
|
||
virtual void produce(EventRange events, EventSetup const& eventSetup, AsyncState& states) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really minor, but is this a typo?
virtual void produce(EventRange events, EventSetup const& eventSetup, AsyncState& states) = 0; | |
virtual void produce(EventRange events, EventSetup const& eventSetup, AsyncState& state) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used both spellings while prototyping, so I don't mind either way: this should capture the "state" or "states" of multiple events:
state
is more consistent withAsyncState
states
is more consistent withevents
Re-thinking a bit more... maybe state
is more correct, since it is one object, not e.g. a vector<AsyncState>
.
EventRange range() { return EventRange(&events_.front(), &events_.back() + 1); } | ||
|
||
ConstEventRange range() const { return ConstEventRange(&events_.front(), &events_.back() + 1); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very relevant for prototyping itself, but i think being able to use something along
EventRange range() { return EventRange(&events_.front(), &events_.back() + 1); } | |
ConstEventRange range() const { return ConstEventRange(&events_.front(), &events_.back() + 1); } | |
EventRange range() { return EventRange(events_.begin(), events_.end()); } | |
ConstEventRange range() const { return ConstEventRange(events_.begin(), events_.end()); } |
would be clearer to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I didn't do that was to avoid making EventRange
and ConstEventRange
use std::vector<Event>::iterator
and const_iterator
: then it could only work for a range of events stored in a vector
, while using plain pointers it can work with any consecutive range of events.
An alternative could be to add a constructor like
template <typename IT>
EventRange(IT begin, IT end) : begin_(std::addressof(*begin)), end_(std::addressof((*end)) { ... }
but then one could be tempted to pass non-contiguous iterators.
Maybe one could add something like
assert((end - begin) == (end_ - begin_));
or, once we move to C++20 with concepts, we could use contiguous_iterator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the purposes of the prototype the current pointers are fine for me. With C++20 maybe even passing the range as std::span
could be an option.
Interesting. The |
For prototyping purposes I think making |
I was surprised as well, and digging into the cause I found it to be I guess we could find a workaround - but avoiding constructing and destructing |
Ah, |
Just to repeat here what I mentioned in the core sw meeting today. I think an object that should live longer than "from And I think I finally realized the issue with that (within this PR)
Therefore Would it be fair to say that the |
Even if that is not the goal of the interface, in practice yes, because it is available (and mutable) in both |
Given that this batching approach needs an "event cache" in any case, how important do you think a separate mechanism to pass data from |
I have the feeling that an "event cache" and a "pass data from acquire to produce" mechanism could be to some extent orthogonal:
If we introduce an event cache, and if the interface for the event cache and the acquire-to-produce data would be the same (i.e. additional template parameter, additional parameter passed to acquire and produce) then of course having both at the same tie would only add complexity. |
Introduce a new way of passing information from
acquire
toproduce
:EDProducerExternalWork
base class, and make it available to inherited classes asAsyncState
;acquire
should use anAsyncState
object hold all the information that needs to be passed toproduce
:acquire
returns anAsyncState
object that is later passed toproduce
; it is also marked asconst
method to guarantee that the information is not passed through some internal state;produce
takes an extraAsyncState
parameter to receive the state information fromacquire
;acquire
toproduce
it can use avoid
or empty template parameter; in this caseacquire
returnsvoid
andproduce
does not take any new parameters (as before);acquire
is still marked asconst
.This change is propaedeutic to the event batching support, but can be reviewed and adopted independently of it.
Add a
doneWaiting
overload that does not take an exception pointer argumentThis is just a simplification for the cases where
doneWaiting(...)
was being called with an empty exception pointer.This change is independent from event batching support, and can be reviewed and adopted independently of it
Implement basic support for event batching, trying to have minimal impact on the existing modules:
Source
is responsible for reading (up to) N events at a time;Event
s are stored in anEventBatch
and passed to eachWorker
as anEventRange
(forproduce()
) orConstEventRange
(foracquire
); the (Const
)EventRange
class allows (const
) iteration and a (const
) range-loop over the events, without any possibility of adding/dropping events;Worker
passes the events to the variousEDProducer
-like base classes in the same way;EDProducer
andEDProducerExternalWork
base classes loop over the events in the range, and callacquire
andproduce
one event at a time; modules that derive from them can be used without any changes;EDBatchingProducer
andEDBatchingProducerExternalWork
base classes that pass the range ofEvent
s to the module'sacquire
andproduce
methods; the module's implementation can then loop over the events, process them in parallel, etc.The framework tests have been reorganised and extended to test all four
EDProdcer
-like base classes:EDProducer
;EDProducerExternalWork
;EDBatchingProducer
;EDBatchingProducerExternalWork
.Notes
About
edm::EventRange
andedm::ConstEventRange
The current
edm::EventRange
implementation is suitable only for the case where all modules will process all events; once the possibility of filtering events or to re-aggregate the ranges are considered, a different implementation will be needed.The interface can mostly stay unchanged, but these objects might better be passed by reference rather than by value.
Alternative approaches for
acquire()
The approach used by
src/fwtest/Framework/EDProducer.h
requires that theAsyncState
type by moveable or cpyable.A different approach is used by
src/cuda/Framework/EDProducer.h
: instead ofacquire
returning a newAsyncState
object by value, theAsyncState
object is default-constructed by theEDProducerExternalWork
base class and passed toacquire
byAsyncState &
, making the interface closer to that ofproduce()
.This waves the requirement that
AsyncState
is moveable or copyable, but adds the requirement thatAsyncState
be default-constructible.A further possibility would be for the base class to only hold an
std::unique_ptr<AsyncState>
orstd::optional<AsyncState>
and pass it by reference toacquire()
.This would wave also the requirement that
AsyncState
is default-constructible; it would be responsibility of the client code to properly construct an object duringacquire()
and assign it to theunique_ptr
oroptional
passed to it.An other downside would be that the interfaces of
acquire()
andproduce()
would diverge again.