-
Notifications
You must be signed in to change notification settings - Fork 698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeviceManager interface and CPUDeviceManager #2249
Conversation
@nickgg Quick question: What happens if someone would call |
@opti-mix I think we should keep the interface here as simple as possible, I'm not sure that all devices Glow supports will support being able to interrupt a running inference, so we'd be setting ourselves up for failure by supporting it in the DeviceManger layer. Do you know a use case that would require this? The basic idea here is that we're serializing all calls into a single queue, and so the evict would take place after previous runs but prevent future runs. |
That ASAN failure doesn't seem related, the alloc & overrun both happen within the call to compile(). Does anyone know about this, if not i'll look into it. |
return maxMemoryBytes >= (usedMemoryBytes + estimate); | ||
} | ||
|
||
void CPUDeviceManager::addNetworkImpl(const Module *module, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be Module or Function? We are loading a specific function to be executed. Np?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're loading a set of functions (which we expect to share constants, e.g. the same graph with different batch sizes). The FunctionMap
contains CompiledFunction
s, but to collect the constants we need access back to the Placeholders which are owned by the module.
#include <chrono> | ||
#include <future> | ||
|
||
using namespace glow; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great unit tests!
Added RuntimeTypes, fixed some formatting. Didn't attack the header include part yet. |
4e85dd1
to
b30b3c1
Compare
Pulled out implementations to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions and comments.
|
||
/// Remove (and delete) the provided network and all it's functions, freeing | ||
/// up space on the device. | ||
virtual void evictNetwork(const Module *module) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we know if eviction succeeded? (e.g. stats/logging etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could generate the event/log from within the device manager.
|
||
/// Remove (and delete) the provided network and all it's functions, freeing | ||
/// up space on the device. | ||
void evictNetwork(const Module *module); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can module be nullptr? if not const Module &module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should keep it consistent with addNetwork and the ReadyCB, and since those are used to do comparisons I think better to keep it as pointer (ie. do a ptr compare, not an operator==).
Honestly I think the raw ptrs here should be shared_ptrs.
71760ef
to
addd995
Compare
@rdzhabarov are you satisfied with those explanations or do you want me to change the interface? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time to unblock this :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nickgg yes, thanks!
Fixed remaining comments and will commit, build permitting. |
@nickgg u_int64_t is not c portable, and work with VS. |
Description: A second prototype of the DeviceManager, which is the interface to a single inference Device. The interface supports three operations:
** addNetwork() - load an existing compiled module (or sub module) onto the device ready to be executed.
** evictNetwork() - unload a module from the device and from memory.
** runFunction() - run a Function (subgraph) with the provided context and callback with results.
The DeviceManager operates in it's own thread, which for the CPU backend implemented here also does compilation and running of the function.
Testing: new tests, parallelized example in a follow up.
Documentation:
Part of the high level Runtime issue #2045.