[WIP][Runtime] Add runtime Executor class #2267

SplitInfinity · 2019-01-15T20:12:13Z

Description
This commit adds Executor, an interface for a component of the Glow
runtime that is capable of running a partitioned directed acyclic Glow
graph and returning the results of that execution. This commit also adds
ThreadPoolExecutor, an implementation of this interface that runs
graphs using a thread pool. The workers essentially do a
concurrent breadth-first search through the node graph to execute all
nodes, repeatedly copying partition components outputs to their descendants along
the way.

Testing
Tests WIP. I need to write some code to mock DAGs, symbol information and DeviceManager; let me know if you've done this already and can share the code.

Description: This commit adds `Executor`, an interface for a component of the Glow runtime that is capable of running a partitioned directed acyclic Glow graph and returning the results of that execution. This commit also adds `ThreadPoolExecutor`, an implementation of this interface that runs graphs using a thread pool. The workers essentially do a concurrent breadth-first search through the node graph to execute all nodes, repeatedly copying partition components outputs to their descendents along the way. Testing: It compiles. Documentation: Comments.

SplitInfinity · 2019-01-15T20:15:49Z

Overall, I think this is simpler and cleaner than v1, but there is still some iteration and cleanup required. There is still some complicated locking/unlocking going on, and there is probably a cleaner way to split up the state and responsibilities between ThreadPoolExecutor and ExecutionState. Right now, the latter is a simple contained for all the state related to a single run that allows me to avoid creating several instance variables in ThreadPoolExecutor of type std::unordered_map<RuntimeIdTy, std::unordered_map<DAGNode*, T>> which is annoying to read and reason about.

bertmaher · 2019-01-15T20:41:43Z

Does this PR include the changes in #2249? The device manager stuff looks similar...

nickgg · 2019-01-15T22:35:00Z

@bertmaher it's got the interface and runtime types from that PR yes, ideally we land #2249 first and rebase onto it.

SplitInfinity · 2019-01-15T22:57:40Z

Yep, @nickgg is correct. I will rebase on top of that PR after it lands.

gcatron · 2019-01-17T00:18:35Z

include/glow/Backends/BackendUtils.h

@@ -59,6 +65,8 @@ class RuntimeBundle {
  size_t getValueOffset(const Named *v) const;
  /// Helper function, gets symbol info for \p v.
  RuntimeSymbolInfo getSymbolInfo(const Named *v) const;
+  /// Get a const reference to the symbol table.


Channeling @bertmaher and since I'm a frequent transgressor, remember vertical whitespace between declarations.

nadavrot · 2019-01-17T05:39:47Z

include/glow/Backends/DeviceManager.h

+
+using DeviceNetworkID = size_t;
+
+enum ResultCode { READY, EXECUTED, FAILED, CANCELLED };


The ResultCode enum leaks into the global namespace.

These now live in RuntimeTypes.h anyway so can just include that

nadavrot · 2019-01-17T05:40:13Z

include/glow/Backends/BackendUtils.h

@@ -30,13 +30,19 @@ struct RuntimeSymbolInfo {
  size_t offset;
  /// Type of symbol.
  Type type;
+  /// Is the symbol an input for the function.
+  bool input;


input -> isInput

nadavrot · 2019-01-17T05:40:43Z

include/glow/Backends/DeviceManager.h

+#include "glow/Graph/Graph.h"
+#include "glow/Support/ThreadPool.h"
+
+#include <atomic>


Can you move some of the includes from the header file to the cpp file?

nadavrot · 2019-01-17T05:41:24Z

lib/Runtime/Executor/ThreadPoolExecutor.cpp

+namespace runtime {
+
+/// Forward declaration of getDeviceManager function.
+/// TODO: Talk to gcatron about the details of this.


Please don't leave usernames and todos in the codebase. You can file an issue.

nickgg · 2019-01-17T17:25:42Z

include/glow/Backends/DeviceManager.h

+
+using DeviceNetworkID = size_t;
+
+enum ResultCode { READY, EXECUTED, FAILED, CANCELLED };


These now live in RuntimeTypes.h anyway so can just include that

nickgg · 2019-01-17T17:27:18Z

include/glow/Runtime/Executor/Executor.h

+};
+
+/// Create a executor of kind \p kind.
+Executor *createExecutor(ExecutorKind executorKind);


This could have a default argument since we wouldn't expect to call it more than once in the runtime.

nickgg · 2019-01-17T17:29:03Z

include/glow/Runtime/Executor/Executor.h

+namespace runtime {
+
+/// This enum lists the available executors.
+enum class ExecutorKind {


Are you sure we need this? I could see maybe we decide on a different executor than a Thread Pool but would we still want to support both?

nickgg · 2019-01-17T17:42:50Z

lib/Runtime/Executor/ThreadPoolExecutor.cpp

+  executionStateLocks_[runId];
+  executionStates_.insert(std::make_pair(
+      runId, std::make_shared<ExecutionState>(runId, std::move(cb))));
+  lock.unlock();


Really prefer using scope to control locking rather than a manual unlock(), its easy for the next developer to introduce a path that locks or unlocks at the wrong point.

nickgg · 2019-01-17T17:44:49Z

lib/Runtime/Executor/ThreadPoolExecutor.cpp

+                                                      DAGNode *node,
+                                                      Context *ctx) {
+  // Get the execution state for the run.
+  std::shared_ptr<ExecutionState> executionState = executionStates_[runId];


You have to hold the executionStateLocksMtx_ lock while reading executionStates_ since it may be being modified by another thread. If you want to release the lock during the propagation process I think you should pass the ExecutionState* in rather than the RunIdentifier.

nickgg · 2019-01-17T17:55:21Z

lib/Runtime/Executor/ThreadPoolExecutor.cpp

+  // Run the node using the DeviceManager.
+  deviceManager->runFunction(
+      node->name, std::move(nodeCtx),
+      [this, runId, node](RunIdentifierTy id, ResultCode resultCode,


idis unused, but it was added specifically for this use case - looks like you have your own runId. Do we still need the RunIdenitifier generated by runFunction?

nickgg · 2019-01-17T17:58:49Z

lib/Runtime/Executor/ThreadPoolExecutor.h

+  /// Locks for the execution state objects. These are used to ensure that
+  /// the shared state contained in the ExecutionState object for a run is
+  /// mutated by only one thread at a time.
+  std::unordered_map<RunIdentifierTy, std::mutex> executionStateLocks_;


The lifetime of these locks appears to be identical to the ExecutionState, why not make the lock a member of the state?

nickgg · 2019-01-17T19:10:34Z

lib/Runtime/Executor/ThreadPoolExecutor.cpp

+  // Get the execution state for the run.
+  auto executionStateIt = executionStates_.find(runId);
+  if (executionStateIt == executionStates_.end()) {
+    // This should never happen. TODO: Log, assert, something.


Feels safer to start with an assert than not.

nickgg · 2019-01-17T19:27:27Z

lib/Runtime/Executor/ThreadPoolExecutor.cpp

+
+  std::shared_ptr<ExecutionState> executionState = executionStateIt->second;
+
+  if (resultCode == ResultCode::CANCELLED || resultCode == ResultCode::FAILED) {


nit: This seems clearer as if (resultCode != ResultCode::EXECUTED)

nickgg · 2019-01-17T19:32:05Z

lib/Runtime/Executor/ThreadPoolExecutor.cpp

+    if ((executionState->inflightNodes).empty()) {
+      // If there are no nodes inflight, that means all nodes are done. Call
+      // the callback and erase the state information.
+      executionState->cb(runId, ResultCode::EXECUTED,


Holding a lock while calling the callback is dangerous, this could deadlock if the cb calls back into the Executor.

stale · 2019-01-22T20:37:52Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

stale · 2019-01-27T21:00:14Z

This PR has been automatically closed due to being stale for 5 days. Thank you for your contributions and feel free to reopen it in case of further progress.

SplitInfinity added the WORK IN PROGRESS label Jan 15, 2019

facebook-github-bot added the CLA Signed label Jan 15, 2019

gcatron reviewed Jan 17, 2019

View reviewed changes

nadavrot reviewed Jan 17, 2019

View reviewed changes

nickgg reviewed Jan 17, 2019

View reviewed changes

stale bot added the stale_will_be_closed label Jan 22, 2019

stale bot closed this Jan 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][Runtime] Add runtime Executor class #2267

[WIP][Runtime] Add runtime Executor class #2267

SplitInfinity commented Jan 15, 2019

SplitInfinity commented Jan 15, 2019

bertmaher commented Jan 15, 2019

nickgg commented Jan 15, 2019

SplitInfinity commented Jan 15, 2019

gcatron Jan 17, 2019

nadavrot Jan 17, 2019

nickgg Jan 17, 2019

nadavrot Jan 17, 2019

nadavrot Jan 17, 2019

nadavrot Jan 17, 2019

nickgg Jan 17, 2019

nickgg Jan 17, 2019

nickgg Jan 17, 2019

nickgg Jan 17, 2019

nickgg Jan 17, 2019

nickgg Jan 17, 2019

nickgg Jan 17, 2019

nickgg Jan 17, 2019

nickgg Jan 17, 2019

nickgg Jan 17, 2019

stale bot commented Jan 22, 2019

stale bot commented Jan 27, 2019


		using DeviceNetworkID = size_t;

		enum ResultCode { READY, EXECUTED, FAILED, CANCELLED };


		std::shared_ptr<ExecutionState> executionState = executionStateIt->second;

		if (resultCode == ResultCode::CANCELLED \|\| resultCode == ResultCode::FAILED) {

[WIP][Runtime] Add runtime Executor class #2267

[WIP][Runtime] Add runtime Executor class #2267

Conversation

SplitInfinity commented Jan 15, 2019

SplitInfinity commented Jan 15, 2019

bertmaher commented Jan 15, 2019

nickgg commented Jan 15, 2019

SplitInfinity commented Jan 15, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stale bot commented Jan 22, 2019

stale bot commented Jan 27, 2019