Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Glow Runtime] Top Level Task #2045

Open
qcolombet opened this issue Nov 17, 2018 · 1 comment
Open

[Glow Runtime] Top Level Task #2045

qcolombet opened this issue Nov 17, 2018 · 1 comment

Comments

@qcolombet
Copy link
Contributor

This is the top level issue to track all the work we plan to do to make the glow runtime supports concurrent execution, pipelining, batching and so on.

At a high level, the idea for the runtime is to be able to:

  • Enqueue inputs: Run input0, then run input1 as soon as the previous run is done, etc.
  • Slice the inputs into batch size and transparently run them: Take N input and sequentially run them in batches of M (where M is the size of the compiled model and N the actual run size.)
  • Pipeline work across models: Run input1 on model M1, then run the result of M1 on M2 while running input2 on M1, etc.

Among other things, the glow runtime will have to:

  • Manage input/output queues for each model (and communication with the devices)
  • Manage incoming model
  • Keep track of data dependencies and schedule next tasks to be done
  • Split inputs
  • Pad inputs
  • Dispatch workload on device
  • Keep track of the status of devices
    Also, somewhat orthogonal to the runtime, but related, glow will need to:
  • Determine what and where to run things (graph partitioning)

Right now, we started by splitting the compilation and runtime stages properly.
This work is tracked in:
#2040, #1967, #1953, #1951

@gcatron
Copy link
Contributor

gcatron commented Dec 5, 2018

Adding #2125 to the list. Work is being done on the HostManager. This is part of the new runtime design. The design has five major components: HostManager, Partitioner, Provisioner, DeviceManager, and Executor.

Partitioner:
This component is responsible for breaking up the provided network into subnetworks that can be run on multiple devices. It does its partitioning based on hardware constraints and heuristics to optimize execution time. It outputs a DAG to be used by the other components.

DeviceManager:
The DeviceManager handles interactions with the device. The manager handles initializing the device, copying constants to the device and preparing the device for execution. It also handles unloading networks from the device.

Provisioner:
The Provisioner takes the output from the partitioner and assigns sub networks to specific devices updating the DAG with device assignments. The Provisioner handles the code generation part of compilation and calls into the device manager to load the subnetworks to the device.

Executor:
The Executor handles the execution of the network. It walks the DAG calling execution of each sub network in accordance with their dependencies.

HostManager:
The HostManager is the container for the other components. It serves as the interface externally, handling network init and run requests. The HostManager routes a request through the other components and holds the DAGs for each network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants