Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Host Manager #2284

Merged
merged 1 commit into from
Feb 7, 2019
Merged

Add Host Manager #2284

merged 1 commit into from
Feb 7, 2019

Conversation

gcatron
Copy link
Contributor

@gcatron gcatron commented Jan 19, 2019

Description: This is a resurrected PR for the HostManager. It is almost ready for merging, once Executor, Partitioner, and Provisioner are merged this will be updated and be ready.
The HostManager will serve as the entry point for execution calls and handle orchestrating the rest of the runtime components.
Testing: Will write Unit Test before merge.
Documentation: N/A
Related to #2045

void HostManager::removeNetwork(std::string networkID) {
// Walk the tree for the given function, calling evict function for each node
// before removing it from networks_ which frees the node.
std::queue<std::string> nodes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using std::string to represent nodes? Use pointers or something that can be uniqued.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strings are being used because that's the key to the map that stores the nodes. I could use pointers instead and then pointer->name to get the iterator to remove. Using a string seemed cleaner.

@nadavrot
Copy link
Contributor

Please make sure to squash the commits.

@jackm321
Copy link
Contributor

@gcatron this is looking good, can't wait to have this interface to the runtime!

include/glow/Runtime/HostManager/HostManager.h Outdated Show resolved Hide resolved
include/glow/Runtime/HostManager/HostManager.h Outdated Show resolved Hide resolved
include/glow/Runtime/HostManager/HostManager.h Outdated Show resolved Hide resolved
include/glow/Runtime/HostManager/HostManager.h Outdated Show resolved Hide resolved
lib/Runtime/HostManager/HostManager.cpp Outdated Show resolved Hide resolved
lib/Runtime/HostManager/HostManager.cpp Show resolved Hide resolved
lib/Runtime/HostManager/HostManager.cpp Outdated Show resolved Hide resolved
lib/Runtime/HostManager/HostManager.cpp Outdated Show resolved Hide resolved
lib/Runtime/HostManager/HostManager.cpp Outdated Show resolved Hide resolved
@gcatron gcatron force-pushed the create_host_manager branch 3 times, most recently from 6c4fcb8 to fe0be49 Compare January 24, 2019 21:48
@gcatron gcatron force-pushed the create_host_manager branch from 961753d to 16042c3 Compare January 24, 2019 23:24
Copy link
Contributor

@rdzhabarov rdzhabarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake changes missing
Need to look into concurrency issues.

include/glow/Runtime/HostManager/HostManager.h Outdated Show resolved Hide resolved
include/glow/Runtime/HostManager/HostManager.h Outdated Show resolved Hide resolved
include/glow/Runtime/HostManager/HostManager.h Outdated Show resolved Hide resolved
include/glow/Runtime/HostManager/HostManager.h Outdated Show resolved Hide resolved
lib/Runtime/HostManager/HostManager.cpp Show resolved Hide resolved
lib/Runtime/HostManager/HostManager.cpp Outdated Show resolved Hide resolved
lib/Runtime/HostManager/HostManager.cpp Outdated Show resolved Hide resolved
lib/Runtime/HostManager/HostManager.cpp Outdated Show resolved Hide resolved
lib/Runtime/HostManager/HostManager.cpp Outdated Show resolved Hide resolved
@gcatron gcatron force-pushed the create_host_manager branch from 62d8b65 to 920a99e Compare January 28, 2019 18:49
include/glow/Runtime/HostManager/HostManager.h Outdated Show resolved Hide resolved
lib/Runtime/HostManager/CMakeLists.txt Outdated Show resolved Hide resolved
tests/unittests/HostManagerTest.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@rdzhabarov rdzhabarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need multithreaded tests for those methods that can be executed concurrently (potentially enable thread sanitizer to catch issues).

I'm going to unblock the progress here, but need better verification around concurrency here in general.

lib/Runtime/HostManager/HostManager.cpp Show resolved Hide resolved
@gcatron gcatron force-pushed the create_host_manager branch 4 times, most recently from 3a0c660 to b3ab82e Compare February 5, 2019 01:39
Copy link
Contributor

@jackm321 jackm321 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! How close are we to this being in a working state atm? Asking so I can judge whether I should parallelize some more onnxifi work or just wait til this is in.

include/glow/Runtime/HostManager/HostManager.h Outdated Show resolved Hide resolved
ResultCBTy callback) {

auto currentRun = totalRequestCount_++;
std::lock_guard<std::mutex> networkLock(networkLock_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this lock need to be held while callback() is being called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the callback we are good since activeRequestCount_ is atomic. It's needed here since we are accessing roots_.

}

void HostManager::clearHost() {
std::lock_guard<std::mutex> networkLock(networkLock_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does lock need to be held while stopping devices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the device no, but this way we can't add a function to a device we are stopping.

lib/Runtime/HostManager/HostManager.cpp Show resolved Hide resolved
@gcatron gcatron force-pushed the create_host_manager branch 5 times, most recently from 7079bf3 to 62aba6b Compare February 6, 2019 18:16
@gcatron
Copy link
Contributor Author

gcatron commented Feb 6, 2019

I currently have the concurrent test disabled. This is because the compiled function is owned by the provisioner not the DeviceManager. Since there's no callback on evictNetwork The HostManager can free the compiledFunction out from under the DeviceManager. We either need to add a callback/make evict blocking, or move ownership of the CompiledFunction to the deviceManager.

@gcatron gcatron force-pushed the create_host_manager branch from 62aba6b to 7ff090d Compare February 6, 2019 18:20
@nickgg
Copy link
Contributor

nickgg commented Feb 6, 2019

@gcatron I think it makes sense to add a callback to the evict, we can probably use the same one from addNetwork.

@gcatron
Copy link
Contributor Author

gcatron commented Feb 6, 2019

@gcatron I think it makes sense to add a callback to the evict, we can probably use the same one from addNetwork.

That sounds good to me. I created #2357 once we do that we should be able to re-enable the concurrent test.

@gcatron gcatron force-pushed the create_host_manager branch 2 times, most recently from a258b1b to b4c3e95 Compare February 6, 2019 20:02
@gcatron gcatron force-pushed the create_host_manager branch 3 times, most recently from c6238d7 to dabe6c3 Compare February 6, 2019 23:52
@gcatron gcatron force-pushed the create_host_manager branch from dabe6c3 to b9d17b9 Compare February 7, 2019 16:33
@gcatron gcatron merged commit f423150 into pytorch:master Feb 7, 2019
@gcatron gcatron deleted the create_host_manager branch February 7, 2019 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants