-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TIR][USMP] Greedy memory planning algorithm #9214
Conversation
01b2055
to
897070d
Compare
897070d
to
ec29b89
Compare
ec29b89
to
ceb9abe
Compare
a friendly ping @mbs-octoml ! this depends on #8468 |
155f7a3
to
313e1ba
Compare
313e1ba
to
4e95aed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just nits and my usual plea for more comments :-)
* Implementation of greedy by size memory planning algorithm * Added a test case of linear sequence of operators with two pools * Added a test case with residual structures Change-Id: I03b41292eab85ddb43710356c23dd123beb24462
* Adding targets to the PrimFuncs in the tests Change-Id: Ic91947e23cbcc4fc0020eb62f0ed9df26cf919f9
This commit implements greedy algorithms for USMP based on size and number of liveness conflicts. * This includes a slight fix for buffer info extraction where non-linear network buffers owned by the main function should not show sporadic liveness. Change-Id: I957d543e75b3b0bcf5fc1fbc7870705c875c7a03
4e95aed
to
26fffb0
Compare
Thanks! @mbs-octoml . I finally got around to address the comments and did some cleanup. |
There was some remanants of unimplemented python APIs related BufferInfo. They are removed now. Change-Id: I4de1d817eb34187bc20da2ac2b1cb0da5b372833
Change-Id: Ia2d24753398bb388918aecba3b0191100d5100a6
@mbaret Could you take look at this PR when you have some time ? |
src/tir/usmp/algo/greedy.cc
Outdated
*/ | ||
|
||
/*! | ||
* \file tir/analysis/usmp/algo/greedy_by_size.cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't match current location
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
return allocates | ||
|
||
|
||
def assign_poolinfos_to_allocates_in_primfunc(primfunc, pool_infos): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these also be 'private' i.e. _assign?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
["algorithm", "fast_memory_size", "slow_memory_size"], | ||
[("greedy_by_size", 200704, 1418528), ("greedy_by_conflicts", 200704, 1418528)], | ||
) | ||
def test_linear(algorithm, fast_memory_size, slow_memory_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general comment on these tests, I'd normally favour a larger number of smaller cases. It's quite hard to parse these complex TIR modules and determine the expected behaviour of the test. Additionally, it's ambiguous what coverage we have other than it works in a particular linear and fan-out case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unit testing in this level is to make sure the component is functional and ensure the contract of the component. As long as greedy algorithm goes covering a linear and fan-out case covers most aspect conflict graph one could end up.
My personal preference would be to check coverage at the integration level of the USMP using these algorithm rather than using hand crafted TIR snippets at this level. WDYT ?
Alternatively, if you had some smaller cases in mind that you'd like tested here, please do share -- so we could consider adding them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it becomes a bit harder to argue for testing coverage in integration tests when this algorithm component is 'swappable'. That is, unless, those integration tests will be extended for every new algorithm that's added (which is plausible).
Regarding these specific tests, I think they would be a lot more readable using small synthetic tests rather than network extracts. The linear case has a large amount of complexity that doesn't contribute to the test if topology is the only impactful change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, unless, those integration tests will be extended for every new algorithm that's added (which is plausible).
Yes, we will be doing this as that would make the most sense as that is the only sensible way to make sure the number workspace size decided by the algorithm, actually turns out to be something that runs.
Regarding these specific tests, I think they would be a lot more readable using small synthetic tests rather than network extracts. The linear case has a large amount of complexity that doesn't contribute to the test if topology is the only impactful change.
I think, in terms of unit testing this algorithm, I ll create BufferInfo object based tests rather than TIR ones -- truly thats the input to the algorithm.
@@ -415,18 +454,30 @@ Map<BufferInfo, tir::Stmt> BufferInfoExtractor::operator()(const PrimFunc& main_ | |||
|
|||
// Traverse the liveness events using a open set to track what | |||
// is live while updating the conflicts through out the linear traversal | |||
std::unordered_set<BufferInfo, ObjectPtrHash, ObjectPtrEqual> open_set; | |||
// std::unordered_set<BufferInfo, ObjectPtrHash, ObjectPtrEqual> open_set; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
fusmp_algo_greedy_by_size = tvm.get_global_func(f"tir.usmp.algo.{algorithm}") | ||
buffer_pool_allocations = fusmp_algo_greedy_by_size(buffer_info_arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming here (and fusmp_algo_greedy_by_size) is a bit confusing because 'algorithm' could be by conflict I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Ack -- will change.
src/tir/usmp/algo/greedy.cc
Outdated
return pool_info; | ||
} | ||
} | ||
ICHECK(false) << "TVM USMP Internal Error: no candidate have been selected!"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this check be reached (if ever)? It seems like it may be hit if every pool is full, in which case I think the reported error should be quite explicit as it's something the user could reasonably run into. We should also probably test it as I don't think it would be a 'true assert'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, make sense, will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mbaret . See my responses and I ll do the acknowledged changes.
src/tir/usmp/algo/greedy.cc
Outdated
*/ | ||
|
||
/*! | ||
* \file tir/analysis/usmp/algo/greedy_by_size.cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
src/tir/usmp/algo/greedy.cc
Outdated
return pool_info; | ||
} | ||
} | ||
ICHECK(false) << "TVM USMP Internal Error: no candidate have been selected!"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, make sense, will change.
} else { | ||
open_set[le_event.buffer_info] -= 1; | ||
} | ||
// open_set.erase(le_event.buffer_info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
return allocates | ||
|
||
|
||
def assign_poolinfos_to_allocates_in_primfunc(primfunc, pool_infos): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
["algorithm", "fast_memory_size", "slow_memory_size"], | ||
[("greedy_by_size", 200704, 1418528), ("greedy_by_conflicts", 200704, 1418528)], | ||
) | ||
def test_linear(algorithm, fast_memory_size, slow_memory_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unit testing in this level is to make sure the component is functional and ensure the contract of the component. As long as greedy algorithm goes covering a linear and fan-out case covers most aspect conflict graph one could end up.
My personal preference would be to check coverage at the integration level of the USMP using these algorithm rather than using hand crafted TIR snippets at this level. WDYT ?
Alternatively, if you had some smaller cases in mind that you'd like tested here, please do share -- so we could consider adding them here.
fusmp_algo_greedy_by_size = tvm.get_global_func(f"tir.usmp.algo.{algorithm}") | ||
buffer_pool_allocations = fusmp_algo_greedy_by_size(buffer_info_arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Ack -- will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbaret . I think I've addresssed you comments by now. PTAL when you have some time.
This commits removes commented out lines ,few trivial cleanups and few BufferInfo based tests to check the algorithm. Change-Id: I1a12b6a424370e9e4c4a55563dde0ad698b07ea3
5e75ed3
to
7c93c60
Compare
Changed sorting criteria use alphabetic ordering as opposed to hashes of string as it seemed different accross different platforms. Change-Id: Ia7938d1b0d1374924c3ec7287526ccf374c54eb7
c675bb0
to
6101e61
Compare
src/tir/usmp/algo/greedy.cc
Outdated
return pool_info; | ||
} | ||
} | ||
CHECK(false) << "TVM USMP Error: no candidate have been selected for " << buf_info; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be more obvious because it'll be directly user-facing. Probably it needs to indicate that TVM has exceeded the pool memory limits and suggest to increase them if you want the compilation to pass. Let's not block the whole patch on this though, happy to take it in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Improving the error message Change-Id: Ib59efb172fe10b70f88a24f4874a7891e8a9cde7
Ah the CI decided to run itself again, thus I've changed the error message. |
This is now merged, thanks @manupa-arm, @mbs-octoml! |
This commit implements a greedy memory planning algorithms using the proposed USMP design.There are two greedy algorithms introduced here which use the size and number of conflicts as the criteria. - Adds few test cases checks for fan-out and linear structures. - Added a test case for ResNet sub-structure - Added a test case for MobileNet sub-structure - This includes a slight fix for buffer info extraction where non-linear network buffers owned by the main function should not show sporadic liveness.
This commit implements a greedy memory planning algorithms using the proposed USMP design.There are two greedy algorithms introduced here which use the size and number of conflicts as the criteria. - Adds few test cases checks for fan-out and linear structures. - Added a test case for ResNet sub-structure - Added a test case for MobileNet sub-structure - This includes a slight fix for buffer info extraction where non-linear network buffers owned by the main function should not show sporadic liveness.
This commit implements a greedy memory planning algorithms using the proposed USMP design.There are two greedy algorithms introduced here which use the size and number of conflicts as the criteria. - Adds few test cases checks for fan-out and linear structures. - Added a test case for ResNet sub-structure - Added a test case for MobileNet sub-structure - This includes a slight fix for buffer info extraction where non-linear network buffers owned by the main function should not show sporadic liveness.
This commit implements a greedy memory planning algorithms using the proposed USMP design.There are two greedy algorithms introduced here which use the size and number of conflicts as the criteria. - Adds few test cases checks for fan-out and linear structures. - Added a test case for ResNet sub-structure - Added a test case for MobileNet sub-structure - This includes a slight fix for buffer info extraction where non-linear network buffers owned by the main function should not show sporadic liveness.
This commit implements a greedy memory planning algorithms using the proposed USMP design.There are two greedy algorithms introduced here which use the size and number of conflicts as the criteria. - Adds few test cases checks for fan-out and linear structures. - Added a test case for ResNet sub-structure - Added a test case for MobileNet sub-structure - This includes a slight fix for buffer info extraction where non-linear network buffers owned by the main function should not show sporadic liveness.
This commit implements a greedy memory planning algorithms
using the proposed USMP design.There are two greedy algorithms
introduced here which use the size and number of conflicts as the criteria.
extraction where non-linear network buffers
owned by the main function should not show
sporadic liveness.