Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TIR][USMP] Greedy memory planning algorithm #9214

Merged
merged 8 commits into from
Dec 2, 2021

Conversation

manupak
Copy link
Contributor

@manupak manupak commented Oct 6, 2021

This commit implements a greedy memory planning algorithms
using the proposed USMP design.There are two greedy algorithms
introduced here which use the size and number of conflicts as the criteria.

  • Adds few test cases checks for fan-out and linear structures.
  • Added a test case for ResNet sub-structure
  • Added a test case for MobileNet sub-structure
  • This includes a slight fix for buffer info
    extraction where non-linear network buffers
    owned by the main function should not show
    sporadic liveness.

@manupak
Copy link
Contributor Author

manupak commented Nov 1, 2021

a friendly ping @mbs-octoml ! this depends on #8468

@areusch areusch self-assigned this Nov 9, 2021
@manupak manupak force-pushed the usmp_greedy_by_size branch 3 times, most recently from 155f7a3 to 313e1ba Compare November 22, 2021 14:49
@manupak manupak marked this pull request as ready for review November 23, 2021 17:37
@manupak manupak requested a review from icemelon as a code owner November 23, 2021 17:37
@manupak manupak force-pushed the usmp_greedy_by_size branch from 313e1ba to 4e95aed Compare November 23, 2021 17:50
Copy link
Contributor

@mbs-octoml mbs-octoml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just nits and my usual plea for more comments :-)

src/tir/usmp/algo/greedy_by_size.cc Outdated Show resolved Hide resolved
src/tir/usmp/algo/greedy_by_size.cc Outdated Show resolved Hide resolved
src/tir/usmp/algo/greedy_by_size.cc Outdated Show resolved Hide resolved
src/tir/usmp/algo/greedy_by_size.cc Outdated Show resolved Hide resolved
src/tir/usmp/algo/greedy_by_size.cc Outdated Show resolved Hide resolved
src/tir/usmp/algo/greedy_by_size.cc Outdated Show resolved Hide resolved
src/tir/usmp/algo/greedy_by_size.cc Outdated Show resolved Hide resolved
src/tir/usmp/algo/greedy_by_size.cc Outdated Show resolved Hide resolved
src/tir/usmp/algo/greedy_by_size.cc Outdated Show resolved Hide resolved
src/tir/usmp/algo/greedy_by_size.cc Outdated Show resolved Hide resolved
* Implementation of greedy by size memory planning algorithm
* Added a test case of linear sequence of operators with two pools
* Added a test case with residual structures

Change-Id: I03b41292eab85ddb43710356c23dd123beb24462
* Adding targets to the PrimFuncs in the tests

Change-Id: Ic91947e23cbcc4fc0020eb62f0ed9df26cf919f9
This commit implements greedy algorithms for
USMP based on size and number of liveness
conflicts.

* This includes a slight fix for buffer info
  extraction where non-linear network buffers
  owned by the main function should not show
  sporadic liveness.

Change-Id: I957d543e75b3b0bcf5fc1fbc7870705c875c7a03
@manupak manupak force-pushed the usmp_greedy_by_size branch from 4e95aed to 26fffb0 Compare November 29, 2021 19:03
@manupak
Copy link
Contributor Author

manupak commented Nov 29, 2021

Thanks! @mbs-octoml .

I finally got around to address the comments and did some cleanup.

There was some remanants of unimplemented python
APIs related BufferInfo. They are removed now.

Change-Id: I4de1d817eb34187bc20da2ac2b1cb0da5b372833
Change-Id: Ia2d24753398bb388918aecba3b0191100d5100a6
@manupak
Copy link
Contributor Author

manupak commented Nov 30, 2021

@mbaret Could you take look at this PR when you have some time ?

*/

/*!
* \file tir/analysis/usmp/algo/greedy_by_size.cc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't match current location

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

return allocates


def assign_poolinfos_to_allocates_in_primfunc(primfunc, pool_infos):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these also be 'private' i.e. _assign?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

["algorithm", "fast_memory_size", "slow_memory_size"],
[("greedy_by_size", 200704, 1418528), ("greedy_by_conflicts", 200704, 1418528)],
)
def test_linear(algorithm, fast_memory_size, slow_memory_size):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment on these tests, I'd normally favour a larger number of smaller cases. It's quite hard to parse these complex TIR modules and determine the expected behaviour of the test. Additionally, it's ambiguous what coverage we have other than it works in a particular linear and fan-out case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit testing in this level is to make sure the component is functional and ensure the contract of the component. As long as greedy algorithm goes covering a linear and fan-out case covers most aspect conflict graph one could end up.
My personal preference would be to check coverage at the integration level of the USMP using these algorithm rather than using hand crafted TIR snippets at this level. WDYT ?

Alternatively, if you had some smaller cases in mind that you'd like tested here, please do share -- so we could consider adding them here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it becomes a bit harder to argue for testing coverage in integration tests when this algorithm component is 'swappable'. That is, unless, those integration tests will be extended for every new algorithm that's added (which is plausible).

Regarding these specific tests, I think they would be a lot more readable using small synthetic tests rather than network extracts. The linear case has a large amount of complexity that doesn't contribute to the test if topology is the only impactful change.

Copy link
Contributor Author

@manupak manupak Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, unless, those integration tests will be extended for every new algorithm that's added (which is plausible).

Yes, we will be doing this as that would make the most sense as that is the only sensible way to make sure the number workspace size decided by the algorithm, actually turns out to be something that runs.

Regarding these specific tests, I think they would be a lot more readable using small synthetic tests rather than network extracts. The linear case has a large amount of complexity that doesn't contribute to the test if topology is the only impactful change.

I think, in terms of unit testing this algorithm, I ll create BufferInfo object based tests rather than TIR ones -- truly thats the input to the algorithm.

src/tir/usmp/algo/greedy.cc Outdated Show resolved Hide resolved
@@ -415,18 +454,30 @@ Map<BufferInfo, tir::Stmt> BufferInfoExtractor::operator()(const PrimFunc& main_

// Traverse the liveness events using a open set to track what
// is live while updating the conflicts through out the linear traversal
std::unordered_set<BufferInfo, ObjectPtrHash, ObjectPtrEqual> open_set;
// std::unordered_set<BufferInfo, ObjectPtrHash, ObjectPtrEqual> open_set;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

tests/python/unittest/test_tir_usmp_algo.py Show resolved Hide resolved
src/tir/usmp/analysis/extract_buffer_info.cc Show resolved Hide resolved
Comment on lines 205 to 206
fusmp_algo_greedy_by_size = tvm.get_global_func(f"tir.usmp.algo.{algorithm}")
buffer_pool_allocations = fusmp_algo_greedy_by_size(buffer_info_arr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming here (and fusmp_algo_greedy_by_size) is a bit confusing because 'algorithm' could be by conflict I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Ack -- will change.

src/tir/usmp/algo/greedy.cc Show resolved Hide resolved
return pool_info;
}
}
ICHECK(false) << "TVM USMP Internal Error: no candidate have been selected!";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this check be reached (if ever)? It seems like it may be hit if every pool is full, in which case I think the reported error should be quite explicit as it's something the user could reasonably run into. We should also probably test it as I don't think it would be a 'true assert'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, make sense, will change.

Copy link
Contributor Author

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbaret . See my responses and I ll do the acknowledged changes.

*/

/*!
* \file tir/analysis/usmp/algo/greedy_by_size.cc
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

return pool_info;
}
}
ICHECK(false) << "TVM USMP Internal Error: no candidate have been selected!";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, make sense, will change.

src/tir/usmp/algo/greedy.cc Show resolved Hide resolved
src/tir/usmp/algo/greedy.cc Outdated Show resolved Hide resolved
src/tir/usmp/analysis/extract_buffer_info.cc Show resolved Hide resolved
} else {
open_set[le_event.buffer_info] -= 1;
}
// open_set.erase(le_event.buffer_info);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

return allocates


def assign_poolinfos_to_allocates_in_primfunc(primfunc, pool_infos):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

["algorithm", "fast_memory_size", "slow_memory_size"],
[("greedy_by_size", 200704, 1418528), ("greedy_by_conflicts", 200704, 1418528)],
)
def test_linear(algorithm, fast_memory_size, slow_memory_size):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit testing in this level is to make sure the component is functional and ensure the contract of the component. As long as greedy algorithm goes covering a linear and fan-out case covers most aspect conflict graph one could end up.
My personal preference would be to check coverage at the integration level of the USMP using these algorithm rather than using hand crafted TIR snippets at this level. WDYT ?

Alternatively, if you had some smaller cases in mind that you'd like tested here, please do share -- so we could consider adding them here.

Comment on lines 205 to 206
fusmp_algo_greedy_by_size = tvm.get_global_func(f"tir.usmp.algo.{algorithm}")
buffer_pool_allocations = fusmp_algo_greedy_by_size(buffer_info_arr)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Ack -- will change.

tests/python/unittest/test_tir_usmp_algo.py Show resolved Hide resolved
Copy link
Contributor Author

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbaret . I think I've addresssed you comments by now. PTAL when you have some time.

src/tir/usmp/algo/greedy.cc Outdated Show resolved Hide resolved
tests/python/unittest/test_tir_usmp_algo.py Show resolved Hide resolved
This commits removes commented out lines
,few trivial cleanups and few BufferInfo
based tests to check the algorithm.

Change-Id: I1a12b6a424370e9e4c4a55563dde0ad698b07ea3
@manupak manupak force-pushed the usmp_greedy_by_size branch from 5e75ed3 to 7c93c60 Compare November 30, 2021 20:09
Changed sorting criteria use alphabetic
ordering as opposed to hashes of string
as it seemed different accross different
platforms.

Change-Id: Ia7938d1b0d1374924c3ec7287526ccf374c54eb7
@manupak manupak force-pushed the usmp_greedy_by_size branch from c675bb0 to 6101e61 Compare December 1, 2021 06:59
return pool_info;
}
}
CHECK(false) << "TVM USMP Error: no candidate have been selected for " << buf_info;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be more obvious because it'll be directly user-facing. Probably it needs to indicate that TVM has exceeded the pool memory limits and suggest to increase them if you want the compilation to pass. Let's not block the whole patch on this though, happy to take it in a follow-up.

Copy link
Contributor

@mbaret mbaret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Improving the error message

Change-Id: Ib59efb172fe10b70f88a24f4874a7891e8a9cde7
@manupak
Copy link
Contributor Author

manupak commented Dec 1, 2021

Ah the CI decided to run itself again, thus I've changed the error message.

@mbaret mbaret merged commit 756debc into apache:main Dec 2, 2021
@mbaret
Copy link
Contributor

mbaret commented Dec 2, 2021

This is now merged, thanks @manupa-arm, @mbs-octoml!

ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
This commit implements a greedy memory planning algorithms
using the proposed USMP design.There are two greedy algorithms
introduced here which use the size and number of conflicts as the criteria.

- Adds few test cases checks for fan-out and linear structures.
- Added a test case for ResNet sub-structure
- Added a test case for MobileNet sub-structure
- This includes a slight fix for buffer info
  extraction where non-linear network buffers
  owned by the main function should not show
  sporadic liveness.
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 11, 2022
This commit implements a greedy memory planning algorithms
using the proposed USMP design.There are two greedy algorithms
introduced here which use the size and number of conflicts as the criteria.

- Adds few test cases checks for fan-out and linear structures.
- Added a test case for ResNet sub-structure
- Added a test case for MobileNet sub-structure
- This includes a slight fix for buffer info
  extraction where non-linear network buffers
  owned by the main function should not show
  sporadic liveness.
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 12, 2022
This commit implements a greedy memory planning algorithms
using the proposed USMP design.There are two greedy algorithms
introduced here which use the size and number of conflicts as the criteria.

- Adds few test cases checks for fan-out and linear structures.
- Added a test case for ResNet sub-structure
- Added a test case for MobileNet sub-structure
- This includes a slight fix for buffer info
  extraction where non-linear network buffers
  owned by the main function should not show
  sporadic liveness.
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
This commit implements a greedy memory planning algorithms
using the proposed USMP design.There are two greedy algorithms
introduced here which use the size and number of conflicts as the criteria.

- Adds few test cases checks for fan-out and linear structures.
- Added a test case for ResNet sub-structure
- Added a test case for MobileNet sub-structure
- This includes a slight fix for buffer info
  extraction where non-linear network buffers
  owned by the main function should not show
  sporadic liveness.
qsqqsqqsq-intellif pushed a commit to qsqqsqqsq-intellif/tvm that referenced this pull request Apr 29, 2022
This commit implements a greedy memory planning algorithms
using the proposed USMP design.There are two greedy algorithms
introduced here which use the size and number of conflicts as the criteria.

- Adds few test cases checks for fan-out and linear structures.
- Added a test case for ResNet sub-structure
- Added a test case for MobileNet sub-structure
- This includes a slight fix for buffer info
  extraction where non-linear network buffers
  owned by the main function should not show
  sporadic liveness.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants