[microNPU] Fix cascade scheduling stability #13428

Aleksei-grovety · 2022-11-18T10:15:13Z

For Plans/Proposals added sorting by the number of cycles in case the memory used matches.

cc @leandron @ekalda, @NicolaLancellotti

tvm-bot · 2022-11-18T10:15:17Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @Mousius, @leandron, @lhutton1 _{See #10317 for details}
Built docs for commit 30a4503 can be found here.

_{Generated by tvm-bot}

ekalda

Thanks @alexey-yazev, looks good! :)

What I gather is that the instability in the cascader comes from nondeterministic sorting when two Plans/Proposals have the same memory usage. It makes sense to me then to look at the cycle count as a differentiating metric. However, in the case where we have identical performance and memory use, I can't think of a reason why one of the Plans/Proposals should be advantageous of the other, so I wonder if this could be simplified by just removing one of the Plan or Proposal?

Aleksei-grovety · 2022-11-22T14:58:43Z

Thanks @alexey-yazev, looks good! :)

What I gather is that the instability in the cascader comes from nondeterministic sorting when two Plans/Proposals have the same memory usage. It makes sense to me then to look at the cycle count as a differentiating metric. However, in the case where we have identical performance and memory use, I can't think of a reason why one of the Plans/Proposals should be advantageous of the other, so I wonder if this could be simplified by just removing one of the Plan or Proposal?

Thanks @ekalda!
I agree that elements with the same metrics have no advantages over each other. It seems that the real problem is in calculation of metrics, since resulting proposal from launch to launch is obtained with the same metrics, but as a result different amount of memory is allocated. I'll try to figure it out.

Aleksei-grovety · 2022-11-23T15:16:13Z

I suppose checking for the equality of allocated_size and workspace_size in test_networks.py is incorrect as when using cascader with enabled striping a proposal is selected with condition proposal.memory_usage < workspace_size, allocated_size and proposal.memory_usage are calculated differently (unified static memory planning is used to calculate allocated_size and proposal.memory_usage is calculated as the sum of all tensors, taking into account striping for intermediate tensors)

ekalda · 2022-11-24T15:02:54Z

Thanks @alexey-yazev, looks good! :)
What I gather is that the instability in the cascader comes from nondeterministic sorting when two Plans/Proposals have the same memory usage. It makes sense to me then to look at the cycle count as a differentiating metric. However, in the case where we have identical performance and memory use, I can't think of a reason why one of the Plans/Proposals should be advantageous of the other, so I wonder if this could be simplified by just removing one of the Plan or Proposal?

Thanks @ekalda! I agree that elements with the same metrics have no advantages over each other. It seems that the real problem is in calculation of metrics, since resulting proposal from launch to launch is obtained with the same metrics, but as a result different amount of memory is allocated. I'll try to figure it out.

I suppose there can be two kinds of instability there:
(1) Choosing a different Pproposal from launch to launch. Even if the Proposals have same memory and cycle counts according to the cascader, the more accurate memory planner can give a differing results for Proposals with different topology
(2) We choose an identical Proposal every time, but the memory planner allocates different amount of memory for the same proposal. That sounds like a memory planner instability

(A bit of a stab in the dark there)

ekalda · 2022-11-24T15:05:55Z

I suppose checking for the equality of allocated_size and workspace_size in test_networks.py is incorrect as when using cascader with enabled striping a proposal is selected with condition proposal.memory_usage < workspace_size, allocated_size and proposal.memory_usage are calculated differently (unified static memory planning is used to calculate allocated_size and proposal.memory_usage is calculated as the sum of all tensors, taking into account striping for intermediate tensors)

Yes, I think you are right, thinking about it, we can't really check for the equality of allocated_size and workspace_size. I suppose when we test for allocated_size < workspace_size we are checking that the Proposal we chose (based on workspace_size) still fits into the workspace_size once we have done memory planning on the resulting graph.

Aleksei-grovety · 2022-11-28T12:35:48Z

Without running the StorageRewrite pass (changes were merged in PR #13365) amount of allocated memory is same from launch to launch despite the fact that different proposals are applied.

I suppose there can be two kinds of instability there:
(1) Choosing a different Pproposal from launch to launch. Even if the Proposals have same memory and cycle counts according to the cascader, the more accurate memory planner can give a differing results for Proposals with different topology
(2) We choose an identical Proposal every time, but the memory planner allocates different amount of memory for the same proposal. That sounds like a memory planner instability

There is the first one and it happens if the StorageRewrite pass was run.

Aleksei-grovety · 2022-11-28T13:08:48Z

For this pull request, will it be enough to add an additional parameter to sort Plans/Proposals or do I need to investigate problem with different memory allocations when running StorageRewrite pass?

ekalda · 2022-12-01T14:39:33Z

For this pull request, will it be enough to add an additional parameter to sort Plans/Proposals or do I need to investigate problem with different memory allocations when running StorageRewrite pass?

Sorry for the delay on this - I don't think we should spend much time investigating the instability that results from using StorageRewrite since Ethos-U is intended to be run with the USMP, so debugging the internals of StorageRewrite seems a bit out of scope here.

Aleksei-grovety · 2022-12-02T07:13:45Z

Sorry for the delay on this - I don't think we should spend much time investigating the instability that results from using StorageRewrite since Ethos-U is intended to be run with the USMP, so debugging the internals of StorageRewrite seems a bit out of scope here.

Thanks, on changes in the code I left only additional conditions for sorting.

The reason for allocating different amounts of memory from launch to launch was that when determining optimal proposals, there are elements in the collection with the same costs metrics and the first of these metrics becomes optimal and the rest are discarded. the problem was solved by adding an additional sorting condition by shapes from StripeConfigs in the case when the metrics match.

ekalda

LGTM!

ekalda · 2022-12-05T11:11:47Z

Thanks @alexey-yazev!

github-actions bot requested review from ekalda and leandron November 18, 2022 10:15

ekalda reviewed Nov 21, 2022

View reviewed changes

Aleksei-grovety force-pushed the ethosu-cascade-scheduling-stability-bugfix branch from a0f521b to 10e8390 Compare December 2, 2022 06:25

Aleksei-grovety added 3 commits December 2, 2022 22:05

fix build

a73dd43

remove sorting by shapes, revert test_networks.py

30a4503

Aleksei-grovety force-pushed the ethosu-cascade-scheduling-stability-bugfix branch from 10e8390 to 30a4503 Compare December 2, 2022 18:07

ekalda approved these changes Dec 5, 2022

View reviewed changes

ekalda merged commit 012551f into apache:main Dec 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[microNPU] Fix cascade scheduling stability #13428

[microNPU] Fix cascade scheduling stability #13428

Aleksei-grovety commented Nov 18, 2022 •

edited

Loading

tvm-bot commented Nov 18, 2022 •

edited

Loading

ekalda left a comment

Aleksei-grovety commented Nov 22, 2022

Aleksei-grovety commented Nov 23, 2022

ekalda commented Nov 24, 2022

ekalda commented Nov 24, 2022

Aleksei-grovety commented Nov 28, 2022

Aleksei-grovety commented Nov 28, 2022 •

edited

Loading

ekalda commented Dec 1, 2022

Aleksei-grovety commented Dec 2, 2022 •

edited

Loading

ekalda left a comment

ekalda commented Dec 5, 2022

[microNPU] Fix cascade scheduling stability #13428

[microNPU] Fix cascade scheduling stability #13428

Conversation

Aleksei-grovety commented Nov 18, 2022 • edited Loading

tvm-bot commented Nov 18, 2022 • edited Loading

ekalda left a comment

Choose a reason for hiding this comment

Aleksei-grovety commented Nov 22, 2022

Aleksei-grovety commented Nov 23, 2022

ekalda commented Nov 24, 2022

ekalda commented Nov 24, 2022

Aleksei-grovety commented Nov 28, 2022

Aleksei-grovety commented Nov 28, 2022 • edited Loading

ekalda commented Dec 1, 2022

Aleksei-grovety commented Dec 2, 2022 • edited Loading

ekalda left a comment

Choose a reason for hiding this comment

ekalda commented Dec 5, 2022

Aleksei-grovety commented Nov 18, 2022 •

edited

Loading

tvm-bot commented Nov 18, 2022 •

edited

Loading

Aleksei-grovety commented Nov 28, 2022 •

edited

Loading

Aleksei-grovety commented Dec 2, 2022 •

edited

Loading