Memory Clustering in the Enumerator #73

vangthao95 · 2020-04-23T05:22:27Z

This is a current draft of our memory clustering project that I have been working on. The main goal of this project is to allow our enumerator to be aware of clusterable memory operations along with register pressure and schedule length. Included in this draft is also the implementation of a new heuristic called "Cluster" which will be discussed later on in this draft. This is only a draft and there is still more work to be done before finalizing this work. Most of the current work is done in bb_spill and OptSchedDDGWrapperBasic.

To get information about instructions that can be clustered together, I copied how LLVM and AMD detect clusters from here and here and implemented it in OptSchedDDGWrapperBasic. Instead of actually adding edges to force constraints in the DAG, I simply added information to our scheduler indicating that the instruction is able to be clustered, which cluster group the cluster is in, and the total amount of instructions in the group. From this, we can also calculate the minimum amount of cluster blocks possible which I used as a lower bound for clustering.

During enumeration, I added a check to see if an instruction was part of a mem. op. cluster. If it is part of a cluster then I initialize variables for clustering otherwise we do nothing. However, if there is already an active clustering happening, we check if we can add the instruction to this current cluster. If we cannot add it to the current cluster, then we store the state of the current cluster (for backtracking purposes) and start a new one if needed. Throughout all of this, I keep track of the current cluster blocks + the optimistic estimate of the remaining cluster blocks. An example of this cost estimate is given below in example 1.

Example 1.
There are 10 total instructions (instruction 1 to 10) and 2 cluster blocks possible.
Cluster block 1 contains instructions 1, 3, 5, and 6. Cluster block 2 contains instructions 9 and 10.
Initially, the optimistic estimate of the remaining cluster block is 2. Then our scheduler clusters instruction 1, and 3. but chooses to schedule instruction 4 closing the current cluster. Since the cluster block closed before also clustering 5 and 6, it adds 1 to the cluster block count which optimistically assumes 5 and 6 will be cluster together in another cluster block. Note that at this current step, it also assumes the scheduler will also be able to cluster instruction 9 and 10 together in one block for a total of 3 blocks. The scheduler then multiplies the total amount of blocks (3) by the weight and adds it to the current cost for the current partial schedule.

To speed up our enumerator find instructions in a cluster, I also added a dynamic heuristic called "Cluster". When our scheduler initializes and starts clustering, this heuristic gives priority to other instructions in the ready list that belongs to the same cluster group that our scheduler is looking for.

…ter and potential issues.

…d a schedule in the ILP pass

…cluster to cluster mem-ops. More Debug statements.

… bound estimation.

…stic is not used.

Currently working implementation of clustering with B&B. No hard limits on cluster size when using AMD's shouldClusterMemOps() function but there is a hard limit of 15 during B&B. Currently still debugging history domination.

… after sequential scheduler.

…hanges to upper bound calculation.

…minimum ILP improvements.

kerbowa · 2020-08-19T15:35:59Z

lib/CMakeLists.txt

@@ -5,3 +5,4 @@ ELSE()
 ENDIF()

 add_dependencies(OptSched ${OPT_SCHED_TARGET_DEPS})
+target_link_libraries(OptSched -L/home/vang/src/ROCm-2.4/opencl/build/lib/ libamdocl64.so)


Probably shouldn't add this.

Vang Thao and others added 30 commits March 6, 2020 23:35

Added notes on how to possibly start.

215e6f9

Idea on how to implement checking if an instruction is part of a clus…

449456b

…ter and potential issues.

Added LLVM's method to check if we should cluster MemOps

a6376ab

Idea for implementation (WIP)

186a1f3

Fixed some compilation issues

c4b0973

Fixed some compiler bugs, and added experimental cost.

75b02f4

Cleaned up debug statements. NFC

00501ae

Added clustering cost to ChkCostFsblty, and added TODOs.

3603da9

Fix typo for variable and disabled terminating enumerator when we fin…

035272b

…d a schedule in the ILP pass

Debugging statements and reset mem clustering info in InitForSchduling

a2cd231

Added setting or memory clustering in settings. Fixed clustering for …

760c38d

…cluster to cluster mem-ops. More Debug statements.

Fix missing var.

8b5e2cc

Fix memory segmentation

93f01e3

Use an integer instead of a vector for cluster groups.

111d5eb

Fix error with static variable.

298fb0f

Added MEM heuristic priority. Not yet implemented.

30c7d9c

ALso save state for cluster of size 1.

d712460

First implementation of MEM heuristic.

91967ba

Print out ready list and changes to linked list (Vlad)

ed248f0

Extract more information about each cluster to be later used in lower…

3664057

… bound estimation.

Error fixes

b8e4ac5

First implementation of cost function

b519e25

Some code cleanup. No functional changes.

26a89c3

Missed variable to clean up

ec8e0bd

Fix issues with enumerator not updating priorities

f467f83

Added store clustering and debugging statements

7fcb9a4

Fix segmentation fault due to copying ready list when a dynamic heuri…

cccccc3

…stic is not used.

Updated comments for easier review.

b4f55af

Merge branch 'master' into memory-clustering-project

9e3c5cc

Fix not accounting for multiple clusters within the same store-chain.

46b9542

Vang Thao and others added 8 commits June 3, 2020 23:29

Copy in dag mutation fix.

4bfbc61

Copy verify schedule bugfix patch for dag mutation fix.

0d80260

Missed a file to copy over.

58978df

Ignore artificial edges for potential clustering and display clusters…

ee1d32f

… after sequential scheduler.

Add option to print cluster information after scheduling and revert c…

decb49f

…hanges to upper bound calculation.

Added 2nd ILP pass with lower target occupancy

913f83d

Add two conditions for re-scheduling ILP pass; Minimum occupancy and …

b01eeff

…minimum ILP improvements.

kerbowa reviewed Aug 19, 2020

View reviewed changes

vang thao and others added 3 commits August 19, 2020 09:11

Fix ILP Improvement calculation bugs

9bbb91d

Disable heuristic scheduler and B&B enumerator in 3rd ILP pass.

6b28d0d

Fix incorrect statement

527d08f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Clustering in the Enumerator #73

Memory Clustering in the Enumerator #73

vangthao95 commented Apr 23, 2020

kerbowa Aug 19, 2020

Memory Clustering in the Enumerator #73

Are you sure you want to change the base?

Memory Clustering in the Enumerator #73

Conversation

vangthao95 commented Apr 23, 2020

kerbowa Aug 19, 2020

Choose a reason for hiding this comment