Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Clustering in the Enumerator #73

Draft
wants to merge 41 commits into
base: master
Choose a base branch
from

Conversation

vangthao95
Copy link
Member

This is a current draft of our memory clustering project that I have been working on. The main goal of this project is to allow our enumerator to be aware of clusterable memory operations along with register pressure and schedule length. Included in this draft is also the implementation of a new heuristic called "Cluster" which will be discussed later on in this draft. This is only a draft and there is still more work to be done before finalizing this work. Most of the current work is done in bb_spill and OptSchedDDGWrapperBasic.

To get information about instructions that can be clustered together, I copied how LLVM and AMD detect clusters from here and here and implemented it in OptSchedDDGWrapperBasic. Instead of actually adding edges to force constraints in the DAG, I simply added information to our scheduler indicating that the instruction is able to be clustered, which cluster group the cluster is in, and the total amount of instructions in the group. From this, we can also calculate the minimum amount of cluster blocks possible which I used as a lower bound for clustering.

During enumeration, I added a check to see if an instruction was part of a mem. op. cluster. If it is part of a cluster then I initialize variables for clustering otherwise we do nothing. However, if there is already an active clustering happening, we check if we can add the instruction to this current cluster. If we cannot add it to the current cluster, then we store the state of the current cluster (for backtracking purposes) and start a new one if needed. Throughout all of this, I keep track of the current cluster blocks + the optimistic estimate of the remaining cluster blocks. An example of this cost estimate is given below in example 1.

Example 1.
There are 10 total instructions (instruction 1 to 10) and 2 cluster blocks possible.
Cluster block 1 contains instructions 1, 3, 5, and 6. Cluster block 2 contains instructions 9 and 10.
Initially, the optimistic estimate of the remaining cluster block is 2. Then our scheduler clusters instruction 1, and 3. but chooses to schedule instruction 4 closing the current cluster. Since the cluster block closed before also clustering 5 and 6, it adds 1 to the cluster block count which optimistically assumes 5 and 6 will be cluster together in another cluster block. Note that at this current step, it also assumes the scheduler will also be able to cluster instruction 9 and 10 together in one block for a total of 3 blocks. The scheduler then multiplies the total amount of blocks (3) by the weight and adds it to the current cost for the current partial schedule.

To speed up our enumerator find instructions in a cluster, I also added a dynamic heuristic called "Cluster". When our scheduler initializes and starts clustering, this heuristic gives priority to other instructions in the ready list that belongs to the same cluster group that our scheduler is looking for.

Vang Thao and others added 30 commits March 6, 2020 23:35
…cluster to cluster mem-ops. More Debug statements.
Vang Thao and others added 8 commits June 3, 2020 23:29
Currently working implementation of clustering with B&B. No hard limits on cluster size when using AMD's shouldClusterMemOps() function but there is a hard limit of 15 during B&B. Currently still debugging history domination.
@@ -5,3 +5,4 @@ ELSE()
ENDIF()

add_dependencies(OptSched ${OPT_SCHED_TARGET_DEPS})
target_link_libraries(OptSched -L/home/vang/src/ROCm-2.4/opencl/build/lib/ libamdocl64.so)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably shouldn't add this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants