-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Clustering in the Enumerator #73
Draft
vangthao95
wants to merge
41
commits into
CSUS-LLVM:master
Choose a base branch
from
vangthao95:memory-clustering-project
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Memory Clustering in the Enumerator #73
vangthao95
wants to merge
41
commits into
CSUS-LLVM:master
from
vangthao95:memory-clustering-project
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ter and potential issues.
…d a schedule in the ILP pass
…cluster to cluster mem-ops. More Debug statements.
… bound estimation.
…stic is not used.
Currently working implementation of clustering with B&B. No hard limits on cluster size when using AMD's shouldClusterMemOps() function but there is a hard limit of 15 during B&B. Currently still debugging history domination.
… after sequential scheduler.
…hanges to upper bound calculation.
…minimum ILP improvements.
kerbowa
reviewed
Aug 19, 2020
lib/CMakeLists.txt
Outdated
@@ -5,3 +5,4 @@ ELSE() | |||
ENDIF() | |||
|
|||
add_dependencies(OptSched ${OPT_SCHED_TARGET_DEPS}) | |||
target_link_libraries(OptSched -L/home/vang/src/ROCm-2.4/opencl/build/lib/ libamdocl64.so) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably shouldn't add this.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a current draft of our memory clustering project that I have been working on. The main goal of this project is to allow our enumerator to be aware of clusterable memory operations along with register pressure and schedule length. Included in this draft is also the implementation of a new heuristic called "Cluster" which will be discussed later on in this draft. This is only a draft and there is still more work to be done before finalizing this work. Most of the current work is done in
bb_spill
andOptSchedDDGWrapperBasic
.To get information about instructions that can be clustered together, I copied how LLVM and AMD detect clusters from here and here and implemented it in
OptSchedDDGWrapperBasic
. Instead of actually adding edges to force constraints in the DAG, I simply added information to our scheduler indicating that the instruction is able to be clustered, which cluster group the cluster is in, and the total amount of instructions in the group. From this, we can also calculate the minimum amount of cluster blocks possible which I used as a lower bound for clustering.During enumeration, I added a check to see if an instruction was part of a mem. op. cluster. If it is part of a cluster then I initialize variables for clustering otherwise we do nothing. However, if there is already an active clustering happening, we check if we can add the instruction to this current cluster. If we cannot add it to the current cluster, then we store the state of the current cluster (for backtracking purposes) and start a new one if needed. Throughout all of this, I keep track of the current cluster blocks + the optimistic estimate of the remaining cluster blocks. An example of this cost estimate is given below in example 1.
Example 1.
There are 10 total instructions (instruction 1 to 10) and 2 cluster blocks possible.
Cluster block 1 contains instructions 1, 3, 5, and 6. Cluster block 2 contains instructions 9 and 10.
Initially, the optimistic estimate of the remaining cluster block is 2. Then our scheduler clusters instruction 1, and 3. but chooses to schedule instruction 4 closing the current cluster. Since the cluster block closed before also clustering 5 and 6, it adds 1 to the cluster block count which optimistically assumes 5 and 6 will be cluster together in another cluster block. Note that at this current step, it also assumes the scheduler will also be able to cluster instruction 9 and 10 together in one block for a total of 3 blocks. The scheduler then multiplies the total amount of blocks (3) by the weight and adds it to the current cost for the current partial schedule.
To speed up our enumerator find instructions in a cluster, I also added a dynamic heuristic called "Cluster". When our scheduler initializes and starts clustering, this heuristic gives priority to other instructions in the ready list that belongs to the same cluster group that our scheduler is looking for.