Skip to content

Commit

Permalink
Reorder MergeSharedMemoryAllocations in GPU codegen
Browse files Browse the repository at this point in the history
  • Loading branch information
slyubomirsky committed Mar 19, 2024
1 parent 3684fd3 commit e77ef6e
Showing 1 changed file with 1 addition and 3 deletions.
4 changes: 1 addition & 3 deletions src/driver/driver_api.cc
Original file line number Diff line number Diff line change
Expand Up @@ -590,6 +590,7 @@ transform::Sequential MixedModulePassManager(IRModule mixed_mod, Target target)

mixed_pass_list.push_back(tir::transform::ThreadSync("shared"));
mixed_pass_list.push_back(tir::transform::ThreadSync("shared.dyn"));
mixed_pass_list.push_back(tir::transform::MergeSharedMemoryAllocations());
mixed_pass_list.push_back(tir::transform::ThreadSync("warp"));
mixed_pass_list.push_back(tir::transform::InferFragment());
mixed_pass_list.push_back(tir::transform::LowerThreadAllreduce());
Expand All @@ -607,9 +608,6 @@ transform::Sequential MixedModulePassManager(IRModule mixed_mod, Target target)

mixed_pass_list.push_back(tir::transform::AnnotateDeviceRegions());
mixed_pass_list.push_back(tir::transform::SplitHostDevice());
// MergeSharedMemoryAllocations must be applied after SplitHostDevice
// because the merged allocation site is at the beginning of each device function
mixed_pass_list.push_back(tir::transform::MergeSharedMemoryAllocations());

bool unpacked_api = mixed_mod->GetAttr<relay::Executor>(tvm::attr::kExecutor)
.value_or(relay::Executor::Create("graph", {}))
Expand Down

0 comments on commit e77ef6e

Please sign in to comment.