Skip to content

Commit

Permalink
LightMetal - Initial Replay infra/library for LightMetalBinary and st…
Browse files Browse the repository at this point in the history
…andalone runner (#17039)

 - This is round 6/6 for now, builds upon previous 5 merges for LightMetal in past week
   and enables e2e capture + replay in unit tests now that replay is supported.

 - This brings the replay library/executor for a LightMetalBinary which handles
   replaying all the commands and traces captured by workload to binary. Like
   capture time, complex objects are stored in map after creation,
   and referenced by global_id by functions that re-use them.

 - Light Metal standalone CLI runner initial infra which just loads an existing
   binary on disk and executes it using replay librarys's ExecuteLightMetalBinary()
  • Loading branch information
kmabeeTT committed Feb 5, 2025
1 parent 9b56336 commit fa33198
Show file tree
Hide file tree
Showing 7 changed files with 881 additions and 0 deletions.
19 changes: 19 additions & 0 deletions tests/tt_metal/tt_metal/lightmetal/lightmetal_fixture.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,13 @@
#include <circular_buffer_constants.h>
#include <tt-metalium/kernel.hpp>
#include <tt-metalium/tt_backend_api_types.hpp>
#include "lightmetal/lightmetal_replay.hpp"
#include "command_queue_fixture.hpp"
#include <lightmetal_binary.hpp>

class SingleDeviceLightMetalFixture : public CommandQueueFixture {
protected:
bool replay_binary_;
std::string trace_bin_path_;
bool write_bin_to_disk_;

Expand All @@ -38,6 +40,7 @@ class SingleDeviceLightMetalFixture : public CommandQueueFixture {
}

this->create_device(trace_region_size);
this->replay_binary_ = replay_binary && !tt::parse_env("LIGHTMETAL_DISABLE_RUN", false);
// TODO (kmabee) - revisit placement. CreateDevice() path calls CreateKernel() on programs not
// created with CreateProgram() traced API which leads to "program not in global_id map"
LightMetalBeginCapture();
Expand All @@ -58,5 +61,21 @@ class SingleDeviceLightMetalFixture : public CommandQueueFixture {
if (!this->IsSlowDispatch()) {
tt::tt_metal::CloseDevice(this->device_);
}

// We could gaurd this to not attempt to replay empty binary, and still allow test to pass
// but, would rather catch the case if the feature gets disabled at compile time.
if (replay_binary_) {
RunLightMetalBinary(std::move(binary));
}
}

// Mimic the light-metal standalone run replay tool by executing the binary.
void RunLightMetalBinary(LightMetalBinary&& binary) {
tt::tt_metal::LightMetalReplay lm_replay(std::move(binary));
if (!lm_replay.execute_binary()) {
FAIL() << "Light Metal Binary failed to execute or encountered errors.";
} else {
log_info(tt::LogMetalTrace, "Light Metal Binary executed successfully!");
}
}
};
1 change: 1 addition & 0 deletions tt_metal/impl/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ set(IMPL_SRC
${CMAKE_CURRENT_SOURCE_DIR}/flatbuffer/buffer_types_to_flatbuffer.cpp
${CMAKE_CURRENT_SOURCE_DIR}/flatbuffer/program_types_from_flatbuffer.cpp
${CMAKE_CURRENT_SOURCE_DIR}/flatbuffer/program_types_to_flatbuffer.cpp
${CMAKE_CURRENT_SOURCE_DIR}/lightmetal/lightmetal_replay.cpp
${CMAKE_CURRENT_SOURCE_DIR}/lightmetal/lightmetal_capture.cpp
${CMAKE_CURRENT_SOURCE_DIR}/lightmetal/lightmetal_capture_utils.cpp
${CMAKE_CURRENT_SOURCE_DIR}/lightmetal/host_api_capture_helpers.cpp
Expand Down
Loading

0 comments on commit fa33198

Please sign in to comment.