Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LightMetal - Initial Host API + Device Capture infra/library and unit tests (#17039) #17514

Merged
merged 1 commit into from
Feb 5, 2025

Conversation

kmabeeTT
Copy link
Contributor

@kmabeeTT kmabeeTT commented Feb 3, 2025

Ticket

I am breaking up original PR I put up recently into smaller bite-sized chunks as @omilyutin-tt suggested. Here is round 5.

[Feature Request] Add Light Metal capture/replay initial changes to tt-metal for some workloads #17039

Problem description

This is initial/bootstrapping changes for "Light Metal" capture/replay feature that uses Flatbuffers as serialization/deserialization library. See my previous bigger PR (#16573) if you want to see how this will be used by followup merge, or context.

What's changed

  • This is round 5, builds upon previous 4 merges for LightMetal that brought flatbuffer cmake/infra, begin/end APIs, LoadTrace() API, flatbuffer/schema serialization/deserialization

  • This adds light-metal Capture support for and instruments with LIGHT_METAL_TRACE_FUNCTION_CALL() and LIGHT_METAL_TRACE_FUNCTION_ENTRY() to many popular (not exuahstive) APIs used by unit tests. The former TRACE_FUNCTION_ENTRY() is more recent, used to protect against host APIs recursively calling other host APIs (only trace top most level). Two macros not always called back-to-back.

  • Support Capture/Replay of the following ~14 host APIs

    EnqueueTrace(), ReplayTrace(), ReleaseTrace() CreateBuffer(), EnqueueWriteBuffer(), EnqueueReadBuffer(), DeallocateBuffer CreateKernel(), CreateCircularBuffer() SetRuntimeArgs(uint32) SetRuntimeArgs(Kernel,RuntimeArgs) CreateProgram(), EnqueueProgram() Finish()

  • During capture, complex objects like Programs, Kernels, Buffers, CBHandle are assigned unique global_id, and referred to by their global_id in capture when used by functions

  • When "Metal Trace" is enabled, don't capture EnqueueProgram(), instead inject ReplayTrace(), would be used alongside LoadTrace()

  • Can be optionally disabled at compile time using build_metal.sh --disable-light-metal-trace which will set C++ define TT_ENABLE_LIGHT_METAL_TRACE=0 (trace functions become NOP)

  • New Verif APIs LightMetalCompareToCapture() / LightMetalCompareToGolden(). Put them in lightmetal_capture_utils.hpp instead of host_api.hpp since they are purely used at capture time, and not worthy enough to be inside host_api.h since just for verif

  • Test fixture hardcoded to run capture-only and skip replay until replay code is merged next

Checklist

  • Build and docs and new unit tests passing for wh, gs, bh on original PR, will check again
  • Post commit CI passes
  • Blackhole Post commit (if applicable)
  • Model regression CI testing passes (if applicable)
  • Device performance regression CI testing passes (if applicable)
  • (For models and ops writers) Full new models tests passes
  • New/Existing tests provide coverage for changes

@kmabeeTT
Copy link
Contributor Author

kmabeeTT commented Feb 3, 2025

Whoops, had a couple dependencies on lightmetal_replay.hpp (included in next PR) that I forgot to purge, going to force push fixed commit.

Comment on lines +317 to +322
if [ "$light_metal_trace" = "ON" ]; then
cmake_args+=("-DTT_ENABLE_LIGHT_METAL_TRACE=ON")
else
cmake_args+=("-DTT_ENABLE_LIGHT_METAL_TRACE=OFF")
fi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmake/C++ define temporary for next little bit, just here as quick way to disable feature from compile time. Plan to remove this in a few weeks once it's settled.

using namespace tt::tt_metal;

namespace tt::tt_metal {
namespace {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General note: applied all previous feedback from parent PR to this test, and renamed it.

tt_metal/impl/lightmetal/lightmetal_capture.hpp Outdated Show resolved Hide resolved
Comment on lines +108 to +110
if (is_in_map(obj)) {
log_warning(tt::LogMetalTrace, "Program already exists in global_id map.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern performs a double look up in a map, consider getting rid of is_in_map functions (unless we need them to be public?) and doing something like:

void LightMetalCaptureContext::remove_from_map(const T* obj) {
  if (auto it = map_.find(obj); it != map_.end()) {
    map_->erase(it);
  } else {
    log_warning(...);
  }
}
...
void LightMetalCaptureContext::add_to_map(const T* obj) {
  auto [it, inserted] = map_.emplace(obj, next_global_id_ + 1);
  if (inserted) {
    ++next_global_id_;
  } else {
    log_warning(...);
  }
}

As I was writing it, I realized we can have a helper template for these methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes is_in_map() does have a public usage today. I'll have to think about this, may defer some of this item to later.

tt_metal/impl/lightmetal/host_api_capture_helpers.cpp Outdated Show resolved Hide resolved
tt_metal/api/tt-metalium/host_api_capture_helpers.hpp Outdated Show resolved Hide resolved
Comment on lines 28 to 43
void CreateDevice(
const size_t trace_region_size, const bool replay_binary = true, const std::string trace_bin_path = "") {
// Skip writing to disk by default, unless user sets env var for local testing
write_bin_to_disk_ = tt::parse_env("LIGHTMETAL_SAVE_BINARY", false);

// If user didn't provide a specific trace bin path, set a default here based on test name
if (trace_bin_path == "") {
const auto test_info = ::testing::UnitTest::GetInstance()->current_test_info();
auto trace_filename = test_info ? std::string(test_info->name()) + ".bin" : "lightmetal_trace.bin";
this->trace_bin_path_ = "/tmp/" + trace_filename;
}

this->create_device(trace_region_size);
LightMetalBeginCapture();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split this into 1. creation of a device and 2. starting light metal capture? Perhaps move the device creation into SetUp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like some other gtests I based mine off, prefer to keep CreateDevice() logic seperate from Setup() so I can tweak settings in test (SetUp() not called by tests, happens automatically through gtest framework afaik). I will try to move LightMetalBeginCapture() call to SetUp() to be consistent with how LightMetalEndCapture() is automatically called inTearDown().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that, I can't move it - I forgot that the placement right now (LightMetalBeginCapture() after CreateDevice() gets called) is intentional. Will add a comment, but basically CreateDevices() calls CreateKernel(program) on a program object that wasn't created from CreateProgram() and so it exposes this failure:

             MetalTrace | TRACE    | /localdev/kmabee/metal1/tt_metal/tt_metal.cpp:1043 - LIGHT_METAL_TRACE_FUNCTION_CALL: CaptureCreateKernel via CreateKernel istracing: true depth: 1
libc++abi: terminating due to uncaught exception of type std::runtime_error: Program not found in global_id map.

A task for later to improve things here, and follow up on fixes.


CommandQueue& command_queue = this->device_->command_queue();
uint32_t num_loops = 5;
bool keep_buffers_alive = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but prefer to keep, it's something I am playing with in test, and need to revisit in next round of tests.

Comment on lines +20 to +21
std::string trace_bin_path_;
bool write_bin_to_disk_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these two be specified by a single env variable that specifies trace_bin_path? If it is empty - don't trace anything, if not - trace and output into the value of the variable?

Alternatively, this can be boolean, and we can always write to a random location in /tmp. It's just the current method requires re-compilation to specify a custom path, and would be great to get away with just one state variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I iterated on this a bit last year before ending up in current state. It might change in the future as tests evolve, but for now prefer to keep it as is because:

  1. plan to have a test soon that generates multiple binary files with specific names, and runs them all in sequence
  2. trace_bin_path being empty is what triggers meaningful name of output file based on testname, and it is nice currently for debug/dev to have consistent names of the .bin file based testname , plus a binary-to-json format flow (through flatc tool) tends to write to an output file name based on the input file name
  3. Never found a need to run a test or test(s) and force specific path for a run via env-var.

Comment on lines +117 to +120
constexpr bool kBlocking = true;
constexpr bool kNonBlocking = false;
vector<bool> blocking_flags = {kBlocking, kNonBlocking};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now used? Was this for Enqueue* APIs? Consider annotating like:

EnqueueWriteBuffer(command_queue, *buffer, input_data.data(), /*blocking=*/ true);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah right, it was for the Enqueue*() APIs, I need to revisit it in test later, will add a TODO, and update annotations like you mentioned.

// Calculate num uint32_t elements in buffer, and convert golden void* to vector
size_t golden_data_len = buffer_ptr->size() / sizeof(uint32_t);
const uint32_t* golden_data_uint32 = static_cast<const uint32_t*>(golden_data);
std::vector<uint32_t> golden_data_vector(golden_data_uint32, golden_data_uint32 + golden_data_len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are making a copy here, right? This is interpreted as calling a constructor from a range?

Is it possible that we have something else at top level, instead of void*?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and yes, flatbuffer API needs it in vector form. Don't think we should change it from void* golden_data because this function is called after/alongside and matches by design the interface of EnqueueReadBuffer(cq, buffer, dst) where dst is void * to give same user experience:

void LightMetalCompareToCapture(
    CommandQueue& cq, 
    const std::variant<std::reference_wrapper<Buffer>, 
    std::shared_ptr<Buffer>>& buffer, 
    void* dst) {
    
<snip>

    EnqueueReadBuffer(cq, buffer, dst, true);  // Blocking read to get golden value.
    LIGHT_METAL_TRACE_FUNCTION_CALL(CaptureLightMetalCompare, cq, buffer, dst, false);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, these void* we use are really annoying and unsafe, I would really love to modernize this. Thats for later though:)

@kmabeeTT kmabeeTT force-pushed the kmabee/light_metal_capture_and_tests branch from 79be348 to 50da601 Compare February 5, 2025 03:03
@kmabeeTT
Copy link
Contributor Author

kmabeeTT commented Feb 5, 2025

Hi @omilyutin-tt - re-review please - I rebased on newer passing commit, force pushed, and included a commit addressing most of the change requests, but left some for later that will require more time/work/thought, to hopefully get this in. I have a conflict on device.cpp after Joseph's changes in main that I didn't include in rebase yet, will do some build testing on this current rebase, then fix conflict and force push again later.

@kmabeeTT kmabeeTT requested a review from omilyutin-tt February 5, 2025 03:14
tests/tt_metal/tt_metal/lightmetal/lightmetal_fixture.hpp Outdated Show resolved Hide resolved
// Calculate num uint32_t elements in buffer, and convert golden void* to vector
size_t golden_data_len = buffer_ptr->size() / sizeof(uint32_t);
const uint32_t* golden_data_uint32 = static_cast<const uint32_t*>(golden_data);
std::vector<uint32_t> golden_data_vector(golden_data_uint32, golden_data_uint32 + golden_data_len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, these void* we use are really annoying and unsafe, I would really love to modernize this. Thats for later though:)

@kmabeeTT kmabeeTT force-pushed the kmabee/light_metal_capture_and_tests branch 2 times, most recently from 9b56336 to b936169 Compare February 5, 2025 14:34
@kmabeeTT
Copy link
Contributor Author

kmabeeTT commented Feb 5, 2025

all post-commit run again overnight, just force pushed a fixup to convert a few throw to TT_THROW, just waiting for @tenstorrent/metalium-developers-infra review for cmake change. It was me adding temporary C++ define (enabled by default) to wrap feature and build_metal.sh -> cmake -> c++ define to disable it, as discussed with Wilder/Andrew offline few weeks ago. Plan to remove it in a few weeks if no issues.

Copy link
Contributor

@afuller-TT afuller-TT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline about moving the conditional to an option() in CMake so ingress via paths other than the shell script also default to on, but devs can opt-out if needed.

Approving in anticipation of that change.

… tests (#17039)

 - This is round 5, builds upon previous 4 merges for LightMetal that brought
   flatbuffer cmake/infra, begin/end APIs, LoadTrace() API, flatbuffer/schema
   serialization/deserialization

 - This adds light-metal Capture support for and instruments with LIGHT_METAL_TRACE_FUNCTION_CALL()
   and LIGHT_METAL_TRACE_FUNCTION_ENTRY() to many popular (not exuahstive) APIs used by unit tests.
   The former TRACE_FUNCTION_ENTRY() is more recent, used to protect against host APIs recursively
   calling other host APIs (only trace top most level). Two macros not always called back-to-back.

 - Support Capture/Replay of the following ~14 host APIs

   EnqueueTrace(), ReplayTrace(), ReleaseTrace()
   CreateBuffer(), EnqueueWriteBuffer(), EnqueueReadBuffer(), DeallocateBuffer
   CreateKernel(), CreateCircularBuffer()
   SetRuntimeArgs(uint32) SetRuntimeArgs(Kernel,RuntimeArgs)
   CreateProgram(), EnqueueProgram()
   Finish()

 - During capture, complex objects like Programs, Kernels, Buffers, CBHandle are assigned
   unique global_id, and referred to by their global_id in capture when used by functions

 - When "Metal Trace" is enabled, don't capture EnqueueProgram(), instead
   inject ReplayTrace(), would be used alongside LoadTrace()

 - Can be optionally disabled at compile time using build_metal.sh --disable-light-metal-trace
   which will set C++ define TT_ENABLE_LIGHT_METAL_TRACE=0 (trace functions become NOP)
   Set TT_ENABLE_LIGHT_METAL_TRACE as project cmake option default ON and adjust usage

 - New Verif APIs LightMetalCompareToCapture() / LightMetalCompareToGolden().
   Put them in lightmetal_capture_utils.hpp instead of host_api.hpp since they are purely
   used at capture time, and not worthy enough to be inside host_api.h since just for verif

 - Test fixture runs capture-only right now, will automatically run binary once replay
   support is merged next.
@kmabeeTT kmabeeTT force-pushed the kmabee/light_metal_capture_and_tests branch from b936169 to bea47ea Compare February 5, 2025 17:49
@kmabeeTT
Copy link
Contributor Author

kmabeeTT commented Feb 5, 2025

Thanks guys, made last feedback re cmake, rebased on near TOTT can compile and run my tests locally, should be good, merge in a few.

@kmabeeTT kmabeeTT merged commit 5815e04 into main Feb 5, 2025
11 checks passed
@kmabeeTT kmabeeTT deleted the kmabee/light_metal_capture_and_tests branch February 5, 2025 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants