-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LightMetal - Initial Host API + Device Capture infra/library and unit tests (#17039) #17514
Conversation
Whoops, had a couple dependencies on |
4d190d1
to
a53cf3f
Compare
if [ "$light_metal_trace" = "ON" ]; then | ||
cmake_args+=("-DTT_ENABLE_LIGHT_METAL_TRACE=ON") | ||
else | ||
cmake_args+=("-DTT_ENABLE_LIGHT_METAL_TRACE=OFF") | ||
fi | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmake/C++ define temporary for next little bit, just here as quick way to disable feature from compile time. Plan to remove this in a few weeks once it's settled.
using namespace tt::tt_metal; | ||
|
||
namespace tt::tt_metal { | ||
namespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General note: applied all previous feedback from parent PR to this test, and renamed it.
if (is_in_map(obj)) { | ||
log_warning(tt::LogMetalTrace, "Program already exists in global_id map."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern performs a double look up in a map, consider getting rid of is_in_map
functions (unless we need them to be public?) and doing something like:
void LightMetalCaptureContext::remove_from_map(const T* obj) {
if (auto it = map_.find(obj); it != map_.end()) {
map_->erase(it);
} else {
log_warning(...);
}
}
...
void LightMetalCaptureContext::add_to_map(const T* obj) {
auto [it, inserted] = map_.emplace(obj, next_global_id_ + 1);
if (inserted) {
++next_global_id_;
} else {
log_warning(...);
}
}
As I was writing it, I realized we can have a helper template for these methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes is_in_map()
does have a public usage today. I'll have to think about this, may defer some of this item to later.
void CreateDevice( | ||
const size_t trace_region_size, const bool replay_binary = true, const std::string trace_bin_path = "") { | ||
// Skip writing to disk by default, unless user sets env var for local testing | ||
write_bin_to_disk_ = tt::parse_env("LIGHTMETAL_SAVE_BINARY", false); | ||
|
||
// If user didn't provide a specific trace bin path, set a default here based on test name | ||
if (trace_bin_path == "") { | ||
const auto test_info = ::testing::UnitTest::GetInstance()->current_test_info(); | ||
auto trace_filename = test_info ? std::string(test_info->name()) + ".bin" : "lightmetal_trace.bin"; | ||
this->trace_bin_path_ = "/tmp/" + trace_filename; | ||
} | ||
|
||
this->create_device(trace_region_size); | ||
LightMetalBeginCapture(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we split this into 1. creation of a device and 2. starting light metal capture? Perhaps move the device creation into SetUp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like some other gtests I based mine off, prefer to keep CreateDevice()
logic seperate from Setup()
so I can tweak settings in test (SetUp()
not called by tests, happens automatically through gtest framework afaik). I will try to move LightMetalBeginCapture()
call to SetUp()
to be consistent with how LightMetalEndCapture()
is automatically called inTearDown()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scratch that, I can't move it - I forgot that the placement right now (LightMetalBeginCapture()
after CreateDevice()
gets called) is intentional. Will add a comment, but basically CreateDevices()
calls CreateKernel(program)
on a program object that wasn't created from CreateProgram()
and so it exposes this failure:
MetalTrace | TRACE | /localdev/kmabee/metal1/tt_metal/tt_metal.cpp:1043 - LIGHT_METAL_TRACE_FUNCTION_CALL: CaptureCreateKernel via CreateKernel istracing: true depth: 1
libc++abi: terminating due to uncaught exception of type std::runtime_error: Program not found in global_id map.
A task for later to improve things here, and follow up on fixes.
|
||
CommandQueue& command_queue = this->device_->command_queue(); | ||
uint32_t num_loops = 5; | ||
bool keep_buffers_alive = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but prefer to keep, it's something I am playing with in test, and need to revisit in next round of tests.
std::string trace_bin_path_; | ||
bool write_bin_to_disk_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these two be specified by a single env variable that specifies trace_bin_path
? If it is empty - don't trace anything, if not - trace and output into the value of the variable?
Alternatively, this can be boolean, and we can always write to a random location in /tmp
. It's just the current method requires re-compilation to specify a custom path
, and would be great to get away with just one state variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I iterated on this a bit last year before ending up in current state. It might change in the future as tests evolve, but for now prefer to keep it as is because:
- plan to have a test soon that generates multiple binary files with specific names, and runs them all in sequence
- trace_bin_path being empty is what triggers meaningful name of output file based on testname, and it is nice currently for debug/dev to have consistent names of the .bin file based testname , plus a binary-to-json format flow (through flatc tool) tends to write to an output file name based on the input file name
- Never found a need to run a test or test(s) and force specific path for a run via env-var.
constexpr bool kBlocking = true; | ||
constexpr bool kNonBlocking = false; | ||
vector<bool> blocking_flags = {kBlocking, kNonBlocking}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now used? Was this for Enqueue*
APIs? Consider annotating like:
EnqueueWriteBuffer(command_queue, *buffer, input_data.data(), /*blocking=*/ true);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah right, it was for the Enqueue*()
APIs, I need to revisit it in test later, will add a TODO, and update annotations like you mentioned.
// Calculate num uint32_t elements in buffer, and convert golden void* to vector | ||
size_t golden_data_len = buffer_ptr->size() / sizeof(uint32_t); | ||
const uint32_t* golden_data_uint32 = static_cast<const uint32_t*>(golden_data); | ||
std::vector<uint32_t> golden_data_vector(golden_data_uint32, golden_data_uint32 + golden_data_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are making a copy here, right? This is interpreted as calling a constructor from a range?
Is it possible that we have something else at top level, instead of void*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and yes, flatbuffer API needs it in vector form. Don't think we should change it from void* golden_data
because this function is called after/alongside and matches by design the interface of EnqueueReadBuffer(cq, buffer, dst)
where dst
is void *
to give same user experience:
void LightMetalCompareToCapture(
CommandQueue& cq,
const std::variant<std::reference_wrapper<Buffer>,
std::shared_ptr<Buffer>>& buffer,
void* dst) {
<snip>
EnqueueReadBuffer(cq, buffer, dst, true); // Blocking read to get golden value.
LIGHT_METAL_TRACE_FUNCTION_CALL(CaptureLightMetalCompare, cq, buffer, dst, false);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, these void*
we use are really annoying and unsafe, I would really love to modernize this. Thats for later though:)
79be348
to
50da601
Compare
Hi @omilyutin-tt - re-review please - I rebased on newer passing commit, force pushed, and included a commit addressing most of the change requests, but left some for later that will require more time/work/thought, to hopefully get this in. I have a conflict on device.cpp after Joseph's changes in main that I didn't include in rebase yet, will do some build testing on this current rebase, then fix conflict and force push again later. |
// Calculate num uint32_t elements in buffer, and convert golden void* to vector | ||
size_t golden_data_len = buffer_ptr->size() / sizeof(uint32_t); | ||
const uint32_t* golden_data_uint32 = static_cast<const uint32_t*>(golden_data); | ||
std::vector<uint32_t> golden_data_vector(golden_data_uint32, golden_data_uint32 + golden_data_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, these void*
we use are really annoying and unsafe, I would really love to modernize this. Thats for later though:)
9b56336
to
b936169
Compare
all post-commit run again overnight, just force pushed a fixup to convert a few throw to TT_THROW, just waiting for @tenstorrent/metalium-developers-infra review for cmake change. It was me adding temporary C++ define (enabled by default) to wrap feature and build_metal.sh -> cmake -> c++ define to disable it, as discussed with Wilder/Andrew offline few weeks ago. Plan to remove it in a few weeks if no issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline about moving the conditional to an option()
in CMake so ingress via paths other than the shell script also default to on, but devs can opt-out if needed.
Approving in anticipation of that change.
… tests (#17039) - This is round 5, builds upon previous 4 merges for LightMetal that brought flatbuffer cmake/infra, begin/end APIs, LoadTrace() API, flatbuffer/schema serialization/deserialization - This adds light-metal Capture support for and instruments with LIGHT_METAL_TRACE_FUNCTION_CALL() and LIGHT_METAL_TRACE_FUNCTION_ENTRY() to many popular (not exuahstive) APIs used by unit tests. The former TRACE_FUNCTION_ENTRY() is more recent, used to protect against host APIs recursively calling other host APIs (only trace top most level). Two macros not always called back-to-back. - Support Capture/Replay of the following ~14 host APIs EnqueueTrace(), ReplayTrace(), ReleaseTrace() CreateBuffer(), EnqueueWriteBuffer(), EnqueueReadBuffer(), DeallocateBuffer CreateKernel(), CreateCircularBuffer() SetRuntimeArgs(uint32) SetRuntimeArgs(Kernel,RuntimeArgs) CreateProgram(), EnqueueProgram() Finish() - During capture, complex objects like Programs, Kernels, Buffers, CBHandle are assigned unique global_id, and referred to by their global_id in capture when used by functions - When "Metal Trace" is enabled, don't capture EnqueueProgram(), instead inject ReplayTrace(), would be used alongside LoadTrace() - Can be optionally disabled at compile time using build_metal.sh --disable-light-metal-trace which will set C++ define TT_ENABLE_LIGHT_METAL_TRACE=0 (trace functions become NOP) Set TT_ENABLE_LIGHT_METAL_TRACE as project cmake option default ON and adjust usage - New Verif APIs LightMetalCompareToCapture() / LightMetalCompareToGolden(). Put them in lightmetal_capture_utils.hpp instead of host_api.hpp since they are purely used at capture time, and not worthy enough to be inside host_api.h since just for verif - Test fixture runs capture-only right now, will automatically run binary once replay support is merged next.
b936169
to
bea47ea
Compare
Thanks guys, made last feedback re cmake, rebased on near TOTT can compile and run my tests locally, should be good, merge in a few. |
Ticket
I am breaking up original PR I put up recently into smaller bite-sized chunks as @omilyutin-tt suggested. Here is round 5.
Problem description
This is initial/bootstrapping changes for "Light Metal" capture/replay feature that uses Flatbuffers as serialization/deserialization library. See my previous bigger PR (#16573) if you want to see how this will be used by followup merge, or context.
What's changed
This is round 5, builds upon previous 4 merges for LightMetal that brought flatbuffer cmake/infra, begin/end APIs, LoadTrace() API, flatbuffer/schema serialization/deserialization
This adds light-metal Capture support for and instruments with LIGHT_METAL_TRACE_FUNCTION_CALL() and LIGHT_METAL_TRACE_FUNCTION_ENTRY() to many popular (not exuahstive) APIs used by unit tests. The former TRACE_FUNCTION_ENTRY() is more recent, used to protect against host APIs recursively calling other host APIs (only trace top most level). Two macros not always called back-to-back.
Support Capture/Replay of the following ~14 host APIs
EnqueueTrace(), ReplayTrace(), ReleaseTrace() CreateBuffer(), EnqueueWriteBuffer(), EnqueueReadBuffer(), DeallocateBuffer CreateKernel(), CreateCircularBuffer() SetRuntimeArgs(uint32) SetRuntimeArgs(Kernel,RuntimeArgs) CreateProgram(), EnqueueProgram() Finish()
During capture, complex objects like Programs, Kernels, Buffers, CBHandle are assigned unique global_id, and referred to by their global_id in capture when used by functions
When "Metal Trace" is enabled, don't capture EnqueueProgram(), instead inject ReplayTrace(), would be used alongside LoadTrace()
Can be optionally disabled at compile time using build_metal.sh --disable-light-metal-trace which will set C++ define TT_ENABLE_LIGHT_METAL_TRACE=0 (trace functions become NOP)
New Verif APIs LightMetalCompareToCapture() / LightMetalCompareToGolden(). Put them in lightmetal_capture_utils.hpp instead of host_api.hpp since they are purely used at capture time, and not worthy enough to be inside host_api.h since just for verif
Test fixture hardcoded to run capture-only and skip replay until replay code is merged next
Checklist