Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add Light Metal capture/replay initial changes to tt-metal for some workloads #17039

Closed
Tracked by #17037
kmabeeTT opened this issue Jan 23, 2025 · 3 comments
Assignees
Labels
feature-request External feature request lightmetal

Comments

@kmabeeTT
Copy link
Contributor

Light Metal Feature Parent Ticket: #17037

Request

It's a new feature - put together initial/boostrapping changes to:

  • infra changes to add flatbuffers and schema files into tt-metal env, build changes, etc
  • add some initial support for capturing / replaying various tt-metal host APIs
  • show that we can capture/replay them from binary / standalone runner
  • use "metal trace" to replay on-device portion
  • add some initial unit tests that use enough capture/replay to be useful
  • whatever else comes up that can be included in this round-1 issue/PR

Should have opened this ticket last year, have already started/completed this chunk of work, has gone through a few rounds of reviews, will reply with details to this ticket and roll out PR more globally shortly.

@kmabeeTT
Copy link
Contributor Author

kmabeeTT commented Jan 23, 2025

Here is high level overview of the changes I made for this ticket so far that I will roll out PR for. It basically adds limited (unsupported) support for tracing a bunch of popular APIs, and adds a few tests with correctness checking (reading output bufers, comparing between capture time and replay time, or to expected values) including some tests that use capture a program using "metal trace" and then replay it via LoadTrace() + ReplayTrace() APIs alongside the replay of all the captured host APIs.

Misc

  • Rebasing regularly has been a bit tough, lots of refactors past month or so, eager to get this initial change merged. Rebased as recently as Jan 22.
  • Attempted to use google-coding style where possible to have function names in PascalCase after brief chat with Wilder offline.

Initial Infra

  • New Host APIs LightMetalBeginCapture(), LightMetalEndCapture(), LoadTrace(), see docs/commit for descriptions, but basically start/end surround workload to enable feature and LoadTrace is used during replay to transfer a metal-trace TraceDescriptor from flatbuffer binary to device.
  • Introduce flatbuffers as CPM package, same version UMD uses, had to add CMAKE_POSITION_INDEPENDENT_CODE=ON to solve linker errors when bringing these changes to tt-mlir project, chatgpt said it made sense. Had to also demote a warning for possible gcc bug seen in g++12 build.
  • Add few initial flatbuffer schema files (.fbs files) to represent flatbuffer binary, and host commands, and few commands for example
  • In follow up commit, made compile-time define to guard this feature enabled by default, disabled with flag to build_metal.sh -> cmake -> C++. If things go well, no unexpected fallout, will remove compile time define, mainly here as quick way to have tracing macros become compile time NOPs (they are lightweight anyways).
  • Programs traced with "metal trace" are captured in flatbuffer binary in order they are traced, and replayed using existing metal trace feature APIs.
  • Standalone runner "lightmetal_runner" tool added as a way to run binaries as they currently exist. Example: ./build/tools/lightmetal_runner /tmp/SingleProgramTraceCapture.bin - e2e cpp unit tests also run directly from binary using same underlying replay executor that standalone runner tool uses.

Host APIs Traced

  • Added TRACE_FUNCTION_CALL() macro to trace host API functions, and later TRACE_FUNCTION_ENTRY() to protect against recursively called host API functions (those that call others) so only the top level is captured.

Here is light from command.fbs, remove Command and these are pretty much the matching host APIs from host_api.hpp. With this came several new enums, structs, tables in types_*.fbs files and their corresponding serialization / deserialization functions in the form of ToFlatbuffer() and FromFlatbuffer() functions.

    ReplayTraceCommand,
    EnqueueTraceCommand,
    LoadTraceCommand,
    ReleaseTraceCommand,
    CreateBufferCommand,
    DeallocateBufferCommand,
    EnqueueWriteBufferCommand,
    EnqueueReadBufferCommand,
    FinishCommand,
    CreateProgramCommand,
    EnqueueProgramCommand,
    CreateKernelCommand,
    SetRuntimeArgsUint32Command,
    SetRuntimeArgsCommand,
    CreateCircularBufferCommand,
    LightMetalCompareCommand,

Note that when Metal Trace is enabled, don't capture EnqueueProgram(), instead inject ReplayTrace(), would be used alongside new API LoadTrace()

Testing

  • Added new cpp unit test tt_metal/lightmetal/lightmetal_fixture.hpp and lightmetal_sanity.cpp which helped me test several features during development here
  • Few very basic tests, but all have light-metal capture/end APIs called in setup/teardown respectively to capture a LightMetalBinary and replay it afterwards using the same replay library that is called from standalone runner.
  • Have buffer write/read tests, datamovement program test, compute program test, compute program with metal-trace using single program, two programs. More to come in the future.
  • Added APIs mostly for verif called LightMetalCompareToCapture() and LightMetalCompareToGolden() used in tests as way to binarize a buffer value at capture time, or a user-provided expected value, and compare the same buffer to these values at replay time (from light metal binary)
  • The 6 tests added run capture + replay end-to-end and pass for grayskull, wormhole, and blackhole

Caveats/Restrictions

  • Covers only tt-metal for now, no ttnn support yet (it has issues, bypasses host_api in several cases, need to follow up)
  • Left a few TODO items with my username in code for things not well covered, will address in follow up ticket. Mainly related to not properly capturing/replaying same device/cq configuration, and hardcoding things there including CreateDevices() call.

kmabeeTT added a commit that referenced this issue Jan 27, 2025
…17039)

 - Mostly unused here, used by light metal capture/replay coming in follow up PR
 - Add CMAKE_POSITION_INDEPENDENT_CODE=ON (-fPIC) for flatbuffer to avoid
   linking errors when lifting to tt-mlir project.
 - Use -Wno-restrict for flatbuffers compile to supress warning in
   g++12 build
kmabeeTT added a commit that referenced this issue Jan 28, 2025
…17039)

 - Mostly unused here, used by light metal capture/replay coming in follow up PR
 - Add CMAKE_POSITION_INDEPENDENT_CODE=ON globally (-fPIC) following discussion
   (needed for flatbuffer to avoid linking errors when lifting to tt-mlir project.
 - Use -Wno-restrict for flatbuffers compile to supress warning in g++12 build,
   though don't know why they were fine previously with flatbuffers from UMD
 - Misc PR feedback about PUBLIC vs PRIVATE, etc.
@kmabeeTT
Copy link
Contributor Author

kmabeeTT commented Jan 28, 2025

Splitting this into some smaller PR's to make review easier by request.

kmabeeTT added a commit that referenced this issue Jan 28, 2025
…17039)

 - Mostly unused here, used by light metal capture/replay coming in follow up PR
 - Add CMAKE_POSITION_INDEPENDENT_CODE=ON globally (-fPIC) following discussion
   (needed for flatbuffer to avoid linking errors when lifting to tt-mlir project.
 - Use -Wno-restrict for flatbuffers compile to supress warning in g++12 build,
   though don't know why they were fine previously with flatbuffers from UMD
 - Misc PR feedback about PUBLIC vs PRIVATE, etc.
kmabeeTT added a commit that referenced this issue Jan 28, 2025
…re() and docs (#17039)

 - Unused for now, empty stubs, will be filled in by subsequent PR (baby steps)
   to denote start and end of capturing host+device workload
kmabeeTT added a commit that referenced this issue Jan 29, 2025
…re() and docs (#17039)

 - Unused for now, empty stubs, will be filled in by subsequent PR (baby steps)
   to denote start and end of capturing host+device workload
kmabeeTT added a commit that referenced this issue Jan 29, 2025
…re() and docs (#17039)

 - Unused for now, empty stubs, will be filled in by subsequent PR (baby steps)
   to denote start and end of capturing host+device workload
kmabeeTT added a commit that referenced this issue Jan 29, 2025
…il namespace (#17039)

 - Will be used by Light Metal replay after upcoming PR, when executing
   a metal-trace traced program from binary, TraceDescriptor is
   extracted from flatbuffer binary and loaded to device through this API.

 - Unrelated - Change trace_buffer.hpp to use fwd decl Buffer instead of
   buffer.hpp incl to reduce dependencies on users of trace_buffer.hpp
kmabeeTT added a commit that referenced this issue Jan 29, 2025
…il namespace (#17039)

 - Will be used by Light Metal replay after upcoming PR, when executing
   a metal-trace traced program from binary, TraceDescriptor is
   extracted from flatbuffer binary and loaded to device through this API.
 - Unrelated - Change trace_buffer.hpp to use fwd decl Buffer instead of
   buffer.hpp incl to reduce dependencies on users of trace_buffer.hpp
kmabeeTT added a commit that referenced this issue Jan 29, 2025
…il namespace (#17039)

 - Will be used by Light Metal replay after upcoming PR, when executing
   a metal-trace traced program from binary, TraceDescriptor is
   extracted from flatbuffer binary and loaded to device through this API.
 - Unrelated - Change trace_buffer.hpp to use fwd decl Buffer instead of
   buffer.hpp incl to reduce dependencies on users of trace_buffer.hpp
kmabeeTT added a commit that referenced this issue Jan 30, 2025
…or various types (#17039)

 - Code is compiled, but not used by anything yet, will be used after
   subsequent merge of light metal capture/replay libraries.
 - Make to/from_flatbuffer() friend of CircularBufferConfig to access internals
 - Remove default case statements from case statements for enums so when
   new enum values are added, compile error is seen to force updates.
kmabeeTT added a commit that referenced this issue Jan 30, 2025
…or various types (#17039)

 - Code is compiled, but not used by anything yet, will be used after
   subsequent merge of light metal capture/replay libraries.
 - Make to/from_flatbuffer() friend of CircularBufferConfig to access internals
 - Remove default case statements from case statements for enums so when
   new enum values are added, compile error is seen to force updates.
kmabeeTT added a commit that referenced this issue Jan 30, 2025
…or various types (#17039)

 - Code is compiled, but not used by anything yet, will be used after
   subsequent merge of light metal capture/replay libraries.
 - Make to/from_flatbuffer() friend of CircularBufferConfig to access internals
 - Remove default case statements from case statements for enums so when
   new enum values are added, compile error is seen to force updates.
kmabeeTT added a commit that referenced this issue Jan 30, 2025
…or various types (#17039)

 - Code is compiled, but not used by anything yet, will be used after
   subsequent merge of light metal capture/replay libraries.
 - Make to/from_flatbuffer() friend of CircularBufferConfig to access internals
 - Remove default case statements from case statements for enums so when
   new enum values are added, compile error is seen to force updates.
williamlyTT pushed a commit that referenced this issue Jan 30, 2025
…17039)

 - Mostly unused here, used by light metal capture/replay coming in follow up PR
 - Add CMAKE_POSITION_INDEPENDENT_CODE=ON globally (-fPIC) following discussion
   (needed for flatbuffer to avoid linking errors when lifting to tt-mlir project.
 - Use -Wno-restrict for flatbuffers compile to supress warning in g++12 build,
   though don't know why they were fine previously with flatbuffers from UMD
 - Misc PR feedback about PUBLIC vs PRIVATE, etc.
williamlyTT pushed a commit that referenced this issue Jan 30, 2025
…re() and docs (#17039)

 - Unused for now, empty stubs, will be filled in by subsequent PR (baby steps)
   to denote start and end of capturing host+device workload
williamlyTT pushed a commit that referenced this issue Jan 30, 2025
…il namespace (#17039)

 - Will be used by Light Metal replay after upcoming PR, when executing
   a metal-trace traced program from binary, TraceDescriptor is
   extracted from flatbuffer binary and loaded to device through this API.
 - Unrelated - Change trace_buffer.hpp to use fwd decl Buffer instead of
   buffer.hpp incl to reduce dependencies on users of trace_buffer.hpp
yieldthought pushed a commit that referenced this issue Jan 31, 2025
…17039)

 - Mostly unused here, used by light metal capture/replay coming in follow up PR
 - Add CMAKE_POSITION_INDEPENDENT_CODE=ON globally (-fPIC) following discussion
   (needed for flatbuffer to avoid linking errors when lifting to tt-mlir project.
 - Use -Wno-restrict for flatbuffers compile to supress warning in g++12 build,
   though don't know why they were fine previously with flatbuffers from UMD
 - Misc PR feedback about PUBLIC vs PRIVATE, etc.
yieldthought pushed a commit that referenced this issue Jan 31, 2025
…re() and docs (#17039)

 - Unused for now, empty stubs, will be filled in by subsequent PR (baby steps)
   to denote start and end of capturing host+device workload
yieldthought pushed a commit that referenced this issue Jan 31, 2025
…il namespace (#17039)

 - Will be used by Light Metal replay after upcoming PR, when executing
   a metal-trace traced program from binary, TraceDescriptor is
   extracted from flatbuffer binary and loaded to device through this API.
 - Unrelated - Change trace_buffer.hpp to use fwd decl Buffer instead of
   buffer.hpp incl to reduce dependencies on users of trace_buffer.hpp
kmabeeTT added a commit that referenced this issue Jan 31, 2025
…or various types (#17039)

 - Code is compiled, but not used by anything yet, will be used after
   subsequent merge of light metal capture/replay libraries.
 - Make to/from_flatbuffer() friend of CircularBufferConfig to access internals
 - Remove default case statements from case statements for enums so when
   new enum values are added, compile error is seen to force updates.
kmabeeTT added a commit that referenced this issue Feb 1, 2025
…or various types (#17039)

 - Code is compiled, but not used by anything yet, will be used after
   subsequent merge of light metal capture/replay libraries.
 - Make to/from_flatbuffer() friend of CircularBufferConfig to access internals
 - Remove default case statements from case statements for enums so when
   new enum values are added, compile error is seen to force updates.
kmabeeTT added a commit that referenced this issue Feb 2, 2025
…or various types (#17039)

 - Code is compiled, but not used by anything yet, will be used after
   subsequent merge of light metal capture/replay libraries.
 - Make to/from_flatbuffer() friend of CircularBufferConfig to access internals
 - Remove default case statements from case statements for enums so when
   new enum values are added, compile error is seen to force updates.
 - All the PR feedback implemented. Add throws after switch statements for gcc12
kmabeeTT added a commit that referenced this issue Feb 2, 2025
…or various types (#17039)

 - Code is compiled, but not used by anything yet, will be used after
   subsequent merge of light metal capture/replay libraries.
 - Remove default case statements from case statements for enums so when
   new enum values are added, compile error is seen to force updates.
 - All the PR feedback implemented. Add throws after switch statements for gcc12
 - CircularBufferConfig needed tweaks, add accessors for 3 private
   members for capture, and new constructor to set all private members
   for replay, following lengthy PR/offline discussion (avoid friend).
kmabeeTT added a commit that referenced this issue Feb 2, 2025
…or various types (#17039)

 - Code is compiled, but not used by anything yet, will be used after
   subsequent merge of light metal capture/replay libraries.
 - Remove default case statements from case statements for enums so when
   new enum values are added, compile error is seen to force updates.
 - All the PR feedback implemented. Add throws after switch statements for gcc12
 - CircularBufferConfig needed tweaks, add accessors for 3 private
   members for capture, and new constructor to set all private members
   for replay, following lengthy PR/offline discussion (avoid friend).
nikileshx pushed a commit to nikileshx/tt-metal that referenced this issue Feb 3, 2025
…il namespace (tenstorrent#17039)

 - Will be used by Light Metal replay after upcoming PR, when executing
   a metal-trace traced program from binary, TraceDescriptor is
   extracted from flatbuffer binary and loaded to device through this API.
 - Unrelated - Change trace_buffer.hpp to use fwd decl Buffer instead of
   buffer.hpp incl to reduce dependencies on users of trace_buffer.hpp
nikileshx pushed a commit to nikileshx/tt-metal that referenced this issue Feb 3, 2025
…or various types (tenstorrent#17039)

 - Code is compiled, but not used by anything yet, will be used after
   subsequent merge of light metal capture/replay libraries.
 - Remove default case statements from case statements for enums so when
   new enum values are added, compile error is seen to force updates.
 - All the PR feedback implemented. Add throws after switch statements for gcc12
 - CircularBufferConfig needed tweaks, add accessors for 3 private
   members for capture, and new constructor to set all private members
   for replay, following lengthy PR/offline discussion (avoid friend).
kmabeeTT added a commit that referenced this issue Feb 3, 2025
… tests (#17039)

 - This is round 4, builds upon previous 3 merges for LightMetal that brought
   flatbuffer cmake/infra, new APIs, flatbuffer/schema serialization/deserialization

 - This adds light-metal Capture support for and instruments with LIGHT_METAL_TRACE_FUNCTION_CALL()
   and LIGHT_METAL_TRACE_FUNCTION_ENTRY() to many popular (not exuahstive) APIs used by unit tests.
   The former TRACE_FUNCTION_ENTRY() is more recent, used to protect against host APIs recursively
   calling other host APIs (only trace top most level). Two macros not always called back-to-back.

 - Support Capture/Replay of the following ~14 host APIs

   EnqueueTrace(), ReplayTrace(), ReleaseTrace()
   CreateBuffer(), EnqueueWriteBuffer(), EnqueueReadBuffer(), DeallocateBuffer
   CreateKernel(), CreateCircularBuffer()
   SetRuntimeArgs(uint32) SetRuntimeArgs(Kernel,RuntimeArgs)
   CreateProgram(), EnqueueProgram()
   Finish()

 - During capture, complex objects like Programs, Kernels, Buffers, CBHandle are assigned
   unique global_id, and referred to by their global_id in capture when used by functions

 - When "Metal Trace" is enabled, don't capture EnqueueProgram(), instead
   inject ReplayTrace(), would be used alongside LoadTrace()

 - Can be optionally disabled at compile time using build_metal.sh --disable-light-metal-trace
   which will set C++ define TT_ENABLE_LIGHT_METAL_TRACE=0 (trace functions become NOP)

 - New Verif APIs LightMetalCompareToCapture() / LightMetalCompareToGolden().
   Put them in lightmetal_capture_utils.hpp instead of host_api.hpp since they are purely
   used at capture time, and not worthy enough to be inside host_api.h since just for verif

 - Test fixture hardcoded to run capture-only and skip replay until replay code is merged next
kmabeeTT added a commit that referenced this issue Feb 3, 2025
… tests (#17039)

 - This is round 5, builds upon previous 4 merges for LightMetal that brought
   flatbuffer cmake/infra, begin/end APIs, LoadTrace() API, flatbuffer/schema
   serialization/deserialization

 - This adds light-metal Capture support for and instruments with LIGHT_METAL_TRACE_FUNCTION_CALL()
   and LIGHT_METAL_TRACE_FUNCTION_ENTRY() to many popular (not exuahstive) APIs used by unit tests.
   The former TRACE_FUNCTION_ENTRY() is more recent, used to protect against host APIs recursively
   calling other host APIs (only trace top most level). Two macros not always called back-to-back.

 - Support Capture/Replay of the following ~14 host APIs

   EnqueueTrace(), ReplayTrace(), ReleaseTrace()
   CreateBuffer(), EnqueueWriteBuffer(), EnqueueReadBuffer(), DeallocateBuffer
   CreateKernel(), CreateCircularBuffer()
   SetRuntimeArgs(uint32) SetRuntimeArgs(Kernel,RuntimeArgs)
   CreateProgram(), EnqueueProgram()
   Finish()

 - During capture, complex objects like Programs, Kernels, Buffers, CBHandle are assigned
   unique global_id, and referred to by their global_id in capture when used by functions

 - When "Metal Trace" is enabled, don't capture EnqueueProgram(), instead
   inject ReplayTrace(), would be used alongside LoadTrace()

 - Can be optionally disabled at compile time using build_metal.sh --disable-light-metal-trace
   which will set C++ define TT_ENABLE_LIGHT_METAL_TRACE=0 (trace functions become NOP)

 - New Verif APIs LightMetalCompareToCapture() / LightMetalCompareToGolden().
   Put them in lightmetal_capture_utils.hpp instead of host_api.hpp since they are purely
   used at capture time, and not worthy enough to be inside host_api.h since just for verif

 - Test fixture runs capture-only right now, will automatically run binary once replay
   support is merged next.
kmabeeTT added a commit that referenced this issue Feb 3, 2025
…andalone runner (#17039)

 - This is round 6/6 for now, builds upon previous 5 merges for LightMetal in past week
   and enables e2e capture + replay in unit tests now that replay is supported.

 - This brings the replay library/executor for a LightMetalBinary which handles
   replaying all the commands and traces captured by workload to binary. Like
   capture time, complex objects are stored in map after creation,
   and referenced by global_id by functions that re-use them.

 - Light Metal standalone CLI runner initial infra which just loads an existing
   binary on disk and executes it using replay librarys's ExecuteLightMetalBinary()
kmabeeTT added a commit that referenced this issue Feb 4, 2025
…andalone runner (#17039)

 - This is round 6/6 for now, builds upon previous 5 merges for LightMetal in past week
   and enables e2e capture + replay in unit tests now that replay is supported.

 - This brings the replay library/executor for a LightMetalBinary which handles
   replaying all the commands and traces captured by workload to binary. Like
   capture time, complex objects are stored in map after creation,
   and referenced by global_id by functions that re-use them.

 - Light Metal standalone CLI runner initial infra which just loads an existing
   binary on disk and executes it using replay librarys's ExecuteLightMetalBinary()
kmabeeTT added a commit that referenced this issue Feb 5, 2025
… tests (#17039)

 - This is round 5, builds upon previous 4 merges for LightMetal that brought
   flatbuffer cmake/infra, begin/end APIs, LoadTrace() API, flatbuffer/schema
   serialization/deserialization

 - This adds light-metal Capture support for and instruments with LIGHT_METAL_TRACE_FUNCTION_CALL()
   and LIGHT_METAL_TRACE_FUNCTION_ENTRY() to many popular (not exuahstive) APIs used by unit tests.
   The former TRACE_FUNCTION_ENTRY() is more recent, used to protect against host APIs recursively
   calling other host APIs (only trace top most level). Two macros not always called back-to-back.

 - Support Capture/Replay of the following ~14 host APIs

   EnqueueTrace(), ReplayTrace(), ReleaseTrace()
   CreateBuffer(), EnqueueWriteBuffer(), EnqueueReadBuffer(), DeallocateBuffer
   CreateKernel(), CreateCircularBuffer()
   SetRuntimeArgs(uint32) SetRuntimeArgs(Kernel,RuntimeArgs)
   CreateProgram(), EnqueueProgram()
   Finish()

 - During capture, complex objects like Programs, Kernels, Buffers, CBHandle are assigned
   unique global_id, and referred to by their global_id in capture when used by functions

 - When "Metal Trace" is enabled, don't capture EnqueueProgram(), instead
   inject ReplayTrace(), would be used alongside LoadTrace()

 - Can be optionally disabled at compile time using build_metal.sh --disable-light-metal-trace
   which will set C++ define TT_ENABLE_LIGHT_METAL_TRACE=0 (trace functions become NOP)

 - New Verif APIs LightMetalCompareToCapture() / LightMetalCompareToGolden().
   Put them in lightmetal_capture_utils.hpp instead of host_api.hpp since they are purely
   used at capture time, and not worthy enough to be inside host_api.h since just for verif

 - Test fixture runs capture-only right now, will automatically run binary once replay
   support is merged next.
kmabeeTT added a commit that referenced this issue Feb 5, 2025
… tests (#17039)

 - This is round 5, builds upon previous 4 merges for LightMetal that brought
   flatbuffer cmake/infra, begin/end APIs, LoadTrace() API, flatbuffer/schema
   serialization/deserialization

 - This adds light-metal Capture support for and instruments with LIGHT_METAL_TRACE_FUNCTION_CALL()
   and LIGHT_METAL_TRACE_FUNCTION_ENTRY() to many popular (not exuahstive) APIs used by unit tests.
   The former TRACE_FUNCTION_ENTRY() is more recent, used to protect against host APIs recursively
   calling other host APIs (only trace top most level). Two macros not always called back-to-back.

 - Support Capture/Replay of the following ~14 host APIs

   EnqueueTrace(), ReplayTrace(), ReleaseTrace()
   CreateBuffer(), EnqueueWriteBuffer(), EnqueueReadBuffer(), DeallocateBuffer
   CreateKernel(), CreateCircularBuffer()
   SetRuntimeArgs(uint32) SetRuntimeArgs(Kernel,RuntimeArgs)
   CreateProgram(), EnqueueProgram()
   Finish()

 - During capture, complex objects like Programs, Kernels, Buffers, CBHandle are assigned
   unique global_id, and referred to by their global_id in capture when used by functions

 - When "Metal Trace" is enabled, don't capture EnqueueProgram(), instead
   inject ReplayTrace(), would be used alongside LoadTrace()

 - Can be optionally disabled at compile time using build_metal.sh --disable-light-metal-trace
   which will set C++ define TT_ENABLE_LIGHT_METAL_TRACE=0 (trace functions become NOP)

 - New Verif APIs LightMetalCompareToCapture() / LightMetalCompareToGolden().
   Put them in lightmetal_capture_utils.hpp instead of host_api.hpp since they are purely
   used at capture time, and not worthy enough to be inside host_api.h since just for verif

 - Test fixture runs capture-only right now, will automatically run binary once replay
   support is merged next.
kmabeeTT added a commit that referenced this issue Feb 5, 2025
…andalone runner (#17039)

 - This is round 6/6 for now, builds upon previous 5 merges for LightMetal in past week
   and enables e2e capture + replay in unit tests now that replay is supported.

 - This brings the replay library/executor for a LightMetalBinary which handles
   replaying all the commands and traces captured by workload to binary. Like
   capture time, complex objects are stored in map after creation,
   and referenced by global_id by functions that re-use them.

 - Light Metal standalone CLI runner initial infra which just loads an existing
   binary on disk and executes it using replay librarys's ExecuteLightMetalBinary()
kmabeeTT added a commit that referenced this issue Feb 5, 2025
… tests (#17039)

 - This is round 5, builds upon previous 4 merges for LightMetal that brought
   flatbuffer cmake/infra, begin/end APIs, LoadTrace() API, flatbuffer/schema
   serialization/deserialization

 - This adds light-metal Capture support for and instruments with LIGHT_METAL_TRACE_FUNCTION_CALL()
   and LIGHT_METAL_TRACE_FUNCTION_ENTRY() to many popular (not exuahstive) APIs used by unit tests.
   The former TRACE_FUNCTION_ENTRY() is more recent, used to protect against host APIs recursively
   calling other host APIs (only trace top most level). Two macros not always called back-to-back.

 - Support Capture/Replay of the following ~14 host APIs

   EnqueueTrace(), ReplayTrace(), ReleaseTrace()
   CreateBuffer(), EnqueueWriteBuffer(), EnqueueReadBuffer(), DeallocateBuffer
   CreateKernel(), CreateCircularBuffer()
   SetRuntimeArgs(uint32) SetRuntimeArgs(Kernel,RuntimeArgs)
   CreateProgram(), EnqueueProgram()
   Finish()

 - During capture, complex objects like Programs, Kernels, Buffers, CBHandle are assigned
   unique global_id, and referred to by their global_id in capture when used by functions

 - When "Metal Trace" is enabled, don't capture EnqueueProgram(), instead
   inject ReplayTrace(), would be used alongside LoadTrace()

 - Can be optionally disabled at compile time using build_metal.sh --disable-light-metal-trace
   which will set C++ define TT_ENABLE_LIGHT_METAL_TRACE=0 (trace functions become NOP)

 - New Verif APIs LightMetalCompareToCapture() / LightMetalCompareToGolden().
   Put them in lightmetal_capture_utils.hpp instead of host_api.hpp since they are purely
   used at capture time, and not worthy enough to be inside host_api.h since just for verif

 - Test fixture runs capture-only right now, will automatically run binary once replay
   support is merged next.
kmabeeTT added a commit that referenced this issue Feb 5, 2025
… tests (#17039)

 - This is round 5, builds upon previous 4 merges for LightMetal that brought
   flatbuffer cmake/infra, begin/end APIs, LoadTrace() API, flatbuffer/schema
   serialization/deserialization

 - This adds light-metal Capture support for and instruments with LIGHT_METAL_TRACE_FUNCTION_CALL()
   and LIGHT_METAL_TRACE_FUNCTION_ENTRY() to many popular (not exuahstive) APIs used by unit tests.
   The former TRACE_FUNCTION_ENTRY() is more recent, used to protect against host APIs recursively
   calling other host APIs (only trace top most level). Two macros not always called back-to-back.

 - Support Capture/Replay of the following ~14 host APIs

   EnqueueTrace(), ReplayTrace(), ReleaseTrace()
   CreateBuffer(), EnqueueWriteBuffer(), EnqueueReadBuffer(), DeallocateBuffer
   CreateKernel(), CreateCircularBuffer()
   SetRuntimeArgs(uint32) SetRuntimeArgs(Kernel,RuntimeArgs)
   CreateProgram(), EnqueueProgram()
   Finish()

 - During capture, complex objects like Programs, Kernels, Buffers, CBHandle are assigned
   unique global_id, and referred to by their global_id in capture when used by functions

 - When "Metal Trace" is enabled, don't capture EnqueueProgram(), instead
   inject ReplayTrace(), would be used alongside LoadTrace()

 - Can be optionally disabled at compile time using build_metal.sh --disable-light-metal-trace
   which will set C++ define TT_ENABLE_LIGHT_METAL_TRACE=0 (trace functions become NOP)
   Set TT_ENABLE_LIGHT_METAL_TRACE as project cmake option default ON and adjust usage

 - New Verif APIs LightMetalCompareToCapture() / LightMetalCompareToGolden().
   Put them in lightmetal_capture_utils.hpp instead of host_api.hpp since they are purely
   used at capture time, and not worthy enough to be inside host_api.h since just for verif

 - Test fixture runs capture-only right now, will automatically run binary once replay
   support is merged next.
kmabeeTT added a commit that referenced this issue Feb 5, 2025
… tests (#17039)

 - This is round 5, builds upon previous 4 merges for LightMetal that brought
   flatbuffer cmake/infra, begin/end APIs, LoadTrace() API, flatbuffer/schema
   serialization/deserialization

 - This adds light-metal Capture support for and instruments with LIGHT_METAL_TRACE_FUNCTION_CALL()
   and LIGHT_METAL_TRACE_FUNCTION_ENTRY() to many popular (not exuahstive) APIs used by unit tests.
   The former TRACE_FUNCTION_ENTRY() is more recent, used to protect against host APIs recursively
   calling other host APIs (only trace top most level). Two macros not always called back-to-back.

 - Support Capture/Replay of the following ~14 host APIs

   EnqueueTrace(), ReplayTrace(), ReleaseTrace()
   CreateBuffer(), EnqueueWriteBuffer(), EnqueueReadBuffer(), DeallocateBuffer
   CreateKernel(), CreateCircularBuffer()
   SetRuntimeArgs(uint32) SetRuntimeArgs(Kernel,RuntimeArgs)
   CreateProgram(), EnqueueProgram()
   Finish()

 - During capture, complex objects like Programs, Kernels, Buffers, CBHandle are assigned
   unique global_id, and referred to by their global_id in capture when used by functions

 - When "Metal Trace" is enabled, don't capture EnqueueProgram(), instead
   inject ReplayTrace(), would be used alongside LoadTrace()

 - Can be optionally disabled at compile time using build_metal.sh --disable-light-metal-trace
   which will set C++ define TT_ENABLE_LIGHT_METAL_TRACE=0 (trace functions become NOP)
   Set TT_ENABLE_LIGHT_METAL_TRACE as project cmake option default ON and adjust usage

 - New Verif APIs LightMetalCompareToCapture() / LightMetalCompareToGolden().
   Put them in lightmetal_capture_utils.hpp instead of host_api.hpp since they are purely
   used at capture time, and not worthy enough to be inside host_api.h since just for verif

 - Test fixture runs capture-only right now, will automatically run binary once replay
   support is merged next.
kmabeeTT added a commit that referenced this issue Feb 5, 2025
…andalone runner (#17039)

 - This is round 6/6 for now, builds upon previous 5 merges for LightMetal in past week
   and enables e2e capture + replay in unit tests now that replay is supported.

 - This brings the replay library/executor for a LightMetalBinary which handles
   replaying all the commands and traces captured by workload to binary. Like
   capture time, complex objects are stored in map after creation,
   and referenced by global_id by functions that re-use them.

 - Light Metal standalone CLI runner initial infra which just loads an existing
   binary on disk and executes it using replay librarys's ExecuteLightMetalBinary()
kmabeeTT added a commit that referenced this issue Feb 5, 2025
…andalone runner (#17039)

 - This is round 6/6 for now, builds upon previous 5 merges for LightMetal in past week
   and enables e2e capture + replay in unit tests now that replay is supported.

 - This brings the replay library/executor for a LightMetalBinary which handles
   replaying all the commands and traces captured by workload to binary. Like
   capture time, complex objects are stored in map after creation,
   and referenced by global_id by functions that re-use them.

 - Light Metal standalone CLI runner initial infra which just loads an existing
   binary on disk and executes it using replay librarys's ExecuteLightMetalBinary()

 - Some PR Reedback: Update asserts, remove default cases, more comments, etc.
kmabeeTT added a commit that referenced this issue Feb 5, 2025
…andalone runner (#17039)

 - This is round 6/6 for now, builds upon previous 5 merges for LightMetal in past week
   and enables e2e capture + replay in unit tests now that replay is supported.

 - This brings the replay library/executor for a LightMetalBinary which handles
   replaying all the commands and traces captured by workload to binary. Like
   capture time, complex objects are stored in map after creation,
   and referenced by global_id by functions that re-use them.

 - Light Metal standalone CLI runner initial infra which just loads an existing
   binary on disk and executes it using replay librarys's ExecuteLightMetalBinary()

 - Some PR Reedback: Update asserts, remove default cases, more comments, etc.
@kmabeeTT
Copy link
Contributor Author

kmabeeTT commented Feb 5, 2025

This is now closed - these initial light-metal capture/replay changes are now merged across 6 PR's (see list in comment above) over last 10 days, after incorporating a bunch of feedback, was around 3600 lines/additions as reported by git. Special thanks to Oleg for thorough review. Will add more work into parent ticket next that is being scoped out (some of it has already begun).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request External feature request lightmetal
Projects
None yet
Development

No branches or pull requests

1 participant