Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Exegesis] Add supports to serialize/deserialize object files into benchmarks #121993

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mshockwave
Copy link
Member

This patch adds support to serialize the assembled object files into the benchmark results, so that we can deserialize them later on and run the measurements. This is useful when the overhead of end-to-end execution (snippet generation + benchmark measurement) is too high and we want to separate it into two stages.

The object file is compressed and serialized into base64 string. It has fantastic compression rate because there are lots of (nearly) identical instructions in the file.

Currently this patch can only resume before the measure phase. It does not support repetition modes that require more than one snippet (i.e. min and middle-half-loop/duplicate) either.


This PR stacks on top of #121991

@llvmbot
Copy link
Member

llvmbot commented Jan 7, 2025

@llvm/pr-subscribers-llvm-support

@llvm/pr-subscribers-llvm-binary-utilities

Author: Min-Yih Hsu (mshockwave)

Changes

This patch adds support to serialize the assembled object files into the benchmark results, so that we can deserialize them later on and run the measurements. This is useful when the overhead of end-to-end execution (snippet generation + benchmark measurement) is too high and we want to separate it into two stages.

The object file is compressed and serialized into base64 string. It has fantastic compression rate because there are lots of (nearly) identical instructions in the file.

Currently this patch can only resume before the measure phase. It does not support repetition modes that require more than one snippet (i.e. min and middle-half-loop/duplicate) either.


This PR stacks on top of #121991


Patch is 31.41 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121993.diff

8 Files Affected:

  • (modified) llvm/docs/CommandGuide/llvm-exegesis.rst (+15-1)
  • (added) llvm/test/tools/llvm-exegesis/RISCV/serialize-obj-file.test (+33)
  • (added) llvm/test/tools/llvm-exegesis/dry-run-measurement.test (+11)
  • (modified) llvm/tools/llvm-exegesis/lib/BenchmarkResult.cpp (+93-2)
  • (modified) llvm/tools/llvm-exegesis/lib/BenchmarkResult.h (+20)
  • (modified) llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp (+62-5)
  • (modified) llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h (+9-2)
  • (modified) llvm/tools/llvm-exegesis/llvm-exegesis.cpp (+159-97)
diff --git a/llvm/docs/CommandGuide/llvm-exegesis.rst b/llvm/docs/CommandGuide/llvm-exegesis.rst
index 8266d891a5e6b1..c3580cdecab7b9 100644
--- a/llvm/docs/CommandGuide/llvm-exegesis.rst
+++ b/llvm/docs/CommandGuide/llvm-exegesis.rst
@@ -299,9 +299,18 @@ OPTIONS
   However, it is possible to stop at some stage before measuring. Choices are:
   * ``prepare-snippet``: Only generate the minimal instruction sequence.
   * ``prepare-and-assemble-snippet``: Same as ``prepare-snippet``, but also dumps an excerpt of the sequence (hex encoded).
-  * ``assemble-measured-code``: Same as ``prepare-and-assemble-snippet``. but also creates the full sequence that can be dumped to a file using ``--dump-object-to-disk``.
+  * ``assemble-measured-code``: Same as ``prepare-and-assemble-snippet``. but
+    also creates the full sequence that can be dumped to a file using ``--dump-object-to-disk``.
+    If either zlib or zstd is available and we're using either duplicate or
+    loop repetition mode, this phase generates benchmarks with a serialized
+    snippet object file attached to it.
   * ``measure``: Same as ``assemble-measured-code``, but also runs the measurement.
 
+.. option:: --run-measurement=<benchmarks file>
+
+  Given a benchmarks file generated after the ``assembly-measured-code`` phase,
+  resume the measurement phase from it.
+
 .. option:: --x86-lbr-sample-period=<nBranches/sample>
 
   Specify the LBR sampling period - how many branches before we take a sample.
@@ -449,6 +458,11 @@ OPTIONS
  crash when hardware performance counters are unavailable and for
  debugging :program:`llvm-exegesis` itself.
 
+.. option:: --dry-run-measurement
+  If set, llvm-exegesis runs everything except the actual snippet execution.
+  This is useful if we want to test some part of the code without actually
+  running on native platforms.
+
 .. option:: --execution-mode=[inprocess,subprocess]
 
   This option specifies what execution mode to use. The `inprocess` execution
diff --git a/llvm/test/tools/llvm-exegesis/RISCV/serialize-obj-file.test b/llvm/test/tools/llvm-exegesis/RISCV/serialize-obj-file.test
new file mode 100644
index 00000000000000..befd16699bef1a
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/RISCV/serialize-obj-file.test
@@ -0,0 +1,33 @@
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p470 --opcode-name=SH3ADD --benchmark-phase=assemble-measured-code --mode=latency --benchmarks-file=%t.yaml
+# RUN: FileCheck --input-file=%t.yaml %s --check-prefixes=CHECK,SERIALIZE
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p470 --run-measurement=%t.yaml --mode=latency --dry-run-measurement --use-dummy-perf-counters \
+# RUN:    --dump-object-to-disk=%t.o | FileCheck %s --check-prefixes=CHECK,DESERIALIZE
+# RUN: llvm-objdump -d %t.o | FileCheck %s --check-prefix=OBJDUMP
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p470 --opcode-name=SH3ADD --mode=latency --dry-run-measurement --use-dummy-perf-counters | \
+# RUN:    FileCheck %s --check-prefix=NO-SERIALIZE
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p470 --opcode-name=SH3ADD --mode=latency --benchmark-phase=assemble-measured-code --repetition-mode=min | \
+# RUN:    FileCheck %s --check-prefix=NO-SERIALIZE
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p470 --opcode-name=SH3ADD --mode=latency --benchmark-phase=assemble-measured-code --repetition-mode=middle-half-loop | \
+# RUN:    FileCheck %s --check-prefix=NO-SERIALIZE
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p470 --opcode-name=SH3ADD --mode=latency --benchmark-phase=assemble-measured-code --repetition-mode=middle-half-duplicate | \
+# RUN:    FileCheck %s --check-prefix=NO-SERIALIZE
+# REQUIRES: zlib || zstd
+
+# A round-trip test for serialize/deserialize benchmarks.
+
+# CHECK: mode: latency
+# CHECK:  instructions:
+# CHECK-NEXT: - 'SH3ADD X{{.*}} X{{.*}} X{{.*}}'
+# CHECK: cpu_name:        sifive-p470
+# CHECK-NEXT: llvm_triple:     riscv64
+# CHECK-NEXT: min_instructions: 10000
+# CHECK-NEXT: measurements:    []
+# SERIALIZE: error: actual measurements skipped.
+# DESERIALIZE: error:           ''
+# CHECK: info:            Repeating a single explicitly serial instruction
+
+# OBJDUMP: sh3add
+
+# Negative tests: we shouldn't serialize object files in some scenarios.
+
+# NO-SERIALIZE-NOT: object_file:
diff --git a/llvm/test/tools/llvm-exegesis/dry-run-measurement.test b/llvm/test/tools/llvm-exegesis/dry-run-measurement.test
new file mode 100644
index 00000000000000..82857e7998b5e6
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/dry-run-measurement.test
@@ -0,0 +1,11 @@
+# RUN: llvm-exegesis --mtriple=riscv64 --mcpu=sifive-p470 --mode=latency --opcode-name=ADD --use-dummy-perf-counters --dry-run-measurement | FileCheck %s
+# REQUIRES: riscv-registered-target
+
+# This test makes sure that llvm-exegesis doesn't execute "cross-compiled" snippets in the presence of
+# --dry-run-measurement. RISC-V was chosen simply because most of the time we run tests on X86 machines.
+
+# Should not contain misleading results.
+# CHECK: measurements:    []
+
+# Should not contain error messages like "snippet crashed while running: Segmentation fault".
+# CHECK: error:           ''
diff --git a/llvm/tools/llvm-exegesis/lib/BenchmarkResult.cpp b/llvm/tools/llvm-exegesis/lib/BenchmarkResult.cpp
index 84dc23b343c6c0..eff5a6d547cbda 100644
--- a/llvm/tools/llvm-exegesis/lib/BenchmarkResult.cpp
+++ b/llvm/tools/llvm-exegesis/lib/BenchmarkResult.cpp
@@ -15,10 +15,13 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/bit.h"
 #include "llvm/ObjectYAML/YAML.h"
+#include "llvm/Support/Base64.h"
+#include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Errc.h"
 #include "llvm/Support/FileOutputBuffer.h"
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/Format.h"
+#include "llvm/Support/Timer.h"
 #include "llvm/Support/raw_ostream.h"
 
 static constexpr const char kIntegerPrefix[] = "i_0x";
@@ -27,6 +30,12 @@ static constexpr const char kInvalidOperand[] = "INVALID";
 
 namespace llvm {
 
+static cl::opt<compression::Format> ForceObjectFileCompressionFormat(
+    "exegesis-force-obj-compress-format", cl::Hidden,
+    cl::desc("Force to use this compression format for object files."),
+    cl::values(clEnumValN(compression::Format::Zstd, "zstd", "Using Zstandard"),
+               clEnumValN(compression::Format::Zlib, "zlib", "Using LibZ")));
+
 namespace {
 
 // A mutable struct holding an LLVMState that can be passed through the
@@ -278,6 +287,13 @@ template <> struct ScalarTraits<exegesis::RegisterValue> {
   static const bool flow = true;
 };
 
+template <> struct ScalarEnumerationTraits<compression::Format> {
+  static void enumeration(IO &Io, compression::Format &Format) {
+    Io.enumCase(Format, "zstd", compression::Format::Zstd);
+    Io.enumCase(Format, "zlib", compression::Format::Zlib);
+  }
+};
+
 template <> struct MappingContextTraits<exegesis::BenchmarkKey, YamlContext> {
   static void mapping(IO &Io, exegesis::BenchmarkKey &Obj,
                       YamlContext &Context) {
@@ -288,6 +304,33 @@ template <> struct MappingContextTraits<exegesis::BenchmarkKey, YamlContext> {
   }
 };
 
+template <> struct MappingTraits<exegesis::Benchmark::ObjectFile> {
+  struct NormalizedBase64Binary {
+    std::string Base64Str;
+
+    NormalizedBase64Binary(IO &) {}
+    NormalizedBase64Binary(IO &, const std::vector<uint8_t> &Data)
+        : Base64Str(llvm::encodeBase64(Data)) {}
+
+    std::vector<uint8_t> denormalize(IO &) {
+      std::vector<char> Buffer;
+      if (Error E = llvm::decodeBase64(Base64Str, Buffer))
+        report_fatal_error(std::move(E));
+
+      StringRef Data(Buffer.data(), Buffer.size());
+      return std::vector<uint8_t>(Data.bytes_begin(), Data.bytes_end());
+    }
+  };
+
+  static void mapping(IO &Io, exegesis::Benchmark::ObjectFile &Obj) {
+    Io.mapRequired("compression", Obj.CompressionFormat);
+    Io.mapRequired("original_size", Obj.UncompressedSize);
+    MappingNormalization<NormalizedBase64Binary, std::vector<uint8_t>>
+        ObjFileString(Io, Obj.CompressedBytes);
+    Io.mapRequired("compressed_bytes", ObjFileString->Base64Str);
+  }
+};
+
 template <> struct MappingContextTraits<exegesis::Benchmark, YamlContext> {
   struct NormalizedBinary {
     NormalizedBinary(IO &io) {}
@@ -325,9 +368,11 @@ template <> struct MappingContextTraits<exegesis::Benchmark, YamlContext> {
     Io.mapRequired("error", Obj.Error);
     Io.mapOptional("info", Obj.Info);
     // AssembledSnippet
-    MappingNormalization<NormalizedBinary, std::vector<uint8_t>> BinaryString(
+    MappingNormalization<NormalizedBinary, std::vector<uint8_t>> SnippetString(
         Io, Obj.AssembledSnippet);
-    Io.mapOptional("assembled_snippet", BinaryString->Binary);
+    Io.mapOptional("assembled_snippet", SnippetString->Binary);
+    // ObjectFile
+    Io.mapOptional("object_file", Obj.ObjFile);
   }
 };
 
@@ -364,6 +409,52 @@ Benchmark::readTriplesAndCpusFromYamls(MemoryBufferRef Buffer) {
   return Result;
 }
 
+Error Benchmark::setObjectFile(StringRef RawBytes) {
+  SmallVector<uint8_t> CompressedBytes;
+  llvm::compression::Format CompressionFormat;
+
+  auto isFormatAvailable = [](llvm::compression::Format F) -> bool {
+    switch (F) {
+    case compression::Format::Zstd:
+      return compression::zstd::isAvailable();
+    case compression::Format::Zlib:
+      return compression::zlib::isAvailable();
+    }
+  };
+  if (ForceObjectFileCompressionFormat.getNumOccurrences() > 0) {
+    CompressionFormat = ForceObjectFileCompressionFormat;
+    if (!isFormatAvailable(CompressionFormat))
+      return make_error<StringError>(
+          "The designated compression format is not available.",
+          inconvertibleErrorCode());
+  } else if (isFormatAvailable(compression::Format::Zstd)) {
+    // Try newer compression algorithm first.
+    CompressionFormat = compression::Format::Zstd;
+  } else if (isFormatAvailable(compression::Format::Zlib)) {
+    CompressionFormat = compression::Format::Zlib;
+  } else {
+    return make_error<StringError>(
+        "None of the compression methods is available.",
+        inconvertibleErrorCode());
+  }
+
+  switch (CompressionFormat) {
+  case compression::Format::Zstd:
+    compression::zstd::compress({RawBytes.bytes_begin(), RawBytes.bytes_end()},
+                                CompressedBytes);
+    break;
+  case compression::Format::Zlib:
+    compression::zlib::compress({RawBytes.bytes_begin(), RawBytes.bytes_end()},
+                                CompressedBytes);
+    break;
+  }
+
+  ObjFile = {CompressionFormat,
+             RawBytes.size(),
+             {CompressedBytes.begin(), CompressedBytes.end()}};
+  return Error::success();
+}
+
 Expected<Benchmark> Benchmark::readYaml(const LLVMState &State,
                                         MemoryBufferRef Buffer) {
   yaml::Input Yin(Buffer);
diff --git a/llvm/tools/llvm-exegesis/lib/BenchmarkResult.h b/llvm/tools/llvm-exegesis/lib/BenchmarkResult.h
index 3c09a8380146e5..a5217566204a14 100644
--- a/llvm/tools/llvm-exegesis/lib/BenchmarkResult.h
+++ b/llvm/tools/llvm-exegesis/lib/BenchmarkResult.h
@@ -21,6 +21,7 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/MC/MCInst.h"
 #include "llvm/MC/MCInstBuilder.h"
+#include "llvm/Support/Compression.h"
 #include "llvm/Support/YAMLTraits.h"
 #include <limits>
 #include <set>
@@ -76,6 +77,11 @@ struct BenchmarkKey {
   uintptr_t SnippetAddress = 0;
   // The register that should be used to hold the loop counter.
   unsigned LoopRegister;
+
+  bool operator==(const BenchmarkKey &RHS) const {
+    return Config == RHS.Config &&
+           Instructions[0].getOpcode() == RHS.Instructions[0].getOpcode();
+  }
 };
 
 struct BenchmarkMeasure {
@@ -122,6 +128,16 @@ struct Benchmark {
   std::string Error;
   std::string Info;
   std::vector<uint8_t> AssembledSnippet;
+
+  struct ObjectFile {
+    llvm::compression::Format CompressionFormat;
+    size_t UncompressedSize = 0;
+    std::vector<uint8_t> CompressedBytes;
+
+    bool isValid() const { return UncompressedSize && CompressedBytes.size(); }
+  };
+  std::optional<ObjectFile> ObjFile;
+
   // How to aggregate measurements.
   enum ResultAggregationModeE { Min, Max, Mean, MinVariance };
 
@@ -132,6 +148,10 @@ struct Benchmark {
   Benchmark &operator=(const Benchmark &) = delete;
   Benchmark &operator=(Benchmark &&) = delete;
 
+  // Compress raw object file bytes and assign the result and compression type
+  // to CompressedObjectFile and ObjFileCompression, respectively.
+  class Error setObjectFile(StringRef RawBytes);
+
   // Read functions.
   static Expected<Benchmark> readYaml(const LLVMState &State,
                                                  MemoryBufferRef Buffer);
diff --git a/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp b/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
index a7771b99e97b1a..3bca6ed13d8fc8 100644
--- a/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
+++ b/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
@@ -53,6 +53,12 @@
 namespace llvm {
 namespace exegesis {
 
+static cl::opt<bool>
+    DryRunMeasurement("dry-run-measurement",
+                      cl::desc("Run every steps in the measurement phase "
+                               "except executing the snippet."),
+                      cl::init(false), cl::Hidden);
+
 BenchmarkRunner::BenchmarkRunner(const LLVMState &State, Benchmark::ModeE Mode,
                                  BenchmarkPhaseSelectorE BenchmarkPhaseSelector,
                                  ExecutionModeE ExecutionMode,
@@ -140,13 +146,21 @@ class InProcessFunctionExecutorImpl : public BenchmarkRunner::FunctionExecutor {
     Scratch->clear();
     {
       auto PS = ET.withSavedState();
+      // We can't directly capture DryRunMeasurement in the lambda below.
+      bool DryRun = DryRunMeasurement;
       CrashRecoveryContext CRC;
       CrashRecoveryContext::Enable();
-      const bool Crashed = !CRC.RunSafely([this, Counter, ScratchPtr]() {
-        Counter->start();
-        this->Function(ScratchPtr);
-        Counter->stop();
-      });
+      const bool Crashed =
+          !CRC.RunSafely([this, Counter, ScratchPtr, DryRun]() {
+            if (DryRun) {
+              Counter->start();
+              Counter->stop();
+            } else {
+              Counter->start();
+              this->Function(ScratchPtr);
+              Counter->stop();
+            }
+          });
       CrashRecoveryContext::Disable();
       PS.reset();
       if (Crashed) {
@@ -610,6 +624,7 @@ Expected<SmallString<0>> BenchmarkRunner::assembleSnippet(
 Expected<BenchmarkRunner::RunnableConfiguration>
 BenchmarkRunner::getRunnableConfiguration(
     const BenchmarkCode &BC, unsigned MinInstructions, unsigned LoopBodySize,
+    Benchmark::RepetitionModeE RepetitionMode,
     const SnippetRepetitor &Repetitor) const {
   RunnableConfiguration RC;
 
@@ -654,12 +669,54 @@ BenchmarkRunner::getRunnableConfiguration(
                         LoopBodySize, GenerateMemoryInstructions);
     if (Error E = Snippet.takeError())
       return std::move(E);
+    // There is no need to serialize/deserialize the object file if we're
+    // simply running end-to-end measurements.
+    // Same goes for any repetition mode that requires more than a single
+    // snippet.
+    if (BenchmarkPhaseSelector < BenchmarkPhaseSelectorE::Measure &&
+        (RepetitionMode == Benchmark::Loop ||
+         RepetitionMode == Benchmark::Duplicate)) {
+      if (Error E = BenchmarkResult.setObjectFile(*Snippet))
+        return std::move(E);
+    }
     RC.ObjectFile = getObjectFromBuffer(*Snippet);
   }
 
   return std::move(RC);
 }
 
+Expected<BenchmarkRunner::RunnableConfiguration>
+BenchmarkRunner::getRunnableConfiguration(Benchmark &&B) const {
+  assert(B.ObjFile.has_value() && B.ObjFile->isValid() &&
+         "No serialized obejct file is attached?");
+  const Benchmark::ObjectFile &ObjFile = *B.ObjFile;
+  SmallVector<uint8_t> DecompressedObjFile;
+  switch (ObjFile.CompressionFormat) {
+  case compression::Format::Zstd:
+    if (!compression::zstd::isAvailable())
+      return make_error<StringError>("zstd is not available for decompression.",
+                                     inconvertibleErrorCode());
+    if (Error E = compression::zstd::decompress(ObjFile.CompressedBytes,
+                                                DecompressedObjFile,
+                                                ObjFile.UncompressedSize))
+      return std::move(E);
+    break;
+  case compression::Format::Zlib:
+    if (!compression::zlib::isAvailable())
+      return make_error<StringError>("zlib is not available for decompression.",
+                                     inconvertibleErrorCode());
+    if (Error E = compression::zlib::decompress(ObjFile.CompressedBytes,
+                                                DecompressedObjFile,
+                                                ObjFile.UncompressedSize))
+      return std::move(E);
+    break;
+  }
+
+  StringRef Buffer(reinterpret_cast<const char *>(DecompressedObjFile.begin()),
+                   DecompressedObjFile.size());
+  return RunnableConfiguration{std::move(B), getObjectFromBuffer(Buffer)};
+}
+
 Expected<std::unique_ptr<BenchmarkRunner::FunctionExecutor>>
 BenchmarkRunner::createFunctionExecutor(
     object::OwningBinary<object::ObjectFile> ObjectFile,
diff --git a/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h b/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h
index e688b814d1c83d..ef9446bdd5bbe8 100644
--- a/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h
+++ b/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h
@@ -54,18 +54,25 @@ class BenchmarkRunner {
     RunnableConfiguration &operator=(RunnableConfiguration &&) = delete;
     RunnableConfiguration &operator=(const RunnableConfiguration &) = delete;
 
+    Benchmark BenchmarkResult;
+    object::OwningBinary<object::ObjectFile> ObjectFile;
+
   private:
     RunnableConfiguration() = default;
 
-    Benchmark BenchmarkResult;
-    object::OwningBinary<object::ObjectFile> ObjectFile;
+    RunnableConfiguration(Benchmark &&B,
+                          object::OwningBinary<object::ObjectFile> &&OF)
+        : BenchmarkResult(std::move(B)), ObjectFile(std::move(OF)) {}
   };
 
   Expected<RunnableConfiguration>
   getRunnableConfiguration(const BenchmarkCode &Configuration,
                            unsigned MinInstructions, unsigned LoopUnrollFactor,
+                           Benchmark::RepetitionModeE RepetitionMode,
                            const SnippetRepetitor &Repetitor) const;
 
+  Expected<RunnableConfiguration> getRunnableConfiguration(Benchmark &&B) const;
+
   std::pair<Error, Benchmark>
   runConfiguration(RunnableConfiguration &&RC,
                    const std::optional<StringRef> &DumpFile,
diff --git a/llvm/tools/llvm-exegesis/llvm-exegesis.cpp b/llvm/tools/llvm-exegesis/llvm-exegesis.cpp
index fa37e05956be8c..a21f3bdb5fba5f 100644
--- a/llvm/tools/llvm-exegesis/llvm-exegesis.cpp
+++ b/llvm/tools/llvm-exegesis/llvm-exegesis.cpp
@@ -114,8 +114,7 @@ static cl::opt<bool> BenchmarkMeasurementsPrintProgress(
 
 static cl::opt<BenchmarkPhaseSelectorE> BenchmarkPhaseSelector(
     "benchmark-phase",
-    cl::desc(
-        "it is possible to stop the benchmarking process after some phase"),
+    cl::desc("Stop the benchmarking process after some phase"),
     cl::cat(BenchmarkOptions),
     cl::values(
         clEnumValN(BenchmarkPhaseSelectorE::PrepareSnippet, "prepare-snippet",
@@ -135,6 +134,13 @@ static cl::opt<BenchmarkPhaseSelectorE> BenchmarkPhaseSelector(
             "(default)")),
     cl::init(BenchmarkPhaseSelectorE::Measure));
 
+static cl::opt<std::string> RunMeasurement(
+    "run-measurement",
+    cl::desc(
+        "Run measurement phase with a benchmarks file generated previously"),
+    cl::cat(BenchmarkOptions), cl::value_desc("<benchmarks file>"),
+    cl::init(""));
+
 static cl::opt<bool>
     UseDummyPerfCounters("use-dummy-perf-counters",
                          cl::desc("Do not read real performance counters, use "
@@ -397,11 +403,55 @@ generateSnippets(const LLVMState &State, unsigned Opcode,
   return Benchmarks;
 }
 
-static void runBenchmarkConfigurations(
-    const LLVMState &State, ArrayRef<BenchmarkCode> Configurations,
+static void deserializeRunnableConfigurations(
+    std::vector<Benchmark> &Benchmarks, const BenchmarkRunner &Runner,
+    std::ve...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jan 7, 2025

@llvm/pr-subscribers-tools-llvm-exegesis

Author: Min-Yih Hsu (mshockwave)

Changes

This patch adds support to serialize the assembled object files into the benchmark results, so that we can deserialize them later on and run the measurements. This is useful when the overhead of end-to-end execution (snippet generation + benchmark measurement) is too high and we want to separate it into two stages.

The object file is compressed and serialized into base64 string. It has fantastic compression rate because there are lots of (nearly) identical instructions in the file.

Currently this patch can only resume before the measure phase. It does not support repetition modes that require more than one snippet (i.e. min and middle-half-loop/duplicate) either.


This PR stacks on top of #121991


Patch is 31.41 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121993.diff

8 Files Affected:

  • (modified) llvm/docs/CommandGuide/llvm-exegesis.rst (+15-1)
  • (added) llvm/test/tools/llvm-exegesis/RISCV/serialize-obj-file.test (+33)
  • (added) llvm/test/tools/llvm-exegesis/dry-run-measurement.test (+11)
  • (modified) llvm/tools/llvm-exegesis/lib/BenchmarkResult.cpp (+93-2)
  • (modified) llvm/tools/llvm-exegesis/lib/BenchmarkResult.h (+20)
  • (modified) llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp (+62-5)
  • (modified) llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h (+9-2)
  • (modified) llvm/tools/llvm-exegesis/llvm-exegesis.cpp (+159-97)
diff --git a/llvm/docs/CommandGuide/llvm-exegesis.rst b/llvm/docs/CommandGuide/llvm-exegesis.rst
index 8266d891a5e6b1..c3580cdecab7b9 100644
--- a/llvm/docs/CommandGuide/llvm-exegesis.rst
+++ b/llvm/docs/CommandGuide/llvm-exegesis.rst
@@ -299,9 +299,18 @@ OPTIONS
   However, it is possible to stop at some stage before measuring. Choices are:
   * ``prepare-snippet``: Only generate the minimal instruction sequence.
   * ``prepare-and-assemble-snippet``: Same as ``prepare-snippet``, but also dumps an excerpt of the sequence (hex encoded).
-  * ``assemble-measured-code``: Same as ``prepare-and-assemble-snippet``. but also creates the full sequence that can be dumped to a file using ``--dump-object-to-disk``.
+  * ``assemble-measured-code``: Same as ``prepare-and-assemble-snippet``. but
+    also creates the full sequence that can be dumped to a file using ``--dump-object-to-disk``.
+    If either zlib or zstd is available and we're using either duplicate or
+    loop repetition mode, this phase generates benchmarks with a serialized
+    snippet object file attached to it.
   * ``measure``: Same as ``assemble-measured-code``, but also runs the measurement.
 
+.. option:: --run-measurement=<benchmarks file>
+
+  Given a benchmarks file generated after the ``assembly-measured-code`` phase,
+  resume the measurement phase from it.
+
 .. option:: --x86-lbr-sample-period=<nBranches/sample>
 
   Specify the LBR sampling period - how many branches before we take a sample.
@@ -449,6 +458,11 @@ OPTIONS
  crash when hardware performance counters are unavailable and for
  debugging :program:`llvm-exegesis` itself.
 
+.. option:: --dry-run-measurement
+  If set, llvm-exegesis runs everything except the actual snippet execution.
+  This is useful if we want to test some part of the code without actually
+  running on native platforms.
+
 .. option:: --execution-mode=[inprocess,subprocess]
 
   This option specifies what execution mode to use. The `inprocess` execution
diff --git a/llvm/test/tools/llvm-exegesis/RISCV/serialize-obj-file.test b/llvm/test/tools/llvm-exegesis/RISCV/serialize-obj-file.test
new file mode 100644
index 00000000000000..befd16699bef1a
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/RISCV/serialize-obj-file.test
@@ -0,0 +1,33 @@
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p470 --opcode-name=SH3ADD --benchmark-phase=assemble-measured-code --mode=latency --benchmarks-file=%t.yaml
+# RUN: FileCheck --input-file=%t.yaml %s --check-prefixes=CHECK,SERIALIZE
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p470 --run-measurement=%t.yaml --mode=latency --dry-run-measurement --use-dummy-perf-counters \
+# RUN:    --dump-object-to-disk=%t.o | FileCheck %s --check-prefixes=CHECK,DESERIALIZE
+# RUN: llvm-objdump -d %t.o | FileCheck %s --check-prefix=OBJDUMP
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p470 --opcode-name=SH3ADD --mode=latency --dry-run-measurement --use-dummy-perf-counters | \
+# RUN:    FileCheck %s --check-prefix=NO-SERIALIZE
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p470 --opcode-name=SH3ADD --mode=latency --benchmark-phase=assemble-measured-code --repetition-mode=min | \
+# RUN:    FileCheck %s --check-prefix=NO-SERIALIZE
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p470 --opcode-name=SH3ADD --mode=latency --benchmark-phase=assemble-measured-code --repetition-mode=middle-half-loop | \
+# RUN:    FileCheck %s --check-prefix=NO-SERIALIZE
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p470 --opcode-name=SH3ADD --mode=latency --benchmark-phase=assemble-measured-code --repetition-mode=middle-half-duplicate | \
+# RUN:    FileCheck %s --check-prefix=NO-SERIALIZE
+# REQUIRES: zlib || zstd
+
+# A round-trip test for serialize/deserialize benchmarks.
+
+# CHECK: mode: latency
+# CHECK:  instructions:
+# CHECK-NEXT: - 'SH3ADD X{{.*}} X{{.*}} X{{.*}}'
+# CHECK: cpu_name:        sifive-p470
+# CHECK-NEXT: llvm_triple:     riscv64
+# CHECK-NEXT: min_instructions: 10000
+# CHECK-NEXT: measurements:    []
+# SERIALIZE: error: actual measurements skipped.
+# DESERIALIZE: error:           ''
+# CHECK: info:            Repeating a single explicitly serial instruction
+
+# OBJDUMP: sh3add
+
+# Negative tests: we shouldn't serialize object files in some scenarios.
+
+# NO-SERIALIZE-NOT: object_file:
diff --git a/llvm/test/tools/llvm-exegesis/dry-run-measurement.test b/llvm/test/tools/llvm-exegesis/dry-run-measurement.test
new file mode 100644
index 00000000000000..82857e7998b5e6
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/dry-run-measurement.test
@@ -0,0 +1,11 @@
+# RUN: llvm-exegesis --mtriple=riscv64 --mcpu=sifive-p470 --mode=latency --opcode-name=ADD --use-dummy-perf-counters --dry-run-measurement | FileCheck %s
+# REQUIRES: riscv-registered-target
+
+# This test makes sure that llvm-exegesis doesn't execute "cross-compiled" snippets in the presence of
+# --dry-run-measurement. RISC-V was chosen simply because most of the time we run tests on X86 machines.
+
+# Should not contain misleading results.
+# CHECK: measurements:    []
+
+# Should not contain error messages like "snippet crashed while running: Segmentation fault".
+# CHECK: error:           ''
diff --git a/llvm/tools/llvm-exegesis/lib/BenchmarkResult.cpp b/llvm/tools/llvm-exegesis/lib/BenchmarkResult.cpp
index 84dc23b343c6c0..eff5a6d547cbda 100644
--- a/llvm/tools/llvm-exegesis/lib/BenchmarkResult.cpp
+++ b/llvm/tools/llvm-exegesis/lib/BenchmarkResult.cpp
@@ -15,10 +15,13 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/bit.h"
 #include "llvm/ObjectYAML/YAML.h"
+#include "llvm/Support/Base64.h"
+#include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Errc.h"
 #include "llvm/Support/FileOutputBuffer.h"
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/Format.h"
+#include "llvm/Support/Timer.h"
 #include "llvm/Support/raw_ostream.h"
 
 static constexpr const char kIntegerPrefix[] = "i_0x";
@@ -27,6 +30,12 @@ static constexpr const char kInvalidOperand[] = "INVALID";
 
 namespace llvm {
 
+static cl::opt<compression::Format> ForceObjectFileCompressionFormat(
+    "exegesis-force-obj-compress-format", cl::Hidden,
+    cl::desc("Force to use this compression format for object files."),
+    cl::values(clEnumValN(compression::Format::Zstd, "zstd", "Using Zstandard"),
+               clEnumValN(compression::Format::Zlib, "zlib", "Using LibZ")));
+
 namespace {
 
 // A mutable struct holding an LLVMState that can be passed through the
@@ -278,6 +287,13 @@ template <> struct ScalarTraits<exegesis::RegisterValue> {
   static const bool flow = true;
 };
 
+template <> struct ScalarEnumerationTraits<compression::Format> {
+  static void enumeration(IO &Io, compression::Format &Format) {
+    Io.enumCase(Format, "zstd", compression::Format::Zstd);
+    Io.enumCase(Format, "zlib", compression::Format::Zlib);
+  }
+};
+
 template <> struct MappingContextTraits<exegesis::BenchmarkKey, YamlContext> {
   static void mapping(IO &Io, exegesis::BenchmarkKey &Obj,
                       YamlContext &Context) {
@@ -288,6 +304,33 @@ template <> struct MappingContextTraits<exegesis::BenchmarkKey, YamlContext> {
   }
 };
 
+template <> struct MappingTraits<exegesis::Benchmark::ObjectFile> {
+  struct NormalizedBase64Binary {
+    std::string Base64Str;
+
+    NormalizedBase64Binary(IO &) {}
+    NormalizedBase64Binary(IO &, const std::vector<uint8_t> &Data)
+        : Base64Str(llvm::encodeBase64(Data)) {}
+
+    std::vector<uint8_t> denormalize(IO &) {
+      std::vector<char> Buffer;
+      if (Error E = llvm::decodeBase64(Base64Str, Buffer))
+        report_fatal_error(std::move(E));
+
+      StringRef Data(Buffer.data(), Buffer.size());
+      return std::vector<uint8_t>(Data.bytes_begin(), Data.bytes_end());
+    }
+  };
+
+  static void mapping(IO &Io, exegesis::Benchmark::ObjectFile &Obj) {
+    Io.mapRequired("compression", Obj.CompressionFormat);
+    Io.mapRequired("original_size", Obj.UncompressedSize);
+    MappingNormalization<NormalizedBase64Binary, std::vector<uint8_t>>
+        ObjFileString(Io, Obj.CompressedBytes);
+    Io.mapRequired("compressed_bytes", ObjFileString->Base64Str);
+  }
+};
+
 template <> struct MappingContextTraits<exegesis::Benchmark, YamlContext> {
   struct NormalizedBinary {
     NormalizedBinary(IO &io) {}
@@ -325,9 +368,11 @@ template <> struct MappingContextTraits<exegesis::Benchmark, YamlContext> {
     Io.mapRequired("error", Obj.Error);
     Io.mapOptional("info", Obj.Info);
     // AssembledSnippet
-    MappingNormalization<NormalizedBinary, std::vector<uint8_t>> BinaryString(
+    MappingNormalization<NormalizedBinary, std::vector<uint8_t>> SnippetString(
         Io, Obj.AssembledSnippet);
-    Io.mapOptional("assembled_snippet", BinaryString->Binary);
+    Io.mapOptional("assembled_snippet", SnippetString->Binary);
+    // ObjectFile
+    Io.mapOptional("object_file", Obj.ObjFile);
   }
 };
 
@@ -364,6 +409,52 @@ Benchmark::readTriplesAndCpusFromYamls(MemoryBufferRef Buffer) {
   return Result;
 }
 
+Error Benchmark::setObjectFile(StringRef RawBytes) {
+  SmallVector<uint8_t> CompressedBytes;
+  llvm::compression::Format CompressionFormat;
+
+  auto isFormatAvailable = [](llvm::compression::Format F) -> bool {
+    switch (F) {
+    case compression::Format::Zstd:
+      return compression::zstd::isAvailable();
+    case compression::Format::Zlib:
+      return compression::zlib::isAvailable();
+    }
+  };
+  if (ForceObjectFileCompressionFormat.getNumOccurrences() > 0) {
+    CompressionFormat = ForceObjectFileCompressionFormat;
+    if (!isFormatAvailable(CompressionFormat))
+      return make_error<StringError>(
+          "The designated compression format is not available.",
+          inconvertibleErrorCode());
+  } else if (isFormatAvailable(compression::Format::Zstd)) {
+    // Try newer compression algorithm first.
+    CompressionFormat = compression::Format::Zstd;
+  } else if (isFormatAvailable(compression::Format::Zlib)) {
+    CompressionFormat = compression::Format::Zlib;
+  } else {
+    return make_error<StringError>(
+        "None of the compression methods is available.",
+        inconvertibleErrorCode());
+  }
+
+  switch (CompressionFormat) {
+  case compression::Format::Zstd:
+    compression::zstd::compress({RawBytes.bytes_begin(), RawBytes.bytes_end()},
+                                CompressedBytes);
+    break;
+  case compression::Format::Zlib:
+    compression::zlib::compress({RawBytes.bytes_begin(), RawBytes.bytes_end()},
+                                CompressedBytes);
+    break;
+  }
+
+  ObjFile = {CompressionFormat,
+             RawBytes.size(),
+             {CompressedBytes.begin(), CompressedBytes.end()}};
+  return Error::success();
+}
+
 Expected<Benchmark> Benchmark::readYaml(const LLVMState &State,
                                         MemoryBufferRef Buffer) {
   yaml::Input Yin(Buffer);
diff --git a/llvm/tools/llvm-exegesis/lib/BenchmarkResult.h b/llvm/tools/llvm-exegesis/lib/BenchmarkResult.h
index 3c09a8380146e5..a5217566204a14 100644
--- a/llvm/tools/llvm-exegesis/lib/BenchmarkResult.h
+++ b/llvm/tools/llvm-exegesis/lib/BenchmarkResult.h
@@ -21,6 +21,7 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/MC/MCInst.h"
 #include "llvm/MC/MCInstBuilder.h"
+#include "llvm/Support/Compression.h"
 #include "llvm/Support/YAMLTraits.h"
 #include <limits>
 #include <set>
@@ -76,6 +77,11 @@ struct BenchmarkKey {
   uintptr_t SnippetAddress = 0;
   // The register that should be used to hold the loop counter.
   unsigned LoopRegister;
+
+  bool operator==(const BenchmarkKey &RHS) const {
+    return Config == RHS.Config &&
+           Instructions[0].getOpcode() == RHS.Instructions[0].getOpcode();
+  }
 };
 
 struct BenchmarkMeasure {
@@ -122,6 +128,16 @@ struct Benchmark {
   std::string Error;
   std::string Info;
   std::vector<uint8_t> AssembledSnippet;
+
+  struct ObjectFile {
+    llvm::compression::Format CompressionFormat;
+    size_t UncompressedSize = 0;
+    std::vector<uint8_t> CompressedBytes;
+
+    bool isValid() const { return UncompressedSize && CompressedBytes.size(); }
+  };
+  std::optional<ObjectFile> ObjFile;
+
   // How to aggregate measurements.
   enum ResultAggregationModeE { Min, Max, Mean, MinVariance };
 
@@ -132,6 +148,10 @@ struct Benchmark {
   Benchmark &operator=(const Benchmark &) = delete;
   Benchmark &operator=(Benchmark &&) = delete;
 
+  // Compress raw object file bytes and assign the result and compression type
+  // to CompressedObjectFile and ObjFileCompression, respectively.
+  class Error setObjectFile(StringRef RawBytes);
+
   // Read functions.
   static Expected<Benchmark> readYaml(const LLVMState &State,
                                                  MemoryBufferRef Buffer);
diff --git a/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp b/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
index a7771b99e97b1a..3bca6ed13d8fc8 100644
--- a/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
+++ b/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
@@ -53,6 +53,12 @@
 namespace llvm {
 namespace exegesis {
 
+static cl::opt<bool>
+    DryRunMeasurement("dry-run-measurement",
+                      cl::desc("Run every steps in the measurement phase "
+                               "except executing the snippet."),
+                      cl::init(false), cl::Hidden);
+
 BenchmarkRunner::BenchmarkRunner(const LLVMState &State, Benchmark::ModeE Mode,
                                  BenchmarkPhaseSelectorE BenchmarkPhaseSelector,
                                  ExecutionModeE ExecutionMode,
@@ -140,13 +146,21 @@ class InProcessFunctionExecutorImpl : public BenchmarkRunner::FunctionExecutor {
     Scratch->clear();
     {
       auto PS = ET.withSavedState();
+      // We can't directly capture DryRunMeasurement in the lambda below.
+      bool DryRun = DryRunMeasurement;
       CrashRecoveryContext CRC;
       CrashRecoveryContext::Enable();
-      const bool Crashed = !CRC.RunSafely([this, Counter, ScratchPtr]() {
-        Counter->start();
-        this->Function(ScratchPtr);
-        Counter->stop();
-      });
+      const bool Crashed =
+          !CRC.RunSafely([this, Counter, ScratchPtr, DryRun]() {
+            if (DryRun) {
+              Counter->start();
+              Counter->stop();
+            } else {
+              Counter->start();
+              this->Function(ScratchPtr);
+              Counter->stop();
+            }
+          });
       CrashRecoveryContext::Disable();
       PS.reset();
       if (Crashed) {
@@ -610,6 +624,7 @@ Expected<SmallString<0>> BenchmarkRunner::assembleSnippet(
 Expected<BenchmarkRunner::RunnableConfiguration>
 BenchmarkRunner::getRunnableConfiguration(
     const BenchmarkCode &BC, unsigned MinInstructions, unsigned LoopBodySize,
+    Benchmark::RepetitionModeE RepetitionMode,
     const SnippetRepetitor &Repetitor) const {
   RunnableConfiguration RC;
 
@@ -654,12 +669,54 @@ BenchmarkRunner::getRunnableConfiguration(
                         LoopBodySize, GenerateMemoryInstructions);
     if (Error E = Snippet.takeError())
       return std::move(E);
+    // There is no need to serialize/deserialize the object file if we're
+    // simply running end-to-end measurements.
+    // Same goes for any repetition mode that requires more than a single
+    // snippet.
+    if (BenchmarkPhaseSelector < BenchmarkPhaseSelectorE::Measure &&
+        (RepetitionMode == Benchmark::Loop ||
+         RepetitionMode == Benchmark::Duplicate)) {
+      if (Error E = BenchmarkResult.setObjectFile(*Snippet))
+        return std::move(E);
+    }
     RC.ObjectFile = getObjectFromBuffer(*Snippet);
   }
 
   return std::move(RC);
 }
 
+Expected<BenchmarkRunner::RunnableConfiguration>
+BenchmarkRunner::getRunnableConfiguration(Benchmark &&B) const {
+  assert(B.ObjFile.has_value() && B.ObjFile->isValid() &&
+         "No serialized obejct file is attached?");
+  const Benchmark::ObjectFile &ObjFile = *B.ObjFile;
+  SmallVector<uint8_t> DecompressedObjFile;
+  switch (ObjFile.CompressionFormat) {
+  case compression::Format::Zstd:
+    if (!compression::zstd::isAvailable())
+      return make_error<StringError>("zstd is not available for decompression.",
+                                     inconvertibleErrorCode());
+    if (Error E = compression::zstd::decompress(ObjFile.CompressedBytes,
+                                                DecompressedObjFile,
+                                                ObjFile.UncompressedSize))
+      return std::move(E);
+    break;
+  case compression::Format::Zlib:
+    if (!compression::zlib::isAvailable())
+      return make_error<StringError>("zlib is not available for decompression.",
+                                     inconvertibleErrorCode());
+    if (Error E = compression::zlib::decompress(ObjFile.CompressedBytes,
+                                                DecompressedObjFile,
+                                                ObjFile.UncompressedSize))
+      return std::move(E);
+    break;
+  }
+
+  StringRef Buffer(reinterpret_cast<const char *>(DecompressedObjFile.begin()),
+                   DecompressedObjFile.size());
+  return RunnableConfiguration{std::move(B), getObjectFromBuffer(Buffer)};
+}
+
 Expected<std::unique_ptr<BenchmarkRunner::FunctionExecutor>>
 BenchmarkRunner::createFunctionExecutor(
     object::OwningBinary<object::ObjectFile> ObjectFile,
diff --git a/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h b/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h
index e688b814d1c83d..ef9446bdd5bbe8 100644
--- a/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h
+++ b/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h
@@ -54,18 +54,25 @@ class BenchmarkRunner {
     RunnableConfiguration &operator=(RunnableConfiguration &&) = delete;
     RunnableConfiguration &operator=(const RunnableConfiguration &) = delete;
 
+    Benchmark BenchmarkResult;
+    object::OwningBinary<object::ObjectFile> ObjectFile;
+
   private:
     RunnableConfiguration() = default;
 
-    Benchmark BenchmarkResult;
-    object::OwningBinary<object::ObjectFile> ObjectFile;
+    RunnableConfiguration(Benchmark &&B,
+                          object::OwningBinary<object::ObjectFile> &&OF)
+        : BenchmarkResult(std::move(B)), ObjectFile(std::move(OF)) {}
   };
 
   Expected<RunnableConfiguration>
   getRunnableConfiguration(const BenchmarkCode &Configuration,
                            unsigned MinInstructions, unsigned LoopUnrollFactor,
+                           Benchmark::RepetitionModeE RepetitionMode,
                            const SnippetRepetitor &Repetitor) const;
 
+  Expected<RunnableConfiguration> getRunnableConfiguration(Benchmark &&B) const;
+
   std::pair<Error, Benchmark>
   runConfiguration(RunnableConfiguration &&RC,
                    const std::optional<StringRef> &DumpFile,
diff --git a/llvm/tools/llvm-exegesis/llvm-exegesis.cpp b/llvm/tools/llvm-exegesis/llvm-exegesis.cpp
index fa37e05956be8c..a21f3bdb5fba5f 100644
--- a/llvm/tools/llvm-exegesis/llvm-exegesis.cpp
+++ b/llvm/tools/llvm-exegesis/llvm-exegesis.cpp
@@ -114,8 +114,7 @@ static cl::opt<bool> BenchmarkMeasurementsPrintProgress(
 
 static cl::opt<BenchmarkPhaseSelectorE> BenchmarkPhaseSelector(
     "benchmark-phase",
-    cl::desc(
-        "it is possible to stop the benchmarking process after some phase"),
+    cl::desc("Stop the benchmarking process after some phase"),
     cl::cat(BenchmarkOptions),
     cl::values(
         clEnumValN(BenchmarkPhaseSelectorE::PrepareSnippet, "prepare-snippet",
@@ -135,6 +134,13 @@ static cl::opt<BenchmarkPhaseSelectorE> BenchmarkPhaseSelector(
             "(default)")),
     cl::init(BenchmarkPhaseSelectorE::Measure));
 
+static cl::opt<std::string> RunMeasurement(
+    "run-measurement",
+    cl::desc(
+        "Run measurement phase with a benchmarks file generated previously"),
+    cl::cat(BenchmarkOptions), cl::value_desc("<benchmarks file>"),
+    cl::init(""));
+
 static cl::opt<bool>
     UseDummyPerfCounters("use-dummy-perf-counters",
                          cl::desc("Do not read real performance counters, use "
@@ -397,11 +403,55 @@ generateSnippets(const LLVMState &State, unsigned Opcode,
   return Benchmarks;
 }
 
-static void runBenchmarkConfigurations(
-    const LLVMState &State, ArrayRef<BenchmarkCode> Configurations,
+static void deserializeRunnableConfigurations(
+    std::vector<Benchmark> &Benchmarks, const BenchmarkRunner &Runner,
+    std::ve...
[truncated]

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments. I still need to go through the changes in llvm-exegesis.cpp.

@@ -278,6 +287,13 @@ template <> struct ScalarTraits<exegesis::RegisterValue> {
static const bool flow = true;
};

template <> struct ScalarEnumerationTraits<compression::Format> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we're supposed to create these for types that aren't owned by us (or at least that's what I was told when I attempted something similar). This might need to go with where the type is defined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// Same goes for any repetition mode that requires more than a single
// snippet.
if (BenchmarkPhaseSelector < BenchmarkPhaseSelectorE::Measure &&
(RepetitionMode == Benchmark::Loop ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you only selecting for these two repetition modes here?

Is it significantly more complicated to support the other repetition modes?

Copy link
Member Author

@mshockwave mshockwave Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you only selecting for these two repetition modes here?

loop and duplicate are the two that contain only a single snippet / object file per benchmark. Others need at least two different snippets / object files.

Is it significantly more complicated to support the other repetition modes?

Sort of. Take --repetition-mode=min which requires two snippets as an example, there are two ways to do it:

  1. Store both the loop and duplicate snippets in the same benchmark YAML record. In this case we also have to store the repetition mode but more importantly, we either have to "re-split" each benchmark record into their own benchmark instances, or teach both the benchmark runner and result aggregator about this.
  2. Store the loop and duplicate snippets in separate benchmark YAML records. In this case we might need to figure out which of these benchmarks belong to the same "group" (same opcode and same configuration).

Option (2) would probably be easier, but even that I think it will make this patch too big. So I rather add this feature incrementally.


# Negative tests: we shouldn't serialize object files in some scenarios.

# NO-SERIALIZE-NOT: object_file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be throwing an error here instead of just not emitting the object file snippet? This seems like it could be a bit confusing.

Copy link
Member Author

@mshockwave mshockwave Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I throw an error when the users try to use --serialize-benchmarks with unsupported repetition modes.

@@ -0,0 +1,33 @@
# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p470 --opcode-name=SH3ADD --benchmark-phase=assemble-measured-code --mode=latency --benchmarks-file=%t.yaml
# RUN: FileCheck --input-file=%t.yaml %s --check-prefixes=CHECK,SERIALIZE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some more comments in the test on what exactly you're testing for in different sections.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,33 @@
# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p470 --opcode-name=SH3ADD --benchmark-phase=assemble-measured-code --mode=latency --benchmarks-file=%t.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we might want to add a new flag that controls the emission of object_file? I would prefer not to have it when messing around with exegesis.

I've been wanting to do the same thing with the assembled_snippet output for a while given I don't find it particularly useful and it reduces performance (another round trip through the MCJIT/assembler), but that's a separate thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we might want to add a new flag that controls the emission of object_file? I would prefer not to have it when messing around with exegesis.

Previously I turned the serialization off if the user is running end-to-end measurement, but I agree that we should make the serialization optional. A new flag has been added

I've been wanting to do the same thing with the assembled_snippet output for a while given I don't find it particularly useful

It's useful to check if I generated the desired shape of snippets when it showed up in the inconsistency reports. But maybe that's just for RISC-V where ill-formed snippets are more common.

@mshockwave mshockwave force-pushed the patch/exegesis/serialize-benchmarks branch from 74da08e to 619a4c3 Compare January 13, 2025 22:38
# RUN: --dump-object-to-disk=%t.o | FileCheck %s --check-prefixes=CHECK,DESERIALIZE
# RUN: llvm-objdump -d %t.o | FileCheck %s --check-prefix=OBJDUMP

# We should not serialie benchmarks by default.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serialize*

CompressionFormat = compression::Format::Zlib;
} else {
return make_error<StringError>(
"None of the compression methods is available.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is -> are

SerializeBenchmarks("serialize-benchmarks",
cl::desc("Generate fully-serialized benchmarks "
"that can later be deserialized and "
"resuming the measurement."),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resuming -> resume

Configurations = ExitOnErr(readSnippets(State, SnippetsFile));
for (const auto &Configuration : Configurations) {
if (ExecutionMode != BenchmarkRunner::ExecutionModeE::SubProcess &&
(Configuration.Key.MemoryMappings.size() != 0 ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!empty()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants