Add ROCm SDXL testing and benchmarking. #17183

saienduri · 2024-04-25T23:19:06Z

ci-exactly: build_packages, regression_test_cpu, regression_test_amdgpu_vulkan, regression_test_amdgpu_rocm, regression_test_nvidiagpu_vulkan, regression_test_nvidiagpu_cuda

ScottTodd

For future work in this repo, please either use a username-prefixed branch name or a fork of the repo. Unprefixed branch names in the main repo are (should be) reserved for long-lived, shared branches.

Also nit: please capitalize the PR title and use sentence style like Add ROCm SDXL testing and benchmarking.

ScottTodd · 2024-04-26T15:26:24Z

.github/workflows/pkgci_regression_test_amdgpu_rocm.yml

+  linux_x86_64_rocm_models:
+    name: Linux (x86_64) AMDGPU Rocm Model Testing + Benchmark


Put MI250 in these names.

Consider these names
PkgCI / Regression Test AMDGPU-Vulkan / Linux (x86_64) Model Testing
PkgCI / Regression Test AMDGPU-ROCm / Linux (x86_64) AMDGPU Rocm Model Testing + Benchmar

...as they appear in checks:

... and on the 'run logs' page:

The names are long and have some redundant information.

How about Linux MI250 - Models here, for PkgCI / Regression Test AMDGPU-ROCm / Linux MI250 - Models. Or just MI250 - Models

ScottTodd · 2024-04-26T15:31:06Z

build_tools/pkgci/external_test_suite/gpu_rocm_models.json

+    "config_name": "gpu_rocm",
+    "iree_compile_flags" : [
+      "--iree-hal-target-backends=rocm",
+      "--iree-rocm-target-chip=gfx90a",


ROCm compilation is tied to a specific chip, so we should call out that chip (either by identifier like gfx90a, or by name like MI250). That should be reflected in both the config_name and the file name. If we run model tests on w7900, that will use a different chip (and thus config file).

Makes sense, multiple machines have same chip name, so will go with the chip

ScottTodd · 2024-04-26T15:34:26Z

build_tools/pkgci/external_test_suite/gpu_rocm_models.json

+      "--iree-opt-const-eval=false",
+      "--iree-codegen-transform-dialect-library=attention_and_matmul_spec.mlir"


This might be a good time to test both default flags and some "additional flags" (can choose a better name for that).

If default flags fail compilation, that's good to test with XFAIL.

ScottTodd · 2024-04-26T15:36:16Z

.github/workflows/pkgci_regression_test_cpu.yml

+            -n 4 \
+            -rpfE \
+            -k real_weights \
+            --no-skip-tests-missing-files \
+            --capture=no \
+            --timeout=1200 \
+            --retries 2 \
+            --retry-delay 5 \
+            --durations=0 \


As we're repeating flags in these workflows, we could move some of them to config files: https://docs.pytest.org/en/stable/reference/customize.html

ScottTodd · 2024-04-26T15:38:38Z

build_tools/pkgci/external_test_suite/sdxl_scheduled_unet_cpu_llvm_task.json

+      "--parameters=model=real_weights.irpa",
+      "--module=sdxl_scheduled_unet_pipeline_fp16_cpu.vmfb",
+      "--input=1x4x128x128xf16=@inference_input.0.bin",
+      "--input=2x64x2048xf16=@inference_input.1.bin",
+      "--input=2x1280xf16=@inference_input.2.bin",
+      "--input=1xf16=@inference_input.3.bin",
+      "--expected_output=1x4x128x128xf16=@inference_output.0.bin",
+      "--expected_f16_threshold=0.8f"


These flags usually live with the model itself and not as part of the config file, but the "scheduled" part of this model / test case is a bit special. Let's keep an eye on that and aim to refactor later. Could file an issue in the test suite repo.

ScottTodd · 2024-04-26T15:39:22Z

build_tools/pkgci/external_test_suite/sdxl_scheduled_unet_gpu_rocm.json

+    "config_name": "gpu_rocm",
+    "iree_compile_flags" : [
+      "--iree-hal-target-backends=rocm",
+      "--iree-rocm-target-chip=gfx90a",


MI250? (anything specific to a particular device should call that out in the file name, so we can add other devices without name conflicts)

ScottTodd · 2024-04-26T15:39:55Z

build_tools/pkgci/external_test_suite/sdxl_scheduled_unet_gpu_rocm.json

+      "--iree-global-opt-propagate-transposes=true",
+      "--iree-opt-outer-dim-concat=true",
+      "--iree-vm-target-truncate-unsupported-floats",
+      "--iree-llvmgpu-enable-prefetch=true",
+      "--iree-opt-data-tiling=false",
+      "--iree-codegen-gpu-native-math-precision=true",
+      "--iree-codegen-llvmgpu-use-vector-distribution",
+      "--iree-preprocessing-pass-pipeline=builtin.module(iree-preprocessing-transpose-convolution-pipeline, util.func(iree-preprocessing-pad-to-intrinsics))"


Default flags (with XFAIL as needed) and "additional" flags would be good to cover here too

ScottTodd · 2024-04-26T15:48:21Z

.github/workflows/pkgci_regression_test_amdgpu_rocm.yml

+      - name: "Running SDXL rocm pipeline benchmark"
+        run: |
+          source ${VENV_DIR}/bin/activate
+          bash SHARK-TestSuite/iree_tests/benchmarks/benchmark_sdxl_rocm.sh


Let's keep an eye on this and explore ways to improve the ergonomics.

Brainstorming:

Document what specifically is being measured (what device, what program, what flags)

Documentation can be in a comment here, included in logs, hosted on wiki / developer doc / readme file, etc.

We could plug in to IREE's existing benchmarking infrastructure to get data into https://perf.iree.dev/ and PR comments, but setup there is tricky

A lighter weight option than posting a PR comment is adding a job summary: https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#adding-a-job-summary. We use that for e.g. https://github.com/iree-org/iree/actions/runs/8840713851?pr=17183#summary-24276534845

Yeah sure, definitely improvements to be made here

saienduri · 2024-04-26T20:55:29Z

should be good to go for now

ScottTodd · 2024-04-26T21:22:13Z

It doesn't have to be in this PR, but let's also keep an eye on config file naming and consider creating some subfolders

gpu_rocm_models_additional_flags_gfx90a.json
gpu_rocm_models_gfx90a.json
onnx_cpu_llvm_sync.json
onnx_gpu_cuda.json
onnx_gpu_rocm_rdna3.json
onnx_gpu_vulkan.json
pytorch_cpu_llvm_task.json
pytorch_models_cpu_llvm_task.json
pytorch_models_gpu_vulkan.json
sdxl_scheduled_unet_cpu_llvm_task.json
sdxl_scheduled_unet_gpu_rocm_gfx90a.json

We should pick a naming scheme like [framework]_[device]_[options].json that allows for intuitive sorting and stick to it.

benvanik · 2024-04-26T21:26:45Z

+1 to naming things better going forward (our current situation is bad, so not hypothetical :)
good litmus test for a good format is "could someone write a regex for this?"

saienduri · 2024-04-26T21:34:32Z

Yeah sure let's come up with a universal way to name these config files in another PR

ci-exactly: build_packages, regression_test_cpu, regression_test_amdgpu_vulkan, regression_test_amdgpu_rocm, regression_test_nvidiagpu_vulkan, regression_test_nvidiagpu_cuda Signed-off-by: Lubo Litchev <[email protected]>

saienduri added 3 commits April 25, 2024 16:12

add in rocm updates

61ff50d

update paths and benchmarking

ed755da

update names

5b325d7

saienduri requested review from ScottTodd and pzread as code owners April 25, 2024 23:19

fix trailing spaces

4e9e912

ScottTodd requested changes Apr 26, 2024

View reviewed changes

ScottTodd added the infrastructure/benchmark Relating to benchmarking infrastructure label Apr 26, 2024

saienduri changed the title ~~add in rocm sdxl testing + benchmarking~~ Add ROCm SDXL testing and benchmarking. Apr 26, 2024

saienduri added 5 commits April 26, 2024 10:56

address comments

d7a7448

add json ext to file name

cffbab5

update flags and try minimal scheduled unet

5aecc8d

only remove spec

a6c43d6

add back attention spec

a2d4350

saienduri requested a review from ScottTodd April 26, 2024 20:55

ScottTodd approved these changes Apr 26, 2024

View reviewed changes

add in default config

fcb6e38

remove prompt encoder from xfail

1daa61c

saienduri merged commit 8735b2e into main Apr 26, 2024
54 checks passed

saienduri deleted the sdxl-rocm branch April 26, 2024 22:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ROCm SDXL testing and benchmarking. #17183

Add ROCm SDXL testing and benchmarking. #17183

saienduri commented Apr 25, 2024

ScottTodd left a comment •

edited

Loading

ScottTodd Apr 26, 2024

ScottTodd Apr 26, 2024

saienduri Apr 26, 2024

ScottTodd Apr 26, 2024

ScottTodd Apr 26, 2024

ScottTodd Apr 26, 2024

ScottTodd Apr 26, 2024

ScottTodd Apr 26, 2024

ScottTodd Apr 26, 2024

saienduri Apr 26, 2024

saienduri commented Apr 26, 2024

ScottTodd commented Apr 26, 2024

benvanik commented Apr 26, 2024

saienduri commented Apr 26, 2024

		linux_x86_64_rocm_models:
		name: Linux (x86_64) AMDGPU Rocm Model Testing + Benchmark

		"--iree-opt-const-eval=false",
		"--iree-codegen-transform-dialect-library=attention_and_matmul_spec.mlir"

Add ROCm SDXL testing and benchmarking. #17183

Add ROCm SDXL testing and benchmarking. #17183

Conversation

saienduri commented Apr 25, 2024

ScottTodd left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saienduri commented Apr 26, 2024

ScottTodd commented Apr 26, 2024

benvanik commented Apr 26, 2024

saienduri commented Apr 26, 2024

ScottTodd left a comment •

edited

Loading