Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ROCm SDXL testing and benchmarking. #17183

Merged
merged 11 commits into from
Apr 26, 2024
Merged

Add ROCm SDXL testing and benchmarking. #17183

merged 11 commits into from
Apr 26, 2024

Conversation

saienduri
Copy link
Collaborator

ci-exactly: build_packages, regression_test_cpu, regression_test_amdgpu_vulkan, regression_test_amdgpu_rocm, regression_test_nvidiagpu_vulkan, regression_test_nvidiagpu_cuda

Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future work in this repo, please either use a username-prefixed branch name or a fork of the repo. Unprefixed branch names in the main repo are (should be) reserved for long-lived, shared branches.

Also nit: please capitalize the PR title and use sentence style like Add ROCm SDXL testing and benchmarking.

Comment on lines 78 to 79
linux_x86_64_rocm_models:
name: Linux (x86_64) AMDGPU Rocm Model Testing + Benchmark
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put MI250 in these names.

Consider these names
PkgCI / Regression Test AMDGPU-Vulkan / Linux (x86_64) Model Testing
PkgCI / Regression Test AMDGPU-ROCm / Linux (x86_64) AMDGPU Rocm Model Testing + Benchmar

...as they appear in checks:
image

... and on the 'run logs' page:
image

The names are long and have some redundant information.

How about Linux MI250 - Models here, for PkgCI / Regression Test AMDGPU-ROCm / Linux MI250 - Models. Or just MI250 - Models

Comment on lines 2 to 5
"config_name": "gpu_rocm",
"iree_compile_flags" : [
"--iree-hal-target-backends=rocm",
"--iree-rocm-target-chip=gfx90a",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROCm compilation is tied to a specific chip, so we should call out that chip (either by identifier like gfx90a, or by name like MI250). That should be reflected in both the config_name and the file name. If we run model tests on w7900, that will use a different chip (and thus config file).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, multiple machines have same chip name, so will go with the chip

Comment on lines 6 to 7
"--iree-opt-const-eval=false",
"--iree-codegen-transform-dialect-library=attention_and_matmul_spec.mlir"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a good time to test both default flags and some "additional flags" (can choose a better name for that).

If default flags fail compilation, that's good to test with XFAIL.

Comment on lines +126 to +134
-n 4 \
-rpfE \
-k real_weights \
--no-skip-tests-missing-files \
--capture=no \
--timeout=1200 \
--retries 2 \
--retry-delay 5 \
--durations=0 \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we're repeating flags in these workflows, we could move some of them to config files: https://docs.pytest.org/en/stable/reference/customize.html

Comment on lines +10 to +17
"--parameters=model=real_weights.irpa",
"--module=sdxl_scheduled_unet_pipeline_fp16_cpu.vmfb",
"--input=1x4x128x128xf16=@inference_input.0.bin",
"--input=2x64x2048xf16=@inference_input.1.bin",
"--input=2x1280xf16=@inference_input.2.bin",
"--input=1xf16=@inference_input.3.bin",
"--expected_output=1x4x128x128xf16=@inference_output.0.bin",
"--expected_f16_threshold=0.8f"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These flags usually live with the model itself and not as part of the config file, but the "scheduled" part of this model / test case is a bit special. Let's keep an eye on that and aim to refactor later. Could file an issue in the test suite repo.

Comment on lines 2 to 5
"config_name": "gpu_rocm",
"iree_compile_flags" : [
"--iree-hal-target-backends=rocm",
"--iree-rocm-target-chip=gfx90a",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MI250? (anything specific to a particular device should call that out in the file name, so we can add other devices without name conflicts)

Comment on lines 8 to 15
"--iree-global-opt-propagate-transposes=true",
"--iree-opt-outer-dim-concat=true",
"--iree-vm-target-truncate-unsupported-floats",
"--iree-llvmgpu-enable-prefetch=true",
"--iree-opt-data-tiling=false",
"--iree-codegen-gpu-native-math-precision=true",
"--iree-codegen-llvmgpu-use-vector-distribution",
"--iree-preprocessing-pass-pipeline=builtin.module(iree-preprocessing-transpose-convolution-pipeline, util.func(iree-preprocessing-pad-to-intrinsics))"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default flags (with XFAIL as needed) and "additional" flags would be good to cover here too

Comment on lines +148 to +151
- name: "Running SDXL rocm pipeline benchmark"
run: |
source ${VENV_DIR}/bin/activate
bash SHARK-TestSuite/iree_tests/benchmarks/benchmark_sdxl_rocm.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep an eye on this and explore ways to improve the ergonomics.

Brainstorming:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure, definitely improvements to be made here

@ScottTodd ScottTodd added the infrastructure/benchmark Relating to benchmarking infrastructure label Apr 26, 2024
@saienduri saienduri changed the title add in rocm sdxl testing + benchmarking Add ROCm SDXL testing and benchmarking. Apr 26, 2024
@saienduri saienduri requested a review from ScottTodd April 26, 2024 20:55
@saienduri
Copy link
Collaborator Author

should be good to go for now

@ScottTodd
Copy link
Member

It doesn't have to be in this PR, but let's also keep an eye on config file naming and consider creating some subfolders

gpu_rocm_models_additional_flags_gfx90a.json
gpu_rocm_models_gfx90a.json
onnx_cpu_llvm_sync.json
onnx_gpu_cuda.json
onnx_gpu_rocm_rdna3.json
onnx_gpu_vulkan.json
pytorch_cpu_llvm_task.json
pytorch_models_cpu_llvm_task.json
pytorch_models_gpu_vulkan.json
sdxl_scheduled_unet_cpu_llvm_task.json
sdxl_scheduled_unet_gpu_rocm_gfx90a.json

We should pick a naming scheme like [framework]_[device]_[options].json that allows for intuitive sorting and stick to it.

@benvanik
Copy link
Collaborator

+1 to naming things better going forward (our current situation is bad, so not hypothetical :)
good litmus test for a good format is "could someone write a regex for this?"

@saienduri
Copy link
Collaborator Author

Yeah sure let's come up with a universal way to name these config files in another PR

@saienduri saienduri merged commit 8735b2e into main Apr 26, 2024
54 checks passed
@saienduri saienduri deleted the sdxl-rocm branch April 26, 2024 22:48
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024
ci-exactly: build_packages, regression_test_cpu,
regression_test_amdgpu_vulkan, regression_test_amdgpu_rocm,
regression_test_nvidiagpu_vulkan, regression_test_nvidiagpu_cuda

Signed-off-by: Lubo Litchev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure/benchmark Relating to benchmarking infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants