generated from mlcommons/template
-
Notifications
You must be signed in to change notification settings - Fork 35
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #80 from woonyee28/mlperf-inference-results-scc24
Results on system scc7 NTUHPC
- Loading branch information
Showing
28 changed files
with
13,611 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
TBD |
3 changes: 3 additions & 0 deletions
3
...asurements/Coffeepot-nvidia_original-gpu-tensorrt-vdefault-scc24-main/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
| Model | Scenario | Accuracy | Throughput | Latency (in ms) | | ||
|---------------------|------------|-----------------------|--------------|-------------------| | ||
| stable-diffusion-xl | offline | (16.50375, 232.23582) | 4.188 | - | |
7 changes: 7 additions & 0 deletions
7
...able-diffusion-xl/offline/Coffeepot-nvidia_original-gpu-tensorrt-vdefault-scc24-main.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{ | ||
"starting_weights_filename": "https://github.com/mlcommons/cm4mlops/blob/main/script/get-ml-model-stable-diffusion/_cm.json#L174", | ||
"retraining": "no", | ||
"input_data_types": "int32", | ||
"weight_data_types": "int8", | ||
"weight_transformations": "quantization, affine fusion" | ||
} |
61 changes: 61 additions & 0 deletions
61
...original-gpu-tensorrt-vdefault-scc24-main/stable-diffusion-xl/offline/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
This experiment is generated using the [MLCommons Collective Mind automation framework (CM)](https://github.com/mlcommons/cm4mlops). | ||
|
||
*Check [CM MLPerf docs](https://docs.mlcommons.org/inference) for more details.* | ||
|
||
## Host platform | ||
|
||
* OS version: Linux-6.5.0-27-generic-x86_64-with-glibc2.29 | ||
* CPU version: x86_64 | ||
* Python version: 3.8.10 (default, Sep 11 2024, 16:02:53) | ||
[GCC 9.4.0] | ||
* MLCommons CM version: 3.4.1 | ||
|
||
## CM Run Command | ||
|
||
See [CM installation guide](https://docs.mlcommons.org/inference/install/). | ||
|
||
```bash | ||
pip install -U cmind | ||
|
||
cm rm cache -f | ||
|
||
cm pull repo mlcommons@cm4mlops --checkout=636343e1980e79ff6f3820e66b6b2f08add3ce46 | ||
|
||
cm run script \ | ||
--tags=run-mlperf,inference,_r4.1-dev,_short,_scc24-main \ | ||
--model=sdxl \ | ||
--implementation=nvidia \ | ||
--framework=tensorrt \ | ||
--category=datacenter \ | ||
--scenario=Offline \ | ||
--execution_mode=test \ | ||
--device=cuda \ | ||
--quiet \ | ||
--target_qps=4.9 \ | ||
--offline_target_qps=4.9 \ | ||
--batch_size=8 \ | ||
--test_query_count=500 \ | ||
--clean | ||
``` | ||
*Note that if you want to use the [latest automation recipes](https://docs.mlcommons.org/inference) for MLPerf (CM scripts), | ||
you should simply reload mlcommons@cm4mlops without checkout and clean CM cache as follows:* | ||
|
||
```bash | ||
cm rm repo mlcommons@cm4mlops | ||
cm pull repo mlcommons@cm4mlops | ||
cm rm cache -f | ||
|
||
``` | ||
|
||
## Results | ||
|
||
Platform: Coffeepot-nvidia_original-gpu-tensorrt-vdefault-scc24-main | ||
|
||
Model Precision: int8 | ||
|
||
### Accuracy Results | ||
`CLIP_SCORE`: `16.50375`, Required accuracy for closed division `>= 31.68632` and `<= 31.81332` | ||
`FID_SCORE`: `232.23582`, Required accuracy for closed division `>= 23.01086` and `<= 23.95008` | ||
|
||
### Performance Results | ||
`Samples per second`: `4.18764` |
96 changes: 96 additions & 0 deletions
96
...riginal-gpu-tensorrt-vdefault-scc24-main/stable-diffusion-xl/offline/accuracy_console.out
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
[2024-11-18 09:47:54,310 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC0. Skipping. | ||
[2024-11-18 09:47:54,311 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC1. Skipping. | ||
[2024-11-18 09:47:54,311 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC2. Skipping. | ||
[2024-11-18 09:47:54,422 main.py:229 INFO] Detected system ID: KnownSystem.newFourH100 | ||
/home/cmuser/.local/lib/python3.8/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning(). | ||
warnings.warn(_BETA_TRANSFORMS_WARNING) | ||
/home/cmuser/.local/lib/python3.8/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning(). | ||
warnings.warn(_BETA_TRANSFORMS_WARNING) | ||
[2024-11-18 09:47:56,249 generate_conf_files.py:107 INFO] Generated measurements/ entries for newFourH100_TRT/stable-diffusion-xl/Offline | ||
[2024-11-18 09:47:56,250 __init__.py:46 INFO] Running command: python3 -m code.stable-diffusion-xl.tensorrt.harness --logfile_outdir="/home/cmuser/CM/repos/local/cache/6712b485075c4fe8/test_results/1b41d1041a1b-nvidia_original-gpu-tensorrt-vdefault-scc24-main/stable-diffusion-xl/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=5000 --test_mode="AccuracyOnly" --gpu_batch_size=8 --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/02144893f8ce40a0/inference/mlperf.conf" --tensor_path="build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/" --use_graphs=false --user_conf_path="/home/cmuser/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/6216a4579b7f4250b03d916b96039c13.conf" --gpu_inference_streams=1 --gpu_copy_streams=1 --gpu_engines="./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan,./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan,./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan,./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan" --scenario Offline --model stable-diffusion-xl | ||
[2024-11-18 09:47:56,250 __init__.py:53 INFO] Overriding Environment | ||
[2024-11-18 09:47:58,985 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC0. Skipping. | ||
[2024-11-18 09:47:58,985 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC1. Skipping. | ||
[2024-11-18 09:47:58,985 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC2. Skipping. | ||
/home/cmuser/.local/lib/python3.8/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning(). | ||
warnings.warn(_BETA_TRANSFORMS_WARNING) | ||
/home/cmuser/.local/lib/python3.8/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning(). | ||
warnings.warn(_BETA_TRANSFORMS_WARNING) | ||
[2024-11-18 09:48:00,997 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan. | ||
[2024-11-18 09:48:01,144 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan. | ||
[2024-11-18 09:48:01,856 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan. | ||
[2024-11-18 09:48:03,677 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan. | ||
[2024-11-18 09:48:06,073 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan. | ||
[2024-11-18 09:48:06,206 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan. | ||
[2024-11-18 09:48:06,915 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan. | ||
[2024-11-18 09:48:08,694 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan. | ||
[2024-11-18 09:48:11,036 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan. | ||
[2024-11-18 09:48:11,172 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan. | ||
[2024-11-18 09:48:11,884 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan. | ||
[2024-11-18 09:48:13,622 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan. | ||
[2024-11-18 09:48:15,958 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan. | ||
[2024-11-18 09:48:16,091 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan. | ||
[2024-11-18 09:48:16,803 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan. | ||
[2024-11-18 09:48:18,551 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan. | ||
[2024-11-18 09:48:20,157 harness.py:207 INFO] Start Warm Up! | ||
[2024-11-18 09:49:19,420 harness.py:209 INFO] Warm Up Done! | ||
[2024-11-18 09:49:19,420 harness.py:211 INFO] Start Test! | ||
[2024-11-18 09:49:34,322 backend.py:801 INFO] [Server] Received 50 total samples | ||
[2024-11-18 09:49:34,322 backend.py:809 INFO] [Device 0] Reported 8 samples | ||
[2024-11-18 09:49:34,322 backend.py:809 INFO] [Device 1] Reported 16 samples | ||
[2024-11-18 09:49:34,322 backend.py:809 INFO] [Device 2] Reported 16 samples | ||
[2024-11-18 09:49:34,322 backend.py:809 INFO] [Device 3] Reported 10 samples | ||
[2024-11-18 09:49:34,322 harness.py:214 INFO] Test Done! | ||
[2024-11-18 09:49:34,322 harness.py:216 INFO] Destroying SUT... | ||
[2024-11-18 09:49:34,322 harness.py:219 INFO] Destroying QSL... | ||
benchmark : Benchmark.SDXL | ||
buffer_manager_thread_count : 0 | ||
data_dir : /home/cmuser/CM/repos/local/cache/5a62a909e14a4c17/data | ||
gpu_batch_size : 8 | ||
gpu_copy_streams : 1 | ||
gpu_inference_streams : 1 | ||
input_dtype : int32 | ||
input_format : linear | ||
log_dir : /home/cmuser/CM/repos/local/cache/bcfcc3f269b147a7/repo/closed/NVIDIA/build/logs/2024.11.18-09.47.48 | ||
mlperf_conf_path : /home/cmuser/CM/repos/local/cache/02144893f8ce40a0/inference/mlperf.conf | ||
model_path : /home/cmuser/CM/repos/local/cache/5a62a909e14a4c17/models/SDXL/ | ||
offline_expected_qps : 4.0 | ||
precision : int8 | ||
preprocessed_data_dir : /home/cmuser/CM/repos/local/cache/5a62a909e14a4c17/preprocessed_data | ||
scenario : Scenario.Offline | ||
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='AMD EPYC 9654 96-Core Processor', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=96, threads_per_core=1): 1}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=1.584936672, byte_suffix=<ByteSuffix.TB: (1000, 4)>, _num_bytes=1584936672000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA H100 PCIe', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=79.6474609375, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=85520809984), max_power_limit=350.0, pci_id='0x233110DE', compute_sm=90): 4})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=2), system_id='newFourH100') | ||
tensor_path : build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/ | ||
test_mode : AccuracyOnly | ||
use_graphs : False | ||
user_conf_path : /home/cmuser/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/6216a4579b7f4250b03d916b96039c13.conf | ||
system_id : newFourH100 | ||
config_name : newFourH100_stable-diffusion-xl_Offline | ||
workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP) | ||
optimization_level : plugin-enabled | ||
num_profiles : 1 | ||
config_ver : custom_k_99_MaxP | ||
accuracy_level : 99% | ||
inference_server : custom | ||
skip_file_checks : False | ||
power_limit : None | ||
cpu_freq : None | ||
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan | ||
[2024-11-18 09:49:37,158 run_harness.py:166 INFO] Result: Accuracy run detected. | ||
|
||
======================== Result summaries: ======================== | ||
|
Oops, something went wrong.