[MetaSchedule] Add Script for TorchBench Model Tuning & Benchmarking #12914

yelite · 2022-09-27T13:32:04Z

This PR adds a script to tune and benchmark TorchBench models, using torchdynamo and the pytorch importer in TVM.

python/tvm/meta_schedule/testing/torchbench/run.py

junrushao · 2022-09-27T16:04:38Z

python/tvm/meta_schedule/testing/torchbench/run.py

+pip3 install --pre \
+    --extra-index-url https://download.pytorch.org/whl/nightly/cu116 \
+    torch==1.13.0.dev20220926 \
+    torchvision==0.14.0.dev20220926 \
+    torchtext==0.14.0.dev20220926 


Quick quesiton: how long does it guarantee pytorch nightly to persist on the URL? I mean, is this instruction going to expire some time in the future because of server cleanup?

The earliest wheel on that index was built on 07/30. So I guess the retention is 60 days. Those wheels are pretty large in size (1 GB). I added a message in the comment string to suggest trying latest nightly if this one can not be found

python/tvm/meta_schedule/testing/torchbench/run.py

junrushao · 2022-09-27T16:10:28Z

python/tvm/meta_schedule/testing/torchbench/run.py

+        mod.run()
+        result = [torch.from_dlpack(mod.get_output(i)) for i in range(mod.get_num_outputs())]
+        if IS_CUDA:
+            torch.cuda.synchronize()


let's think twice about the synchronization here. two questions:

torch.cuda.synchronize or tvm.cuda(0).sync()

do we want to move the sync one line up before mod.get_output()?

My thoughts are:

We should call torch's synchronize before the computation here then call tvm's synchronize afterward, letting each library's sync to wait the computation from their side.

We probably want to move sync before get_output. I believe that it doesn't matter as long as torch.from_dlpack(mod.get_output(i)) is zero-copy. But it doesn't hurt to move sync before that line.

junrushao · 2022-09-27T16:11:14Z

python/tvm/meta_schedule/testing/torchbench/run.py

+    def forward(*args):
+        if IS_CUDA:
+            torch.cuda.synchronize()
+        args = [arg.contiguous() for arg in args]


qq: is it going to incur an unnecessary extra copy if arg has been contiguous already, or it's going to be a no-op?

It's no-op if arg is contiguous.

python/tvm/meta_schedule/testing/torchbench/run.py

junrushao

high-level comments: per-method documentation is needed.

zxybazh

Generally looking good. Would you please address my comments and add some type annotations? Thanks.

python/tvm/meta_schedule/testing/torchbench/run.py

zxybazh · 2022-09-27T20:46:22Z

python/tvm/meta_schedule/testing/torchbench/run.py

+        "--benchmark-repeat",
+        type=int,
+        default=10,
+        help="The number of times to repeat the benchmark measurement.",


Just curious, can we customize other benchmarking details like warm up rounds, time between measurements, etc.?

In torchdynamo there is warm up rounds. Adding this could be an option here.

Benchmark runner from TorchDynamo doesn't have these parameters exposed (Their args can be found at https://github.com/pytorch/torchdynamo/blob/main/benchmarks/common.py#L1363). But we can still implement these customization if needed.

python/tvm/meta_schedule/testing/torchbench/run.py

python/tvm/meta_schedule/testing/torchbench/utils.py

python/tvm/meta_schedule/testing/torchbench/run.py

shingjan · 2022-09-27T21:38:30Z

python/tvm/meta_schedule/testing/torchbench/run.py

+machine than the one executes tuning.
+```bash
+python python/tvm/meta_schedule/testing/torchbench/run.py \
+    --mode eval \


NIT: As no perf evaluation will be done with --tuning, I feel like we should combine tuning and all, if perf evaluation doesn't really take much time.

This option is created to support running this script on a machine without the target GPU. For example, the tuning can be done with the help of RPC runners on a machine with 3070 while targeting A100.

We still require the host machine to have GPU because the model provided from TorchBench could potentially be different on CPU versus on CUDA. If we implement the remote task extraction (we probably will), we can even run this script on a machine without GPU to tune the model.

python/tvm/meta_schedule/testing/torchbench/run.py

shingjan · 2022-09-27T21:51:14Z

python/tvm/meta_schedule/testing/torchbench/run.py

+        "--benchmark-repeat",
+        type=int,
+        default=10,
+        help="The number of times to repeat the benchmark measurement.",


In torchdynamo there is warm up rounds. Adding this could be an option here.

shingjan · 2022-09-27T21:56:28Z

python/tvm/meta_schedule/testing/torchbench/run.py

+        for idx, arg in enumerate(args, 0):
+            mod.set_input(
+                f"inp_{idx}",
+                tvm.nd.from_dlpack(arg),


i think this could potentially be a problem and that is reason why in torchdynamo's TVM backend torch.Tensor is converted to numpy and then to TVM.NDarray. And if the arg is typed torch.Tensor you may need torch.utils.dlpack.to_dlpack(arg) for this approach as well.

That TVM backend actually doesn't work on CUDA. Converting tensor to numpy array will fail if it's on CUDA

>>> torch.zeros((5, 5), device="cuda").numpy() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I guess the only problem here is from the boolean tensor (an maybe also tensor with unaligned memory). I will create a followup PR which uses Yaoda's work on TVM PyTorch integration to replace these code. That integration can handle these edge cases with minimal numbers of data copies.

to_dlpack is considered as a legacy interface (https://pytorch.org/docs/stable/dlpack.html#torch.utils.dlpack.to_dlpack). The new approach is to have a __dlpack__ method on the object (like torch.Tensor) and the importing function (like tvm.nd.from_dlpack) can call it to get the capsule.

python/tvm/meta_schedule/testing/torchbench/run.py

zxybazh

Looking good to me, would you please fix the CI?

zxybazh · 2022-09-28T20:36:01Z

python/tvm/meta_schedule/testing/torchbench/run.py

+from enum import Enum
+from typing import Callable, List, Tuple
+
+import numpy as np  # type: ignore


Not sure if we need type: ignore here for imported libraries, any particular reason?

They are for suppressing errors like Cannot find implementation or library stub for module named "torch"(https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-12914/7/pipeline#step-97-log-74)

python/tvm/meta_schedule/testing/torchbench/run.py

…pache#12914) This PR adds a script to tune and benchmark TorchBench models, using torchdynamo and the pytorch importer in TVM.

github-actions bot requested review from junrushao and zxybazh September 27, 2022 13:41

junrushao reviewed Sep 27, 2022

View reviewed changes

python/tvm/meta_schedule/testing/torchbench/run.py Outdated Show resolved Hide resolved

junrushao reviewed Sep 27, 2022

View reviewed changes

python/tvm/meta_schedule/testing/torchbench/run.py Outdated Show resolved Hide resolved

junrushao reviewed Sep 27, 2022

View reviewed changes

python/tvm/meta_schedule/testing/torchbench/run.py Show resolved Hide resolved

junrushao reviewed Sep 27, 2022

View reviewed changes

python/tvm/meta_schedule/testing/torchbench/run.py Outdated Show resolved Hide resolved

junrushao reviewed Sep 27, 2022

View reviewed changes

python/tvm/meta_schedule/testing/torchbench/run.py Outdated Show resolved Hide resolved

junrushao reviewed Sep 27, 2022

View reviewed changes

python/tvm/meta_schedule/testing/torchbench/run.py Show resolved Hide resolved

junrushao reviewed Sep 27, 2022

View reviewed changes

python/tvm/meta_schedule/testing/torchbench/run.py Outdated Show resolved Hide resolved

junrushao reviewed Sep 27, 2022

View reviewed changes

python/tvm/meta_schedule/testing/torchbench/run.py Outdated Show resolved Hide resolved

junrushao reviewed Sep 27, 2022

View reviewed changes

zxybazh reviewed Sep 27, 2022

View reviewed changes

shingjan suggested changes Sep 27, 2022

View reviewed changes

zxybazh reviewed Sep 28, 2022

View reviewed changes

python/tvm/meta_schedule/testing/torchbench/run.py Outdated Show resolved Hide resolved

zxybazh approved these changes Sep 28, 2022

View reviewed changes

yelite force-pushed the torchdynamo-tuning-script branch from fb909b8 to c737c1c Compare September 28, 2022 20:16

zxybazh changed the title ~~Add a script to tune and benchmark models from TorchBench~~ [MetaSchedule] Add Script for TorchBench Model Tuning & Benchmarking Sep 28, 2022

zxybazh reviewed Sep 28, 2022

View reviewed changes

python/tvm/meta_schedule/testing/torchbench/run.py Outdated Show resolved Hide resolved

yelite force-pushed the torchdynamo-tuning-script branch 4 times, most recently from 8a50453 to 5affc12 Compare September 29, 2022 04:51

Add a script to run torchbench with TVM

6b7fbbc

yelite force-pushed the torchdynamo-tuning-script branch from 5affc12 to 6b7fbbc Compare September 29, 2022 11:48

junrushao merged commit 2379917 into apache:main Sep 29, 2022

leandron mentioned this pull request Feb 1, 2023

TVM v0.11.0 Release Candidate Notes #13899

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MetaSchedule] Add Script for TorchBench Model Tuning & Benchmarking #12914

[MetaSchedule] Add Script for TorchBench Model Tuning & Benchmarking #12914

yelite commented Sep 27, 2022

junrushao Sep 27, 2022

yelite Sep 27, 2022

junrushao Sep 27, 2022

yelite Sep 27, 2022 •

edited

Loading

junrushao Sep 27, 2022

yelite Sep 27, 2022

junrushao left a comment

zxybazh left a comment

zxybazh Sep 27, 2022

shingjan Sep 27, 2022

yelite Sep 28, 2022

shingjan Sep 27, 2022

yelite Sep 28, 2022

shingjan Sep 27, 2022

shingjan Sep 27, 2022

yelite Sep 28, 2022 •

edited

Loading

zxybazh left a comment

zxybazh Sep 28, 2022

yelite Sep 28, 2022

[MetaSchedule] Add Script for TorchBench Model Tuning & Benchmarking #12914

[MetaSchedule] Add Script for TorchBench Model Tuning & Benchmarking #12914

Conversation

yelite commented Sep 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yelite Sep 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junrushao left a comment

Choose a reason for hiding this comment

zxybazh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yelite Sep 28, 2022 • edited Loading

Choose a reason for hiding this comment

zxybazh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yelite Sep 27, 2022 •

edited

Loading

yelite Sep 28, 2022 •

edited

Loading