Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark_time includes cluster startup time #238

Closed
gjoseph92 opened this issue Aug 10, 2022 · 4 comments · Fixed by #239
Closed

benchmark_time includes cluster startup time #238

gjoseph92 opened this issue Aug 10, 2022 · 4 comments · Fixed by #239

Comments

@gjoseph92
Copy link
Contributor

The benchmark_time fixture simply times how long it takes the entire test to run:
https://github.com/coiled/coiled-runtime/blob/54c49ccac9546188acc5bc1099ab1b31e3ccae1a/conftest.py#L206-L212

However, that doesn't distinguish between cluster startup time (and any test initialization time) and actual runtime of the core workload.

  1. Cluster startup is somewhat availability-dependent and we should expect to be a little variable
  2. Improvements/regressions to cluster startup time might look like improvements/regressions in dask performance, and visa versa

Both are valuable metrics to track, but they shouldn't be combined into one.

cc @ian-r-rose @jrbourbeau

@ian-r-rose
Copy link
Contributor

That's right, this is a very crude metric, but a few notes:

  1. It's a function-scoped benchmark, so tests which include a module-scoped cluster fixture (as most of them do) will not include the cluster startup, unless they specifically include a cluster spinup in the test, like test_default_cluster_spinup_time(). Other potentially costly things like transferring large benchmarking data from the scheduler to the client, uploading performance reports, restarting the client, etc might get caught up in this, however.
  2. An important metric that we'll want to include are timing data for task groups, including compute time, disk spill time, transfer time, etc. I have some WIP work for that, but it hasn't landed yet (though note there are spots in the benchmark schema form them already). I think that these are complementary to a "test wall-clock" time fixture.

I'm happy to hear suggestions on how to improve this fixture, or what additional metrics to collect.

@gjoseph92
Copy link
Contributor Author

It's a function-scoped benchmark, so tests which include a module-scoped cluster fixture (as most of them do) will not include the cluster startup

Okay, maybe that's the pattern we need to use then. For the benchmarks I'm adding in #191 though, I was planning to use a number of different cluster sizes and configurations, which makes using a single cluster fixture hard. Just creating the cluster inline in the test is easier to read for those cases IMO.

Maybe benchmark_time should be a contextmanager, so you can time just the relevant block, and there could be an auto_benchmark_time fixture to time the whole test that you can use when appropriate?

@ian-r-rose
Copy link
Contributor

Maybe benchmark_time should be a contextmanager, so you can time just the relevant block, and there could be an auto_benchmark_time fixture to time the whole test that you can use when appropriate?

Yes, this is very doable -- a number of other fixtures here are structured as context managers, including the memory sampler: https://github.com/coiled/coiled-runtime/blob/80ffc2e2564e19ea4d623b4f5b52975dc9845ef6/conftest.py#L217-L231

@ian-r-rose
Copy link
Contributor

Something like the following would probably do the trick:

diff --git a/conftest.py b/conftest.py
index a9ae89a..9528444 100644
--- a/conftest.py
+++ b/conftest.py
@@ -201,17 +201,27 @@ def test_run_benchmark(benchmark_db_session, request, testrun_uid):
         benchmark_db_session.commit()
 
 
-@pytest.fixture(scope="function", autouse=True)
+@pytest.fixture(scope="function")
 def benchmark_time(test_run_benchmark):
-    if not test_run_benchmark:
-        yield
-    else:
-        start = time.time()
+    @contextlib.contextmanager
+    def _benchmark_time():
+        if not test_run_benchmark:
+            yield
+        else:
+            start = time.time()
+            yield
+            end = time.time()
+            test_run_benchmark.duration = end - start
+            test_run_benchmark.start = datetime.datetime.utcfromtimestamp(start)
+            test_run_benchmark.end = datetime.datetime.utcfromtimestamp(end)
+
+    return _benchmark_time()
+
+
+@pytest.fixture(scope="function")
+def auto_benchmark_time(benchmark_time):
+    with benchmark_time:
         yield
-        end = time.time()
-        test_run_benchmark.duration = end - start
-        test_run_benchmark.start = datetime.datetime.utcfromtimestamp(start)
-        test_run_benchmark.end = datetime.datetime.utcfromtimestamp(end)
 

Note that the current benchmark_time is currently an autouse fixture, so we'd need to switch it to be explicitly included where appropriate (or autoused at the module scope).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants