[CHORE] Refactor RayRunner so that we can add tracing #3163

jaychia · 2024-11-01T22:23:11Z

This PR refactors the RayRunner so that it is easier to add tracing.

I also added more docstrings to make it clearer about what is happening.

Code Changes

I highlight changes that were made in the code for easier review.

I removed the next_step state, and instead expose a new has_next return variable from self._construct_dispatch_batch. This cleans up the code because iteration on the physical plan now ONLY happens inside of self._construct_dispatch_batch instead of being scattered across the scheduling loop.
I cleaned up the logic in self._await_tasks by explicitly waiting on one task (with timeout=None) to first wait for any task to complete, and then perform an actual wait on all tasks (with timeout=0.01) to actually retrieve tasks that are ready.
I pulled out self._is_active and self._place_in_queue into methods, instead of them being locally-defined functions

codspeed-hq · 2024-11-01T22:34:10Z

CodSpeed Performance Report

Merging #3163 will not alter performance

_{Comparing jay/rayrunner-refactor (850f3db) with main (3cef614)}

Summary

✅ 17 untouched benchmarks

jaychia · 2024-11-01T22:33:15Z

daft/runners/ray_runner.py

-                        if dispatches_allowed == 0 or next_step is None:
+                        # Break the dispatch batching/dispatch loop if no more dispatches allowed, or physical plan
+                        # needs work for forward progress
+                        if dispatches_allowed == 0 or not has_next:


NOTE: behavior change here. We now use a has_next variable that is returned from self._construct_dispatch_batch to figure out whether or now we should break the DispatchBatching -> Dispatch loop.

Previously the scheduling loop had to keep track of next_step which was super weird and unwieldy, and caused us to call next(tasks) in a bunch of random places scattered throughout the loop.

jaychia · 2024-11-01T22:35:18Z

daft/runners/ray_runner.py

+            num_returns=1,
+            timeout=None,
+            fetch_local=False,
+        )


NOTE: slight behavior change here compared to previous awaiting logic

I call a ray.wait here but discard the outputs to just wait on one item to be ready with timeout=None.

Then I subsequently call ray.wait again with a 0.01 timeout to actually retrieve a batch of ready tasks.

I think this logic is a little easier to follow, and gets rid of the weird loop over ("next_one", "next_batch") that we had earlier. Also shouldn't have too much of a performance impact.

jaychia · 2024-11-01T22:35:54Z

daft/runners/ray_runner.py

-                    self.results_by_df[result_uuid].put(item, timeout=0.1)
-                    break
-                except Full:
-                    pass


Got rid of these weird locally-defined callbacks by making them methods, so that I can call them elsewhere without having to pass them around.

codecov · 2024-11-01T23:00:36Z

Codecov Report

Attention: Patch coverage is 91.42857% with 6 lines in your changes missing coverage. Please review.

Project coverage is 78.61%. Comparing base (3cef614) to head (850f3db).
Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
daft/runners/ray_runner.py	91.42%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3163      +/-   ##
==========================================
- Coverage   79.00%   78.61%   -0.40%     
==========================================
  Files         634      634              
  Lines       76943    77961    +1018     
==========================================
+ Hits        60789    61289     +500     
- Misses      16154    16672     +518

Files with missing lines	Coverage Δ
daft/runners/ray_runner.py	`80.95% <91.42%> (-0.31%)`	⬇️

... and 7 files with indirect coverage changes

kevinzwang

LGTM

[CHORE] Refactor RayRunner so that we can add tracing

0e30701

github-actions bot added the chore label Nov 1, 2024

More docstrings

6ca2901

jaychia commented Nov 1, 2024

View reviewed changes

jaychia requested review from kevinzwang and colin-ho and removed request for colin-ho November 1, 2024 22:36

Little nit docstrings

850f3db

kevinzwang approved these changes Nov 5, 2024

View reviewed changes

jaychia merged commit 64e35f8 into main Nov 5, 2024
42 checks passed

jaychia deleted the jay/rayrunner-refactor branch November 5, 2024 22:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CHORE] Refactor RayRunner so that we can add tracing #3163

[CHORE] Refactor RayRunner so that we can add tracing #3163

jaychia commented Nov 1, 2024 •

edited

Loading

codspeed-hq bot commented Nov 1, 2024 •

edited

Loading

jaychia Nov 1, 2024

jaychia Nov 1, 2024

jaychia Nov 1, 2024

codecov bot commented Nov 1, 2024 •

edited

Loading

kevinzwang left a comment

[CHORE] Refactor RayRunner so that we can add tracing #3163

[CHORE] Refactor RayRunner so that we can add tracing #3163

Conversation

jaychia commented Nov 1, 2024 • edited Loading

Code Changes

codspeed-hq bot commented Nov 1, 2024 • edited Loading

CodSpeed Performance Report

Merging #3163 will not alter performance

Summary

jaychia Nov 1, 2024

Choose a reason for hiding this comment

jaychia Nov 1, 2024

Choose a reason for hiding this comment

jaychia Nov 1, 2024

Choose a reason for hiding this comment

codecov bot commented Nov 1, 2024 • edited Loading

Codecov Report

kevinzwang left a comment

Choose a reason for hiding this comment

jaychia commented Nov 1, 2024 •

edited

Loading

codspeed-hq bot commented Nov 1, 2024 •

edited

Loading

codecov bot commented Nov 1, 2024 •

edited

Loading