Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Scheduler and Worker memory tracking #2847

Closed
wants to merge 5 commits into from

Conversation

TomAugspurger
Copy link
Member

Part 1 of #2602.

This adds

  • Scheduler.task_net_nbytes
  • Worker.net_nybtes

modeled after task_duration / durations (with the primary difference that the Worker.net_nbytes is at the prefix, rather than task level).

The basic idea is to learn a per-prefix measure of the net memory usage of running a task. We already know

  1. the memory usage of a tasks' output
  2. the memory usage of each dependency

Now we just do a bit of arithmetic when we complete a task and when we release dependencies.

This information is passed to works to aid with scheduling, but that part will be done in a separate PR.

@mrocklin
Copy link
Member

Thinking about this from a diagnostics perspective, I wonder if the thing to track might instead be the expected size in bytes of each key prefix rather than the delta. My guess is that this will be more generally useful for a variety of applications.

We might then derive the delta information from this information on the fly for the specific use case of memory-aware scheduling.

Sometimes when thinking about scheduler state I find myself also being motivated by what would look good in a dashboard. This is a rather shallow objective on its own, but the information for good automated task scheduling can look surprisingly like the information for good human consumption.

Thoughts @TomAugspurger ?

@TomAugspurger
Copy link
Member Author

I think tracking prefix-level memory usage, and then deriving the delta, was my initial path. It's been a little while, but IIRC that ran a problem with knowing whether completing a task actually freed the memory of a tasks dependencies.

        a1
       / |
      b1 |
       \ |
        c1

it's not clear (to me) how we would know that completing b isn't responsible for freeing a. We could probably track some kind of task_frees : Dict[Tuple[prefix_1, prefix_2]]: int that keeps a count of how often completing task with prefix_1 caused a release of a task with prefix_2? I'm not sure it's clearer (though it would likely be more useful for diagnostics)

@mrocklin
Copy link
Member

I think I mentioned this earlier, but have since forgotten what the response was, but for any particular task we could probably include a fractional hit for every dependency based the number of its dependents.

So in the case above b would be half responsible for freeing a (assuming top-to-bottom execution) and so when we compute if a task is memory-freeing or not it would get half of the weight of a.

@TomAugspurger
Copy link
Member Author

I'm not actively working on this at the moment. Closing to clear the backlog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants