Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HTML Repr for ProcessInterface Class and all its subclasses #5181

Merged
merged 22 commits into from
Aug 17, 2021

Conversation

freyam
Copy link
Contributor

@freyam freyam commented Aug 5, 2021

  • Closes #xxxx
  • Tests added / passed
  • Passes black distributed / flake8 distributed / isort distributed

I have added the HTML Repr for ProcessInterface Class and all its subclasses.

Before

proc would call the __repr__().

image

After

image

image

/cc @martindurant @GenevieveBuckley

@freyam
Copy link
Contributor Author

freyam commented Aug 5, 2021

This is a Draft PR as of now as I have not figured out some key visual details in this representation.

I am working on the following at the moment:

  1. Getting more examples of this class objects and it's real world usage.
  2. Show an SVG image depending on the status
  3. Add more relevant data
  4. Maybe also add information about the client + cluster as well (good idea?)

distributed/deploy/spec.py Outdated Show resolved Hide resolved
distributed/deploy/spec.py Outdated Show resolved Hide resolved
distributed/deploy/spec.py Outdated Show resolved Hide resolved
@freyam
Copy link
Contributor Author

freyam commented Aug 5, 2021

@jrbourbeau Could you share an example code of ProcessInterface object with which I could play around?
I want to see the address and the external_address filled and different stages of status?

@GenevieveBuckley
Copy link
Contributor

GenevieveBuckley commented Aug 5, 2021

Feedback

The most important thing about this class is whether the process is created, running, or closed. It might be worth highlighting this information with colours and/or icons, maybe as a line before the table starts. (Also, this class doesn't have a long list of attributes, so you don't have to stick with a table at all if you don't want to).

Example code

One good strategy for finding examples is to look at the test cases for the thing you're interested in (eg: search for ProcessInterface across the repository and look at where this appears in tests. These won't necessarily always be very realistic examples, but it will give you a good starting point.

async def test_spec_process():
proc = ProcessInterface()
assert proc.status == Status.created
await proc
assert proc.status == Status.running
await proc.close()
assert proc.status == Status.closed

For this particular class, we can see that the process gets created, starts running, then is closed. You could make up your own example that moves through each stage of that process, and looks at the HMTL repr between each stage.

Optional fun, extending to ssh.py

(These suggestions are probably best to go in a new PR, which depend on this PR being merged first)

Also, you might want to do the same thing with the Process class in ssh.py

class Process(ProcessInterface):

You can try to use tricks like super()._repr_html_() and then build extra stuff around that html. You can see me do this kind of trick for the high level graph html reprs here: https://github.com/dask/dask/pull/7763/files

There are two tricks to pay attention to:

  1. I use super()._repr_html() so we don't have to duplicate code to represent the Layers
  2. I also moved the layer_info_dict into its own function, because I wanted an easier way to be able to add/remove information before converting it to a HTML string.

Potentially, you could also look at some of the other classes that inherit from Process, like Worker or Scheduler although these will be a bit more complicated with more information.

@freyam
Copy link
Contributor Author

freyam commented Aug 7, 2021

How about I colour the box next to the title based on the current value of the status?

It also matches the HTML Representations of the other classes such as client, cluster, worker, etc.
I followed @jacobtomlinson's work over at #4857 to add the box.

Color Guide

if self.status == Status.created: # Green
    status = "Created"
    bg_color = "#c7f9cc"
    border_color = "#78c6a3"
elif self.status == Status.running: # Blue
    status = "Running"
    bg_color = "#caf0f8"
    border_color = "#48cae4"
elif self.status == Status.closed: # Red
    status = "Closed"
    bg_color = "#ffbfad"
    border_color = "#ff6132"

Demo

image
image

@freyam
Copy link
Contributor Author

freyam commented Aug 7, 2021

Now, I am trying to figure out a way to represent the Process, Worker, and Scheduler (in a different PR) using the steps followed in #7763 and #4857.

It would be like a dropdown menu and it would list all the Workers and Schedulers and they would have their own _repr_html_s.

Any suggestions for this PR till then?

@freyam freyam marked this pull request as ready for review August 7, 2021 20:24
@freyam
Copy link
Contributor Author

freyam commented Aug 8, 2021

Would it make more sense if:

  1. Created: Blue
  2. Running: Green
  3. Closed Red
    instead?

@freyam
Copy link
Contributor Author

freyam commented Aug 8, 2021

Demo

image

@GenevieveBuckley
Copy link
Contributor

Those colours look great!

@GenevieveBuckley
Copy link
Contributor

Any suggestions for this PR till then?

From your screenshots above I think we've lost the address and external_address information, can you try to add that in a tiny table below?

Other than that no, I don't have more suggestions. It looks good!

@freyam
Copy link
Contributor Author

freyam commented Aug 9, 2021

New

image

Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start @freyam. I really like the colours used to represent the state.

The ProcessInterface object will never be created directly, it is an abstract base class. When it is implemented there is usually one subclass that contains the logic for starting the process and then Worker and Scheduler subclasses on top of that. So we need to put some thought into the actual classes that users will want to inspect.

In SSHCluster it will be distributed.deploy.ssh.Process which starts a subprocess that calls SSH. For the worker and scheduler the SSH commands will be different, but they will have consistent properties like the process PID.

In a package like dask-cloudprovider you will have AzureVMCluster where the process interface implementations will be dask_cloudprovider.azure.azurevm.AzureVM. This class represents a virtual machine running on Azure which will have a whole bunch of properties such as an ID, instance type, disk size, admin username and password, etc.

So for all the objects that use ProcessInterface as a base they will either be a scheduler or a worker and will have some number of properties that we will want to be displayed.

position: absolute;"
></div>
<div style="margin-left: 48px">
<h3 style="margin-bottom: 0px">Process Interface</h3>
Copy link
Member

@jacobtomlinson jacobtomlinson Aug 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is an interface, it will never be created directly. So we probably want to ensure the name here is the name of the class and not Process Interface as that phrase means nothing to the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't checked, but I think you should be able to grab that information with self.__class__.__name__ @freyam

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if we will never directly use a ProcessInterface class, but instead always use it through the Worker or Scheduler classes, then perhaps it doesn't make sense to have a heading here.
Instead we might want a html string that can be returned by super()._repr_html_() and incorporated into the other class representations (#4857)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity when I say Scheduler and Worker classes I don't mean distributed.worker.Worker. I mean something like distributed.deploy.ssh.Worker which is something different.

distributed/deploy/spec.py Show resolved Hide resolved
@GenevieveBuckley
Copy link
Contributor

@jacobtomlinson Freyam and I are a little confused still about where the access point for the HTML representations is likely to be. Do you intend that the distributed client _repr_html_ should show information about the process there? This seems like the main source of information for most people about their system #4857

@jacobtomlinson
Copy link
Member

All SpecCluster based objects have a scheduler and workers properties which contain the objects that we are discussing here.

So for a SpecCluster implementation like SSHCluster I would expect something like this.

image

from dask.distributed import LocalCluster, Client, SSHCluster
cluster = SSHCluster(["127.0.0.1", "127.0.0.1", "127.0.0.1"])
cluster

cluster.scheduler

cluster.workers

cluster.workers[0]

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Aug 10, 2021

So to answer your question no I wouldn't expect these to be a part of the client/cluster/scheduler info reprs.

I agree this is a little confusing though.

If we think about a single worker in this example there are four different places where worker info exists.

  • On the worker system that I am SSHing into there is an instance of distributed.worker.Worker.
  • On my local system there is an instance of distributed.deploy.ssh.Worker which manages the SSH subprocess. This is a subclass of ProcessInterface which is being discussed in this PR.
  • On the scheduler system there is an instance of distributed.scheduler.Scheduler which manages a dictionary of scheduler_info that is kept in sync by the worker heartbeat.
  • On my local system both the SSHCluster and Client objects have a copy of the scheduler_info dictionary from the Scheduler via the RPC.

Today the HTML repr which shows scheduler and worker info is on that scheduler_info object. This is because the scheduler_info is the simplest way of getting and showing this information to the user.

There are some things to think about here:

  • Do the Worker and ProcessInterface objects have all the same information about the workers that scheduler_info does? (I think no)
  • Should distributed.worker.Worker and distributed.deploy.ssh.Worker have reprs that look like the worker dropdowns in scheduler_info?
  • Should Cluster and Client reuse the reprs from the Worker objects instead of creating its own representation for them?

@freyam freyam marked this pull request as draft August 12, 2021 09:42
@GenevieveBuckley
Copy link
Contributor

Ok, here's what we're going to do.

@jacobtomlinson could you open a new issue to discuss consolidating the worker information #5181 (comment)

@freyam (if time permits) is going to use super()._repr_html() to make this show up for the distributed.deploy.ssh.Worker class. That will be the example with cluster.workers[0] in Jacob's notebook here #5181 (comment)

@freyam
Copy link
Contributor Author

freyam commented Aug 13, 2021

Added HTML Reprs for distributed.deploy.ssh.Worker and distributed.deploy.ssh.Scheduler by calling super()._repr_html().

@freyam
Copy link
Contributor Author

freyam commented Aug 16, 2021

Added ✔️

@freyam
Copy link
Contributor Author

freyam commented Aug 17, 2021

Added ✅

@freyam freyam changed the title Add HTML Repr for ProcessInterface Class and all its subclasses. Add HTML Repr for ProcessInterface Class and all its subclasses Aug 17, 2021
@freyam
Copy link
Contributor Author

freyam commented Aug 17, 2021

Sorry for the messy commit history. It should all be fine now. ✔️

Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@jacobtomlinson
Copy link
Member

CI failures appear unrelated. I'm going to get this in.

@jacobtomlinson jacobtomlinson merged commit 3d8ed5f into dask:main Aug 17, 2021
@freyam
Copy link
Contributor Author

freyam commented Aug 17, 2021

💛

@freyam freyam deleted the process-interface-html branch August 17, 2021 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants