Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask backend: Create a separate cluster for group automatically #4521

Closed
10 tasks done
Tracked by #4159
sanderegg opened this issue Jul 21, 2023 · 2 comments
Closed
10 tasks done
Tracked by #4159

Dask backend: Create a separate cluster for group automatically #4521

sanderegg opened this issue Jul 21, 2023 · 2 comments
Assignees
Labels
a:dask-service Any of the dask services: dask-scheduler/sidecar or worker a:director-v2 issue related with the director-v2 service
Milestone

Comments

@sanderegg
Copy link
Member

sanderegg commented Jul 21, 2023

Currently one has to manually create a separate machine in AWS where:

  • a docker swarm is created
  • a osparc-dask-gateway is started

Once this is done, the use can register the osparc-dask-gateway in oSparc GUI and send computational jobs to that cluster.
Later after use, the cluster is manually terminated.

This needs to be automated. ideally the cluster shall remain up for some time after last use.

Tasks

Preview Give feedback
  1. a:clusters-keeper
    sanderegg
  2. a:clusters-keeper
    sanderegg
  3. a:services-library
    sanderegg
  4. a:services-library
    sanderegg
  5. a:clusters-keeper a:director-v2
    sanderegg
  6. a:clusters-keeper a:services-library
    sanderegg
  7. a:database
    sanderegg
  8. a:webserver
    sanderegg
  9. a:clusters-keeper
    sanderegg
  10. a:apiserver
    sanderegg
@sanderegg sanderegg changed the title Create a separate cluster for group automatically Dask backend: Create a separate cluster for group automatically Jul 21, 2023
@sanderegg sanderegg self-assigned this Jul 21, 2023
@sanderegg sanderegg added a:director-v2 issue related with the director-v2 service a:dask-service Any of the dask services: dask-scheduler/sidecar or worker labels Jul 21, 2023
@sanderegg sanderegg added this to the Sundae milestone Jul 21, 2023
@sanderegg
Copy link
Member Author

sanderegg commented Jul 21, 2023

Proposal

  • Create a clusters-manager service in osparc-simcore

    • responsible to create and terminate external clusters in AWS
    • a cluster may be auto-managed (with an autoscaling) or fixed size?
    • a cluster is made of at least 1 computer
    • a cluster is a docker swarm
    • a cluster has a dask-gateway running on the computer
    • a cluster shall have access to the osparc registry and dockerhub (need the correct network setup)
    • a cluster shall be shutdown when it is not used anymore (heartbeat? find out if still in use?)
  • oSparc GUI shall allow the creation of the cluster (in the current clusters preferences?)

    • user shall decide which type of instances (small, medium, big), number of instances? or max number of instances?
    • user can share the cluster (already available I think)
    • the ID of the cluster shall be visible somewhere, for use with the PublicAPI
  • similar options shall be available in PublicAPI

@sanderegg
Copy link
Member Author

Implementation completed during The Nameless sprint.

a cluster is created for user/wallet combination for now.
a cluster is:

  • 1 machine is created through EC2
  • the machine is the manager of a docker swarm
  • the machine currently runs a dask-scheduler/dask-worker
  • the machine is created through RPC call to the clusters-keeper service
  • the clusters-keeper service terminates the machine once it is deemed as unused for more than 5 minutes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:dask-service Any of the dask services: dask-scheduler/sidecar or worker a:director-v2 issue related with the director-v2 service
Projects
None yet
Development

No branches or pull requests

1 participant