Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comp backend/clusters creation functionalities #4602

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
57b9b75
@GitHK review: upper casing and others
sanderegg Aug 9, 2023
268414d
initial RPC calls
sanderegg Aug 10, 2023
e2dce5b
added parse
sanderegg Aug 10, 2023
d97f0e5
added tags for ec2 instances
sanderegg Aug 10, 2023
11eb526
test crating instances over rpc
sanderegg Aug 10, 2023
885809d
added code for background task checking for clusters
sanderegg Aug 10, 2023
f440056
start cluster cleaning task in background
sanderegg Aug 10, 2023
bb72dc7
added default tags
sanderegg Aug 10, 2023
f8e770b
force public IP
sanderegg Aug 10, 2023
e0afe68
better logs
sanderegg Aug 10, 2023
1e4aa90
ensure termination of instances with the correct tags
sanderegg Aug 10, 2023
521de3f
prepare grounds to keep last heartbeat as EC2 tag
sanderegg Aug 10, 2023
0da83fc
ensure rpc is there
sanderegg Aug 11, 2023
d4f2c8e
improved test
sanderegg Aug 11, 2023
d88224c
upgrade aio-pika
sanderegg Aug 11, 2023
2fdb45c
ensure the test passes
sanderegg Aug 11, 2023
0e2490a
fix usage of test
sanderegg Aug 11, 2023
65fa037
test passing again
sanderegg Aug 11, 2023
5ac5957
test ec2 utils complete
sanderegg Aug 14, 2023
060cb15
refactor
sanderegg Aug 14, 2023
9aa004f
more refactor
sanderegg Aug 14, 2023
30a2973
added test for heartbeats
sanderegg Aug 14, 2023
8e577ff
add test for background task
sanderegg Aug 14, 2023
f86db84
separating?
sanderegg Aug 14, 2023
4483019
change of plan
sanderegg Aug 14, 2023
bf05289
adding tests
sanderegg Aug 15, 2023
ee9b993
refactor
sanderegg Aug 15, 2023
c84072f
more refactor and check for last heartbeat
sanderegg Aug 15, 2023
e81e8df
testing of clusters_api
sanderegg Aug 15, 2023
d1eee66
refactor
sanderegg Aug 15, 2023
76dbed6
typo
sanderegg Aug 15, 2023
302f42b
cleanup
sanderegg Aug 15, 2023
397d046
add test
sanderegg Aug 15, 2023
f3f43c1
passing tests
sanderegg Aug 15, 2023
b7a8a81
updated reqs
sanderegg Aug 18, 2023
0d29a30
skip some lines for coverage
sanderegg Aug 18, 2023
40e7bbc
refactor
sanderegg Aug 18, 2023
7845c19
new settings
sanderegg Aug 18, 2023
db70add
100%
sanderegg Aug 18, 2023
e502197
return any cluster
sanderegg Aug 18, 2023
9356e0f
ongoing changes
sanderegg Aug 24, 2023
de529e5
we now have rpc routes
sanderegg Aug 24, 2023
7ae85c2
100%
sanderegg Aug 24, 2023
ccc0c52
ruff
sanderegg Aug 24, 2023
3913b2d
docker-compose to run local against AWS
sanderegg Aug 24, 2023
1eaa13d
makefile for testing
sanderegg Aug 24, 2023
d6482e9
fixing ports
sanderegg Aug 24, 2023
31f65b5
use local image
sanderegg Aug 24, 2023
dcea106
create new model
sanderegg Aug 24, 2023
e445a0f
preparing new rpc entrypoint
sanderegg Aug 24, 2023
720f120
new function
sanderegg Aug 25, 2023
ba26869
ruff
sanderegg Aug 25, 2023
206dbd4
added clusters-keeper for debugging
sanderegg Aug 25, 2023
f72d1d5
added port for live debugging
sanderegg Aug 25, 2023
73898cc
add startup script to initialize the swarm
sanderegg Aug 25, 2023
7d355cc
use docker devel
sanderegg Aug 25, 2023
5e9a12a
improve logging
sanderegg Aug 25, 2023
0c12345
added remote debugging capabilities
sanderegg Aug 25, 2023
046e84d
refactor
sanderegg Aug 25, 2023
64f9dda
revert
sanderegg Aug 25, 2023
5dae981
fix tests
sanderegg Aug 25, 2023
d214427
renaming
sanderegg Aug 25, 2023
8346aff
correct entry
sanderegg Aug 25, 2023
424e56b
fix decorator
sanderegg Aug 25, 2023
2503132
refactor
sanderegg Aug 25, 2023
d9ccda7
ip address not always available
sanderegg Aug 25, 2023
f89f1c8
cleanup
sanderegg Aug 25, 2023
2fa8e1a
refactoring
sanderegg Aug 25, 2023
dd10d4d
cleanup
sanderegg Aug 25, 2023
82fbbb6
renaming
sanderegg Aug 25, 2023
f9626f2
test remote debug
sanderegg Aug 25, 2023
0d159c7
refactor
sanderegg Aug 25, 2023
2ef7d93
additional test
sanderegg Aug 25, 2023
2fd5516
auto jsonize response
sanderegg Aug 25, 2023
3256882
set a heartbeat when asking for a cluster
sanderegg Aug 25, 2023
170d24d
return secretstr correctly
sanderegg Aug 25, 2023
8540048
use settings
sanderegg Aug 25, 2023
f42f2e2
moved startup script to utils
sanderegg Aug 25, 2023
e94b9e9
orjson
sanderegg Aug 25, 2023
0bd430a
fix test
sanderegg Aug 28, 2023
fa366c3
adding logic
sanderegg Aug 28, 2023
2fb1b5b
dependencies upgrade + added dask[distributed]
sanderegg Aug 29, 2023
9d28fcb
added gateway
sanderegg Aug 29, 2023
0ca31b5
added dask module
sanderegg Aug 29, 2023
ea4e659
add doc and use ping
sanderegg Aug 29, 2023
6936e30
add env
sanderegg Aug 29, 2023
07e04ec
simplify
sanderegg Aug 29, 2023
98abd09
ping dask gateway
sanderegg Aug 29, 2023
0fd9ace
ensure latest is correct
sanderegg Aug 29, 2023
eb449f4
we are now able to ping and set a specific password in the started ga…
sanderegg Aug 29, 2023
be9aab2
now the heartbeat is directly linked with the clusters instead of bei…
sanderegg Aug 29, 2023
cd3a458
a bit faster interval for developing
sanderegg Aug 29, 2023
7b9951a
temporary stuff
sanderegg Aug 29, 2023
11bb013
check the mock
sanderegg Aug 29, 2023
32d6114
removed 2 exposed unused rpcs
sanderegg Aug 29, 2023
7b8cd99
use new separate client and rpc router
sanderegg Aug 30, 2023
6cce731
ruff
sanderegg Aug 30, 2023
e843c3a
improve logs
sanderegg Sep 2, 2023
73934e5
improve logs
sanderegg Sep 2, 2023
4c3b7cc
remove testit
sanderegg Sep 2, 2023
afbbf42
add markers
sanderegg Sep 2, 2023
6f4d772
add same ignores in pytest-simcore
sanderegg Sep 2, 2023
2bf2023
added dask-gateway-server for tests
sanderegg Sep 2, 2023
58c1523
moved dask-gateway-server fixture to pytest-simcore
sanderegg Sep 2, 2023
13763b0
moved more fixture
sanderegg Sep 2, 2023
7b5483d
cleanup
sanderegg Sep 2, 2023
d09bf3b
refactor
sanderegg Sep 2, 2023
b660665
test for dask module
sanderegg Sep 2, 2023
94d0549
refactor
sanderegg Sep 2, 2023
66f4599
auth
sanderegg Sep 2, 2023
90a6ca6
authentication better defined
sanderegg Sep 2, 2023
2b3741d
mypy
sanderegg Sep 3, 2023
fcee95f
@pcrespov review: remove useless call
sanderegg Sep 4, 2023
dda0460
@pcrespov review: sort
sanderegg Sep 4, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .ruff.toml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ target-version = "py310"


[per-file-ignores]
"**/tests/**" = [
"{**/{tests, pytest_simcore}/**}" = [
sanderegg marked this conversation as resolved.
Show resolved Hide resolved
"T201", # print found
"ARG001", # unused function argument
"PT019", # user pytest.mark.usefixture
Expand Down
13 changes: 13 additions & 0 deletions .vscode/launch.template.json
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,19 @@
"name": "Python: Remote Attach api-server",
"type": "python",
"request": "attach",
"port": 3015,
"host": "127.0.0.1",
"pathMappings": [
{
"localRoot": "${workspaceFolder}",
"remoteRoot": "/devel"
}
]
},
{
"name": "Python: Remote Attach clusters-keeper",
"type": "python",
"request": "attach",
"port": 3006,
"host": "127.0.0.1",
"pathMappings": [
Expand Down
104 changes: 104 additions & 0 deletions packages/pytest-simcore/src/pytest_simcore/dask_gateway.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# pylint: disable=unused-argument
# pylint: disable=redefined-outer-name

from collections.abc import Callable
from typing import AsyncIterator, NamedTuple

import pytest
import traitlets.config
from dask_gateway import Gateway, GatewayCluster, auth
from dask_gateway_server.app import DaskGateway
from dask_gateway_server.backends.local import UnsafeLocalBackend
from distributed import Client


@pytest.fixture
def local_dask_gateway_server_config(
unused_tcp_port_factory: Callable,
) -> traitlets.config.Config:
c = traitlets.config.Config()
assert isinstance(c.DaskGateway, traitlets.config.Config)
assert isinstance(c.ClusterConfig, traitlets.config.Config)
assert isinstance(c.Proxy, traitlets.config.Config)
assert isinstance(c.SimpleAuthenticator, traitlets.config.Config)
c.DaskGateway.backend_class = UnsafeLocalBackend
c.DaskGateway.address = f"127.0.0.1:{unused_tcp_port_factory()}"
c.Proxy.address = f"127.0.0.1:{unused_tcp_port_factory()}"
c.DaskGateway.authenticator_class = "dask_gateway_server.auth.SimpleAuthenticator"
c.SimpleAuthenticator.password = "qweqwe" # noqa: S105
c.ClusterConfig.worker_cmd = [
"dask-worker",
"--resources",
f"CPU=12,GPU=1,RAM={16e9}",
]
# NOTE: This must be set such that the local unsafe backend creates a worker with enough cores/memory
c.ClusterConfig.worker_cores = 12
c.ClusterConfig.worker_memory = "16G"
c.ClusterConfig.cluster_max_workers = 3

c.DaskGateway.log_level = "DEBUG"
return c


class DaskGatewayServer(NamedTuple):
address: str
proxy_address: str
password: str
server: DaskGateway


@pytest.fixture
async def local_dask_gateway_server(
local_dask_gateway_server_config: traitlets.config.Config,
) -> AsyncIterator[DaskGatewayServer]:
print("--> creating local dask gateway server")
dask_gateway_server = DaskGateway(config=local_dask_gateway_server_config)
dask_gateway_server.initialize([]) # that is a shitty one!
print("--> local dask gateway server initialized")
await dask_gateway_server.setup()
await dask_gateway_server.backend.proxy._proxy_contacted # pylint: disable=protected-access

print("--> local dask gateway server setup completed")
yield DaskGatewayServer(
f"http://{dask_gateway_server.backend.proxy.address}",
f"gateway://{dask_gateway_server.backend.proxy.tcp_address}",
local_dask_gateway_server_config.SimpleAuthenticator.password, # type: ignore
dask_gateway_server,
)
print("--> local dask gateway server switching off...")
await dask_gateway_server.cleanup()
print("...done")


@pytest.fixture
async def dask_gateway(
local_dask_gateway_server: DaskGatewayServer,
) -> Gateway:
async with Gateway(
local_dask_gateway_server.address,
local_dask_gateway_server.proxy_address,
asynchronous=True,
auth=auth.BasicAuth("pytest_user", local_dask_gateway_server.password),
) as gateway:
print(f"--> {gateway=} created")
cluster_options = await gateway.cluster_options()
gateway_versions = await gateway.get_versions()
clusters_list = await gateway.list_clusters()
print(f"--> {gateway_versions=}, {cluster_options=}, {clusters_list=}")
for option in cluster_options.items():
print(f"--> {option=}")
return gateway


@pytest.fixture
async def dask_gateway_cluster(dask_gateway: Gateway) -> AsyncIterator[GatewayCluster]:
async with dask_gateway.new_cluster() as cluster:
yield cluster


@pytest.fixture
async def dask_gateway_cluster_client(
dask_gateway_cluster: GatewayCluster,
) -> AsyncIterator[Client]:
async with dask_gateway_cluster.get_client() as client:
yield client
21 changes: 21 additions & 0 deletions services/clusters-keeper/.env-devel
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
CLUSTERS_KEEPER_DEBUG=true
CLUSTERS_KEEPER_LOGLEVEL=INFO
CLUSTERS_KEEPER_MAX_MISSED_HEARTBEATS_BEFORE_CLUSTER_TERMINATION=60
CLUSTERS_KEEPER_TASK_INTERVAL=30
EC2_ACCESS_KEY_ID=XXXXXXXXXX
sanderegg marked this conversation as resolved.
Show resolved Hide resolved
EC2_INSTANCES_ALLOWED_TYPES="[\"t2.micro\"]"
EC2_INSTANCES_AMI_ID=XXXXXXXXXX
EC2_INSTANCES_KEY_NAME=XXXXXXXXXX
EC2_INSTANCES_SECURITY_GROUP_IDS=XXXXXXXXXX
EC2_INSTANCES_SUBNET_ID=XXXXXXXXXX
EC2_SECRET_ACCESS_KEY=XXXXXXXXXX
LOG_FORMAT_LOCAL_DEV_ENABLED=True
RABBIT_HOST=rabbit
RABBIT_PASSWORD=test
RABBIT_PORT=5672
RABBIT_SECURE=false
RABBIT_USER=test
REDIS_HOST=redis
REDIS_PORT=6379
SC_BOOT_MODE=debug-ptvsd
SC_BUILD_TARGET=development
14 changes: 14 additions & 0 deletions services/clusters-keeper/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,17 @@
#
include ../../scripts/common.Makefile
include ../../scripts/common-service.Makefile

.env: .env-devel ## creates .env file from defaults in .env-devel
sanderegg marked this conversation as resolved.
Show resolved Hide resolved
$(if $(wildcard $@), \
@echo "WARNING ##### $< is newer than $@ ####"; diff -uN $@ $<; false;,\
@echo "WARNING ##### $@ does not exist, cloning $< as $@ ############"; cp $< $@)


.PHONY: test-local
up-devel: .env ## starts local test application (running bare metal against AWS)
# setting up dependencies
@docker compose up

down: .env ## stops local test app dependencies (running bare metal against AWS)
-@docker compose down
51 changes: 51 additions & 0 deletions services/clusters-keeper/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
version: "3.8"
services:
rabbit:
image: itisfoundation/rabbitmq:3.11.2-management
init: true
ports:
- "5672:5672"
- "15672:15672"
- "15692"
environment:
- RABBITMQ_DEFAULT_USER=${RABBIT_USER}
- RABBITMQ_DEFAULT_PASS=${RABBIT_PASSWORD}
healthcheck:
# see https://www.rabbitmq.com/monitoring.html#individual-checks for info about health-checks available in rabbitmq
test: rabbitmq-diagnostics -q status
interval: 5s
timeout: 30s
retries: 5
start_period: 5s

redis:
image: "redis:6.2.6@sha256:4bed291aa5efb9f0d77b76ff7d4ab71eee410962965d052552db1fb80576431d"
init: true
ports:
- "6379:6379"
healthcheck:
test: [ "CMD", "redis-cli", "ping" ]
interval: 5s
timeout: 30s
retries: 50

redis-commander:
image: rediscommander/redis-commander:latest
init: true
ports:
- "18081:8081"
environment:
- REDIS_HOSTS=resources:${REDIS_HOST}:${REDIS_PORT}:0,locks:${REDIS_HOST}:${REDIS_PORT}:1,validation_codes:${REDIS_HOST}:${REDIS_PORT}:2,scheduled_maintenance:${REDIS_HOST}:${REDIS_PORT}:3,user_notifications:${REDIS_HOST}:${REDIS_PORT}:4,announcements:${REDIS_HOST}:${REDIS_PORT}:5
# If you add/remove a db, do not forget to update the --databases entry in the docker-compose.yml

clusters-keeper:
image: local/clusters-keeper:development
init: true
ports:
- "8010:8000"
- "3015:3000"
env_file:
- .env
volumes:
- ./:/devel/services/clusters-keeper
- ../../packages:/devel/packages
2 changes: 1 addition & 1 deletion services/clusters-keeper/docker/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ fi

if [ "${SC_BOOT_MODE}" = "debug-ptvsd" ]; then
# NOTE: production does NOT pre-installs ptvsd
pip install --no-cache-dir ptvsd
pip install --no-cache-dir debugpy
sanderegg marked this conversation as resolved.
Show resolved Hide resolved
fi

# Appends docker group if socket is mounted
Expand Down
4 changes: 4 additions & 0 deletions services/clusters-keeper/requirements/_base.in
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
# NOTE: ALL version constraints MUST be commented
--constraint ../../../requirements/constraints.txt
--constraint ./constraints.txt
--constraint ../../../services/dask-sidecar/requirements/_dask-distributed.txt

# intra-repo required dependencies
--requirement ../../../packages/models-library/requirements/_base.in
Expand All @@ -13,7 +14,10 @@
--requirement ../../../packages/service-library/requirements/_fastapi.in



aioboto3
dask[distributed]
dask-gateway
fastapi
packaging
types-aiobotocore[ec2]
Loading