Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lancium site adapter #267

Merged
merged 16 commits into from
Nov 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ test_scripts/
*.db

# Ignore configurations
cobald.yml
cobald*.yml
*tardis.yml

#Ignore cloudinit files
Expand Down
1 change: 1 addition & 0 deletions CONTRIBUTORS
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Alexander Haas <[email protected]>
mschnepf <[email protected]>
Matthias J. Schnepf <[email protected]>
Matthias Schnepf <[email protected]>
LGTM Migrator <[email protected]>
Matthias Schnepf <[email protected]>
PSchuhmacher <[email protected]>
Peter Wienemann <[email protected]>
Expand Down
63 changes: 61 additions & 2 deletions docs/source/adapters/site.rst
Original file line number Diff line number Diff line change
Expand Up @@ -298,6 +298,66 @@ Available adapter configuration options
The ``Arguments`` contains the following command line arguments, ``--cores``. ``--memory``. ``--disk`` and
``--uuid``.

Lancium Site Adapter
--------------------

.. content-tabs:: left-col

The :py:class:`~tardis.adapters.sites.lancium.LanciumAdapter` implements an interface to `Lancium`_ Compute API.
The following general adapter configuration options are available.

.. _Lancium: https://lancium.github.io

Available adapter configuration options
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. content-tabs:: left-col

+---------------------+------------------------------------------------------------------------+-----------------+
| Option | Short Description | Requirement |
+=====================+========================================================================+=================+
| api_url | The end point of the Lancium API to contact. | **Required** |
+---------------------+------------------------------------------------------------------------+-----------------+
| api_key | API Token to access the Lancium API. | **Required** |
+---------------------+------------------------------------------------------------------------+-----------------+
| max_age | The output of the `show_jobs` API call is cached for `max_age` minutes | **Required** |
+---------------------+------------------------------------------------------------------------+-----------------+

All configuration entries in the `MachineTypeConfiguration` section of the machine types are
directly added to the body of Lancium API `create_job` call. All available options are
described in the `Lancium documentation`_

.. _Lancium documentation: https://lancium.github.io/compute-api-docs/api.html#tag/Jobs/operation/create_job

.. content-tabs:: right-col

.. rubric:: Example configuration

.. code-block:: yaml

Sites:
- name: Lancium
adapter: Lancium
quota: 1 # CPU core quota

Lancium:
api_url: https://portal.lancium.com/api/v1/
api_key: "top_secret"
max_age: 1
MachineTypes:
- m1.small
MachineTypeConfiguration:
m1.small:
qos: "high"
image: "lancium/ubuntu"
command_line: "sleep 500"
max_run_time: 600
MachineMetaData:
m1.small:
Cores: 2
Memory: 4
Disk: 20

Moab Site Adapter
-----------------

Expand Down Expand Up @@ -621,5 +681,4 @@ Available machine type configuration options
.. content-tabs:: left-col

Your favorite site is currently not supported?
Please, have a look at
:ref:`how to contribute.<ref_contribute_site_adapter>`
Please, have a look at how to contribute.
7 changes: 7 additions & 0 deletions docs/source/api/tardis.adapters.sites.lancium.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
tardis.adapters.sites.lancium module
====================================

.. automodule:: tardis.adapters.sites.lancium
:members:
:undoc-members:
:show-inheritance:
1 change: 1 addition & 0 deletions docs/source/api/tardis.adapters.sites.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Submodules
tardis.adapters.sites.fakesite
tardis.adapters.sites.htcondor
tardis.adapters.sites.kubernetes
tardis.adapters.sites.lancium
tardis.adapters.sites.moab
tardis.adapters.sites.openstack
tardis.adapters.sites.slurm
7 changes: 7 additions & 0 deletions docs/source/api/tardis.plugins.auditor.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
tardis.plugins.auditor module
=============================

.. automodule:: tardis.plugins.auditor
:members:
:undoc-members:
:show-inheritance:
1 change: 1 addition & 0 deletions docs/source/api/tardis.plugins.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Submodules
.. toctree::
:maxdepth: 4

tardis.plugins.auditor
tardis.plugins.elasticsearchmonitoring
tardis.plugins.prometheusmonitoring
tardis.plugins.sqliteregistry
Expand Down
3 changes: 2 additions & 1 deletion docs/source/api/tardis.rest.app.routers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,6 @@ Submodules
.. toctree::
:maxdepth: 4

tardis.rest.app.routers.login
tardis.rest.app.routers.resources
tardis.rest.app.routers.types
tardis.rest.app.routers.user
7 changes: 7 additions & 0 deletions docs/source/api/tardis.rest.app.routers.types.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
tardis.rest.app.routers.types module
====================================

.. automodule:: tardis.rest.app.routers.types
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/source/api/tardis.rest.app.routers.user.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
tardis.rest.app.routers.user module
===================================

.. automodule:: tardis.rest.app.routers.user
:members:
:undoc-members:
:show-inheritance:
1 change: 1 addition & 0 deletions docs/source/api/tardis.rest.app.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,5 @@ Submodules
tardis.rest.app.crud
tardis.rest.app.database
tardis.rest.app.main
tardis.rest.app.scopes
tardis.rest.app.security
7 changes: 7 additions & 0 deletions docs/source/api/tardis.rest.app.scopes.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
tardis.rest.app.scopes module
=============================

.. automodule:: tardis.rest.app.scopes
:members:
:undoc-members:
:show-inheritance:
1 change: 0 additions & 1 deletion docs/source/api/tardis.rest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ Subpackages

tardis.rest.app
tardis.rest.hash_credentials
tardis.rest.token_generator

Submodules
----------
Expand Down
5 changes: 3 additions & 2 deletions docs/source/changelog.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
.. Created by changelog.py at 2022-11-11, command
.. Created by changelog.py at 2022-11-23, command
'/Users/giffler/.cache/pre-commit/repor6pnmwlm/py_env-python3.10/bin/changelog docs/source/changes compile --output=docs/source/changelog.rst'
based on the format of 'https://keepachangelog.com/'

#########
CHANGELOG
#########

[Unreleased] - 2022-11-11
[Unreleased] - 2022-11-23
=========================

Added
Expand All @@ -15,6 +15,7 @@ Added
* Introduce a TARDIS REST API to query the state of resources from SqlRegistry
* Added support for manual draining of drones using the REST API
* Add support for passing environment variables as executable arguments to support HTCondor grid universe
* Added a new site adapter to use Lancium compute as resource provider

Changed
-------
Expand Down
6 changes: 6 additions & 0 deletions docs/source/changes/267.add_lancium_site_adapter.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
category: added
summary: "Added a new site adapter to use Lancium compute as resource provider"
description: |
A new Lancium compute site adapter has been added to `TARDIS` to use resources provided by the Lancium compute cluster.
pull requests:
- 267
3 changes: 1 addition & 2 deletions docs/source/plugins/plugins.rst
Original file line number Diff line number Diff line change
Expand Up @@ -220,5 +220,4 @@ Available configuration options
.. content-tabs:: left-col

Your favorite monitoring is currently not supported?
Please, have a look at
:ref:`how to contribute.<ref_contribute_plugin>`
Please, have a look at how to contribute.
maxfischer2781 marked this conversation as resolved.
Show resolved Hide resolved
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ def get_cryptography_version():
"python-auditor>=0.0.5",
"pytz",
"tzlocal",
"aiolancium",
],
extras_require={
"docs": [
Expand Down
141 changes: 141 additions & 0 deletions tardis/adapters/sites/lancium.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
from aiolancium.client import Authenticator, LanciumClient

from ...exceptions.tardisexceptions import TardisError, TardisResourceStatusUpdateFailed
from ...interfaces.siteadapter import SiteAdapter, ResourceStatus
from ...utilities.attributedict import AttributeDict, convert_to_attribute_dict
from ...utilities.asynccachemap import AsyncCacheMap
from ...utilities.staticmapping import StaticMapping

from contextlib import contextmanager
from datetime import datetime
from functools import partial
from typing import Dict

import logging

logger = logging.getLogger("cobald.runtime.tardis.adapters.sites.lancium")


async def lancium_status_updater(client: LanciumClient) -> Dict:
response = await client.jobs.show_jobs()
logger.debug(f"Show jobs returned {response}")
return {job["id"]: job for job in response["jobs"]}


class LanciumAdapter(SiteAdapter):
# space in last key requires dict expansion in `__init__` `translation_functions`
resource_status_translation = {
"created": ResourceStatus.Booting,
"submitted": ResourceStatus.Booting,
"queued": ResourceStatus.Booting,
"ready": ResourceStatus.Booting,
"running": ResourceStatus.Running,
"error": ResourceStatus.Error,
"finished": ResourceStatus.Stopped,
"delete pending": ResourceStatus.Stopped,
"deleted": ResourceStatus.Deleted,
}

def __init__(self, machine_type: str, site_name: str):
self._machine_type = machine_type
self._site_name = site_name

auth = Authenticator(api_key=self.configuration.api_key)
self.client = LanciumClient(api_url=self.configuration.api_url, auth=auth)

key_translator = StaticMapping(
remote_resource_uuid="id",
drone_uuid="name",
resource_status="status",
)

translator_functions = StaticMapping(
status=lambda x, translator=StaticMapping(
**self.resource_status_translation
): translator[x],
id=int,
name=str,
)

self.handle_response = partial(
self.handle_response,
key_translator=key_translator,
translator_functions=translator_functions,
)

self._lancium_status = AsyncCacheMap(
update_coroutine=partial(lancium_status_updater, self.client),
max_age=self.configuration.max_age * 60,
)

async def deploy_resource(
self, resource_attributes: AttributeDict
) -> AttributeDict:
specs = dict(name=resource_attributes.drone_uuid)
specs["resources"] = dict(
core_count=self.machine_meta_data.Cores,
memory=self.machine_meta_data.Memory,
scratch=self.machine_meta_data.Disk,
)
specs["environment"] = [
{"variable": f"TardisDrone{key}", "value": str(value)}
for key, value in self.drone_environment(
resource_attributes.drone_uuid,
resource_attributes.obs_machine_meta_data_translation_mapping,
).items()
]
specs.update(self.machine_type_configuration)
create_response = await self.client.jobs.create_job(job=specs)
logger.debug(f"{self.site_name} create job returned {create_response}")
submit_response = await self.client.jobs.submit_job(
id=create_response["job"]["id"]
)
logger.debug(f"{self.site_name} submit job returned {submit_response}")
return self.handle_response(create_response["job"])
maxfischer2781 marked this conversation as resolved.
Show resolved Hide resolved

async def resource_status(
self, resource_attributes: AttributeDict
) -> AttributeDict:
await self._lancium_status.update_status()
# In case the created timestamp is after last update timestamp of the
# asynccachemap, no decision about the current state can be given,
# since map is updated asynchronously.
try:
resource_uuid = resource_attributes.remote_resource_uuid
resource_status = self._lancium_status[resource_uuid]
except KeyError as err:
if (
self._lancium_status.last_update - resource_attributes.created
).total_seconds() < 0:
raise TardisResourceStatusUpdateFailed from err
else:
RHofsaess marked this conversation as resolved.
Show resolved Hide resolved
resource_status = {
"id": resource_attributes.remote_resource_uuid,
"status": "deleted",
}
logger.debug(f"{self.site_name} has status {resource_status}.")
resource_attributes["updated"] = datetime.now()
return convert_to_attribute_dict(
{**resource_attributes, **self.handle_response(resource_status)}

Check failure

Code scanning / CodeQL

Wrong number of arguments in a call

Call to [method SiteAdapter.handle_response](1) with too few arguments; should be no fewer than 3.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positiv due to using partial.

)

async def stop_resource(self, resource_attributes: AttributeDict):
response = await self.client.jobs.terminate_job(
id=resource_attributes.remote_resource_uuid
)
logger.debug(f"{self.site_name} stop resource returned {response}")
return response

async def terminate_resource(self, resource_attributes: AttributeDict):
response = await self.client.jobs.delete_job(
id=resource_attributes.remote_resource_uuid
)
logger.debug(f"{self.site_name} terminate resource returned {response}")
return response

@contextmanager
def handle_exceptions(self):
try:
yield
except Exception as ex:
raise TardisError from ex
Loading