Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auditor accounting plugin #263

Merged
merged 4 commits into from
Sep 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions docs/source/plugins/plugins.rst
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,62 @@ Available configuration options
index: cobald_tardis
meta: instance1

Auditor Accounting
------------------

.. content-tabs:: left-col

The :py:class:`~tardis.plugins.auditor.Auditor` implements an interface to push
information from the drones relevant for accounting to an `Auditor <https://alu-schumacher.github.io/AUDITOR/>`_ instance.
The plugin extracts the components to be accounted for from the ``MachineMetaData`` in the configuration.
Scores which help relating resources of the same kind with different performance to each other can be configured as well.
Scores are configured for each ``MachineType`` individually and multiple scores per ``MachineType`` are possible.
An Auditor record requires a ``site_id``, a ``user_id`` and a ``group_id``. The latter two can be configured in the
``Auditor`` plugin configuration (and default to ``tardis`` if omitted). The ``site_id`` is taken from the ``Sites`` in
the TARDIS config.

Available configuration options
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. content-tabs:: left-col

+----------------+--------------------------------------------------------------------------------------------------+-----------------+
| Option | Short Description | Requirement |
+============+======================================================================================================+=================+
| host | Hostname or IP address of the Auditor instance. | **Required** |
+----------------+--------------------------------------------------------------------------------------------------+-----------------+
| port | Port on which the Auditor instance is listening on. | **Required** |
+----------------+--------------------------------------------------------------------------------------------------+-----------------+
| user | User name added to the record. Defaults to ``tardis``. | **Optional** |
+----------------+--------------------------------------------------------------------------------------------------+-----------------+
| group | Group name added to the record. Defaults to ``tardis``. | **Optional** |
+----------------+--------------------------------------------------------------------------------------------------+-----------------+
| components | Configuration of the components per ``MachineType``. Used to attach scores to individual components. | **Optional** |
+----------------+--------------------------------------------------------------------------------------------------+-----------------+

.. content-tabs:: right-col

.. rubric:: Example configuration

.. code-block:: yaml

Plugins:
Auditor:
host: "127.0.0.1"
port: 8000
user: "some-user"
group: "some-group"
components:
machinetype_1:
Cores:
HEPSPEC06: 1.2
OTHERBENCHMARK: 1.4
machinetype_2:
Cores:
HEPSPEC06: 1.0
Memory:
PRECIOUSMEMORY: 2.0

.. content-tabs:: left-col

Your favorite monitoring is currently not supported?
Expand Down
3 changes: 3 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,9 @@ def get_cryptography_version():
"asyncstdlib",
"typing_extensions",
"backports.cached_property",
"python-auditor>=0.0.5",
"pytz",
"tzlocal",
],
extras_require={
"docs": [
Expand Down
117 changes: 117 additions & 0 deletions tardis/plugins/auditor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
from ..configuration.configuration import Configuration
from ..interfaces.plugin import Plugin
from ..interfaces.state import State
from ..utilities.attributedict import AttributeDict
from ..resources.dronestates import AvailableState, DownState

import pyauditor

import logging
import pytz
from tzlocal import get_localzone


class Auditor(Plugin):
"""
The :py:class:`~tardis.plugins.auditor.Auditor` plugin is a collector for the
accounting tool Auditor. It sends accounting information of individual drones to an
Auditor instance. The records contain information about the provided resources of
the drones as well as start and stop times. When a drone enters `AvailableState`, a
record with the start time set to the time it went into this state is stored in the
Auditor database. The stop time remains empty until the drone goes into `DownState`.
The Auditor plugin does not keep any state.
"""

def __init__(self):
self.logger = logging.getLogger("cobald.runtime.tardis.plugins.auditor")
config = Configuration()
config_auditor = config.Plugins.Auditor

self._resources = {}
self._components = {}
for site in config.Sites:
self._resources[site.name] = {}
self._components[site.name] = {}
for machine_type in getattr(config, site.name).MachineTypes:
self._resources[site.name][machine_type] = {}
self._components[site.name][machine_type] = {}
for resource in getattr(config, site.name).MachineMetaData[
machine_type
]:
self._resources[site.name][machine_type][resource] = getattr(
config, site.name
).MachineMetaData[machine_type][resource]
self._components[site.name][machine_type][resource] = getattr(
config_auditor.components, machine_type
).get(resource, {})

self._user = getattr(config_auditor, "user", "tardis")
self._group = getattr(config_auditor, "group", "tardis")
auditor_timeout = getattr(config_auditor, "timeout", 30)
self._local_timezone = get_localzone()
self._client = (
pyauditor.AuditorClientBuilder()
.address(config_auditor.host, config_auditor.port)
.timeout(auditor_timeout)
.build()
)

async def notify(self, state: State, resource_attributes: AttributeDict) -> None:
"""
Pushes a record to an Auditor instance when the drone is in state
`AvailableState` or `DownState`.

:param state: New state of the Drone
:type state: State
:param resource_attributes: Contains all meta-data of the Drone (created and
updated timestamps, dns name, unique id, site_name, machine_type, etc.)
:type resource_attributes: AttributeDict
:return: None
"""
self.logger.debug(
f"Drone: {str(resource_attributes)} has changed state to {state}"
)

if isinstance(state, AvailableState):
record = self.construct_record(resource_attributes)
await self._client.add(record)
elif isinstance(state, DownState):
record = self.construct_record(resource_attributes)
record.with_stop_time(
resource_attributes["updated"]
.replace(tzinfo=self._local_timezone)
.astimezone(pytz.utc)
)
await self._client.update(record)

def construct_record(self, resource_attributes: AttributeDict):
"""
Constructs a record from ``resource_attributes``.

:param resource_attributes: Contains all meta-data of the Drone (created and
updated timestamps, dns name, unique id, site_name, machine_type, etc.)
:type resource_attributes: AttributeDict
:return: Record
"""
record = pyauditor.Record(
resource_attributes["drone_uuid"],
resource_attributes["site_name"],
self._user,
self._group,
resource_attributes["updated"]
.replace(tzinfo=self._local_timezone)
.astimezone(pytz.utc),
)

for (resource, amount) in self._resources[resource_attributes["site_name"]][
resource_attributes["machine_type"]
].items():
component = pyauditor.Component(resource, amount)
for score_name, score_val in self._components[
resource_attributes["site_name"]
][resource_attributes["machine_type"]][resource].items():
component = component.with_score(pyauditor.Score(score_name, score_val))

record = record.with_component(component)

return record
165 changes: 165 additions & 0 deletions tests/plugins_t/test_auditor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
from tardis.plugins.auditor import Auditor
from tardis.utilities.attributedict import AttributeDict
from tardis.interfaces.siteadapter import ResourceStatus
from tardis.resources.dronestates import (
AvailableState,
DownState,
BootingState,
CleanupState,
ShutDownState,
ShuttingDownState,
DrainState,
DrainingState,
IntegrateState,
DisintegrateState,
)

from datetime import datetime
from unittest import TestCase
from unittest.mock import patch

from ..utilities.utilities import async_return
from ..utilities.utilities import run_async


class TestAuditor(TestCase):
@classmethod
def setUpClass(cls):
cls.mock_config_patcher = patch("tardis.plugins.auditor.Configuration")
cls.mock_auditorclientbuilder_patcher = patch(
"tardis.plugins.auditor.pyauditor.AuditorClientBuilder"
)

cls.mock_config = cls.mock_config_patcher.start()
cls.mock_auditorclientbuilder = cls.mock_auditorclientbuilder_patcher.start()

@classmethod
def tearDownClass(cls):
cls.mock_config_patcher.stop()
cls.mock_auditorclientbuilder_patcher.stop()

def setUp(self):
self.address = "127.0.0.1"
self.port = 8000
self.timeout = 20
self.user = "user-1"
self.group = "group-1"
self.site = "testsite"
self.cores = 12
self.memory = 100
self.drone_uuid = "test-drone"
self.machine_type = "test_machine_type"
config = self.mock_config.return_value
config.Plugins.Auditor = AttributeDict(
host=self.address,
port=self.port,
timeout=self.timeout,
user=self.user,
group=self.group,
components=AttributeDict(
test_machine_type=AttributeDict(
Cores=AttributeDict(HEPSPEC=1.2, BENCHMARK=3.0),
Memory=AttributeDict(BLUBB=1.4),
)
),
)
config.Sites = [AttributeDict(name=self.site)]
config.testsite.MachineTypes = [self.machine_type]
config.testsite.MachineMetaData = AttributeDict(
test_machine_type=AttributeDict(Cores=self.cores, Memory=self.memory)
)

self.test_param = AttributeDict(
site_name=self.site,
machine_type=self.machine_type,
created=datetime.now(),
updated=datetime.now(),
resource_status=ResourceStatus.Booting,
drone_uuid=self.drone_uuid,
)

builder = self.mock_auditorclientbuilder.return_value
builder = builder.address.return_value
builder = builder.timeout.return_value
self.client = builder.build.return_value
self.client.add.return_value = async_return()
self.client.update.return_value = async_return()

self.config = config
self.plugin = Auditor()

def test_default_fields(self):
# Potential future race condition ahead.
# Since this is the only test modifying self.config, this is
# fine. However, when adding further tests care must be taken
# when config changes are involved.
del self.config.Plugins.Auditor.user
del self.config.Plugins.Auditor.group
# Needed in order to reload config.
plugin = Auditor()
self.assertEqual(plugin._user, "tardis")
self.assertEqual(plugin._group, "tardis")

def test_notify(self):
stefan-k marked this conversation as resolved.
Show resolved Hide resolved
self.mock_auditorclientbuilder.return_value.address.assert_called_with(
self.address,
self.port,
)
self.mock_auditorclientbuilder.return_value.address.return_value.timeout.assert_called_with(
self.timeout,
)
run_async(
self.plugin.notify,
state=AvailableState(),
resource_attributes=self.test_param,
)
self.assertEqual(self.client.add.call_count, 1)
self.assertEqual(self.client.update.call_count, 0)

run_async(
self.plugin.notify,
state=DownState(),
resource_attributes=self.test_param,
)
self.assertEqual(self.client.add.call_count, 1)
self.assertEqual(self.client.update.call_count, 1)

# test for no-op
for state in [
BootingState(),
IntegrateState(),
CleanupState(),
ShutDownState(),
ShuttingDownState(),
DrainState(),
DrainingState(),
DisintegrateState(),
]:
run_async(
self.plugin.notify,
state=state,
resource_attributes=self.test_param,
)
self.assertEqual(self.client.add.call_count, 1)
self.assertEqual(self.client.update.call_count, 1)

def test_construct_record(self):
record = self.plugin.construct_record(resource_attributes=self.test_param)

self.assertEqual(record.record_id, self.drone_uuid)
self.assertEqual(record.site_id, self.site)
self.assertEqual(record.user_id, self.user)
self.assertEqual(record.group_id, self.group)
self.assertEqual(len(record.components), 2)
self.assertEqual(record.components[0].name, "Cores")
self.assertEqual(record.components[0].amount, 12)
self.assertEqual(len(record.components[0].scores), 2)
self.assertEqual(record.components[0].scores[0].name, "HEPSPEC")
self.assertEqual(record.components[0].scores[0].factor, 1.2)
self.assertEqual(record.components[0].scores[1].name, "BENCHMARK")
self.assertEqual(record.components[0].scores[1].factor, 3.0)
self.assertEqual(record.components[1].name, "Memory")
self.assertEqual(record.components[1].amount, 100)
self.assertEqual(len(record.components[1].scores), 1)
self.assertEqual(record.components[1].scores[0].name, "BLUBB")
self.assertEqual(record.components[1].scores[0].factor, 1.4)