Skip to content

Commit

Permalink
Merge pull request #242 from giffels/release-0.7.0
Browse files Browse the repository at this point in the history
Release 0.7.0
  • Loading branch information
giffels authored Feb 27, 2023
2 parents 61f6f54 + b7792d1 commit 44725f8
Show file tree
Hide file tree
Showing 39 changed files with 175 additions and 145 deletions.
2 changes: 1 addition & 1 deletion CONTRIBUTORS
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ Contributors ordered by number of commits:
==========================================
Manuel Giffels <[email protected]>
Max Fischer <[email protected]>
Alexander Haas <[email protected]>
Stefan Kroboth <[email protected]>
Alexander Haas <[email protected]>
Eileen Kuehn <[email protected]>
matthias.schnepf <[email protected]>
ubdsv <[email protected]>
Expand Down
4 changes: 2 additions & 2 deletions docs/source/adapters/site.rst
Original file line number Diff line number Diff line change
Expand Up @@ -199,11 +199,11 @@ Available adapter configuration options
| Option | Short Description | Requirement |
+================+===================================================================================+=================+
| max_age | The result of the `condor_status` call is cached for `max_age` in minutes. | **Required** |
+================+===================================================================================+=================+
+----------------+-----------------------------------------------------------------------------------+-----------------+
| bulk_size | Maximum number of jobs to handle per bulk invocation of a condor tool. | **Optional** |
+ + + +
| | Default: 100 | |
+================+===================================================================================+=================+
+----------------+-----------------------------------------------------------------------------------+-----------------+
| bulk_delay | Maximum duration in seconds to wait per bulk invocation of a condor tool. | **Optional** |
+ + + +
| | Default: 1.0 | |
Expand Down

This file was deleted.

15 changes: 0 additions & 15 deletions docs/source/api/tardis.rest.token_generator.rst

This file was deleted.

15 changes: 12 additions & 3 deletions docs/source/changelog.rst
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
.. Created by changelog.py at 2023-02-23, command
.. Created by changelog.py at 2023-02-24, command
'/Users/giffler/.cache/pre-commit/repor6pnmwlm/py_env-python3.10/bin/changelog docs/source/changes compile --output=docs/source/changelog.rst'
based on the format of 'https://keepachangelog.com/'
#########
CHANGELOG
#########

[Unreleased] - 2023-02-23
=========================
[0.7.0] - 2023-02-24
====================

Added
-----

* Introduce a TARDIS REST API to query the state of resources from SqlRegistry
* Ensure python3.10 compatibility
* Added support for manual draining of drones using the REST API
* Add support for passing environment variables as executable arguments to support HTCondor grid universe
* Added support for application credentials of the OpenStack site adapter
Expand All @@ -21,15 +22,23 @@ Added
Changed
-------

* Adjust ElasticSearch plugin to support client versions >=7.17,<8.0.0
* Remove granularity in Standardiser to enable earlier creation of new drones
* Introduced Bulk Executor and HTCondor Bulk Operations
* SSHExecutor respects the remote MaxSessions via queueing
* Remove minimum core limit (Standardiser) from pool factory
* Change drone state initialisation and notification of plugins
* REST API cookie authentication and refactoring
* Adjust Prometheus plugin to the latest aioprometheus version 21.9.0

Fixed
-----

* Unique constraints in database schema have been fixed to allow same machine_type and remote_resource_uuid on multiple sites
* Update the remote_resource_uuid in sqlite registry on a each update
* REST API does not suppress KeyboardInterrupt
* Fixing recurrent cancellation of jobs TIMEOUTED in Slurm
* Fixed state transition for stopped workers

[0.6.0] - 2021-08-09
====================
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
category: fixed
summary: "The `CleanupState` is now taking into account the status of the resource\
\ for state transitions"
summary: "The `CleanupState` is now taking into account the status of the resource for state transitions"
description: |
The `CleanupState` is now taking into account the status of the resource and two missing state transitions have been
added to the transistion dictionary.
Expand Down
5 changes: 3 additions & 2 deletions docs/source/changes/126.fix_config_translation.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
category: fixed
summary: "Fix the translation of cloud init scripts into base64 encoded strings"
description: "The translation of cloud init scripts into base64 encoded strings has\
\ been fixed. Reason was changing a data structure\nwhile interating over it. \n"
description: |
The translation of cloud init scripts into base64 encoded strings has
been fixed. Reason was changing a data structure while interating over it.
pull requests:
- 126
version: 0.3.0
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
category: changed
summary: "The SLURM adapter can now be configured to use different startup commands\
\ for each machine type."
summary: "The SLURM adapter can now be configured to use different startup commands for each machine type."
description: |
The SLURM adapter can now be configured to use different startup commands for each machine type. The old behaviour of
providing one startup command is still supported, but will be deprecated in the next major release 0.4.0.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
category: changed
summary: "The Moab adapter can now be configured to use different startup commands\
\ for each machine type."
summary: "The Moab adapter can now be configured to use different startup commands for each machine type."
description: |
The Moab adapter can now be configured to use different startup commands for each machine type. The old behaviour of
providing one startup command is still supported, but will be deprecated in the next major release 0.4.0.
Expand Down
3 changes: 1 addition & 2 deletions docs/source/changes/145.add_ssh_connection_sharing.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
category: added
summary: "Add ssh connection sharing to `SSHExecutor` in order to re-use existing\
\ connection"
summary: "Add ssh connection sharing to `SSHExecutor` in order to re-use existing connection"
description: |
The `SSHExector` is now re-using existing connections. Closed connections are automatically reestablished. This will
avoid connection problems when bothering a remote ssh server with too many requests in too short intervals.
Expand Down
3 changes: 1 addition & 2 deletions docs/source/changes/146.improve_logging.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
category: changed
summary: "Added log channels and adjusted log levels according to the conventions\
\ in `COBalD` documentation"
summary: "Added log channels and adjusted log levels according to the conventions in `COBalD` documentation"
description: |
Added log channels and adjusted log levels according to the conventions in the `COBalD` documentation.
This will improve the user's ability to filter log messages according their needs.
Expand Down
3 changes: 1 addition & 2 deletions docs/source/changes/166.add_drone_heartbeat_interval.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
category: added
summary: "An optional and per site configurable drone heartbeat interval has been\
\ added"
summary: "An optional and per site configurable drone heartbeat interval has been added"
description: |
Add an optional and per site configurable drone heartbeat interval to TARDIS. The heartbeat interval is defined as
the time between two consecutive calls of the drones run method. The heartbeat interval defaults to 60s.
Expand Down
3 changes: 1 addition & 2 deletions docs/source/changes/169.fix_drone_lifetime.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
category: fixed
summary: "Fixes a bug that the drone_minimum_lifetime parameter is not working as\
\ described in the documentation"
summary: "Fixes a bug that the drone_minimum_lifetime parameter is not working as described in the documentation"
description: |
The `drone_minimum_lifetime` parameter is not working as expected and described in the documentation.
`drone_minimum_lifetime` is meant to be a generic site parameter. However the code is trying to look it up in the
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
category: fixed
summary: "Fixes a bug in the HTCondor Site Adapter which leads to wrong requirements\
\ when using non HTCondor OBS"
summary: "Fixes a bug in the HTCondor Site Adapter which leads to wrong requirements when using non HTCondor OBS"
description: |
The HTCondor Site Adapter takes a wrong `machine_meta_data_translation_mapping` into account in some circumstances.
Due to a bug introduced in #157, the HTCondor Site Adapter uses the `machine_meta_data_translation_mapping` of the
Expand Down
6 changes: 4 additions & 2 deletions docs/source/changes/183.add_rest_api.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
category: added
summary: "Introduce a TARDIS REST API to query the state of resources from SqlRegistry"
description: |
Introduction of a REST API to query the state of resources from the SqlRegistry. The REST API is using the FastApi
framework in combination with an uvicorn ASGI server. JSON Web Token and OAuth2 scopes are supported for
Introduction of a REST API to query the state of resources from the
SqlRegistry. The REST API is using the FastApi framework in combination with
an uvicorn ASGI server. JSON Web Token and OAuth2 scopes are supported for
authentication and authorization.
pull requests:
- 183
version: 0.7.0
9 changes: 9 additions & 0 deletions docs/source/changes/209.change_resource_granularity.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
category: changed
summary: "Remove granularity in Standardiser to enable earlier creation of new drones"
description: |
With granularity new drones are requested when `demand>supply+granularity`.
Remove granularity in Standardiser to enable earlier creation of new drones
when `demand>supply`.
pull requests:
- 209
version: 0.7.0
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
category: changed
summary: "Adjust Prometheus plugin to the latest aioprometheus version 21.9.0"
description: |
`aioprometheus` has changed its API so that metrics are automatically
registered when they are created. Prometheus plugin has been changed
accordingly and the installation of `aioprometheus>=21.9.0` has been enforced.
pull requests:
- 211
version: 0.7.0
9 changes: 9 additions & 0 deletions docs/source/changes/213.add_python_3.10_compatibilty.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
category: added
summary: "Ensure python3.10 compatibility"
description: |
Ensure python3.10 compatibility. Enable Python 3.10 unittests, adds
3.10 as supported release into setup.py and fixes a few deprecation warnings
occuring when executing unittests under 3.10.
pull requests:
- 213
version: 0.7.0
5 changes: 3 additions & 2 deletions docs/source/changes/218.respect_ssh_maxsessions.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ description: |
The SSHExecutor now is aware of sshd MaxSessions, which is a limit on the concurrent
operations per connection. If more operations are to be run at once, operations are
queued until a session becomes available.
pull requests:
- 218
issues:
- 217
pull requests:
- 218
version: 0.7.0
5 changes: 2 additions & 3 deletions docs/source/changes/220.fix_unique_constraints_db_schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ summary: "Unique constraints in database schema have been fixed to allow same ma
description: |
The unique constraints in the datebase schema have been relaxed to allow the same machine_type and the same
remote_resource_uuid to be used on multiple sites. In addition, the unittest of the SqliteRegistry have been improved.
pull_requests:
- 220
issues:
- 219
- 219
version: 0.7.0
13 changes: 13 additions & 0 deletions docs/source/changes/224.changed_htcondor_bulk_operations.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
category: changed
summary: "Introduced Bulk Executor and HTCondor Bulk Operations"
description: |
Introduced bulk execution to HTCondor SiteAdapter including generic
AsyncBulkCall framework class for collecting tasks to execute in bulk.
HTCondorAdapter uses bulk executions for its commands `deploy resource`,
`stop resource` and `terminate resource`. Changes the Resource UUID
format used by the HTCondor Site adapter to `ClusterId.ProcId`.
issues:
- 223
pull requests:
- 224
version: 0.7.0
12 changes: 12 additions & 0 deletions docs/source/changes/230.change_support_new_es_client.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
category: changed
summary: "Adjust ElasticSearch plugin to support client versions >=7.17,<8.0.0"
description: |
The latest versions of the Elasticsearch client have a compatibility
mode which can also be used for newer server versions. Includes also a fix
for the case where `resource_status` is not part of `resource_attributes`,
which occasionally caused crashes. Newer versions of the Elasticsearch client
require the `scheme` parameter to be set. By setting this already, it will
be easier to eventually transition to client version 8.
pull requests:
- 230
version: 0.7.0
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
category: fixed
summary: "Fixed state transition for stopped workers"
description: |
Fixes an unexpected behaviour for Drones in AvailableState in case the
HTCondor daemon on nodes is shutdown automatically causing the machine status
to be NotAvailable, while the resource status continues to be Running. In hat
case the drone state is re-set to IntegratingState. Since HTCondor is not
restarted, the Drone remains in this state forever.
pull requests:
- 234
version: 0.7.0
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
category: fixed
summary: "Fixing recurrent cancellation of jobs TIMEOUTED in Slurm"
description: |
Fixed a problem where Slurm jobs in status TIMEOUT are not handled correctly.
Slurm TIMEOUT state were handled as `ResourceStatus.Error` causing TARDIS to
repeatedly cleanup the job from the batch system using `scancel`. Now timeouted
drones in Slurm are handled as `ResourceStatus.Deleted` instead.
issues:
- 240
pull requests:
- 241
version: 0.7.0
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
category: added
summary: "Add support for passing environment variables as executable arguments to support HTCondor grid universe"
description: |
In order to properly identify started drones in the overlay batch system and to limit the amount of resources
(CPU cores, memory, disk) announced to be available, a set of environment variables needs to be set inside the drone.
In case of the HTCondor grid universe such an environment is usually dropped by the Grid Compute Element. Therefore
the possibility to pass the environment variables using executable arguments has been added.
pull_requests:
- 224
In order to properly identify started drones in the overlay batch system
and to limit the amount of resources (CPU cores, memory, disk) announced to
be available, a set of environment variables needs to be set inside the drone.
In case of the HTCondor grid universe such an environment is usually dropped
by the Grid Compute Element. Therefore the possibility to pass the environment
variables using executable arguments has been added.
version: 0.7.0
16 changes: 9 additions & 7 deletions docs/source/changes/247.change_drone_state_initialisation.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
category: changed
summary: "Change drone state initialisation and notification of plugins"
description: |
The initialisation procedure and the notification of the plugins is changed to fix a bug occurring on restarts of
Drones. A newly created Drone is now initialised with ``state = None`` and all plugins are notified first state
change ``None`` -> ``RequestState``. The Drone is now inserted in the `SqliteRegistry` when it state changes to
``RequestState`` and all subsequent changes are DB updates. So, failing duplicated inserts due to the unique
requirement of the ``drone_uuid`` are prevented in case a Drone changes back to ``BootingState`` again.
pull_requests:
- 247
The initialisation procedure and the notification of the plugins is
changed to fix a bug occurring on restarts of Drones. A newly created Drone
is now initialised with ``state = None`` and all plugins are notified first
state change ``None`` -> ``RequestState``. The Drone is now inserted in the
`SqliteRegistry` when it state changes to ``RequestState`` and all
subsequent changes are DB updates. So, failing duplicated inserts due to the
unique requirement of the ``drone_uuid`` are prevented in case a Drone
changes back to ``BootingState`` again.
version: 0.7.0
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
category: fixed
summary: "Update the remote_resource_uuid in sqlite registry on a each update"
description: |
The change drone state initialisation update revealed a bug in TARDIS. The ``remote_resource_uuid`` in the
``SqliteRegistry`` plugin is not updated at all. As a result, TARDIS keeps crashing on restarts due to the missing
``remote_resource_uuid`` until the DB has been removed.
pull_requests:
- 249
The change drone state initialisation update revealed a bug in TARDIS.
The ``remote_resource_uuid`` in the ``SqliteRegistry`` plugin is not
updated at all. As a result, TARDIS keeps crashing on restarts due to the
missing ``remote_resource_uuid`` until the DB has been removed.
issues:
- 248
- 248
version: 0.7.0
8 changes: 8 additions & 0 deletions docs/source/changes/250.change_rest_api_auth.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
category: changed
summary: "REST API cookie authentication and refactoring"
description: |
The authentication method was changed to cookie authentication with jwt
tokens. The login router was completly changed and moved to /user.
pull requests:
- 250
version: 0.7.0
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,4 @@ description: |
requested. This overwrites all Standardisers using the minimum parameter in the COBalD pipeline. It turns out that
the ``Standardiser`` is not needed anymore, since the ``utilisation`` and ``allocation`` is always 1.0 when no drone
is running, so that automatically one is requested.
pull_requests:
- 252
version: 0.7.0
9 changes: 9 additions & 0 deletions docs/source/changes/259.fix_rest_api_keyboard_interupt.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
category: fixed
summary: "REST API does not suppress KeyboardInterrupt"
description: |
Implements a workaround for a bug in uvicorn that suppresses
KeyboardInterrupt. It allows clean shutdown in both the current cobald
release and master version.
pull requests:
- 259
version: 0.7.0
8 changes: 5 additions & 3 deletions docs/source/changes/260.add_remote_drone_draining.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
category: added
summary: "Added support for manual draining of drones using the REST API"
description: |
Added limited support to synchronize the state stored in the ``SqliteRegistry`` with the current state of the drone.
Only implemented for drones in ``AvailableState`` which can transition to ``DrainState`` via a remote update of the
``SqliteRegistry``, i.e. using the REST API.
Added limited support to synchronize the state stored in the ``SqliteRegistry``
with the current state of the drone. Only implemented for drones in ``AvailableState``
which can transition to ``DrainState`` via a remote update of the ``SqliteRegistry``,
i.e. using the REST API.
pull requests:
- 260
version: 0.7.0
7 changes: 7 additions & 0 deletions docs/source/changes/263.add_auditor_accounting_plugin,yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
category: added
summary: "Added Auditor accounting plugin"
description: |
Added Auditor (Accounting Data Handling Toolbox For Opportunistic Resources)
plugin.
pull requests:
- 263
1 change: 1 addition & 0 deletions docs/source/changes/267.add_lancium_site_adapter.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ description: |
A new Lancium compute site adapter has been added to `TARDIS` to use resources provided by the Lancium compute cluster.
pull requests:
- 267
version: 0.7.0
Loading

0 comments on commit 44725f8

Please sign in to comment.