Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 0.7.0 #242

Merged
merged 17 commits into from
Feb 27, 2023
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTORS
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ Contributors ordered by number of commits:
==========================================
Manuel Giffels <[email protected]>
Max Fischer <[email protected]>
Alexander Haas <[email protected]>
Stefan Kroboth <[email protected]>
Alexander Haas <[email protected]>
Eileen Kuehn <[email protected]>
matthias.schnepf <[email protected]>
ubdsv <[email protected]>
Expand Down
4 changes: 2 additions & 2 deletions docs/source/adapters/site.rst
Original file line number Diff line number Diff line change
Expand Up @@ -199,11 +199,11 @@ Available adapter configuration options
| Option | Short Description | Requirement |
+================+===================================================================================+=================+
| max_age | The result of the `condor_status` call is cached for `max_age` in minutes. | **Required** |
+================+===================================================================================+=================+
+----------------+-----------------------------------------------------------------------------------+-----------------+
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected multiple head/body row separators in the HTCondor site adapter documentation. The table was missing otherwise as it caused an error when building the docs.

| bulk_size | Maximum number of jobs to handle per bulk invocation of a condor tool. | **Optional** |
+ + + +
| | Default: 100 | |
+================+===================================================================================+=================+
+----------------+-----------------------------------------------------------------------------------+-----------------+
| bulk_delay | Maximum duration in seconds to wait per bulk invocation of a condor tool. | **Optional** |
+ + + +
| | Default: 1.0 | |
Expand Down
15 changes: 12 additions & 3 deletions docs/source/changelog.rst
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
.. Created by changelog.py at 2023-02-23, command
.. Created by changelog.py at 2023-02-24, command
'/Users/giffler/.cache/pre-commit/repor6pnmwlm/py_env-python3.10/bin/changelog docs/source/changes compile --output=docs/source/changelog.rst'
based on the format of 'https://keepachangelog.com/'

#########
CHANGELOG
#########

[Unreleased] - 2023-02-23
=========================
[0.7.0] - 2023-02-24
====================

Added
-----

* Introduce a TARDIS REST API to query the state of resources from SqlRegistry
* Ensure python3.10 compatibility
* Added support for manual draining of drones using the REST API
* Add support for passing environment variables as executable arguments to support HTCondor grid universe
* Added support for application credentials of the OpenStack site adapter
Expand All @@ -21,15 +22,23 @@ Added
Changed
-------

* Adjust ElasticSearch plugin to support client versions >=7.17,<8.0.0
* Remove granularity in Standardiser to enable earlier creation of new drones
* Introduced Bulk Executor and HTCondor Bulk Operations
* SSHExecutor respects the remote MaxSessions via queueing
* Remove minimum core limit (Standardiser) from pool factory
* Change drone state initialisation and notification of plugins
* REST API cookie authentication and refactoring
* Adjust Prometheus plugin to the latest aioprometheus version 21.9.0

Fixed
-----

* Unique constraints in database schema have been fixed to allow same machine_type and remote_resource_uuid on multiple sites
* Update the remote_resource_uuid in sqlite registry on a each update
* REST API does not suppress KeyboardInterrupt
* Fixing recurrent cancellation of jobs TIMEOUTED in Slurm
* Fixed state transition for stopped workers

[0.6.0] - 2021-08-09
====================
Expand Down
9 changes: 5 additions & 4 deletions docs/source/changes/183.add_rest_api.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
category: added
summary: "Introduce a TARDIS REST API to query the state of resources from SqlRegistry"
description: |
Introduction of a REST API to query the state of resources from the SqlRegistry. The REST API is using the FastApi
framework in combination with an uvicorn ASGI server. JSON Web Token and OAuth2 scopes are supported for
authentication and authorization.
description: "Introduction of a REST API to query the state of resources from the\
\ SqlRegistry. The REST API is using the FastApi \nframework in combination with\
\ an uvicorn ASGI server. JSON Web Token and OAuth2 scopes are supported for \n\
authentication and authorization.\n"
pull requests:
- 183
version: 0.7.0
8 changes: 8 additions & 0 deletions docs/source/changes/209.change_resource_granularity.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
category: changed
summary: "Remove granularity in Standardiser to enable earlier creation of new drones"
description: "With granularity new drones are requested when `demand>supply+granularity`.\
\ Remove granularity in Standardiser to \nenable earlier creation of new drones\
\ when `demand>supply`.\n"
pull requests:
- 209
version: 0.7.0
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
category: changed
summary: "Adjust Prometheus plugin to the latest aioprometheus version 21.9.0"
description: "`aioprometheus` has changed its API so that metrics are automatically\
\ registered when they are created. Prometheus \nplugin has been changed accordingly\
\ and the installation of `aioprometheus>=21.9.0` has been enforced.\n"
pull requests:
- 211
version: 0.7.0
8 changes: 8 additions & 0 deletions docs/source/changes/213.add_python_3.10_compatibilty.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
category: added
summary: "Ensure python3.10 compatibility"
description: "Ensure python3.10 compatibility. Enable Python 3.10 unittests, adds\
\ 3.10 as supported release into setup.py and \nfixes a few deprecation warnings\
\ occuring when executing unittests under 3.10.\n"
pull requests:
- 213
version: 0.7.0
5 changes: 3 additions & 2 deletions docs/source/changes/218.respect_ssh_maxsessions.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ description: |
The SSHExecutor now is aware of sshd MaxSessions, which is a limit on the concurrent
operations per connection. If more operations are to be run at once, operations are
queued until a session becomes available.
pull requests:
- 218
issues:
- 217
pull requests:
- 218
version: 0.7.0
8 changes: 4 additions & 4 deletions docs/source/changes/220.fix_unique_constraints_db_schema.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
category: fixed
summary: "Unique constraints in database schema have been fixed to allow same machine_type and remote_resource_uuid on multiple sites"
summary: "Unique constraints in database schema have been fixed to allow same machine_type\
\ and remote_resource_uuid on multiple sites"
description: |
The unique constraints in the datebase schema have been relaxed to allow the same machine_type and the same
remote_resource_uuid to be used on multiple sites. In addition, the unittest of the SqliteRegistry have been improved.
pull_requests:
- 220
issues:
- 219
- 219
version: 0.7.0
12 changes: 12 additions & 0 deletions docs/source/changes/224.changed_htcondor_bulk_operations.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
category: changed
summary: "Introduced Bulk Executor and HTCondor Bulk Operations"
description: "Introduced bulk execution to HTCondor SiteAdapter including generic\
\ AsyncBulkCall framework class for collecting \ntasks to execute in bulk. HTCondorAdapter\
\ uses bulk executions for its commands `deploy resource`, `stop resource` and\n\
`terminate resource`. Changes the Resource UUID format used by the HTCondor Site\
\ adapter to `ClusterId.ProcId`.\n"
issues:
- 223
pull requests:
- 224
version: 0.7.0
11 changes: 11 additions & 0 deletions docs/source/changes/230.change_support_new_es_client.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
category: changed
summary: "Adjust ElasticSearch plugin to support client versions >=7.17,<8.0.0"
description: "The latest versions of the Elasticsearch client have a compatibility\
\ mode which can also be used for newer server \nversions. Includes also a fix for\
\ the case where `resource_status` is not part of `resource_attributes`, which \n\
occasionally caused crashes. Newer versions of the Elasticsearch client require\
\ the `scheme` parameter to be set. \nBy setting this already, it will be easier\
\ to eventually transition to client version 8.\n"
pull requests:
- 230
version: 0.7.0
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
category: fixed
summary: "Fixed state transition for stopped workers"
description: "Fixes an unexpected behaviour for Drones in AvailableState in case the\
\ HTCondor daemon on nodes is shutdown\nautomatically causing the machine status\
\ to be NotAvailable, while the resource status continues to be Running. In \nthat\
\ case the drone state is re-set to IntegratingState. Since HTCondor is not restarted,\
\ the Drone remains in this \nstate forever.\n"
pull requests:
- 234
version: 0.7.0
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
category: fixed
summary: "Fixing recurrent cancellation of jobs TIMEOUTED in Slurm"
description: "Fixed a problem where Slurm jobs in status TIMEOUT are not handled correctly.\
\ Slurm TIMEOUT state were handled as \n`ResourceStatus.Error` causing TARDIS to\
\ repeatedly cleanup the job from the batch system using `scancel`. Now \ntimeouted\
\ drones in Slurm are handled as `ResourceStatus.Deleted` instead.\n"
issues:
- 240
pull requests:
- 241
version: 0.7.0
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
category: added
summary: "Add support for passing environment variables as executable arguments to support HTCondor grid universe"
description: |
In order to properly identify started drones in the overlay batch system and to limit the amount of resources
(CPU cores, memory, disk) announced to be available, a set of environment variables needs to be set inside the drone.
In case of the HTCondor grid universe such an environment is usually dropped by the Grid Compute Element. Therefore
the possibility to pass the environment variables using executable arguments has been added.
pull_requests:
- 224
summary: "Add support for passing environment variables as executable arguments to\
\ support HTCondor grid universe"
description: "In order to properly identify started drones in the overlay batch system\
\ and to limit the amount of resources \n(CPU cores, memory, disk) announced to\
\ be available, a set of environment variables needs to be set inside the drone.\
\ \nIn case of the HTCondor grid universe such an environment is usually dropped\
\ by the Grid Compute Element. Therefore \nthe possibility to pass the environment\
\ variables using executable arguments has been added.\n"
version: 0.7.0
17 changes: 9 additions & 8 deletions docs/source/changes/247.change_drone_state_initialisation.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
category: changed
summary: "Change drone state initialisation and notification of plugins"
description: |
The initialisation procedure and the notification of the plugins is changed to fix a bug occurring on restarts of
Drones. A newly created Drone is now initialised with ``state = None`` and all plugins are notified first state
change ``None`` -> ``RequestState``. The Drone is now inserted in the `SqliteRegistry` when it state changes to
``RequestState`` and all subsequent changes are DB updates. So, failing duplicated inserts due to the unique
requirement of the ``drone_uuid`` are prevented in case a Drone changes back to ``BootingState`` again.
pull_requests:
- 247
description: "The initialisation procedure and the notification of the plugins is\
\ changed to fix a bug occurring on restarts of \nDrones. A newly created Drone\
\ is now initialised with ``state = None`` and all plugins are notified first state\n\
change ``None`` -> ``RequestState``. The Drone is now inserted in the `SqliteRegistry`\
\ when it state changes to \n``RequestState`` and all subsequent changes are DB\
\ updates. So, failing duplicated inserts due to the unique \nrequirement of the\
\ ``drone_uuid`` are prevented in case a Drone changes back to ``BootingState``\
\ again.\n"
version: 0.7.0
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
category: fixed
summary: "Update the remote_resource_uuid in sqlite registry on a each update"
description: |
The change drone state initialisation update revealed a bug in TARDIS. The ``remote_resource_uuid`` in the
``SqliteRegistry`` plugin is not updated at all. As a result, TARDIS keeps crashing on restarts due to the missing
``remote_resource_uuid`` until the DB has been removed.
pull_requests:
- 249
description: "The change drone state initialisation update revealed a bug in TARDIS.\
\ The ``remote_resource_uuid`` in the \n``SqliteRegistry`` plugin is not updated\
\ at all. As a result, TARDIS keeps crashing on restarts due to the missing \n``remote_resource_uuid``\
\ until the DB has been removed.\n"
issues:
- 248
- 248
version: 0.7.0
8 changes: 8 additions & 0 deletions docs/source/changes/250.change_rest_api_auth.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
category: changed
summary: "REST API cookie authentication and refactoring"
description: |
The authentication method was changed to cookie authentication with jwt
tokens. The login router was completly changed and moved to /user.
pull requests:
- 250
version: 0.7.0
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,4 @@ description: |
requested. This overwrites all Standardisers using the minimum parameter in the COBalD pipeline. It turns out that
the ``Standardiser`` is not needed anymore, since the ``utilisation`` and ``allocation`` is always 1.0 when no drone
is running, so that automatically one is requested.
pull_requests:
- 252
version: 0.7.0
9 changes: 9 additions & 0 deletions docs/source/changes/259.fix_rest_api_keyboard_interupt.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
category: fixed
summary: "REST API does not suppress KeyboardInterrupt"
description: |
Implements a workaround for a bug in uvicorn that suppresses
KeyboardInterrupt. It allows clean shutdown in both the current cobald
release and master version.
pull requests:
- 259
version: 0.7.0
9 changes: 5 additions & 4 deletions docs/source/changes/260.add_remote_drone_draining.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
category: added
summary: "Added support for manual draining of drones using the REST API"
description: |
Added limited support to synchronize the state stored in the ``SqliteRegistry`` with the current state of the drone.
Only implemented for drones in ``AvailableState`` which can transition to ``DrainState`` via a remote update of the
``SqliteRegistry``, i.e. using the REST API.
description: "Added limited support to synchronize the state stored in the ``SqliteRegistry``\
\ with the current state of the drone. \nOnly implemented for drones in ``AvailableState``\
\ which can transition to ``DrainState`` via a remote update of the \n``SqliteRegistry``,\
\ i.e. using the REST API.\n"
pull requests:
- 260
version: 0.7.0
7 changes: 7 additions & 0 deletions docs/source/changes/263.add_auditor_accounting_plugin,yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
category: added
summary: "Added Auditor accounting plugin"
description: |
Added Auditor (Accounting Data Handling Toolbox For Opportunistic Resources)
plugin.
pull requests:
- 263
1 change: 1 addition & 0 deletions docs/source/changes/267.add_lancium_site_adapter.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ description: |
A new Lancium compute site adapter has been added to `TARDIS` to use resources provided by the Lancium compute cluster.
pull requests:
- 267
version: 0.7.0
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
category: added
summary: "Added support for application credentials of the OpenStack site adapter"
description: |
Newer versions of OpenStack support the creation of application credentials for specific projects. So, the pair of
application_credential_id and application_credential_secret is only valid of a specific project. The OpenStack site
adapter now fully supports the utilization of application credentials to authenticate against the OpenStack API
endpoint.
description: "Newer versions of OpenStack support the creation of application credentials\
\ for specific projects. So, the pair of\napplication_credential_id and application_credential_secret\
\ is only valid of a specific project. The OpenStack site\nadapter now fully supports\
\ the utilization of application credentials to authenticate against the OpenStack\
\ API \nendpoint.\n"
pull requests:
- 274
version: 0.7.0
2 changes: 2 additions & 0 deletions docs/source/changes/versions.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
- semver: 0.7.0
date: '2023-02-24'
- semver: 0.6.0
date: '2021-08-09'
- semver: 0.5.0
Expand Down
Loading