Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change drone state initialisation and notification of plugins #247

Merged

Conversation

giffels
Copy link
Member

@giffels giffels commented May 11, 2022

Currently, newly created drones are initialised with state=RequestState(). Plugins like the SqliteRegistry will be notified on the first state change RequestState -> BootingState. As a consequence, the SqliteRegistry is inserting the Drone on the change to BootingState. Sometimes the first startup of a Drone is not successful, which requeues the job associated with the Drone in case of using the HTCondorSiteAdapter. So, the following pattern is occuring in the log.

The Drone is in ResourceState.Running for a short moment before it goes to ResourceState.Booting again due to requeing. So the state of the Drone changes to IntegratingState.

cobald.runtime.tardis.resources.dronestates: 2022-05-11 08:29:53 Drone {'site_name': 'TOPAS-CPU', 'machine_type': 'singlecore', 'obs_machine_meta_data_translation_mapping': {'Cores': 1, 'Memory': 1024, 'Disk': 1048576}, 'remote_resource
_uuid': '8712562.0', 'created': datetime.datetime(2022, 5, 11, 6, 20, 45, 487443), 'updated': datetime.datetime(2022, 5, 11, 6, 26, 45, 517969), 'drone_uuid': 'topas-cpu-863f5ee763'} in IntegratingState

Shortly afterwards the drone is ResourceState.Booting again and the Drone changes back to BootingState again.

cobald.runtime.tardis.plugins.sqliteregistry: 2022-05-11 08:29:53 Drone: {'site_name': 'TOPAS-CPU', 'machine_type': 'singlecore', 'obs_machine_meta_data_translation_mapping': {'Cores': 1, 'Memory': 1024, 'Disk': 1048576}, 'remote_resour
ce_uuid': '8712562.0', 'created': datetime.datetime(2022, 5, 11, 6, 20, 45, 487443), 'updated': datetime.datetime(2022, 5, 11, 8, 29, 53, 163734), 'drone_uuid': 'topas-cpu-863f5ee763', 'resource_status': <ResourceStatus.Booting: 1>} has
 changed state to BootingState

This in turn leads to a re-insertion of the Drone in the SqliteRegistry causing the following error.

sqlite3.IntegrityError: column drone_uuid is not unique

This pull request changes the initialisation procedure and the notfication of the plugins, so that the above behaviour is not occuring anymore. A newly created Drone is now initialised with state = None and all plugins are notified first state change None -> RequestState. The Drone is now inserted when it state changes to RequestState and all subsequent changes are DB updates.

@giffels giffels added the bug Something isn't working label May 11, 2022
@giffels giffels added this to the 0.7.0 - Release milestone May 11, 2022
@codecov-commenter
Copy link

codecov-commenter commented May 11, 2022

Codecov Report

Merging #247 (9293713) into master (631b396) will decrease coverage by 0.04%.
The diff coverage is 50.00%.

@@            Coverage Diff             @@
##           master     #247      +/-   ##
==========================================
- Coverage   99.39%   99.34%   -0.05%     
==========================================
  Files          54       54              
  Lines        2144     2145       +1     
==========================================
  Hits         2131     2131              
- Misses         13       14       +1     
Impacted Files Coverage Δ
tardis/plugins/sqliteregistry.py 100.00% <ø> (ø)
tardis/resources/poolfactory.py 100.00% <ø> (ø)
tardis/resources/drone.py 98.80% <50.00%> (-1.20%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 631b396...9293713. Read the comment docs.

@@ -126,26 +126,24 @@ def test_create_composite(
@patch("tardis.resources.poolfactory.BatchSystemAgent")
@patch("tardis.resources.poolfactory.SiteAgent")
def test_create_drone(self, mock_site_agent, mock_batch_system_agent, mock_drone):
self.assertEqual(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not directly related to the reason for the pull request. Instead it fixes a broken unittest. Actually, the test_create_drone test did not test anything.

@giffels giffels requested review from maxfischer2781, a team and RHofsaess and removed request for a team May 11, 2022 12:58
@giffels giffels marked this pull request as ready for review May 11, 2022 12:59
Copy link
Member

@maxfischer2781 maxfischer2781 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pedantic man strikes again! Request for change! :wq!

tardis/resources/drone.py Outdated Show resolved Hide resolved
@giffels giffels requested a review from maxfischer2781 May 13, 2022 15:26
Copy link
Member

@maxfischer2781 maxfischer2781 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fabulous idea, if I may say so. 🧐

Copy link

@RHofsaess RHofsaess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants