Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎨Clusters keeper/use ssm (🚨change in private clusters) #6361

Merged
merged 14 commits into from
Sep 20, 2024

Conversation

sanderegg
Copy link
Member

@sanderegg sanderegg commented Sep 13, 2024

What do these changes do?

⚠️ This should go in master AFTER the release

Currently when private clusters are created, a new EC2 is created called the primary or manager instance. The current workflow is as follows:

  • EC2 launch,
  • EC2 configuration through custom boot scripts, (through custom bash script on EC2 launch)
  • pulling of dask TLS data from AWS SSM, (through custom bash script on EC2 launch)
  • docker swarm initialization, (through custom bash script on EC2 launch)
  • docker stack deployment with all necessary ENV setup (through custom bash script on EC2 launch),

All these custom bash scripts are passed to the EC2 as so-called UserData (special text that is run on start). The UserData is limited to 16 KB and we are hitting this limit.

This PR changes this behaviour by splitting the part in UserData and the part run via AWS SSM service:
AFTER:

  • EC2 launch,
  • EC2 configuration through custom boot scripts, (through custom bash script on EC2 launch)
  • pulling of dask TLS data from AWS SSM, (via SSM command)
  • docker swarm initialization, (via SSM command)
  • docker stack deployment with all necessary ENV setup (via SSM command),

In order to make this work, the clusters-keeper now waits for the EC2 to connect to the SSM server, and then send the SSM command. This might slightly reduce performances but is expected to be not noticeable.

⚠️ SSM Access is now necessary for clusters-keeper as well

NOTE:

  • SSM command size is limited to 64KB,
  • in case it becomes too large, the commands might be split in several commands.

NOTE2: This will allow to correctly setup graylog in computational clusters,
NOTE3: This will allow to correctly setup registry mirroring in computational clusters,
NOTE4: This will allow to add more machine types without breaking the limit,

Related issue/s

How to test

Dev-ops checklist

new ENV:

@sanderegg sanderegg added this to the Doppelbock milestone Sep 13, 2024
@sanderegg sanderegg self-assigned this Sep 13, 2024
Copy link

codecov bot commented Sep 13, 2024

Codecov Report

Attention: Patch coverage is 98.98990% with 1 line in your changes missing coverage. Please review.

Project coverage is 88.2%. Comparing base (cafbf96) to head (8d03db6).
Report is 563 commits behind head on master.

Files with missing lines Patch % Lines
.../simcore_service_clusters_keeper/utils/clusters.py 87.5% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #6361      +/-   ##
=========================================
+ Coverage    84.5%   88.2%    +3.6%     
=========================================
  Files          10    1507    +1497     
  Lines         214   62485   +62271     
  Branches       25    2070    +2045     
=========================================
+ Hits          181   55140   +54959     
- Misses         23    7025    +7002     
- Partials       10     320     +310     
Flag Coverage Δ
integrationtests 64.6% <ø> (?)
unittests 86.2% <98.9%> (+1.6%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ages/models-library/src/models_library/clusters.py 100.0% <100.0%> (ø)
.../src/simcore_service_clusters_keeper/api/health.py 100.0% <100.0%> (ø)
...r/src/simcore_service_clusters_keeper/constants.py 100.0% <100.0%> (ø)
...imcore_service_clusters_keeper/core/application.py 100.0% <100.0%> (ø)
...c/simcore_service_clusters_keeper/core/settings.py 100.0% <100.0%> (ø)
...imcore_service_clusters_keeper/modules/clusters.py 100.0% <ø> (ø)
...lusters_keeper/modules/clusters_management_core.py 100.0% <100.0%> (ø)
...lusters_keeper/modules/clusters_management_task.py 100.0% <ø> (ø)
...src/simcore_service_clusters_keeper/modules/ssm.py 100.0% <100.0%> (ø)
...r/src/simcore_service_clusters_keeper/utils/ec2.py 100.0% <100.0%> (ø)
... and 1 more

... and 1446 files with indirect coverage changes

@sanderegg sanderegg force-pushed the clusters-keeper/use-ssm branch 2 times, most recently from be264e8 to f72d38c Compare September 17, 2024 12:31
@sanderegg sanderegg changed the title 🎨Clusters keeper/use ssm 🎨Clusters keeper/use ssm (🚨change in private clusters) Sep 17, 2024
@sanderegg sanderegg marked this pull request as ready for review September 17, 2024 12:37
Copy link
Member

@mrnicegyu11 mrnicegyu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read the description not the code, ask if I shall read through it (serious offer).

Also, kudos on going from the 16k limit to the 64k limit 🍡

@sanderegg sanderegg force-pushed the clusters-keeper/use-ssm branch from f72d38c to 3d2ed1b Compare September 17, 2024 13:03
Copy link
Member Author

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocked until release is done

Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx!

@sanderegg sanderegg force-pushed the clusters-keeper/use-ssm branch from 3d2ed1b to 8d03db6 Compare September 20, 2024 14:11
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
19.9% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

@sanderegg sanderegg merged commit c5028b2 into ITISFoundation:master Sep 20, 2024
56 of 57 checks passed
@sanderegg sanderegg deleted the clusters-keeper/use-ssm branch September 20, 2024 14:54
mrnicegyu11 pushed a commit to mrnicegyu11/osparc-simcore that referenced this pull request Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clusters-keeper: cannot cope with long custom scripts
4 participants