Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make baremetal targets more resilient to re-running #1004

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

slagle
Copy link
Contributor

@slagle slagle commented Jan 28, 2025

This refactors some of the baremetal targets and scripts so that the
targets can be re-used without completely removing everything (resources
and instances) that were created on previous runs. This enables:

  • creating different sets of edpm compute nodes, some preprovisioned and
    some baremetal, without the baremetal target completely deleting all
    edpm instances that existed previously.
  • adding additional baremetal nodes to create different
    NodeSets/Deployments without deleting all previous resources.
  • Setting specific baremetal node prefixes and suffixes to control the
    naming so that specific new nodes can be added without overwriting
    others.
  • Removes over eager rm's, such as for out/edpm, which completely wipes
    out anything edpm related already deployed.

Signed-off-by: James Slagle [email protected]

@openshift-ci openshift-ci bot requested review from gibizer and viroel January 28, 2025 21:15
Copy link
Contributor

openshift-ci bot commented Jan 28, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: slagle

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/757fa2e5999c44e9a3c2177d929abf44

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 37m 09s
install-yamls-crc-podified-edpm-baremetal FAILURE in 1h 23m 47s
podified-multinode-edpm-deployment-crc FAILURE in 1h 23m 50s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/16ea423fe18a4050b9204a9afda9feed

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 43m 17s
install-yamls-crc-podified-edpm-baremetal FAILURE in 1h 30m 42s
podified-multinode-edpm-deployment-crc FAILURE in 1h 26m 37s

@slagle slagle force-pushed the baremetal-cleanup branch from 4b68adf to 0f01f20 Compare January 29, 2025 19:21
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4ec49ccf729540e99379e7b8bd274d09

openstack-k8s-operators-content-provider FAILURE in 12m 10s
⚠️ install-yamls-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ adoption-standalone-to-crc-ceph-provider SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ adoption-standalone-to-crc-no-ceph-provider SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

This refactors some of the baremetal targets and scripts so that the
targets can be re-used without completely removing everything (resources
and instances) that were created on previous runs. This enables:

- creating different sets of edpm compute nodes, some preprovisioned and
  some baremetal, without the baremetal target completely deleting all
  edpm instances that existed previously.
- adding additional baremetal nodes to create different
  NodeSets/Deployments without deleting all previous resources.
- Setting specific baremetal node prefixes and suffixes to control the
  naming so that specific new nodes can be added without overwriting
  others.
- Removes over eager rm's, such as for out/edpm, which completely wipes
  out anything edpm related already deployed.

Signed-off-by: James Slagle <[email protected]>
@slagle slagle force-pushed the baremetal-cleanup branch from 0f01f20 to ecad344 Compare January 29, 2025 19:35
@slagle slagle changed the title EDPM baremetal changes Make baremetal targets more resilient to re-running Jan 29, 2025
@slagle slagle requested review from abays, bshephar, hjensas, rabi and steveb and removed request for viroel January 29, 2025 19:44
Copy link
Contributor

@steveb steveb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/aa440636c1db4dabb5d478ff97b3f811

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 58m 04s
✔️ install-yamls-crc-podified-edpm-baremetal SUCCESS in 1h 33m 33s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 14m 10s
adoption-standalone-to-crc-ceph-provider FAILURE in 1h 39m 52s
adoption-standalone-to-crc-no-ceph-provider FAILURE in 1h 44m 39s

scripts/edpm-compute-baremetal.sh --cleanup
pushd .. && rm -Rf out/edpm || true && popd
make bmaas_sushy_emulator_cleanup || true
if [ "${NODE_COUNT}" = "1" ]; then \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we're changing the name of the vms and bmhs, cleanup won't work with things created with earlier version of the script. They have to be manually deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would still be possible with using the env vars to set the names. However, I don't think we need to maintain backwards compatibility here anyway.

edpm_baremetal_compute: export BMAAS_NODE_COUNT=${BM_NODE_COUNT}
edpm_baremetal_compute: export NODE_COUNT=${BM_NODE_COUNT}
edpm_baremetal_compute: export DEPLOY_DIR=../out/bmaas
edpm_baremetal_compute: ## Create virtual baremetal for the dataplane
$(eval $(call vars))
make bmaas_virtual_bms
make bmaas_sushy_emulator_wait
bash scripts/edpm-compute-baremetal.sh --create
if [ "${NODE_COUNT}" = "1" ]; then \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though you can now re-run to create additional BMHs, as we cleanup the existing OpenstackDataPlaneNodeSet/OpenStackDataPlaneDeployments with edpm_deploy_cleanup[1] that would deprovision the all existing BMHs as well, before provisioning them again.

[1] https://github.com/openstack-k8s-operators/install_yamls/blob/main/Makefile#L922

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true. I would actually drop the cleanup deps from edpm_baremetal_deploy and edpm_deploy. I usually use the existing resources already written out, customize them as needed, and use them with oc directly. I guess it depends how far we want to take it in terms of making the targets work without cleanup or making them re-runnable.

if [ ${NODE_COUNT} -eq 1 ]; then \
scripts/bmaas/vbm-setup.sh --create ; \
else \
for INDEX in $(shell seq 0 $$((${NODE_COUNT} -1))) ; do \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling these scripts as many times as the number of nodes is possibly little inefficient, though does not matter probably as no one would be deploying large number of nodes with install_yamls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to iterate in the Makefile though? The vbm-setup.sh takes a NODE_COUNT env var and iterates within the script itself:
https://github.com/openstack-k8s-operators/install_yamls/blob/main/devsetup/scripts/bmaas/vbm-setup.sh#L166-L170

https://github.com/openstack-k8s-operators/install_yamls/blob/main/devsetup/scripts/bmaas/vbm-setup.sh#L145-L147

if [ ${NODE_COUNT} -eq 1 ]; then \
 		scripts/bmaas/vbm-setup.sh --create ; \
 	else
 	        scripts/bmaas/vbm-setup.sh --create --num-nodes $NODE_COUNT
fi

Then we can just handle the NODE_NAME_SUFFIX in the script?

Just avoids re-running the script like Rabi is mentioning/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the whole check isn't really needed in that scenario actually, because NODE_COUNT would just be handled in the script anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR changes vbm-setup.sh to drop --num-nodes, so it manages just a single VM. That change is already included in this PR. Are you asking me to drop that change from the PR? The only inefficient part of calling the script twice would be setting up the libvirt logging, not really something I'd optimize in a dev repo, but we can go back to the way it was if that is wanted.

gen-edpm-node.sh manages 1 vm at a time. That is used for networker and bootc vms already, with the Makefile controlling the looping. So, I was changing the baremetal scripts to work in a similar way.

gen-edpm-node-bgp.sh is hardcoded to manage 3 at a time, so that's a different way as well.

Whichever way it's done, I'd like to see these things consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants