Make baremetal targets more resilient to re-running #1004

slagle · 2025-01-28T21:15:49Z

This refactors some of the baremetal targets and scripts so that the
targets can be re-used without completely removing everything (resources
and instances) that were created on previous runs. This enables:

creating different sets of edpm compute nodes, some preprovisioned and
some baremetal, without the baremetal target completely deleting all
edpm instances that existed previously.
adding additional baremetal nodes to create different
NodeSets/Deployments without deleting all previous resources.
Setting specific baremetal node prefixes and suffixes to control the
naming so that specific new nodes can be added without overwriting
others.
Removes over eager rm's, such as for out/edpm, which completely wipes
out anything edpm related already deployed.

Signed-off-by: James Slagle [email protected]

openshift-ci · 2025-01-28T21:15:59Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: slagle

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [slagle]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

softwarefactory-project-zuul · 2025-01-28T22:53:55Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/757fa2e5999c44e9a3c2177d929abf44

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 37m 09s
❌ install-yamls-crc-podified-edpm-baremetal FAILURE in 1h 23m 47s
❌ podified-multinode-edpm-deployment-crc FAILURE in 1h 23m 50s

softwarefactory-project-zuul · 2025-01-29T01:20:08Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/16ea423fe18a4050b9204a9afda9feed

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 43m 17s
❌ install-yamls-crc-podified-edpm-baremetal FAILURE in 1h 30m 42s
❌ podified-multinode-edpm-deployment-crc FAILURE in 1h 26m 37s

softwarefactory-project-zuul · 2025-01-29T19:34:51Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4ec49ccf729540e99379e7b8bd274d09

❌ openstack-k8s-operators-content-provider FAILURE in 12m 10s
⚠️ install-yamls-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ adoption-standalone-to-crc-ceph-provider SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ adoption-standalone-to-crc-no-ceph-provider SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

This refactors some of the baremetal targets and scripts so that the targets can be re-used without completely removing everything (resources and instances) that were created on previous runs. This enables: - creating different sets of edpm compute nodes, some preprovisioned and some baremetal, without the baremetal target completely deleting all edpm instances that existed previously. - adding additional baremetal nodes to create different NodeSets/Deployments without deleting all previous resources. - Setting specific baremetal node prefixes and suffixes to control the naming so that specific new nodes can be added without overwriting others. - Removes over eager rm's, such as for out/edpm, which completely wipes out anything edpm related already deployed. Signed-off-by: James Slagle <[email protected]>

steveb

/lgtm

softwarefactory-project-zuul · 2025-01-29T21:34:45Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/aa440636c1db4dabb5d478ff97b3f811

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 58m 04s
✔️ install-yamls-crc-podified-edpm-baremetal SUCCESS in 1h 33m 33s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 14m 10s
❌ adoption-standalone-to-crc-ceph-provider FAILURE in 1h 39m 52s
❌ adoption-standalone-to-crc-no-ceph-provider FAILURE in 1h 44m 39s

rabi · 2025-01-30T04:29:59Z

devsetup/Makefile

-	scripts/edpm-compute-baremetal.sh --cleanup
-	pushd .. && rm -Rf out/edpm || true && popd
-	make bmaas_sushy_emulator_cleanup || true
+	if [ "${NODE_COUNT}" = "1" ]; then \


As we're changing the name of the vms and bmhs, cleanup won't work with things created with earlier version of the script. They have to be manually deleted.

I think that would still be possible with using the env vars to set the names. However, I don't think we need to maintain backwards compatibility here anyway.

rabi · 2025-01-30T04:36:45Z

devsetup/Makefile

 edpm_baremetal_compute: export BMAAS_NODE_COUNT=${BM_NODE_COUNT}
 edpm_baremetal_compute: export NODE_COUNT=${BM_NODE_COUNT}
 edpm_baremetal_compute: export DEPLOY_DIR=../out/bmaas
 edpm_baremetal_compute: ## Create virtual baremetal for the dataplane
 	$(eval $(call vars))
 	make bmaas_virtual_bms
 	make bmaas_sushy_emulator_wait
-	bash scripts/edpm-compute-baremetal.sh --create
+	if [ "${NODE_COUNT}" = "1" ]; then \


Though you can now re-run to create additional BMHs, as we cleanup the existing OpenstackDataPlaneNodeSet/OpenStackDataPlaneDeployments with edpm_deploy_cleanup[1] that would deprovision the all existing BMHs as well, before provisioning them again.

[1] https://github.com/openstack-k8s-operators/install_yamls/blob/main/Makefile#L922

true. I would actually drop the cleanup deps from edpm_baremetal_deploy and edpm_deploy. I usually use the existing resources already written out, customize them as needed, and use them with oc directly. I guess it depends how far we want to take it in terms of making the targets work without cleanup or making them re-runnable.

rabi · 2025-01-30T04:49:31Z

devsetup/Makefile

+	if [ ${NODE_COUNT} -eq 1 ]; then \
+		scripts/bmaas/vbm-setup.sh --create ; \
+	else \
+		for INDEX in $(shell seq 0 $$((${NODE_COUNT} -1))) ; do \


Calling these scripts as many times as the number of nodes is possibly little inefficient, though does not matter probably as no one would be deploying large number of nodes with install_yamls.

Why do we need to iterate in the Makefile though? The vbm-setup.sh takes a NODE_COUNT env var and iterates within the script itself:
https://github.com/openstack-k8s-operators/install_yamls/blob/main/devsetup/scripts/bmaas/vbm-setup.sh#L166-L170

https://github.com/openstack-k8s-operators/install_yamls/blob/main/devsetup/scripts/bmaas/vbm-setup.sh#L145-L147

if [ ${NODE_COUNT} -eq 1 ]; then \ scripts/bmaas/vbm-setup.sh --create ; \ else scripts/bmaas/vbm-setup.sh --create --num-nodes $NODE_COUNT fi

Then we can just handle the NODE_NAME_SUFFIX in the script?

Just avoids re-running the script like Rabi is mentioning/

I guess the whole check isn't really needed in that scenario actually, because NODE_COUNT would just be handled in the script anyway.

This PR changes vbm-setup.sh to drop --num-nodes, so it manages just a single VM. That change is already included in this PR. Are you asking me to drop that change from the PR? The only inefficient part of calling the script twice would be setting up the libvirt logging, not really something I'd optimize in a dev repo, but we can go back to the way it was if that is wanted.

gen-edpm-node.sh manages 1 vm at a time. That is used for networker and bootc vms already, with the Makefile controlling the looping. So, I was changing the baremetal scripts to work in a similar way.

gen-edpm-node-bgp.sh is hardcoded to manage 3 at a time, so that's a different way as well.

Whichever way it's done, I'd like to see these things consistent.

openshift-ci bot requested review from gibizer and viroel January 28, 2025 21:15

openshift-ci bot added the approved label Jan 28, 2025

slagle added the do-not-merge/work-in-progress label Jan 28, 2025

slagle force-pushed the baremetal-cleanup branch from 4b68adf to 0f01f20 Compare January 29, 2025 19:21

slagle force-pushed the baremetal-cleanup branch from 0f01f20 to ecad344 Compare January 29, 2025 19:35

slagle changed the title ~~EDPM baremetal changes~~ Make baremetal targets more resilient to re-running Jan 29, 2025

openshift-ci bot removed the do-not-merge/work-in-progress label Jan 29, 2025

slagle requested review from abays, bshephar, hjensas, rabi and steveb and removed request for viroel January 29, 2025 19:44

steveb reviewed Jan 29, 2025

View reviewed changes

openshift-ci bot assigned steveb Jan 29, 2025

openshift-ci bot added the lgtm label Jan 29, 2025

rabi reviewed Jan 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make baremetal targets more resilient to re-running #1004

Make baremetal targets more resilient to re-running #1004

slagle commented Jan 28, 2025 •

edited

Loading

openshift-ci bot commented Jan 28, 2025

softwarefactory-project-zuul bot commented Jan 28, 2025

softwarefactory-project-zuul bot commented Jan 29, 2025

softwarefactory-project-zuul bot commented Jan 29, 2025

steveb left a comment

softwarefactory-project-zuul bot commented Jan 29, 2025

rabi Jan 30, 2025

slagle Jan 30, 2025

rabi Jan 30, 2025

slagle Jan 30, 2025

rabi Jan 30, 2025

bshephar Jan 30, 2025

bshephar Jan 30, 2025

slagle Jan 31, 2025

Make baremetal targets more resilient to re-running #1004

Are you sure you want to change the base?

Make baremetal targets more resilient to re-running #1004

Conversation

slagle commented Jan 28, 2025 • edited Loading

openshift-ci bot commented Jan 28, 2025

softwarefactory-project-zuul bot commented Jan 28, 2025

softwarefactory-project-zuul bot commented Jan 29, 2025

softwarefactory-project-zuul bot commented Jan 29, 2025

steveb left a comment

Choose a reason for hiding this comment

softwarefactory-project-zuul bot commented Jan 29, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slagle commented Jan 28, 2025 •

edited

Loading