CAPT changes related to kubified tinkerbell stack #160

panktishah26 · 2022-04-26T16:41:15Z

Description

This PR contains changes related to kubified tinkerbell stack. Tink PostgreSQL dependency is removed from the CAPT and CAPT controllers will only have to communicate with underlying Kubernetes client.

Signed-off-by: Pankti Shah [email protected]

Why is this needed

We have kubified all the services of Tinkerbell stack like Tink-apis, Boots, Hegel, Rufio etc. We needed to change CAPT as well to make the whole kubified stack work to create a BareMetal cluster.

Fixes: #

How Has This Been Tested?

To test these changes,

I set up Kubified Tinkerbell stack by creating containers such as tink-server, boots, hegel, tink-controller, registry and k3s containing Kubernetes related changes.
Removed PostgreSQL tink-api dependencies from the CAPT repo.
Accommodated all the tink-api v1alpha1 structure changes related to hardware, template and workflow.
Ran CAPT on k3s cluster using clusterctl api.
Fed hardware yaml files and deployment file to the running cluster which created a cluster with 1 control plane and 1 worker successfully.

Sample yaml file for new hardware CRDs

apiVersion: "tinkerbell.org/v1alpha1"
kind: Hardware
metadata:
  name: node-1
  namespace: default
spec:
  disks:
    - device: /dev/sda
  metadata:
    facility:
      facility_code: onprem
    instance:
      userdata: ""
      hostname: "node-1"
      id: "xx:xx:xx:xx:xx:xx"
      operating_system:
        distro: "ubuntu"
        os_slug: "ubuntu_20_04"
        version: "20.04"
  interfaces:
    - dhcp:
        arch: x86_64
        hostname: node-1
        ip:
          address: 0.0.0.0
          gateway: 0.0.0.1
          netmask: 255.255.255.0
        lease_time: 86400
        mac: xx:xx:xx:xx:xx:
        name_servers:
          - 8.8.8.8
        uefi: true
      netboot:
        allowPXE: true
        allowWorkflow: true

How are existing users impacted? What migration steps/scripts do we need?

Existing users might have to take the latest versions of each Tinkerbell services to created a cluster.

Checklist:

I have:

updated the documentation and/or roadmap (if required)
added unit or e2e tests
provided instructions on how to upgrade

micahhausler

Thanks for putting out what you've got so far! I have just a few suggestions

config/default/manager_image_patch.yaml

controllers/machine.go

config/default/manager_pull_policy.yaml

controllers/machine.go

controllers/tinkerbellmachine_controller.go

config/default/manager_image_patch.yaml

config/default/manager_pull_policy.yaml

micahhausler

LGTM! Great work @panktishah26!

micahhausler

Thanks for updating the docs! Just some minor changes

docs/README.md

docs/QUICK-START.md

micahhausler

LGTM!

Modified tink hardware crd example Signed-off-by: panktishah26 <[email protected]> ## Description Modified tink hardware crd example. ## Why is this needed I have updated the tink hardware yaml file example which will help customers to create a cluster using CAPT. Fixes: # ## How Has This Been Tested? I have tested this functionality with CAPT changes related to kubified stack in this [PR](tinkerbell/cluster-api-provider-tinkerbell#160). ## Checklist: I have: - [ ] updated the documentation and/or roadmap (if required) - [ ] added unit or e2e tests - [ ] provided instructions on how to upgrade

detiber

Love seeing this work coming together, just a few questions and suggestions. Please excuse my ignorance if anything I brought up has already been discussed while I've been offline.

go.mod

detiber · 2022-05-10T20:38:44Z

controllers/machine.go

@@ -501,7 +507,7 @@ func (mrc *machineReconcileContext) createWorkflow() error {
 		},
 		Spec: tinkv1.WorkflowSpec{
 			TemplateRef: mrc.tinkerbellMachine.Name,
-			HardwareRef: mrc.tinkerbellMachine.Spec.HardwareName,
+			HardwareMap: map[string]string{"device_1": hardware.Spec.Metadata.Instance.ID},


Still reading through the changes, but initial reaction here is: should this be a well defined data type rather than a map[string]string here?

Is the assumption here that hardware.Spec.Metadata.Instance.ID is going to match the MAC address?

I'm wondering if there is a better way to handle this mapping that provides a clearer mapping between Hardware and Workflow.

Maybe something like this (simplified yaml):

hardwareRefs: - placeholder: "device_1", - hardwareName: mrc.tinkerbellMachine.Spec.HardwareName, <some way to specify interface index and whether to use IP or MAC for matching>

This would potentially give the same flexibility as the current map without losing an easy way to match the hardware associated with a given workflow (especially for the purposes in displaying in kubectl get)

Still reading through the changes, but initial reaction here is: should this be a well defined data type rather than a map[string]string here?

I think this could use a better data type, but that will necessitate changes to the API now defined in Tinkerbell.

Is the assumption here that hardware.Spec.Metadata.Instance.ID is going to match the MAC address?

Yes. We should come up with a better way to encode this, but its probably outside the scope of this PR

Cleaning this up in the future sounds good to me

detiber · 2022-05-11T02:02:58Z

controllers/machine.go

@@ -345,7 +349,7 @@ func (mrc *machineReconcileContext) ensureHardware() (*tinkv1.Hardware, error) {
 	}

 	mrc.tinkerbellMachine.Spec.HardwareName = hardware.Name
-	mrc.tinkerbellMachine.Spec.ProviderID = fmt.Sprintf("tinkerbell://%s", hardware.Spec.ID)
+	mrc.tinkerbellMachine.Spec.ProviderID = fmt.Sprintf("tinkerbell://%s", hardware.UID)


Use of the UID here would cause issues if types are ever backed up and restored (or if resources are "pivoted" to a different Kubernetes cluster). This wasn't an issue with the Tinkerbell when using the Postgres backend because users because users were specifying the UUID when creating the hardware.

Before recommending an alternative approach, I have a few questions to narrow down what supported permutations we expect to exist.

Would we expect to be able to reference hardware defined in a different namespace to be used for a given Tinkerbell Machine?

Would we expect the use of hardware defined on a different k8s cluster to be used?

Would we expect to be able to reference hardware defined using Tinkerbell with the postgresql backend still?

Would we expect to be able to reference hardware defined in a different namespace to be used for a given Tinkerbell Machine?

I don't think so, even though Tinkerbell types are namespaced, the Tink stack really only operates in a single namespace. Machines aren't aware of the namespace with how they communicate to the tink components (tink server/worker over gRPC with a MAC ID as the identifier, hardware with boots over DHCP with MAC address, tink-worker to hegel over http with an IP address as the identifier)

Would we expect the use of hardware defined on a different k8s cluster to be used?

Not at this point. We could certainly wire up CAPT to have two K8s clients to separate clusters, one for CAPI and one for Tink, but it doesn't operate this way today. For now, it would probably be unneeded complexity.

Would we expect to be able to reference hardware defined using Tinkerbell with the postgresql backend still?

No, as this PR rips out all the Tink API client that uses the Postgres backend.

We should probably use tinkerbell://{namespace}/{name} for repeatability across clusters, as you brought up with the pivot point.

Should we make this change in this PR or should I create an another PR?

@panktishah26 its up to you

@micahhausler thanks for the clarification, it really helps.

I don't think we need to. block the PR on updating the providerID, especially since it would require additional changes to the controllers to support the new pattern.

I would consider getting things updated to use tinkerbell//{namespace}/{name} a release blocker, though to avoid users hitting issues related to backup/restore and pivot.

Thanks @micahhausler and @detiber, based on our discussion, it would be good if we address this in the different PR since this change require changes to the controllers.

CAPT changes related to kubified tinkerbell stack CAPT changes related to kubified tinkerbell stack CAPT changes related to kubified tinkerbell stack CI related changes Signed-off-by: panktishah26 <[email protected]>

panktishah26 · 2022-05-12T16:58:40Z

I have reverted go.mod file change. Kindly let me know if there is any other changes needs to be done. @micahhausler @detiber

micahhausler

LGTM, awesome work @panktishah26!

@micahhausler

My team has been actively working on this project. My approver role will help expedite the PR review/approve process and help unblock any open issues or feature requests. Org request issue: tinkerbell/org#24 Requirements: * I have reviewed the [community membership guidelines](https://github.com/tinkerbell/proposals/blob/main/proposals/0024/GOVERNANCE.md) * I have [enabled 2FA on my GitHub account](https://github.com/settings/security) * I have subscribed to the [tinkerbell-contributors e-mail list](https://groups.google.com/g/tinkerbell-contributors) * I am actively contributing to 1 or more Tinkerbell subprojects PRs contributions on CAPT: - #130 - #176 PRs reviewed on CAPT: - #160 - #184 - #182 Sponsors: @micahhausler @chrisdoherty4 @jacobweinstock

micahhausler suggested changes Apr 26, 2022

View reviewed changes

panktishah26 force-pushed the capt-kubified-stack branch 5 times, most recently from c183fc3 to fc9077d Compare May 5, 2022 21:10

abhinavmpandey08 reviewed May 5, 2022

View reviewed changes

controllers/machine.go Outdated Show resolved Hide resolved

panktishah26 force-pushed the capt-kubified-stack branch 2 times, most recently from 246bf1c to c2a03f5 Compare May 5, 2022 22:42

micahhausler suggested changes May 5, 2022

View reviewed changes

controllers/tinkerbellmachine_controller.go Outdated Show resolved Hide resolved

panktishah26 force-pushed the capt-kubified-stack branch from c2a03f5 to f9e524b Compare May 5, 2022 23:17

micahhausler reviewed May 6, 2022

View reviewed changes

config/default/manager_image_patch.yaml Outdated Show resolved Hide resolved

micahhausler reviewed May 6, 2022

View reviewed changes

config/default/manager_pull_policy.yaml Outdated Show resolved Hide resolved

panktishah26 force-pushed the capt-kubified-stack branch from f9e524b to 045b178 Compare May 10, 2022 00:15

panktishah26 changed the title ~~[WIP] CAPT changes related to kubified tinkerbell stack~~ CAPT changes related to kubified tinkerbell stack May 10, 2022

micahhausler approved these changes May 10, 2022

View reviewed changes

panktishah26 mentioned this pull request May 10, 2022

Modified tink hardware crd example tinkerbell/tink#621

Merged

3 tasks

panktishah26 force-pushed the capt-kubified-stack branch from 045b178 to 2a2eaef Compare May 10, 2022 21:59

micahhausler suggested changes May 10, 2022

View reviewed changes

docs/README.md Outdated Show resolved Hide resolved

docs/QUICK-START.md Show resolved Hide resolved

panktishah26 force-pushed the capt-kubified-stack branch from 2a2eaef to 417b0f4 Compare May 10, 2022 22:52

micahhausler approved these changes May 10, 2022

View reviewed changes

detiber reviewed May 11, 2022

View reviewed changes

CAPT changes related to kubified tinkerbell stack

2bea178

CAPT changes related to kubified tinkerbell stack CAPT changes related to kubified tinkerbell stack CAPT changes related to kubified tinkerbell stack CI related changes Signed-off-by: panktishah26 <[email protected]>

panktishah26 force-pushed the capt-kubified-stack branch from 417b0f4 to 2bea178 Compare May 12, 2022 01:35

micahhausler approved these changes May 12, 2022

View reviewed changes

micahhausler added the ready-to-merge Signal to Mergify to merge the PR. label May 12, 2022

mergify bot merged commit 77df39b into tinkerbell:main May 12, 2022

abhinavmpandey08 mentioned this pull request Aug 15, 2022

[Organization/member]: request for abhinavmpandey08 tinkerbell/org#24

Closed

7 tasks

abhinavmpandey08 mentioned this pull request Aug 16, 2022

Request approver role for @abhinavmpandey08 #197

Merged

displague added this to the v0.2 milestone Aug 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CAPT changes related to kubified tinkerbell stack #160

CAPT changes related to kubified tinkerbell stack #160

panktishah26 commented Apr 26, 2022 •

edited

Loading

micahhausler left a comment

micahhausler left a comment

micahhausler left a comment

micahhausler left a comment

detiber left a comment

detiber May 10, 2022

detiber May 11, 2022

micahhausler May 11, 2022

detiber May 11, 2022

detiber May 11, 2022

micahhausler May 11, 2022

panktishah26 May 11, 2022

micahhausler May 11, 2022

detiber May 11, 2022

panktishah26 May 12, 2022

panktishah26 commented May 12, 2022

micahhausler left a comment

CAPT changes related to kubified tinkerbell stack #160

CAPT changes related to kubified tinkerbell stack #160

Conversation

panktishah26 commented Apr 26, 2022 • edited Loading

Description

Why is this needed

How Has This Been Tested?

Sample yaml file for new hardware CRDs

How are existing users impacted? What migration steps/scripts do we need?

Checklist:

micahhausler left a comment

Choose a reason for hiding this comment

micahhausler left a comment

Choose a reason for hiding this comment

micahhausler left a comment

Choose a reason for hiding this comment

micahhausler left a comment

Choose a reason for hiding this comment

detiber left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

panktishah26 commented May 12, 2022

micahhausler left a comment

Choose a reason for hiding this comment

panktishah26 commented Apr 26, 2022 •

edited

Loading