Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAPT changes related to kubified tinkerbell stack #160

Merged
merged 1 commit into from
May 12, 2022

Conversation

panktishah26
Copy link
Contributor

@panktishah26 panktishah26 commented Apr 26, 2022

Description

This PR contains changes related to kubified tinkerbell stack. Tink PostgreSQL dependency is removed from the CAPT and CAPT controllers will only have to communicate with underlying Kubernetes client.

Signed-off-by: Pankti Shah [email protected]

Why is this needed

We have kubified all the services of Tinkerbell stack like Tink-apis, Boots, Hegel, Rufio etc. We needed to change CAPT as well to make the whole kubified stack work to create a BareMetal cluster.

Fixes: #

How Has This Been Tested?

To test these changes,

  1. I set up Kubified Tinkerbell stack by creating containers such as tink-server, boots, hegel, tink-controller, registry and k3s containing Kubernetes related changes.
  2. Removed PostgreSQL tink-api dependencies from the CAPT repo.
  3. Accommodated all the tink-api v1alpha1 structure changes related to hardware, template and workflow.
  4. Ran CAPT on k3s cluster using clusterctl api.
  5. Fed hardware yaml files and deployment file to the running cluster which created a cluster with 1 control plane and 1 worker successfully.

Sample yaml file for new hardware CRDs

apiVersion: "tinkerbell.org/v1alpha1"
kind: Hardware
metadata:
  name: node-1
  namespace: default
spec:
  disks:
    - device: /dev/sda
  metadata:
    facility:
      facility_code: onprem
    instance:
      userdata: ""
      hostname: "node-1"
      id: "xx:xx:xx:xx:xx:xx"
      operating_system:
        distro: "ubuntu"
        os_slug: "ubuntu_20_04"
        version: "20.04"
  interfaces:
    - dhcp:
        arch: x86_64
        hostname: node-1
        ip:
          address: 0.0.0.0
          gateway: 0.0.0.1
          netmask: 255.255.255.0
        lease_time: 86400
        mac: xx:xx:xx:xx:xx:
        name_servers:
          - 8.8.8.8
        uefi: true
      netboot:
        allowPXE: true
        allowWorkflow: true

How are existing users impacted? What migration steps/scripts do we need?

Existing users might have to take the latest versions of each Tinkerbell services to created a cluster.

Checklist:

I have:

  • updated the documentation and/or roadmap (if required)
  • added unit or e2e tests
  • provided instructions on how to upgrade

Copy link
Contributor

@micahhausler micahhausler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting out what you've got so far! I have just a few suggestions

config/default/manager_image_patch.yaml Outdated Show resolved Hide resolved
controllers/machine.go Outdated Show resolved Hide resolved
controllers/machine.go Outdated Show resolved Hide resolved
config/default/manager_pull_policy.yaml Outdated Show resolved Hide resolved
controllers/machine.go Outdated Show resolved Hide resolved
controllers/machine.go Outdated Show resolved Hide resolved
@panktishah26 panktishah26 force-pushed the capt-kubified-stack branch 5 times, most recently from c183fc3 to fc9077d Compare May 5, 2022 21:10
@panktishah26 panktishah26 force-pushed the capt-kubified-stack branch 2 times, most recently from 246bf1c to c2a03f5 Compare May 5, 2022 22:42
@panktishah26 panktishah26 force-pushed the capt-kubified-stack branch from c2a03f5 to f9e524b Compare May 5, 2022 23:17
@panktishah26 panktishah26 force-pushed the capt-kubified-stack branch from f9e524b to 045b178 Compare May 10, 2022 00:15
@panktishah26 panktishah26 changed the title [WIP] CAPT changes related to kubified tinkerbell stack CAPT changes related to kubified tinkerbell stack May 10, 2022
Copy link
Contributor

@micahhausler micahhausler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work @panktishah26!

Copy link
Contributor

@micahhausler micahhausler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the docs! Just some minor changes

docs/README.md Outdated Show resolved Hide resolved
docs/QUICK-START.md Show resolved Hide resolved
@panktishah26 panktishah26 force-pushed the capt-kubified-stack branch from 2a2eaef to 417b0f4 Compare May 10, 2022 22:52
Copy link
Contributor

@micahhausler micahhausler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

mergify bot added a commit to tinkerbell/tink that referenced this pull request May 10, 2022
Modified tink hardware crd example

Signed-off-by: panktishah26 <[email protected]>

## Description
Modified tink hardware crd example.

## Why is this needed
I have updated the tink hardware yaml file example which will help customers to create a cluster using CAPT.

Fixes: #

## How Has This Been Tested?
I have tested this functionality with CAPT changes related to kubified stack in this [PR](tinkerbell/cluster-api-provider-tinkerbell#160).


## Checklist:

I have:

- [ ] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
Copy link
Contributor

@detiber detiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love seeing this work coming together, just a few questions and suggestions. Please excuse my ignorance if anything I brought up has already been discussed while I've been offline.

go.mod Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
@@ -501,7 +507,7 @@ func (mrc *machineReconcileContext) createWorkflow() error {
},
Spec: tinkv1.WorkflowSpec{
TemplateRef: mrc.tinkerbellMachine.Name,
HardwareRef: mrc.tinkerbellMachine.Spec.HardwareName,
HardwareMap: map[string]string{"device_1": hardware.Spec.Metadata.Instance.ID},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reading through the changes, but initial reaction here is: should this be a well defined data type rather than a map[string]string here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the assumption here that hardware.Spec.Metadata.Instance.ID is going to match the MAC address?

I'm wondering if there is a better way to handle this mapping that provides a clearer mapping between Hardware and Workflow.

Maybe something like this (simplified yaml):

hardwareRefs:
- placeholder: "device_1",
- hardwareName: mrc.tinkerbellMachine.Spec.HardwareName,
<some way to specify interface index and whether to use IP or MAC for matching>

This would potentially give the same flexibility as the current map without losing an easy way to match the hardware associated with a given workflow (especially for the purposes in displaying in kubectl get)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reading through the changes, but initial reaction here is: should this be a well defined data type rather than a map[string]string here?

I think this could use a better data type, but that will necessitate changes to the API now defined in Tinkerbell.

Is the assumption here that hardware.Spec.Metadata.Instance.ID is going to match the MAC address?

Yes. We should come up with a better way to encode this, but its probably outside the scope of this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaning this up in the future sounds good to me

@@ -345,7 +349,7 @@ func (mrc *machineReconcileContext) ensureHardware() (*tinkv1.Hardware, error) {
}

mrc.tinkerbellMachine.Spec.HardwareName = hardware.Name
mrc.tinkerbellMachine.Spec.ProviderID = fmt.Sprintf("tinkerbell://%s", hardware.Spec.ID)
mrc.tinkerbellMachine.Spec.ProviderID = fmt.Sprintf("tinkerbell://%s", hardware.UID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of the UID here would cause issues if types are ever backed up and restored (or if resources are "pivoted" to a different Kubernetes cluster). This wasn't an issue with the Tinkerbell when using the Postgres backend because users because users were specifying the UUID when creating the hardware.

Before recommending an alternative approach, I have a few questions to narrow down what supported permutations we expect to exist.

  • Would we expect to be able to reference hardware defined in a different namespace to be used for a given Tinkerbell Machine?
  • Would we expect the use of hardware defined on a different k8s cluster to be used?
  • Would we expect to be able to reference hardware defined using Tinkerbell with the postgresql backend still?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we expect to be able to reference hardware defined in a different namespace to be used for a given Tinkerbell Machine?

I don't think so, even though Tinkerbell types are namespaced, the Tink stack really only operates in a single namespace. Machines aren't aware of the namespace with how they communicate to the tink components (tink server/worker over gRPC with a MAC ID as the identifier, hardware with boots over DHCP with MAC address, tink-worker to hegel over http with an IP address as the identifier)

Would we expect the use of hardware defined on a different k8s cluster to be used?

Not at this point. We could certainly wire up CAPT to have two K8s clients to separate clusters, one for CAPI and one for Tink, but it doesn't operate this way today. For now, it would probably be unneeded complexity.

Would we expect to be able to reference hardware defined using Tinkerbell with the postgresql backend still?

No, as this PR rips out all the Tink API client that uses the Postgres backend.

We should probably use tinkerbell://{namespace}/{name} for repeatability across clusters, as you brought up with the pivot point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this change in this PR or should I create an another PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@panktishah26 its up to you

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@micahhausler thanks for the clarification, it really helps.

I don't think we need to. block the PR on updating the providerID, especially since it would require additional changes to the controllers to support the new pattern.

I would consider getting things updated to use tinkerbell//{namespace}/{name} a release blocker, though to avoid users hitting issues related to backup/restore and pivot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @micahhausler and @detiber, based on our discussion, it would be good if we address this in the different PR since this change require changes to the controllers.

CAPT changes related to kubified tinkerbell stack

CAPT changes related to kubified tinkerbell stack

CAPT changes related to kubified tinkerbell stack

CI related changes

Signed-off-by: panktishah26 <[email protected]>
@panktishah26 panktishah26 force-pushed the capt-kubified-stack branch from 417b0f4 to 2bea178 Compare May 12, 2022 01:35
@panktishah26
Copy link
Contributor Author

I have reverted go.mod file change. Kindly let me know if there is any other changes needs to be done. @micahhausler @detiber

Copy link
Contributor

@micahhausler micahhausler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, awesome work @panktishah26!

@micahhausler micahhausler added the ready-to-merge Signal to Mergify to merge the PR. label May 12, 2022
@mergify mergify bot merged commit 77df39b into tinkerbell:main May 12, 2022
mergify bot added a commit that referenced this pull request Aug 23, 2022
My team has been actively working on this project. My approver role will help expedite the PR review/approve process and help unblock any open issues or feature requests.

Org request issue: tinkerbell/org#24

Requirements:

* I have reviewed the [community membership guidelines](https://github.com/tinkerbell/proposals/blob/main/proposals/0024/GOVERNANCE.md)
* I have [enabled 2FA on my GitHub account](https://github.com/settings/security)
* I have subscribed to the [tinkerbell-contributors e-mail list](https://groups.google.com/g/tinkerbell-contributors)
* I am actively contributing to 1 or more Tinkerbell subprojects

PRs contributions on CAPT:
- #130
- #176

PRs reviewed on CAPT:
- #160
- #184
- #182

Sponsors:
@micahhausler @chrisdoherty4 @jacobweinstock
@displague displague added this to the v0.2 milestone Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge Signal to Mergify to merge the PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants