Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

waiter.py Add ClusterOperator Test #879

Merged

Conversation

stratus-ss
Copy link
Contributor

SUMMARY

Fixes #869

During an OpenShift installation, one of the checks to see that the cluster is ready to proceed with configuration is to check to ensure that the Cluster Operators are in an Available: True Degraded: False Progressing: False state. While you can currently use the k8s_info module to get a json response, the resulting json needs to be iterated over several times to get the appropriate status.

This PR adds functionality into waiter.py which loops over all resource instances of the cluster operators. If any of them is not ready, waiter returns False and the task false. If the task returns, you can assume that all the cluster operators are healthy.

ISSUE TYPE
  • Feature Pull Request
COMPONENT NAME

waiter.py

ADDITIONAL INFORMATION

A simple playbook will trigger the waiter.py to watch the ClusterOperator object


---
- name: get operators
  hosts: localhost
  gather_facts: false
  tasks:
    - name: Get cluster operators
      kubernetes.core.k8s_info:
        api_version: v1
        kind: ClusterOperator
        kubeconfig: "/home/ocp/one/auth/kubeconfig"
        wait: true
        wait_timeout: 30
      register: cluster_operators

This will produce the simple response if everything is functioning properly:

PLAY [get operators] *************************************************************************************************

TASK [Get cluster operators] *****************************************************************************************
ok: [localhost]

PLAY RECAP ***********************************************************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

If the timeout is reached:

PLAY [get operators] *************************************************************************************************

TASK [Get cluster operators] *****************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ansible_collections.kubernetes.core.plugins.module_utils.k8s.exceptions.CoreException: Failed to gather information about ClusterOperator(s) even after waiting for 30 seconds
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Failed to gather information about ClusterOperator(s) even after waiting for 30 seconds"}

PLAY RECAP ***********************************************************************************************************
localhost                  : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   

UNSOLVED: How to know which Operators are failing

Copy link

Copy link
Contributor

@mandar242 mandar242 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please also add a changelog fragment? thanks!

Copy link

Copy link

@stratus-ss
Copy link
Contributor Author

could you please also add a changelog fragment? thanks!

It should be there now

@mandar242 mandar242 requested review from gravesm and abikouo February 12, 2025 19:16
Copy link
Contributor

@abikouo abikouo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good; you need to fix the CI issues

@stratus-ss
Copy link
Contributor Author

The code looks good; you need to fix the CI issues

I am having a small problem with the linter. I was copying the deployments.yml fixture which seems to have multiple documents. However, when I run the linter it's getting tripped up

ERROR: tests/unit/module_utils/fixtures/clusteroperator.yml:1:1: multiple-yaml-documents: expected a single document in the stream

I'm not exactly sure how to fix this except by removing usecases

Copy link

@mandar242 mandar242 requested a review from abikouo February 25, 2025 16:01
Copy link

@mandar242
Copy link
Contributor

The code looks good; you need to fix the CI issues

I am having a small problem with the linter. I was copying the deployments.yml fixture which seems to have multiple documents. However, when I run the linter it's getting tripped up

ERROR: tests/unit/module_utils/fixtures/clusteroperator.yml:1:1: multiple-yaml-documents: expected a single document in the stream

I'm not exactly sure how to fix this except by removing usecases

Hi @stratus-ss , for this, could you please add line below to files in https://github.com/stratus-ss/kubernetes.core/tree/ClusterOperator/tests/sanity

tests/unit/module_utils/fixtures/clusteroperator.yml yamllint!skip

similar to

tests/unit/module_utils/fixtures/definitions.yml yamllint!skip
tests/unit/module_utils/fixtures/deployments.yml yamllint!skip
tests/integration/targets/k8s_delete/files/deployments.yaml yamllint!skip

Copy link

@mandar242
Copy link
Contributor

@stratus-ss you'll need to add the change to all ignore files here https://github.com/stratus-ss/kubernetes.core/tree/ClusterOperator/tests/sanity

ignore-2.14.txt
ignore-2.15.txt
ignore-2.16.txt
ignore-2.17.txt
ignore-2.18.txt
ignore-2.19.txt

Copy link

Copy link
Contributor

@abikouo abikouo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link

Build succeeded (gate pipeline).
https://ansible.softwarefactory-project.io/zuul/buildset/3459216024124eb9aacbb2b341809e54

✔️ ansible-galaxy-importer SUCCESS in 4m 20s
✔️ build-ansible-collection SUCCESS in 5m 19s

@softwarefactory-project-zuul softwarefactory-project-zuul bot merged commit 7cdf0d0 into ansible-collections:main Feb 26, 2025
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenShift & k8s_info: Support Cluster Operator Info Gathering
3 participants