Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intallation fails when waiting for k8s apiserver due to variable overlapping #8387

Closed
unai-ttxu opened this issue Jan 7, 2022 · 0 comments · Fixed by #8388
Closed

Intallation fails when waiting for k8s apiserver due to variable overlapping #8387

unai-ttxu opened this issue Jan 7, 2022 · 0 comments · Fixed by #8388
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@unai-ttxu
Copy link
Contributor

unai-ttxu commented Jan 7, 2022

Environment:

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
Linux 4.18.0-240.10.1.el8_3.x86_64 x86_64
NAME="CentOS Linux"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
  • Version of Ansible (ansible --version):
ansible 2.10.15
  config file = /stratio/ansible/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.8/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 3.8.10 (default, May  6 2021, 00:05:59) [GCC 10.2.1 20201203]
  • Version of Python (python --version):
Python 3.8.10

Kubespray version (commit) (git rev-parse --short HEAD):

92f25bf267ffd3393f6caffa588169d3a44a799c -> v2.18.0

Network plugin used:

calico

Command used to invoke ansible:

ansible-playbook cluster.yml

Output of ansible run:

TASK [kubernetes/control-plane : Create kubeadm ControlPlane config] **********************************************************************************************************************************************
» Friday 07 January 2022  12:49:32 +0000 (0:00:00.103)       0:36:48.467 ****** 
skipping: [kube-control-plane-gx03l92v]

TASK [kubernetes/control-plane : Wait for k8s apiserver] **********************************************************************************************************************************************************
» Friday 07 January 2022  12:49:32 +0000 (0:00:00.038)       0:36:48.505 ****** 
fatal: [kube-control-plane-gx03l92v]: FAILED! => changed=false 
  elapsed: 200
  msg: Timeout when waiting for kube-control-plane-gx03l92v:6443

NO MORE HOSTS LEFT ************************************************************************************************************************************************************************************************

PLAY RECAP ********************************************************************************************************************************************************************************************************
kube-control-plane-gx03l92v : ok=524  changed=117  unreachable=0    failed=1    skipped=693  rescued=0    ignored=2   
kube-node-dz6jnyww         : ok=391  changed=91   unreachable=0    failed=0    skipped=471  rescued=0    ignored=1   
kube-node-e14m35zv         : ok=391  changed=91   unreachable=0    failed=0    skipped=471  rescued=0    ignored=1   
kube-node-ey04mxwj         : ok=391  changed=91   unreachable=0    failed=0    skipped=472  rescued=0    ignored=1   
kube-node-g04l3809         : ok=391  changed=91   unreachable=0    failed=0    skipped=471  rescued=0    ignored=1   
localhost                  : ok=34   changed=11   unreachable=0    failed=0    skipped=30   rescued=0    ignored=0 

Explanation:

As we can see in the Ansible output log, the following task locating in roles/kubernetes/control-plane/tasks/kubeadm-secondary.yml is failing:

- name: Wait for k8s apiserver
  wait_for:
    host: "{{ kubeadm_discovery_address.split(':')[0] }}"
    port: "{{ kubeadm_discovery_address.split(':')[1] }}"
    timeout: 180

This is because kubeadm_discovery_address is set to kube-control-plane-gx03l92v:6443 where kube-control-plane-gx03l92v is the inventory_hostname of the first and only kube_control_plane.

I think this failure is caused by the changes of commit 27ab364. This commit sets the fact first_kube_control_plane in order to ensure task delegation to working kube_control_plane instance. This is achieved by refactoring groups['kube_control_plane']|first to first_kube_control_plane.

This main goal works smoothly but the commit also refactor the use of first_kube_master to first_kube_control_plane. But this is incorrect since first_kube_master should be an address and not the inventory_hostname. Some times both are the same but we can't ensure that. So when roles/kubernetes/control-plane/tasks/define-first-kube-control.yml is included, first_kube_control_plane fact value changes from adress to inventory_hostname generating the fail.

I think one option is to refactor the first_kube_control_plane name from roles/kubespray-defaults/defaults/main.yaml to first_kube_control_plane_address in order to avoid variable overlapping.

I will open the PR to fix this ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant