Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to the master cloud tags are removing them from Load Balancer and leaving the cluster unreachable #9862

Closed
marianomirabelli opened this issue Sep 2, 2020 · 7 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@marianomirabelli
Copy link

1. What kops version are you running? The command kops version, will display
this information.

We are using kops 1.17.0 version.

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

We are using kubernetes 1.16.9.

3. What cloud provider are you using?

We are using AWS.

4. What commands did you run? What is the simplest way to reproduce this issue?

We are using kops and terraform. Therefore, the combination of commands used follow the following sequence:

First, we generate the kops-spec.yml file :

kops toolbox template --name ${cluster-name} --values cluster-vars.json --template cluster-template.yml --format-yaml > kops-spec.yml

Then, we replace the state in S3 bucket:

kops replace -f kops-spec.yml --state ${bucket-name} --kops-state --name ${cluster-name} --force

We execute the kops update as follows:

kops update cluster ${cluster-name} --state=${bucket-name} --out=terraform/ --target=terraform

Then we navigate to terraform folder and execute:

terraform plan

Finally, we do:

terraform apply

5. What happened after the commands executed?

When we make a change to the master nodes, such as adding a new cloudLabel, the master nodes are removed from the load balancer. So when we want to run the rolling-update command, the cluster becomes unreachable.

6. What did you expect to happen?

We expect that a change in the master nodes through terraform does not become the cluster unreachable for the rolling-update and the next operations with kubectl.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

kind: Cluster
metadata:
  creationTimestamp: null
  name: pipeline-test.test.almundo.io
spec:
  api:
    loadBalancer:
      type: Internal
      useForInternalApi: true
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://pipeline-test.test.almundo.io--kops-state/pipeline-test.test.almundo.io
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-us-east-1a
      name: us-east-1a
    - instanceGroup: master-us-east-1b
      name: us-east-1b
    - instanceGroup: master-us-east-1c
      name: us-east-1c
    name: main
    version: 3.3.13
  - etcdMembers:
    - instanceGroup: master-us-east-1a
      name: us-east-1a
    - instanceGroup: master-us-east-1b
      name: us-east-1b
    - instanceGroup: master-us-east-1c
      name: us-east-1c
    name: events
    version: 3.3.13
  iam:
    allowContainerRegistry: true
    legacy: false
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.16.8
  masterInternalName: api.internal.pipeline-test.test.almundo.io
  masterPublicName: api.pipeline-test.test.almundo.io
  networkCIDR: 10.5.0.0/16
  networkID: vpc-00e1e798cb482b198
  networking:
    calico:
      crossSubnet: true
      majorVersion: v3
      mtu: 8912
  nonMasqueradeCIDR: 100.64.0.0/10
  subnets:
  - cidr: 10.5.10.0/24
    id: subnet-049ff35a1c5756450
    name: utility-us-east-1a
    type: Utility
    zone: us-east-1a
  - cidr: 10.5.20.0/24
    id: subnet-069cadd0600707792
    name: utility-us-east-1b
    type: Utility
    zone: us-east-1b
  - cidr: 10.5.30.0/24
    id: subnet-07a637c1d8ae8de7e
    name: utility-us-east-1c
    type: Utility
    zone: us-east-1c
  - cidr: 10.5.110.0/24
    id: subnet-083e778d6dc8c03be
    name: us-east-1a
    type: Private
    zone: us-east-1a
  - cidr: 10.5.120.0/24
    id: subnet-0c30bbfa43ceb6dfd
    name: us-east-1b
    type: Private
    zone: us-east-1b
  - cidr: 10.5.130.0/24
    id: subnet-019ee23ee6a8efa90
    name: us-east-1c
    type: Private
    zone: us-east-1c
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-09-02T14:37:46Z"
  generation: 5
  labels:
    kops.k8s.io/cluster: pipeline-test.test.almundo.io
  name: master-us-east-1a
spec:
  cloudLabels:
    bar: test2
    cluster: pipeline-test.test.almundo.io
    env: dv
    foo: test
    k8s-type: master
    solution: k8s
    zem: test3
  image: kope.io/k8s-1.16-debian-stretch-amd64-hvm-ebs-2020-07-20
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1a
  role: Master
  subnets:
  - us-east-1a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-09-02T14:37:46Z"
  generation: 5
  labels:
    kops.k8s.io/cluster: pipeline-test.test.almundo.io
  name: master-us-east-1b
spec:
  cloudLabels:
    bar: test2
    cluster: pipeline-test.test.almundo.io
    env: dv
    foo: test
    k8s-type: master
    solution: k8s
    zem: test3
  image: kope.io/k8s-1.16-debian-stretch-amd64-hvm-ebs-2020-07-20
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1a
  role: Master
  subnets:
  - us-east-1b

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-09-02T14:37:46Z"
  generation: 5
  labels:
    kops.k8s.io/cluster: pipeline-test.test.almundo.io
  name: master-us-east-1c
spec:
  cloudLabels:
    bar: test2
    cluster: pipeline-test.test.almundo.io
    env: dv
    foo: test
    k8s-type: master
    solution: k8s
    zem: test3
  image: kope.io/k8s-1.16-debian-stretch-amd64-hvm-ebs-2020-07-20
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1a
  role: Master
  subnets:
  - us-east-1c

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-09-02T14:37:46Z"
  generation: 1
  labels:
    kops.k8s.io/cluster: pipeline-test.test.almundo.io
  name: nodes
spec:
  cloudLabels:
    env: dv
    solution: k8s
  image: kope.io/k8s-1.16-debian-stretch-amd64-hvm-ebs-2020-07-20
  machineType: t3.medium
  maxSize: 3
  minSize: 2
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  subnets:
  - us-east-1a
  - us-east-1b
  - us-east-1c

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

We attach the log files in this issue.

kops-replace-output.txt
kops-update-output.txt
terraform-plan-log.txt

@bertux
Copy link

bertux commented Oct 30, 2020

Hello @marianomirabelli I had the same problem even with latest version of Kops 1.18.2 and Kubernetes 1.18.10 but I've solved it by reading "NOTE on Auto Scaling Groups and ASG Attachments" at https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_attachment and applying it to the kubernetes.tf file generated by kops.
More precisely I've added to all resource "aws_autoscaling_group" for bastions, master, nodes :

  lifecycle {
    ignore_changes = [
      # Ignore changes because of attachment above
      load_balancers,
      target_group_arns,
    ]
  }

I had also a bigger delay with detaching/attaching the Target Groups to nodes so I've also added target_group_arns.
I would be interested to prepare a Pull Request which corrects that in the source code but I don't use go often yet so if you can advise me where I should look then it would help me to start.

@bmelbourne
Copy link
Contributor

@marianomirabelli @bertux
I believe this issue has been addressed in PR #9794 and will be released in Kops v1.19.

@bertux
Copy link

bertux commented Nov 13, 2020

thank you @bmelbourne for notifying me this PR which address this problem but I'm using AWS ACM for the SSL certificates of the LoadBalancers and it was written in the release notes of Kops 1.18 that I need to enable basic authentication for the API LoadBalancer which won't be possible anymore in Kops 1.19 so I will see after release date of Kops 1.19 what will be possible.

@marianomirabelli
Copy link
Author

Thanks @bmelbourne for notify me about this PR!

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 11, 2021
@bertux
Copy link

bertux commented Feb 11, 2021

/close

@k8s-ci-robot
Copy link
Contributor

@bertux: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

5 participants