Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openstack cloud provider fails to reauthenticate #44461

Closed
bradbeam opened this issue Apr 13, 2017 · 10 comments · Fixed by #45545
Closed

openstack cloud provider fails to reauthenticate #44461

bradbeam opened this issue Apr 13, 2017 · 10 comments · Fixed by #45545
Labels
area/provider/openstack Issues or PRs related to openstack provider
Milestone

Comments

@bradbeam
Copy link
Contributor

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.):
no

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):

cinder / openstack / reauth

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
Bug Report

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1+coreos.0", GitCommit:"9212f77ed8c169a0afa02e58dce87913c6387b3e", GitTreeState:"clean", BuildDate:"2017-04-04T00:32:53Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1+coreos.0", GitCommit:"9212f77ed8c169a0afa02e58dce87913c6387b3e", GitTreeState:"clean", BuildDate:"2017-04-04T00:32:53Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: OpenStack
  • OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="16.04.1 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.1 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
UBUNTU_CODENAME=xenial
  • Kernel (e.g. uname -a):
Linux mgmt-01 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
    Kargo
  • Others:
    Kubelet running in rkt via systemd service; docker still used for kubernetes workloads

What happened:
When kubelet initially starts up, it is able to successfully authenticate to openstack and openstack integration works great ( dynamic provisioning with cinder, pulling instance data, etc ). However, after 12h ( our token TTL ), kubelet ( and other kubernetes components ) attempt to reauthenticate and fail to do so --

Apr 11 16:01:36 control-01 rkt[24916]: I0411 16:01:36.851732   24916 openstack_instances.go:80] Claiming to support Instances
Apr 11 16:01:38 control-01 rkt[24916]: I0411 16:01:38.892420   24916 container_manager_linux.go:439] Discovered runtime cgroups name: /system.slice/docker.service
Apr 11 16:01:42 control-01 rkt[24916]: I0411 16:01:42.974724   24916 server.go:778] GET /metrics: (60.65808ms) 200 [[Go-http-client/1.1] 10.232.255.10:57764]
Apr 11 16:01:47 control-01 rkt[24916]: W0411 16:01:47.858585   24916 openstack_instances.go:75] Failed to find compute flavors: Successfully re-authenticated, but got error executing request: Invalid requst

After it fails to re-authenticate, cinder persistent volumes can no longer be created.

We did some packet captures and found that during kubelet startup, it sends in the correctly scoped request --

{
    "auth": {
        "identity": {
            "methods": [
                "password"
            ], 
            "password": {
                "user": {
                    "domain": {
                        "name": "domainname"
                    }, 
                    "name": "username", 
                    "password": "password"
                }
            }
        }, 
        "scope": {
            "project": {
                "id": "9dedeb58b40044c5a93946bfec716429"
            }
        }
    }
}

When doing the reauth, the scope is missing from the request,

{
    "auth": {
        "identity": {
            "methods": [
                "password"
            ], 
            "password": {
                "user": {
                    "domain": {
                        "name": "domainname"
                    }, 
                    "name": "username", 
                    "password": "password"
                }
            }
        }
    }
}

Which results in openstack responding with

{
    "badRequest": {
        "code": 400, 
        "message": "Malformed request URL: URL's project_id '9dedeb58b40044c5a93946bfec716429' doesn't match Context's project_id 'None'"
    }
}

What you expected to happen:
Kubernetes components can reauthenticate with Openstack as necessary

How to reproduce it (as minimally and precisely as possible):
Provision an environment with openstack cloud provider, let it sit around for 12h ( or longer depending on the configuration for the session to expire ). You should be able to see log messages in kubelet and controller-manager about failing to connect with openstack. You can also attempt to provision a pv/pvc using cinder and see it get stuck in a pending state.

Anything else we need to know:
I believe this was working correctly in 1.5.x
It looks like between 1.5 -> 1.6, gophercloud was replaced/updated. I'm suspecting it has something to do with this, but not positive.

@bradbeam
Copy link
Contributor Author

@xrl just posted gophercloud/gophercloud#255 in slack that seems to be very much related. Testing locally to see if it addresses.

@bradbeam
Copy link
Contributor Author

looks like that patch addressed it

@xsgordon
Copy link

xsgordon commented Apr 17, 2017

/cc @kubernetes/sig-openstack-bugs

@idvoretskyi can I get you to tag this with the sig/openstack label please.

@idvoretskyi idvoretskyi added the area/provider/openstack Issues or PRs related to openstack provider label Apr 19, 2017
@mikebryant
Copy link
Contributor

This is a significant regression in 1.6 for us. Can this be prioritised, and when fixed cherry-picked into a 1.6 patch release?

@stuart-warren
Copy link
Contributor

I assume we just need to bump the release of gophercloud/gophercloud

Master (as of today): gophercloud/gophercloud@b06120d...ce1e02c - 80 file changes

Related fixes only: gophercloud/gophercloud@b06120d...0bf921d - 74 file changes

@mikebryant
Copy link
Contributor

I think this is also breaking the kubelet:

Jun 09 10:44:42 kubernetes-cit-kubernetes-cr1-2-1495722764 bash[5121]: E0609 10:44:42.555324    5121 kubelet.go:1535] Failed creating a mirror pod for "pod-checkpointer-kubernetes-cit-kubernetes-cr1-2-1495722764_kube-system(82fe97d64f905684fc168629eced8637)": pods "pod-checkpointer-kubernetes-cit-kubernetes-cr1-2-1495722764" already exists
Jun 09 10:44:43 kubernetes-cit-kubernetes-cr1-2-1495722764 bash[5121]: W0609 10:44:43.889177    5121 openstack_instances.go:48] Failed to find compute endpoint: No suitable endpoint could be found in the service catalog.
Jun 09 10:44:43 kubernetes-cit-kubernetes-cr1-2-1495722764 bash[5121]: W0609 10:44:43.889214    5121 kubelet_node_status.go:899] Failed to set some node status fields: failed to get instances from cloud provider
Jun 09 10:44:53 kubernetes-cit-kubernetes-cr1-2-1495722764 bash[5121]: W0609 10:44:53.914174    5121 openstack_instances.go:48] Failed to find compute endpoint: No suitable endpoint could be found in the service catalog.
Jun 09 10:44:53 kubernetes-cit-kubernetes-cr1-2-1495722764 bash[5121]: W0609 10:44:53.914267    5121 kubelet_node_status.go:899] Failed to set some node status fields: failed to get instances from cloud provider
Jun 09 10:45:03 kubernetes-cit-kubernetes-cr1-2-1495722764 bash[5121]: W0609 10:45:03.935040    5121 openstack_instances.go:48] Failed to find compute endpoint: No suitable endpoint could be found in the service catalog.
Jun 09 10:45:03 kubernetes-cit-kubernetes-cr1-2-1495722764 bash[5121]: W0609 10:45:03.935101    5121 kubelet_node_status.go:899] Failed to set some node status fields: failed to get instances from cloud provider
Jun 09 10:45:13 kubernetes-cit-kubernetes-cr1-2-1495722764 bash[5121]: W0609 10:45:13.959051    5121 openstack_instances.go:48] Failed to find compute endpoint: No suitable endpoint could be found in the service catalog.

@dchen1107
Copy link
Member

dchen1107 commented Jun 9, 2017

I incline to accept this for 1.7 because

  1. a regression,
  2. have some significant impact to the users
  3. looks like the change shouldn't affect others (need to verify if no other pieced use that new revision)

@idvoretskyi WDYT? I marked this for 1.7, but it requires your approver. Thanks!

cc/ @kubernetes/kubernetes-release-managers

@dchen1107 dchen1107 added this to the v1.7 milestone Jun 9, 2017
@xrl
Copy link

xrl commented Jun 9, 2017

@mikebryant that error is from an unpatched kubelet, correct? That is the error which prompted me to fix the bug upstream.

@mikebryant
Copy link
Contributor

@xrl Yeh (well, it's the coreos hyperkube distribution, but I'm not aware of them patching anything)

@dims
Copy link
Member

dims commented Jun 9, 2017

@dchen1107 on behalf of @kubernetes/sig-openstack-misc SIG, yes, let us please merge referenced PR #45545

dims pushed a commit to dims/kubernetes that referenced this issue Feb 8, 2018
…cloud-bump

Automatic merge from submit-queue (batch tested with PRs 46678, 45545, 47375)

update gophercloud/gophercloud dependency

**What this PR does / why we need it**:

**Which issue this PR fixes** 
fixes kubernetes#44461

**Special notes for your reviewer**:

**Release note**:

```release-note
update gophercloud/gophercloud dependency for reauthentication fixes
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/openstack Issues or PRs related to openstack provider
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants