Skip to content
This repository has been archived by the owner on Jul 27, 2023. It is now read-only.

Update vagrant to include kubeworkers and refator edge, worker loop #1365

Closed
wants to merge 3 commits into from
Closed

Conversation

andreijs
Copy link
Contributor

Hey guys,

Here is the updated vagrant file for kubeworker.

  • [done ] Installs cleanly on a fresh build of most recent master branch
  • [ done] Upgrades cleanly from the most recent release
  • [ done] Updates documentation relevant to the changes

@@ -26,6 +30,36 @@ else
config_hash = config_hash.merge(YAML.load(File.read(config_path)))
end

def spin_up(config_hash:, config:, server_array:, hostvars:, hosts:, server_type:)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function could use an explaining comment

@langston-barrett
Copy link
Contributor

langston-barrett commented Apr 20, 2016

This looks great, thanks @andreijs! I left just a few comments

"ansible_ssh_host" => ip,
"private_ipv4" => ip,
"public_ipv4" => ip,
"role" => server_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server_type could just be renamed to "role" for consistency

@langston-barrett
Copy link
Contributor

We should also probably add a line for kubeworkers here:
https://github.com/CiscoCloud/mantl/pull/1365/files#diff-23b6f443c01ea2efcb4f36eedfea9089L141

@andreimc
Copy link
Contributor

@siddharthist thanks for your feedback I had made the changes pointed out 👍

consul_package: consul-0.6.3
consul_ui_package: consul-ui-0.6.3
consul_package: consul-0.6.4
consul_ui_package: consul-ui-0.6.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this in a separate PR? Doesn't seem directly related.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't realize I commited this

@langston-barrett
Copy link
Contributor

I got this error on vagrant up:

TASK: [kubernetes-addons | create/update skydns replication controller] ******* 
failed: [control-01] => {"failed": true}
msg: error running kubectl (/bin/kubectl --server=http://localhost:8085/ --namespace=kube-system create --filename=/etc/kubernetes/manifests/skydns-rc.yaml) command (rc=1): Error from server: error when creating "/etc/kubernetes/manifests/skydns-rc.yaml": namespaces "kube-system" not found


FATAL: all hosts have already failed -- aborting

@andreimc
Copy link
Contributor

I get this one some times haven't seen the one you got @siddharthist

TASK: [kubernetes-addons | create/update elasticsearch service] ***************
failed: [control-01] => {"failed": true}
msg: error running kubectl (/bin/kubectl --server=http://localhost:8085/ --namespace=kube-system create --filename=/etc/kubernetes/manifests/es-svc.yaml) command (rc=1): Error from server: error when creating "/etc/kubernetes/manifests/es-svc.yaml": Internal error occurred: failed to allocate a serviceIP: cannot allocate resources of type serviceipallocations at this time

@andreimc
Copy link
Contributor

Everything works for me apart from Kube UI and nginx-consul starting on kubeworker ref #1346

PLAY RECAP ********************************************************************
mesos | wait for zookeeper service to be registered -------------------- 38.28s
common | install system utilities -------------------------------------- 37.09s
consul | wait for leader ----------------------------------------------- 30.54s
kubernetes | pull hyperkube docker image ------------------------------- 24.81s
kubernetes-master | wait for apiserver to come up ---------------------- 16.20s
etcd | restart skydns -------------------------------------------------- 14.56s
kubernetes-master | download kubernetes binaries ----------------------- 14.36s
mantlui | ensure nginx-mantlui docker image is present ----------------- 12.17s
kubernetes-node | download kubernetes binaries ------------------------- 12.07s
zookeeper | install zookeepercli package ------------------------------- 10.47s
control-01                 : ok=271  changed=206  unreachable=0    failed=0
edge-001                   : ok=122  changed=91   unreachable=0    failed=0
kubeworker-001             : ok=152  changed=106  unreachable=0    failed=0
localhost                  : ok=0    changed=0    unreachable=0    failed=0
worker-001                 : ok=123  changed=88   unreachable=0    failed=0

➜  mantl (master) ✔ vagrant ssh kubeworker-001
No vagrant-config.yml found, using defaults
Last login: Wed Apr 20 23:54:41 2016 from control-01
[vagrant@kubeworker-001 ~]$

@langston-barrett
Copy link
Contributor

I got another error this time:

TASK: [kubernetes-addons | create/update grafana service] ********************* 
failed: [control-01] => {"failed": true}
msg: error running kubectl (/bin/kubectl --server=http://localhost:8085/ --namespace=kube-system create --filename=/etc/kubernetes/manifests/grafana-service.yaml) command (rc=1): Error from server: error when creating "/etc/kubernetes/manifests/grafana-service.yaml": Internal error occurred: failed to allocate a serviceIP: cannot allocate resources of type serviceipallocations at this time


FATAL: all hosts have already failed -- aborting

@andreimc
Copy link
Contributor

@siddharthist can you give me your machine specs OS, ansible version etc ?

@langston-barrett
Copy link
Contributor

@andreimc I have Vagrant 1.8.1 and Oracle VM VirtualBox Manager 5.0.16_OSE. My host's version of ansible doesn't/shouldn't affect anything, the VMs are provisioned from the control node.

@andreimc
Copy link
Contributor

@siddharthist I really don't know why it fails, me and a co-worker both tried to spin it up and it worked fine, Vagrant file updates should not really cause ansible to fail ... maybe get someone else to try it.

@ryane ryane modified the milestone: 1.1 Apr 22, 2016
@ryane
Copy link
Contributor

ryane commented Apr 26, 2016

I'm also having a lot of trouble getting kubernetes to run on Vagrant. Provisioning fails intermittently with various errors. Here are a couple I have seen repeatedly:

TASK: [kubernetes-addons | create or update dashboard] ************************
failed: [control-01] => {"failed": true}
msg: error running kubectl (/bin/kubectl --server=http://localhost:8085/ --namespace=kube-system create --filename=/etc/kubernetes/manifests/kubernetes-dashboard.yaml) command (rc=1): You have exposed your service on an external port on all nodes in your
cluster.  If you want to expose this service to the external internet, you may
need to set up firewall rules for the service port(s) (tcp:30000) to serve traffic.

See http://releases.k8s.io/release-1.2/docs/user-guide/services-firewalls.md for more details.
service "kubernetes-dashboard" created
TASK: [kubernetes-addons | create or update dashboard] ************************
failed: [control-01] => {"failed": true}
msg: error running kubectl (/bin/kubectl --server=http://localhost:8085/ --namespace=kube-system create --filename=/etc/kubernetes/manifests/kubernetes-dashboard.yaml) command (rc=1): replicationcontroller "kubernetes-dashboard" created

Repeated provisioning attempts might ultimately complete but still seeing various problems with Kubernetes:

  1. UI not accessible

  2. No nodes registered

    $ kubectl get nodes
    
    # no results
    
  3. Errors running kubectl

    $ kubectl get po
    Error from server: an error on the server has prevented the request from succeeding
    

@BrianHicks @Zogg any ideas on this?

@SillyMoo
Copy link

I get the same issue, but if I re-run with a 'vagrant provision' it all springs to life. Looks like a timing issue to me (I know that the ansible scripts wait for hyperkube to be pulled, but do they start for it to actually be up and listening?).

@SillyMoo
Copy link

Ok, I tell a bit of a lie. Ansible finishes ok, hyperkube is running and I see a node in kubectl. However I can't actually get kubernetes to pull any images (the pod just sits there, no image pull events, and no sign on the kubeworker that any images are being pulled).

@andreimc
Copy link
Contributor

With latest master merged in it fails to restart skydns, not sure what would be causing it It just hangs for a while then I get the following error message:

NOTIFIED: [dnsmasq | restart dnsmasq] *****************************************
changed: [control-01]
changed: [kubeworker-001]
changed: [edge-001]

PLAY [role=worker] ************************************************************

TASK: [mesos | install mesos packages] ****************************************
FATAL: no hosts matched or all hosts have already failed -- aborting

Not sure why.

@stevendborrelli
Copy link
Contributor

Docker fails on this

TASK: [docker | enable docker] ************************************************ 
failed: [control-01] => {"failed": true}
msg: Job for docker.service failed because a configured resource limit was exceeded. See "systemctl status docker.service" and "journalctl -xe" for details.

But this is due to the new docker implementation not creating a /etc/sysconfig/mantl-storage file on non-lvm based systems and not a problem with this PR.

@andreimc
Copy link
Contributor

andreimc commented May 4, 2016

The problems with this will be fixed after #1409 and #1410 are merged in.

@langston-barrett
Copy link
Contributor

@andreijs @andreimc Can you rebase this? Both those PRs have been merged.

@andreimc
Copy link
Contributor

andreimc commented May 4, 2016

@siddharthist up to date.

@ryane
Copy link
Contributor

ryane commented May 4, 2016

Had a successful build but I am back to

Internal Server Error (500)

Get https://10.254.0.1:443/api/v1/replicationcontrollers: dial tcp 10.254.0.1:443: getsockopt: connection refused

when trying to access the Kubernetes UI. 10.254.0.1 is the cluster ip for the kubernetes service.

kubectl get svc --namespace=default
NAME         CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   10.254.0.1   <none>        443/TCP   1h

Logs from the kubernetes-dashboard pod's container:

2016/05/04 12:50:26 Incoming HTTP/1.0 GET /api/v1/replicationcontrollers request from 10.10.99.1:33787
2016/05/04 12:50:26 Getting list of all replication controllers in the cluster
2016/05/04 12:50:26 Get https://10.254.0.1:443/api/v1/replicationcontrollers: dial tcp 10.254.0.1:443: getsockopt: connection refused
2016/05/04 12:50:26 Outcoming response to 10.10.99.1:33787 with 500 status code

@andreimc
Copy link
Contributor

andreimc commented May 7, 2016

hey guys, I had some time span this up in vagrant, I still get 502 for kube ui. :(, asnible ran ok tho.

@andreimc
Copy link
Contributor

andreimc commented May 7, 2016

I get the following on control-01 when i try to list pods:

[vagrant@control-01 ~]$ kubectl --namespace kube-system get pods
Error from server: an error on the server has prevented the request from succeeding

@ryane ryane modified the milestones: 1.2, 1.1 May 12, 2016
@manishrajkarnikar
Copy link

@andreimc @siddharthist curious how is file groups_var/all/kubernetes_vars.yml read in vagrant run? or is it required at all.

@Zogg
Copy link
Contributor

Zogg commented May 16, 2016

@manishrajkarnikar groups_var/all/kubernetes_vars.yml usage in local ansible run shouldn't be different from the remote case.
As far as I remember yes, the kubernetes_vars.yml was mandatory to have the kubernetes roles play nice.

@manishrajkarnikar
Copy link

@Zogg I don't see it being mentioned in the vagrant file. I added that in my vagrant file as raw parameter and I was able to get K8s multi node cluster up and running. I couldn't get single master and slave node going though probably because of bug reported in vagrant file.

@andreijs
Copy link
Contributor Author

Opening new PR of a branch closing this

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants