Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico with plain containerd on Windows does not work #4334

Closed
jnummelin opened this issue Jan 22, 2021 · 10 comments
Closed

Calico with plain containerd on Windows does not work #4334

jnummelin opened this issue Jan 22, 2021 · 10 comments

Comments

@jnummelin
Copy link

I'm trying to setup Calico on Windows with plain containerd CRI. (no Docker on top) Calico fails to configure networking on the pods.

Expected Behavior

Calico should be able to create networks for plain containerd based pods

Current Behavior

Calico and CNI seems to be ready:

time="2021-01-22 13:37:52" level=info msg="I0122 13:37:52.924273    6260 kubelet.go:2160] Container runtime status: Runtime Conditions: RuntimeReady=true reason: message:, NetworkReady=true reason: message:" component=kubelet.exe

But pod creation fails:

time="2021-01-22 13:38:26" level=info msg="E0122 13:38:26.729265    6260 remote_runtime.go:116] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to setup network for sandbox \"032dbbc728e4758824cc91472b9295cee9bd6f4eec5cf1a275045083d3d16216\": Endpoint 032dbbc728e4758824cc91472b9295cee9bd6f4eec5cf1a275045083d3d16216_Calico not found" component=kubelet.exe

Containerd is configured to looks for CNI where Calico installer script puts it in:

    [plugins."io.containerd.grpc.v1.cri".cni]
      bin_dir = "C:\\k\\cni"
      conf_dir = "C:\\k\\cni\\config"

Kubelet started as:

C:\var\lib\k0s\bin\kubelet.exe --root-dir=C:\var\lib\k0s\kubelet --v=5 --container-runtime-endpoint=npipe:////.//pipe//containerd-containerd --cni-bin-dir=C:\k\cni --hairpin-mode=promiscuous-bridge --cert-dir=C:\var\lib\k0s\kubelet_certs --cgroups-per-qos=false --pod-infra-container-image=mcr.microsoft.com/oss/kubernetes/pause:1.4.1 --resolv-conf= --config=C:\var\lib\k0s\kubelet-config.yaml --bootstrap-kubeconfig=C:\var\lib\k0s\kubelet-bootstrap.conf --container-runtime=remote --kubeconfig=C:\var\lib\k0s\kubelet.conf --hostname-override=ip-172-31-41-146.eu-north-1.compute.internal --network-plugin=cni --cni-conf-dir=C:\k\cni\config --cluster-domain=cluster.local --enforce-node-allocatable=

Calico node logs:

Kubelet has (re)started, (re)initialising the node...
2021-01-22 13:13:57.521 [INFO][5200] startup/startup.go 379: Early log level set to info
2021-01-22 13:13:57.556 [INFO][5200] startup/startup.go 395: Using NODENAME environment for node name
2021-01-22 13:13:57.557 [INFO][5200] startup/startup.go 407: Determined node name: ip-172-31-41-146.eu-north-1.compute.internal
2021-01-22 13:13:57.559 [INFO][5200] startup/startup.go 114: Skipping datastore connection test
2021-01-22 13:13:57.592 [INFO][5200] startup/startup.go 759: Using autodetected IPv4 address on interface vEthernet (Ethernet 2): 172.31.41.146/20
2021-01-22 13:13:57.592 [INFO][5200] startup/startup.go 836: No AS number configured on node resource, using global value
2021-01-22 13:13:57.605 [INFO][5200] startup/startup.go 951: Selected default IP pool is '192.168.0.0/16'
2021-01-22 13:13:57.612 [INFO][5200] startup/startup.go 217: Using node name: ip-172-31-41-146.eu-north-1.compute.internal
2021-01-22 13:13:57.612 [INFO][5200] startup/startup_windows.go 66: Backend networking is vxlan, ensure vxlan network.
2021-01-22 13:13:57.613 [INFO][5200] startup/ipam.go 1804: Ensure block for host ip-172-31-41-146.eu-north-1.compute.internal, ipv4 attr &{3 1 windows-reserved-ipam-handle windows host rsvd} ipv6 attr <nil>
2021-01-22 13:13:57.615 [INFO][5200] startup/ipam.go 1875: Looking up existing affinities for host host="ip-172-31-41-146.eu-north-1.compute.internal"
2021-01-22 13:13:57.620 [INFO][5200] startup/ipam.go 346: Looking up existing affinities for host host="ip-172-31-41-146.eu-north-1.compute.internal"
2021-01-22 13:13:57.623 [INFO][5200] startup/ipam.go 429: Trying affinity for 10.244.11.128/26 host="ip-172-31-41-146.eu-north-1.compute.internal"
2021-01-22 13:13:57.625 [INFO][5200] startup/ipam.go 140: Attempting to load block cidr=10.244.11.128/26 host="ip-172-31-41-146.eu-north-1.compute.internal"
2021-01-22 13:13:57.627 [INFO][5200] startup/ipam.go 217: Affinity is confirmed and block has been loaded cidr=10.244.11.128/26 host="ip-172-31-41-146.eu-north-1.compute.internal"
2021-01-22 13:13:57.627 [INFO][5200] startup/ipam.go 1901: Host's block '10.244.11.128/26'  host="ip-172-31-41-146.eu-north-1.compute.internal"
2021-01-22 13:13:57.639 [INFO][5200] startup/dataplane_windows.go 326: Attempting to create HNSNetwork {"Name":"Calico","Type":"Overlay","Subnets":[{"AddressPrefix":"10.244.11.128/26","GatewayAddress":"10.244.11.129","Policies":[{"Type":"VSID","VSID":4096}]}]} subnet="10.244.11.128/26"
2021-01-22 13:13:57.697 [INFO][5200] startup/dataplane_windows.go 334: Waiting to get ManagementIP from HNSNetwork Calico subnet="10.244.11.128/26"
2021-01-22 13:13:58.201 [INFO][5200] startup/dataplane_windows.go 345: Waiting to get net interface for HNSNetwork Calico (172.31.41.146) subnet="10.244.11.128/26"
2021-01-22 13:13:59.017 [INFO][5200] startup/dataplane_windows.go 354: Created HNSNetwork Calico subnet="10.244.11.128/26"
2021-01-22 13:13:59.030 [INFO][5200] startup/dataplane_windows.go 528: Created HNSEndpoint Calico_ep subnet="10.244.11.128/26"
2021-01-22 13:13:59.030 [INFO][5200] startup/startup_windows.go 108: Ensure network is done.
Calico node initialisation succeeded; monitoring kubelet for restarts...

Attached longer log from kubelet with --v=5 log level around the time a pod is submitted for the node.
calico.log

Possible Solution

N/A

Steps to Reproduce (for bugs)

  1. Install & run containerd
  2. Install calico with slightly adapted installer script at: https://github.com/k0sproject/k0s/blob/main/pkg/component/worker/calico_installer_windows.go#L156
    (mostly just dropping the docker image build etc stuff which are unneeded for containerd)
  3. run k0s (it manages kubelet etc. processes)

Your Environment

  • Calico version
PS C:\Users\Administrator> C:\CalicoWindows\calico-node.exe -v
v3.17.0
  • Orchestrator version (e.g. kubernetes, mesos, rkt):
Kubernetes 1.20.0
  • Operating System and version:
PS C:\Users\Administrator> Get-WmiObject -class Win32_OperatingSystem


SystemDirectory : C:\Windows\system32
Organization    : Amazon.com
BuildNumber     : 17763
RegisteredUser  : EC2
SerialNumber    : 00430-00000-00000-AA611
Version         : 10.0.17763
@song-jiang song-jiang self-assigned this Jan 22, 2021
@song-jiang
Copy link
Member

song-jiang commented Jan 22, 2021

Thanks @jnummelin for the logs. I seems the log contains information for pod deletion (not pod creation). I am wondering if you can rerun your test and attach entire kubelet logs from very beginning?

@jnummelin
Copy link
Author

in the log third line it says: Status Manager: adding pod: ..., this is the point where I've submitted an IIS pod onto the node. There's lot of pod/sandbox deletions as kubelet retries to re-create the pod.

@jnummelin
Copy link
Author

jnummelin commented Jan 22, 2021

k0s-calico.log

@song-jiang this log is from the start of everything (containerd, kubelet etc.)

@song-jiang
Copy link
Member

Not sure what is going on but from the log there is no process for sandbox creation. Calico IPAM cni is not being called and Calico CNI is called for pod deletion.

@jnummelin
Copy link
Author

jnummelin commented Jan 25, 2021

@song-jiang is there any more info I could dig out from the env? I can bump up the log levels if needed too.

This feels it could be related: containerd/containerd#4851

Kubelet is started with proper CNI flags:

--cni-bin-dir=C:\k\cni --cni-conf-dir=C:\k\cni\config

The CNI config is identical to neighbouring node with Docker which works perfectly:

PS C:\Users\Administrator> Get-Content -Path C:\k\cni\config\10-calico.conf
{
  "name": "Calico",
  "windows_use_single_network": true,

  "cniVersion": "0.3.1",
  "type": "calico",
  "mode": "vxlan",

  "vxlan_mac_prefix":  "0E-2A",
  "vxlan_vni": 4096,

  "policy": {
    "type": "k8s"
  },

  "log_level": "debug",

  "capabilities": {"dns": true},

  "DNS":  {
    "Nameservers":  ["10.96.0.10"],
    "Search":  [
      "svc.cluster.local"
    ]
  },

  "nodename_file": "C:\\CalicoWindows\\libs\\calico\\..\\..\\nodename",

  "datastore_type": "kubernetes",

  "etcd_endpoints": "<your etcd endpoints>",
  "etcd_key_file": "<your etcd key>",
  "etcd_cert_file": "<your etcd cert>",
  "etcd_ca_cert_file": "<your etcd ca cert>",

  "kubernetes": {
    "kubeconfig": "C:\\CalicoWindows\\calico-kube-config"
  },

  "ipam": {
    "type": "calico-ipam",
    "subnet": "usePodCidr"
  },

  "policies":  [
    {
      "Name":  "EndpointPolicy",
      "Value":  {
        "Type":  "OutBoundNAT",
        "ExceptionList":  [
          "10.96.0.0/12"
        ]
      }
    },
    {
      "Name":  "EndpointPolicy",
      "Value":  {
        "Type":  "ROUTE",
        "DestinationPrefix":  "10.96.0.0/12",
        "NeedEncap":  true
      }
    }
  ]
}

@song-jiang
Copy link
Member

@jnummelin I don't think I would need more information from your cluster. It looks like there is no quick fix for containerd support. Thanks!

@jnummelin
Copy link
Author

@song-jiang so you confirm that this is the same containerd issue I linked?

@song-jiang
Copy link
Member

I would think so.

@jnummelin jnummelin changed the title Calico with plain containerd on Windows does not Calico with plain containerd on Windows does not work Feb 22, 2021
@lmm
Copy link
Contributor

lmm commented May 20, 2021

@jnummelin We've added containerd support to v3.19.0 🙂 https://docs.projectcalico.org/release-notes/#windows-data-plane-support-for-containerd

Please try it out with k0s again and let us know if you run into any more issues, thanks!

@lmm lmm closed this as completed May 20, 2021
@jnummelin
Copy link
Author

@lmm that's good news. I'll take it on our todo list to try it out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants