Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating Windows KEP for GA #729

Merged
merged 10 commits into from
Jan 25, 2019
80 changes: 69 additions & 11 deletions keps/sig-windows/20190103-windows-node-support.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,24 @@ authors:
- "@astrieanna"
- "@benmoss"
- "@patricklang"
- "@michmike"
owning-sig: sig-windows
participating-sigs:
- sig-architecture
- sig-node
reviewers:
- sig-architecture
- sig-node
- sig-testing
- sig-release
approvers:
- "@bgrant0607"
- "@michmike"
- "@patricklang"
- "@spiffxp"
editor: TBD
creation-date: 2018-11-29
last-updated: 2019-01-21
last-updated: 2019-01-25
status: provisional
---

Expand Down Expand Up @@ -55,39 +61,52 @@ There is strong interest in the community for adding support for workloads runni

## Motivation

Windows-native workloads still account for a significant portion of the enterprise software space. While containerization technologies emerged first in the UNIX ecosystem, Microsoft has made investments in recent years to enable support for containers in its Windows OS. As users of Windows increasingly turn to containers as the preferred abstraction for running software, the Kubernetes ecosystem stands to benefit by becoming a cross-platform cluster manager.
Windows-based workloads still account for a significant portion of the enterprise software space. While containerization technologies emerged first in the UNIX ecosystem, Microsoft has made investments in recent years to enable support for containers in its Windows OS. As users of Windows increasingly turn to containers as the preferred abstraction for running software and modernizing existing applications, the Kubernetes ecosystem stands to benefit by becoming a cross-platform cluster manager.

### Goals

- Enable users to run nodes on Windows servers
- Enable users to schedule Windows Server containers in Kubernetes through the introduction of support for Windows compute nodes
- Document the differences and limitations compared to Linux
- Test results added to testgrid to prevent regression of functionality
- Create a test suite in testgrid to maintain high quality for this feature and prevent regression of functionality

### Non-Goals

- Adding Windows support to all projects in the Kubernetes ecosystem (Cluster Lifecycle, etc)
- Enable the Kubernetes master components to run on Windows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is supporting LCOW a non goal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now yes. i will clarify


## Proposal

As of 29-11-2018 much of the work for enabling Windows nodes has already been completed. Both `kubelet` and `kube-proxy` have been adapted to work on Windows Server, and so the first goal of this KEP is largely already complete.

### What works today
- Windows-based containers can be created by kubelet, [provided the host OS version matches the container base image](https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/version-compatibility)
- ConfigMap, Secrets: as environment variables or volumes
- Pod (single or multiple containers per Pod with process isolation), Deployment, ReplicaSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's confusing to mention Deployment and ReplicaSet here, and DaemonSet and StatefulSet below. Please discuss all the workload controllers adjacent to one another.

Do Job and CronJob have any issues? If not, please list them with ReplicaSet and Deployment.

- Services types NodePort, ClusterIP, LoadBalancer, and ExternalName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Headless services?

Are there any DNS differences?

- ConfigMap, Secrets: as environment variables or volumes
- Resource limits
- Pod & container metrics
- Pod networking with [Azure-CNI](https://github.com/Azure/azure-container-networking/blob/master/docs/cni.md), [OVN-Kubernetes](https://github.com/openvswitch/ovn-kubernetes), [two CNI meta-plugins](https://github.com/containernetworking/plugins), [Flannel](https://github.com/coreos/flannel) and [Calico](https://github.com/projectcalico/calico)
- Horizontal Pod Autoscaling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are system OOMs reported?

Are there notable differences in Pod Status fields?

- Windows Server 2019 is the only Windows operating system we will support at GA timeframe. Note above that the host operating system version and the container base image need to match. This is a Windows limitation we cannot overcome.
- Customers can deploy a heterogeneous cluster, with Windows and Linux compute nodes side-by-side and schedule Docker containers on both operating systems. Of course, Windows Server containers have to be scheduled on Windows and Linux containers on Linux
- Out-of-tree Pod networking with [Azure-CNI](https://github.com/Azure/azure-container-networking/blob/master/docs/cni.md), [OVN-Kubernetes](https://github.com/openvswitch/ovn-kubernetes), [two CNI meta-plugins](https://github.com/containernetworking/plugins), [Flannel (VXLAN and Host-Gateway)](https://github.com/coreos/flannel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't VXLAN support only in 1903 currently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astrieanna by the time we GA, it will be supported for Server 2019

- Dockershim CRI
- Many<sup id="a1">[1]</sup> of the e2e conformance tests when run with [alternate Windows-based images](https://hub.docker.com/r/e2eteam/) which are being moved to [kubernetes-sigs/windows-testing](https://www.github.com/kubernetes-sigs/windows-testing)
- Persistent storage: FlexVolume with [SMB + iSCSI](https://github.com/Microsoft/K8s-Storage-Plugins/tree/master/flexvolume/windows), and in-tree AzureFile and AzureDisk providers
Copy link
Member

@ddebroy ddebroy Jan 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions/notes from the storage perspective:

  1. Is there some in-tree code that specifically allows AzureDisk to work with Windows that is not present for other similar existing in-tree block/disk backed storage plugins like GCE PD/AWS EBS/etc?

  2. If GCE PD/AWS EBS and others are known to work with Windows workers, can they also be added here (along with Azure Disk) please for clarity?

  3. In the context of the CSI Migration initiative (the effort to have in-tree plugins shim out to CSI versions of the in-tree plugins over a couple of releases so that eventually the in-tree plugin code can be removed), lack of support for CSI node plugins for Windows 2019 may have an impact if EBS/GCE-PD in-tree works with Windows workers today but their CSI counterparts will not in the future (until Windows OS enhancements to support CSI node plugins like mount propagation, privileged containers, etc. are in).

  4. While SMB based storage will be available (through the Flexvolume plugin and AzureFile), can the support for NFS based storage be clarified? For example, are there any plans for a NFS Flexvolume plugin for Windows?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For NFS, just came across kubernetes/kubernetes#56188 (comment). So sounds like NFS [#4 above] is beyond scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will try to get answers to your questions

- Windows Server containers can take advantage of StatefulSet functionality for stateful applications and distributed systems
- Windows Pods can take advantage of DaemonSet, with the exception that privileged containers are not supported on Windows (more on that below)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above you mentioned "Windows server containers" and here "Windows pods". Is there any difference in meaning between the two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no difference. i will update the naming to be consistent.


### What will work eventually
- Group Managed Service Accounts, a way to assign an Active Directory identity to a Windows container, is forthcoming with KEP `Windows Group Managed Service Accounts for Container Identity`
- `kubectl port-forward` hasn't been implemented due to lack of an `nsenter` equivalent to run a process inside a network namespace.
- CRIs other than Dockershim: CRI-containerd support is forthcoming
- Some kubeadm work was done in the past to add Windows nodes to Kubernetes, but that effort has been dormant since. We will need to revisit that work and complete it in the future.
- Calico CNI for Pod networking
- Hyper-V isolation (Currently this is limited to 1 container per Pod and is an alpha feature)
- It is unclear if the RuntimeClass proposal from sig-node will simplify scheduled Windows containers. we will work with sig-node on this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is still not well understood I don't think it needs to be included here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

folks from sig-architecture will likely ask about this, which is why i included here. indicating we will do more work on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My meta-point here is that Windows stable shouldn't require supporting an alpha or beta feature. We should continue working on a plan for this alongside SIG-Node. I think this is ok as-is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be clearer if the section was renamed to "Windows Node Roadmap" to make it explicit that the eventually is beyond the scope of GA

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked about RuntimeClass back in November. :-)

@craiglpeters has a good point. I assume "eventually" is post-GA for all of these?


### What will never work (without underlying OS changes)
- Certain Pod functionality
- Privileged containers
- Privileged containers and other Pod security context privilege and access control settings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked a bunch of other questions on the original KEP PR:
#676 (comment)
#676 (comment)
#676 (comment)
#676 (comment)
#676 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgrant0607 , which linux capabilities specifically do you mean? these ones? https://kubernetes.io/docs/tasks/configure-pod-container/security-context/

- Reservations are not enforced by the OS, but overprovisioning could be blocked with `--enforce-node-allocatable=pods` (pending: tests needed)
- Certain volume mappings
- Single file & subpath volume mounting
Expand All @@ -96,8 +115,9 @@ As of 29-11-2018 much of the work for enabling Windows nodes has already been co
- readOnly root filesystem. Mapped volumes still support readOnly
- Termination Message - these require single file mappings
- CSI plugins, which require privileged containers
- Host networking is not available in Windows
- [Some parts of the V1 API](https://github.com/kubernetes/kubernetes/issues/70604)
- Overlay networking support in Windows Server 1803 is not fully functional using the `win-overlay` CNI plugin. Specifically service IPs do not work on Windows nodes. This is currently specific to `win-overlay` - other CNI plugins (OVS, AzureCNI) work.
- Overlay networking support in Windows Server 1803 is not fully functional using the `win-overlay` CNI plugin. Specifically service IPs do not work on Windows nodes. This is currently specific to `win-overlay`; other CNI plugins (OVS, AzureCNI) work. Since Windows Server 1803 is not supported for GA, this is mostly not applicable. We left it here since it impacts beta

### Relevant resources/conversations

Expand All @@ -110,13 +130,51 @@ As of 29-11-2018 much of the work for enabling Windows nodes has already been co

**Second class support**: Kubernetes contributors are likely to be thinking of Linux-based solutions to problems, as Linux remains the primary OS supported. Keeping Windows support working will be an ongoing burden potentially limiting the pace of development.

**User experience**: Users today will need to use some combination of taints and node selectors in order to keep Linux and Windows workloads separated. In the best case this imposes a burden only on Windows users, but this is still less than ideal.
**User experience**: Users today will need to use some combination of taints and node selectors in order to keep Linux and Windows workloads separated. In the best case this imposes a burden only on Windows users, but this is still less than ideal. The recommended approach is outlined below

## Graduation Criteria
#### Ensuring OS-specific workloads land on appropriate container host
As you can see below, we plan to document how Windows containers can be scheduled on the appropriate host using Taints and Tolerations. All nodes today have the following default labels
- beta.kubernetes.io/os = [windows|linux]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth noting the promotion of these to stable:
kubernetes/kubernetes#72929

- beta.kubernetes.io/arch = [amd64|arm64|...]

If a deployment does not specify a nodeSelector like `"beta.kubernetes.io/os": windows`, it is possible the Pods can be scheduled on any host, Windows or Linux. This can be problematic since a Windows container can only run on Windows and a Linux container can only run on Linux. The best practice we will recommend is to use a nodeSelector.

## Implementation History
However, we understand that in certain cases customers have a pre-existing large number of deployments for Linux containers. Since they will not want to change all deployments to add nodeSelectors, the alternative is to use Taints. Because the kubelet can set Taints during registration, it could easily be modified to automatically add a taint when running on Windows only (`“--register-with-taints=’os=Win1809:NoSchedule’” `). By adding a taint to all Windows nodes, nothing will be scheduled on them (that includes existing Linux Pods). In order for a Windows Pod to be scheduled on a Windows node, it would need both the nodeSelector to choose Windows, and a toleration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just deployments, but also ecosystem off-the-shelf configurations, such as community Helm charts, and programmatic pod generation cases, such as with Operators. I think taints are going to be needed in most cases.

```
nodeSelector:
"beta.kubernetes.io/os": windows
tolerations:
- key: "os"
operator: "Equal"
Value: “Win1809”
effect: "NoSchedule"
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Windows containers are specific to the os version, does it make sense to have the taint/toleration include the windows version? While only 2019 is supported at GA, eventually there will be more versions of windows support (as new Windows versions are released). A version-specific taint could help containers land on the right nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were going to add that in the docs, but i made the change here as well for additional clarity


## Graduation Criteria
- All features and functionality under `What works today` is fully tested and vetted to be working by SIG-Windows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this section complete, or is @craiglpeters still working on it?

My previous comment:
#676 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i made some more edits now that i will pushing through

- SIG-Windows has high confidence to the stability and reliability of Windows Server containers on Kubernetes
- 100% green/passing conformance tests that are applicable to Windows (see the Testing Plan section for details on these tests)
- Comprehensive documentation that includes but is not limited to the following sections. Documentation will reside at https://kubernetes.io/docs
1. Outline of Windows Server containers on Kubernetes
2. Getting Started Guide, including Prerequisites
3. How to deploy Windows nodes in Kubernetes
4. Overview of Networking on Windows
5. Links to documentation on how to deploy and use CNI plugins for Windows (example for OVN - https://github.com/openvswitch/ovn-kubernetes/tree/master/contrib)
6. Links to documentation on how to deploy Windows nodes for public cloud providers or other Kubernetes distributions (example for Rancher - https://rancher.com/docs//rancher/v2.x/en/cluster-provisioning/rke-clusters/windows-clusters/)
7. How to schedule Windows Server containers, including examples
8. Advanced: How to use metrics and the Horizontal Pod Autoscaler
9. Advanced: How to use Group Managed Service Accounts
10. Advanced: How to use Taints and Tolerations for a heterogeneous compute cluster (Windows + Linux)
11. Advanced: How to use Hyper-V isolation (not a stable feature yet)
12. Advanced: How to build Kubernetes for Windows from source
13. Supported functionality (with examples where appropriate)
14. Known Limitations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any node addons work, such as node problem detector?

15. Unsupported functionality
16. Resources for contributing and getting help - Includes troubleshooting help and links to additional troubleshooting guides like https://docs.microsoft.com/en-us/virtualization/windowscontainers/kubernetes/common-problems

## Implementation History
- Alpha was released with Kubernetes v.1.5
- Beta was released with Kubernetes v.1.9

## Testing Plan

Expand Down