From 5839bd349bc34ee73a2bec9f079abe08042f8847 Mon Sep 17 00:00:00 2001
From: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
Date: Wed, 24 Aug 2022 17:15:50 -0400
Subject: [PATCH] Enhance and update documentation (#351)

Change-Id: I25de039eea653e95a712f9d8450f14e77f16452f
---
 CHANGELOG/CHANGELOG-0.2.md              |  21 ++++-
 README.md                               |  34 +++++--
 docs/concepts/README.md                 |  11 +--
 docs/concepts/cluster_queue.md          | 113 +++++++++++++++++++-----
 docs/concepts/local_queue.md            |  17 ++++
 docs/concepts/queue.md                  |   9 --
 docs/concepts/workload.md               |  13 ++-
 docs/setup/install.md                   |  14 +++
 docs/tasks/administer_cluster_quotas.md |   8 +-
 docs/tasks/run_jobs.md                  |   2 +
 10 files changed, 188 insertions(+), 54 deletions(-)
 create mode 100644 docs/concepts/local_queue.md
 delete mode 100644 docs/concepts/queue.md

diff --git a/CHANGELOG/CHANGELOG-0.2.md b/CHANGELOG/CHANGELOG-0.2.md
index 5634b0d8e6..ad7ad2b1f3 100644
--- a/CHANGELOG/CHANGELOG-0.2.md
+++ b/CHANGELOG/CHANGELOG-0.2.md
@@ -2,8 +2,25 @@
 
 Changes since `v0.1.0`:
 
-- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be
-  retried after a transient error.
+### Features
 - Bumped the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported and Queue is now named LocalQueue.
+- Add webhooks to validate and add defaults to all kueue APIs.
+- Support [codependent resources](/docs/concepts/cluster_queue.md#codepedent-resources)
+  by assigning the same flavor to codependent resources in a pod set.
+- Support [pod overhead](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/)
+  in Workload pod sets.
+- Default requests to limits if requests are not set in a Workload pod set, to
+  match internal defaulting for k8s Pods.
 - Added [prometheus metrics](/docs/reference/metrics.md) to monitor health of
   the system and the status of ClusterQueues.
+
+### Bug fixes
+
+- Prevent Workloads that don't match the ClusterQueue's namespaceSelector from
+  blocking other Workloads in a StrictFIFO ClusterQueue.
+- Fixed number of pending workloads in a BestEffortFIFO ClusterQueue.
+- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be
+  retried after a transient error.
+- Fixed requeuing an out-of-date workload when failed to admit it.
+- Fixed bug in a BestEffortFIFO ClusterQueue where unadmissible workloads
+  were not removed from the ClusterQueue when removing the corresponding Queue.
diff --git a/README.md b/README.md
index 51f966e520..db60a047df 100644
--- a/README.md
+++ b/README.md
@@ -5,6 +5,20 @@ Kueue is a set of APIs and controller for [job](docs/concepts/workload.md)
 a job should be [admitted](docs/concepts#admission) to start (as in pods can be
 created) and when it should stop (as in active pods should be deleted).
 
+## Why use Kueue
+
+Kueue is a lean controller that you can install on top of a vanilla Kubernetes
+cluster without replacing any components. It is compatible with cloud
+environments where:
+- Nodes and other compute resources can be scaled up and down.
+- Compute resources are heterogeneous (in architecture, availability, price, etc.).
+
+Kueue APIs allow you to express:
+- Quotas and policies for fair sharing among tenants.
+- Resource fungibility: if a [resource flavor](docs/concepts/cluster_queue.md#resourceflavor-object)
+  is fully utilized, run the [job](docs/concepts/workload.md) using a different
+  flavor.
+
 The main design principle for Kueue is to avoid duplicating mature functionality
 in [Kubernetes components](https://kubernetes.io/docs/concepts/overview/components/)
 and well-established third-party controllers. Autoscaling, pod-to-node scheduling and
@@ -12,14 +26,6 @@ job lifecycle management are the responsibility of cluster-autoscaler,
 kube-scheduler and kube-controller-manager, respectively. Advanced
 admission control can be delegated to controllers such as [gatekeeper](https://github.com/open-policy-agent/gatekeeper).
 
-<!-- TODO(#64) Remove links to google docs once the contents have been migrated to this repo -->
-Learn more by reading the design docs:
-- [bit.ly/kueue-apis](https://bit.ly/kueue-apis) (please join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch)
-to get access) discusses the API proposal and a high-level description of how it
-operates.
-- [bit.ly/kueue-controller-design](https://bit.ly/kueue-controller-design)
-presents the detailed design of the controller.
-
 ## Installation
 
 **Requires Kubernetes 1.22 or newer**.
@@ -52,6 +58,18 @@ Learn more about:
 - Kueue [concepts](docs/concepts).
 - Common and advanced [tasks](docs/tasks).
 
+## Architecture
+
+<!-- TODO(#64) Remove links to google docs once the contents have been migrated to this repo -->
+
+Learn more about the architecture of Kueue in the design docs:
+
+- [bit.ly/kueue-apis](https://bit.ly/kueue-apis) (please join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch)
+to get access) discusses the API proposal and a high-level description of how it
+operates.
+- [bit.ly/kueue-controller-design](https://bit.ly/kueue-controller-design)
+presents the detailed design of the controller.
+
 ## Community, discussion, contribution, and support
 
 Learn how to engage with the Kubernetes community on the [community page](http://kubernetes.io/community/).
diff --git a/docs/concepts/README.md b/docs/concepts/README.md
index 5526ce770e..0a2c9bcbff 100644
--- a/docs/concepts/README.md
+++ b/docs/concepts/README.md
@@ -10,7 +10,7 @@ abstractions that Kueue uses to represent your cluster and workloads.
 A cluster-scoped resource that governs a pool of resources, defining usage
 limits and fair sharing rules.
 
-### [Queue](queue.md)
+### [Local Queue](local_queue.md)
 
 A namespaced resource that groups closely related workloads belonging to a
 single tenant.
@@ -30,11 +30,12 @@ models, etc.
 
 ### Admission
 
-The process of admitting a workload to start (pods to be created). A workload
+The process of admitting a Workload to start (pods to be created). A Workload
 is admitted by a ClusterQueue according to the available resources and gets
-resource flavors assigned for each requested resource. Sometimes referred to
-as _workload scheduling_ or _job scheduling_ (not to be confused with
-[pod scheduling](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/)).
+resource flavors assigned for each requested resource.
+
+Sometimes referred to as _workload scheduling_ or _job scheduling_
+(not to be confused with [pod scheduling](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/)).
 
 ### [Cohort](cluster_queue.md#cohort)
 
diff --git a/docs/concepts/cluster_queue.md b/docs/concepts/cluster_queue.md
index edd0b22c34..3c612f9cee 100644
--- a/docs/concepts/cluster_queue.md
+++ b/docs/concepts/cluster_queue.md
@@ -1,8 +1,9 @@
 # Cluster Queue
 
-A `ClusterQueue` is a cluster-scoped object that governs a pool of resources
+A ClusterQueue is a cluster-scoped object that governs a pool of resources
 such as CPU, memory and hardware accelerators. A `ClusterQueue` defines:
-- The resource _flavors_ that it manages, with usage limits and order of consumption.
+- The [resource _flavors_](#resourceflavor-object) that it manages, with usage
+  limits and order of consumption.
 - Fair sharing rules across the tenants of the cluster.
 
 Only [cluster administrators](/docs/tasks#batch-administrator) should create `ClusterQueue` objects.
@@ -35,6 +36,74 @@ This ClusterQueue admits [workloads](workload.md) if and only if:
 
 You can specify the quota as a [quantity](https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/).
 
+## Resources
+
+In a ClusterQueue, you can define quotas for multiple [compute resources](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-types)
+(cpu, memory, GPUs, etc.).
+
+For each resource, you can define quotas for multiple _flavors_. A
+flavor represents different variations of a resource. The variations can be
+defined in a [ResourceFlavor object](#resourceflavor-object).
+
+In a process called [admission](.#admission), Kueue assigns
+[Workload pod sets](workload.md#pod-sets) a flavor for each resource it requests.
+Kueue assigns the first flavor in the ClusterQueue's `.spec.resources[*].flavors`
+list that has enough unused `min` quota in the ClusterQueue or the
+ClusterQueue's [cohort](#cohort).
+
+### Codepedent resources
+
+It is possible that multiple resources are tied to the same flavors. This is
+typical for `cpu` and `memory`, where the flavors are generally tied to a
+machine family or availability guarantees.
+
+If this is the case, the resources in the ClusterQueue must list the same
+flavors in the same order. When two or more resources match their flavors,
+they are said to be codependent. During admission, for each pod set in a
+Workload, Kueue assigns the same flavor to the codependent resources that the
+pod set requests.
+
+An example of a ClusterQueue with codependent resources looks like the following:
+
+```yaml
+apiVersion: kueue.x-k8s.io/v1alpha1
+kind: ClusterQueue
+metadata:
+  name: cluster-total
+spec:
+  namespaceSelector: {}
+  resources:
+  - name: "cpu"
+    flavors:
+    - name: spot
+      quota:
+        min: 18
+    - name: on_demand
+      quota:
+        min: 9
+  - name: "memory"
+    flavors:
+    - name: spot
+      quota:
+        min: 72Gi
+    - name: on_demand
+      quota:
+        min: 36Gi
+  - name: "gpu"
+    flavors:
+    - name: vendor1
+      quota:
+        min: 10
+    - name: vendor2
+      quota:
+        min: 10
+```
+
+In the example above, `cpu` and `memory` are codependent resources, while `gpu`
+is independent.
+
+If two resources are not codependent, they must not have any flavors in common.
+
 ## Namespace selector
 
 You can limit which namespaces can have workloads admitted in the ClusterQueue
@@ -81,7 +150,7 @@ Resources in a cluster are typically not homogeneous. Resources could differ in:
 - architecture (ex: x86 vs ARM CPUs)
 - brands and models (ex: Radeon 7000 vs Nvidia A100 vs T4 GPUs)
 
-A `ResourceFlavor` is an object that represents these variations and allows you
+A ResourceFlavor is an object that represents these variations and allows you
 to associate them with node labels and taints.
 
 **Note**: If your cluster is homogeneous, you can use an [empty ResourceFlavor](#empty-resourceflavor)
@@ -102,13 +171,8 @@ taints:
   value: "true"
 ```
 
-You can use the `.metadata.name` to reference a flavor from a ClusterQueue in
-the `.spec.resources[*].flavors[*].name` field.
-
-For each resource of each [pod set](workload.md#pod-sets) in a Workload, Kueue
-assigns the first flavor in the `.spec.resources[*].flavors`
-list that has enough unused quota in the ClusterQueue or the ClusterQueue's
-[cohort](#cohort).
+You can use the `.metadata.name` to reference a ResourceFlavor from a
+ClusterQueue in the `.spec.resources[*].flavors[*].name` field.
 
 ### ResourceFlavor labels
 
@@ -132,9 +196,9 @@ steps:
    didn't specify them already.
 
    For example, for a [batch/v1.Job](https://kubernetes.io/docs/concepts/workloads/controllers/job/),
-   Kueue adds the labels to `.spec.template.spec.nodeSelector`. This guarantees
-   that the workload Pods run on the nodes associated to the flavor that Kueue
-   decided that the workload should use.
+   Kueue adds the labels to the `.spec.template.spec.nodeSelector` field. This
+   guarantees that the workload Pods run on the nodes associated to the flavor
+   that Kueue decided that the workload should use.
 
 ### ResourceFlavor taints
 
@@ -143,8 +207,9 @@ with taints.
 
 Taints on the ResourceFlavor work similarly to [node taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/).
 For Kueue to admit a workload to use the ResourceFlavor, the PodSpecs in the
-workload should have a toleration for it. As opposed to ResourceFlavor labels,
-Kueue will not add tolerations for the flavor taints.
+workload should have a toleration for it. As opposed to the behavior for
+[ResourceFlavor labels](#resourceflavor-labels), Kueue will not add tolerations
+for the flavor taints.
 
 ### Empty ResourceFlavor
 
@@ -173,18 +238,18 @@ ClusterQueue.
 
 ### Flavors and borrowing semantics
 
-When borrowing, Kueue satisfies the following semantics:
+When borrowing, Kueue satisfies the following admission semantics:
 
-- When assigning flavors, Kueue goes through the list of flavors in
-  `.spec.resources[*].flavors`. For each flavor, Kueue attempts to
-  fit the workload using the min quota of the ClusterQueue or the unused
-  min quota of other ClusterQueues in the cohort, up to the max quota of the
-  ClusterQueue. If the workload doesn't fit, Kueue proceeds evaluating the next
+- When assigning flavors, Kueue goes through the list of flavors in the
+  ClusterQueue's `.spec.resources[*].flavors`. For each flavor, Kueue attempts
+  to fit a Workload's pod set using the `min` quota of the ClusterQueue or the
+  unused `min` quota of other ClusterQueues in the cohort, up to the `max` quota
+  of the ClusterQueue. If the workload doesn't fit, Kueue proceeds evaluating the next
   flavor in the list.
-- Borrowing happens per-flavor. A ClusterQueue can only borrow quota of flavors
-  it defines.
+- A ClusterQueue can only borrow quota of flavors it defines and it can only
+  borrow quota for one flavor.
 
-### Example
+### Borrowing example
 
 Assume you created the following two ClusterQueues:
 
diff --git a/docs/concepts/local_queue.md b/docs/concepts/local_queue.md
new file mode 100644
index 0000000000..de984334b3
--- /dev/null
+++ b/docs/concepts/local_queue.md
@@ -0,0 +1,17 @@
+# Local Queue
+
+A `LocalQueue` is a namespaced object that groups closely related workloads
+belonging to a single tenant. A `LocalQueue` points to one [`ClusterQueue`](cluster_queue.md)
+from which resources are allocated to run its workloads.
+
+Users submit jobs to a `LocalQueue`, instead of directly to a `ClusterQueue`.
+Tenants can discover which queues they can submit jobs to by listing the
+local queues in their namespace. The command looks similar to the following:
+
+```sh
+kubectl get -n my-namespace localqueues
+# Alternatively, use the alias `queue` or `queues`
+kubectl get -n my-namespace queues
+```
+
+`queue` and `queues` are aliases for `localqueue`.
diff --git a/docs/concepts/queue.md b/docs/concepts/queue.md
deleted file mode 100644
index d71ad26cfd..0000000000
--- a/docs/concepts/queue.md
+++ /dev/null
@@ -1,9 +0,0 @@
-# Queue
-
-A `Queue` is a namespaced object that groups closely related workloads
-belonging to a single tenant. A `Queue` points to one [`ClusterQueue`](cluster_queue.md)
-from which resources are allocated to run its workloads.
-
-Users submit jobs to a `Queue`, instead of directly to a `ClusterQueue`. This
-allows tenants to discover which queues they can submit jobs to by listing the
-queues in their namespace.
diff --git a/docs/concepts/workload.md b/docs/concepts/workload.md
index 08449d02dc..0b4168831e 100644
--- a/docs/concepts/workload.md
+++ b/docs/concepts/workload.md
@@ -23,6 +23,7 @@ metadata:
   name: sample-job
   namespace: default
 spec:
+  queueName: user-queue
   podSets:
   - count: 3
     name: main
@@ -36,9 +37,13 @@ spec:
             cpu: "1"
             memory: 200Mi
       restartPolicy: Never
-  queueName: user-queue
 ```
 
+## Queue name
+
+To indicate in which [LocalQueue](local_queue.md) you want your Workload to be
+enqueued, set the name of the LocalQueue in the `.spec.queueName` field.
+
 ## Pod sets
 
 A Workload might be composed of multiple Pods with different pod specs.
@@ -63,4 +68,8 @@ of the Job's pod template.
 
 As described previously, Kueue has built-in support for workloads created with
 the Job API. But any custom workload API can integrate with Kueue by
-creating a corresponding Workload object for it.
\ No newline at end of file
+creating a corresponding Workload object for it.
+
+## What's next
+
+- Learn how to [run jobs](/docs/tasks/run_jobs.md).
\ No newline at end of file
diff --git a/docs/setup/install.md b/docs/setup/install.md
index 4577d9d195..75c48de8ac 100644
--- a/docs/setup/install.md
+++ b/docs/setup/install.md
@@ -39,6 +39,20 @@ to scrape metrics from kueue components, run the following command:
 kubectl apply -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/prometheus.yaml
 ```
 
+### Uninstall
+
+To uninstall a released version of Kueue from your cluster, run the following command:
+
+```shell
+VERSION=v0.1.1
+kubectl delete -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml
+```
+
+### Upgrading from 0.1 to 0.2
+
+Upgrading from `0.1.x` to `0.2.y` is not supported due to breaking API changes.
+To install Kueue `0.2.y`, [uninstall](#uninstall) the older version first.
+
 ## Install a custom-configured released version
 
 To install a custom-configured released version of Kueue in your cluster, execute the following steps:
diff --git a/docs/tasks/administer_cluster_quotas.md b/docs/tasks/administer_cluster_quotas.md
index afdfef433c..43a8225b20 100644
--- a/docs/tasks/administer_cluster_quotas.md
+++ b/docs/tasks/administer_cluster_quotas.md
@@ -93,7 +93,7 @@ kubectl apply -f default-flavor.yaml
 The `.metadata.name` matches the `.spec.resources[*].flavors[0].resourceFlavor`
 field in the ClusterQueue.
 
-### 3. Create [Queues](/docs/concepts/queue.md)
+### 3. Create [LocalQueues](/docs/concepts/local_queue.md)
 
 Users cannot directly send [workloads](/docs/concepts/workload.md) to
 ClusterQueues. Instead, users need to send their workloads to a Queue in their
@@ -101,12 +101,12 @@ namespace.
 Thus, for the queuing system to be complete, you need to create a Queue in
 each namespace that needs access to the ClusterQueue.
 
-Write the manifest for the Queue. It should look similar to the following:
+Write the manifest for the LocalQueue. It should look similar to the following:
 
 ```yaml
 # default-user-queue.yaml
 apiVersion: kueue.x-k8s.io/v1alpha1
-kind: Queue
+kind: LocalQueue
 metadata:
   namespace: default
   name: user-queue
@@ -114,7 +114,7 @@ spec:
   clusterQueue: cluster-total
 ```
 
-To create the Queue, run the following command:
+To create the LocalQueue, run the following command:
 
 ```shell
 kubectl apply -f default-user-queue.yaml
diff --git a/docs/tasks/run_jobs.md b/docs/tasks/run_jobs.md
index e846e3c822..b8153b833d 100644
--- a/docs/tasks/run_jobs.md
+++ b/docs/tasks/run_jobs.md
@@ -18,6 +18,8 @@ Make sure the following conditions are met:
 Run the following command to list the Queues available in your namespace.
 
 ```shell
+kubectl -n default get localqueues
+# Or use the 'queues' alias.
 kubectl -n default get queues
 ```