-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubectl scale implementation for core workload controllers #49504
Comments
This should allow |
Currently, we have ScaleInterface being implemented by each group's client. So, extending it to arbitrary types would need a new generic |
+1 for generic |
There have been longer discussions of this in the past, but there are two
factors at play:
* no matter what, the schema version of Scale as exposed by the sub
resource needs to be locked in time and follow the versioning rules of its
group (no breaking changes)
* we believe that the content negotiation mechanism that was part of 1.7 is
the appropriate way to allow generic clients to request a known Scale type
( Proposal: Alternate API representations for resource
<kubernetes/community#123>)
What we haven't done is settle on where generic types like Scale should go
- a "server" group (is there only one Scale interface in all of Kube?), or
a net new group per use case. The former is easier, but could lead to
bloat in the core Kube API and suffers from the rate limit imposed on the
fundamental API. The latter is more complicated.
…On Mon, Jul 24, 2017 at 8:15 PM, Jun Xiang Tee ***@***.***> wrote:
+1 for generic Scale interface.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#49504 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p5yEDYiDxX8Hkgvnaxugx9eB7hGrks5sRTOZgaJpZM4Ohaz4>
.
|
I lean slightly towards adding "Scale" to another group (like "autoscaling"
or a net new empty group)
On Tue, Jul 25, 2017 at 1:00 AM, Clayton Coleman <[email protected]>
wrote:
… There have been longer discussions of this in the past, but there are two
factors at play:
* no matter what, the schema version of Scale as exposed by the sub
resource needs to be locked in time and follow the versioning rules of its
group (no breaking changes)
* we believe that the content negotiation mechanism that was part of 1.7
is the appropriate way to allow generic clients to request a known Scale
type ( Proposal: Alternate API representations for resource
<kubernetes/community#123>)
What we haven't done is settle on where generic types like Scale should go
- a "server" group (is there only one Scale interface in all of Kube?), or
a net new group per use case. The former is easier, but could lead to
bloat in the core Kube API and suffers from the rate limit imposed on the
fundamental API. The latter is more complicated.
On Mon, Jul 24, 2017 at 8:15 PM, Jun Xiang Tee ***@***.***>
wrote:
> +1 for generic Scale interface.
>
> —
> You are receiving this because you are on a team that was mentioned.
> Reply to this email directly, view it on GitHub
> <#49504 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABG_p5yEDYiDxX8Hkgvnaxugx9eB7hGrks5sRTOZgaJpZM4Ohaz4>
> .
>
|
cc: @DirectXMan12 |
I think scale should live in autoscaling. I look at Scale like an interface, and groups like packages in this case -- it makes sense to me for the "interfaces" required for autoscaling to live in the autoscaling "package" with the "concrete types" (HPA). |
For future reference, there's already a copy of scale there ;-). |
and |
Doesn't sound right. Autoscaling is only one way of scaling an object. |
sure, but the scale subresource is mainly used for autoscaling -- that's mainly why it exists so "automated" processes can scale different resources without having specific knowledge of them. |
What must be done here? The proposal for subresources for CustomResources (kubernetes/community#913) is blocked on where a unified |
@DirectXMan12 any update on polymorphic scale? |
I think there is still open discussion on placement. I would propose that we should have one representation of scale for serving and stateful workloads. This should live in the apps/v1 group version and should have the existing signature. // ScaleSpec describes the attributes of a scale subresource
type ScaleSpec struct {
// desired number of instances for the scaled object.
// +optional
Replicas int32 `json:"replicas,omitempty" protobuf:"varint,1,opt,name=replicas"`
}
// ScaleStatus represents the current status of a scale subresource.
type ScaleStatus struct {
// actual number of observed instances of the scaled object.
Replicas int32 `json:"replicas" protobuf:"varint,1,opt,name=replicas"`
// label query over pods that should match the replicas count. More info: http://kubernetes.io/docs/user-guide/labels#label-selectors
// +optional
Selector map[string]string `json:"selector,omitempty" protobuf:"bytes,2,rep,name=selector"`
// label selector for pods that should match the replicas count. This is a serializated
// version of both map-based and more expressive set-based selectors. This is done to
// avoid introspection in the clients. The string will be in the same format as the
// query-param syntax. If the target type only supports map-based selectors, both this
// field and map-based selector field are populated.
// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
// +optional
TargetSelector string `json:"targetSelector,omitempty" protobuf:"bytes,3,opt,name=targetSelector"`
}
// Scale represents a scaling request for a resource.
type Scale struct {
metav1.TypeMeta `json:",inline"`
// Standard object metadata; More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata.
// +optional
metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`
// defines the behavior of the scale. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status.
// +optional
Spec ScaleSpec `json:"spec,omitempty" protobuf:"bytes,2,opt,name=spec"`
// current status of the scale. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status. Read-only.
// +optional
Status ScaleStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"`
} We moved everything that is a workload (DaemonSet, Deployment, ReplicaSet, and StatefulSet) to apps, and for workloads that have a notion of scale (Deployment ,ReplicaSet, and StatefulSet), they all have an implemented scale resource. I would argue that anything that is a scalable serving or stateful workload should use the same interface. There has also been a proposal for scalable batch workloads. Scaling a Job has a different meaning than scaling a serving or stateful workload. When you scale a serving or stateful workload, you are definitively requesting or releasing capacity (mem, compute, or storage), in general in response to QPS demands. When you scale a batch workload (as represented by Job), you are actually asking for potentially more capacity, to increase parallelism, and to thereby decrease the time to completion. Depending on the number of already completed Tasks, scaling a Job may not actually require more capacity. There is a proposed API for this type of scaling and it looks something like this. type ScaleSpec struct {
// Specifies the maximum desired number of pods the job should
// run at any given time. The actual number of pods running in steady state will
// be less than this number when ((.spec.completions - .status.successful) < .spec.parallelism),
// i.e. when the work left to do is less than max parallelism.
// +optional
Parallelism int32
}
// represents the current status of a scale subresource.
type ScaleStatus struct {
// Specifies the desired number of successfully finished pods the
// job should be run with. Setting to nil means that the success of any
// pod signals the success of all pods, and allows parallelism to have any positive
// value. Setting to 1 means that parallelism is limited to 1 and the success of that
// pod signals the success of the job.
// +optional
Completions int32
// The number of actively running pods.
// +optional
Active int32
// The current max parallelism of the batch resource.
Parallelism int32
// label query over pods that should match the completions + active count.
// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
// +optional
Selector *metav1.LabelSelector
}
// represents a scaling request for a resource.
type Scale struct {
metav1.TypeMeta
// Standard object metadata; More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata.
// +optional
metav1.ObjectMeta
// defines the behavior of the scale. More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status.
// +optional
Spec ScaleSpec
// current status of the scale. More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status. Read-only.
// +optional
Status ScaleStatus
} As the status of the scale of batch Job considers the number of complete Jobs, the number of active Jobs, and the max parallelism, and as the spec is really changing the parallelism and not requesting a fixed capacity increase or decrease, imo, it warrants having a different signature for the Scale sub resource. I'd argue this sub resource should live in batch/v1. |
Note that not all workloads or batch apis will live in batch or apps, so
it's not enough to define a sub resource there. If we create multiple
Scale resource subtypes then Scale is not generic anymore. And I don't
think that we have clearly articulated why that is superior.
…On Wed, Oct 4, 2017 at 6:30 PM, Kenneth Owens ***@***.***> wrote:
I think there is still open discussion on placement. I would propose that
we should have one representation of scale for serving and stateful
workloads. This should live in the apps/v1 group version and should have
the existing signature.
// ScaleSpec describes the attributes of a scale subresourcetype ScaleSpec struct {
// desired number of instances for the scaled object.
// +optional
Replicas int32 `json:"replicas,omitempty" protobuf:"varint,1,opt,name=replicas"`
}
// ScaleStatus represents the current status of a scale subresource.type ScaleStatus struct {
// actual number of observed instances of the scaled object.
Replicas int32 `json:"replicas" protobuf:"varint,1,opt,name=replicas"`
// label query over pods that should match the replicas count. More info: http://kubernetes.io/docs/user-guide/labels#label-selectors
// +optional
Selector map[string]string `json:"selector,omitempty" protobuf:"bytes,2,rep,name=selector"`
// label selector for pods that should match the replicas count. This is a serializated
// version of both map-based and more expressive set-based selectors. This is done to
// avoid introspection in the clients. The string will be in the same format as the
// query-param syntax. If the target type only supports map-based selectors, both this
// field and map-based selector field are populated.
// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
// +optional
TargetSelector string `json:"targetSelector,omitempty" protobuf:"bytes,3,opt,name=targetSelector"`
}
// Scale represents a scaling request for a resource.type Scale struct {
metav1.TypeMeta `json:",inline"`
// Standard object metadata; More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata.
// +optional
metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`
// defines the behavior of the scale. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status.
// +optional
Spec ScaleSpec `json:"spec,omitempty" protobuf:"bytes,2,opt,name=spec"`
// current status of the scale. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status. Read-only.
// +optional
Status ScaleStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"`
}
We moved everything that is a workload (DaemonSet, Deployment, ReplicaSet,
and StatefulSet) to apps, and for workloads that have a notion of scale
(Deployment ,ReplicaSet, and StatefulSet), they all have an implemented
scale resource. I would argue that anything that is a scalable serving or
stateful workload should use the same interface.
There has also been a proposal for scalable batch workloads. Scaling a Job
has a different meaning than scaling a serving or stateful workload. When
you scale a serving or stateful workload, you are definitively requesting
or releasing capacity (mem, compute, or storage), in general in response to
QPS demands. When you scale a batch workload (as represented by Job), you
are actually asking for potentially more capacity, to increase parallelism,
and to thereby decrease the time to completion. Depending on the number of
already completed Tasks, scaling a Job may not actually require more
capacity. There is a proposed API for this type of scaling and it looks
something like this.
type ScaleSpec struct {
// Specifies the maximum desired number of pods the job should
// run at any given time. The actual number of pods running in steady state will
// be less than this number when ((.spec.completions - .status.successful) < .spec.parallelism),
// i.e. when the work left to do is less than max parallelism.
// +optional
Parallelism int32
}
// represents the current status of a scale subresource.type ScaleStatus struct {
// Specifies the desired number of successfully finished pods the
// job should be run with. Setting to nil means that the success of any
// pod signals the success of all pods, and allows parallelism to have any positive
// value. Setting to 1 means that parallelism is limited to 1 and the success of that
// pod signals the success of the job.
// +optional
Completions int32
// The number of actively running pods.
// +optional
Active int32
// The current max parallelism of the batch resource.
Parallelism int32
// label query over pods that should match the completions + active count.
// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
// +optional
Selector *metav1.LabelSelector
}
// represents a scaling request for a resource.type Scale struct {
metav1.TypeMeta
// Standard object metadata; More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata.
// +optional
metav1.ObjectMeta
// defines the behavior of the scale. More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status.
// +optional
Spec ScaleSpec
// current status of the scale. More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status. Read-only.
// +optional
Status ScaleStatus
}
As the status of the scale of batch Job considers the number of complete
Jobs, the number of active Jobs, and the max parallelism, and as the spec
is really changing the parallelism and not requesting a fixed capacity
increase or decrease, imo, it warrants having a different signature for the
Scale sub resource. I'd argue this sub resource should live in batch/v1.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#49504 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p7sWVi9ZeUv48Sb_wJ_Ia189zbSaks5spAcMgaJpZM4Ohaz4>
.
|
If we go with a generic implementation over specialized implementations (within reason), aren't we just aggregating the bloat into a single location and causing confusion?
Perhaps apps and batch might not be the right home, but if we are moving toward and architecture where the nucleus is the core and most workloads are in the applications layer, apps might be a good place. I think, as it stands now, we could have a general implementation for any controller that launches Pods with a required restart policy of always, and another for any controller that launches Pods that will eventually complete as successful or failed. I'm not sure I see a benefit to attempting to unify them when the semantics are so different. |
This is a dupe of #29698 |
Do we feel like the scale subresource has worked well for autoscaling, at least? As with conditions (#7856), I am starting to feel that subresources are overly complicated and cumbersome just for polymorphism. I am now in favor of duck typing, which has been extremely successful for metadata. We could scale anything with a .status.replicas field. I'm in favor of creating a convention for selectors in status, also: #3676. I also want to do this for client-side resource manipulation, anyway. If we use subresources for .status.replicas doesn't work for Job, but that's acceptable IMO. Job scaling requires policies that take into account at least the amount of queued work and the deadline. Why do we need a Scale flavor for Job? Do we expect to have more than one Job-like API? With respect to the Scale subresource type: @DirectXMan12 accurately explained why Scale was put into the autoscaling group. Is there a concrete mechanical problem with that? Why can't every controller that implements Scale use its own copy? When encoded, all copies will be structurally indistinguishable. Go typing is not helping us here, but I don't understand why we care. The type is very small. |
@bgrant0607 If we don't do a generic scale sub resource, we will need to find another mechanism to integrate pod disruption budgets and custom resource definitions. Currently PDBs will work based on the replicas of known controller objects only. |
@kow3ns By "generic", you just mean structurally equivalent? |
The duck typing philosophy sounds interesting to me, although it puts more pressure on fixing some related issues. One of the reasons we moved to /scale for some controllers was because it helped paper over version skew problems with get/roundtrip/update on the main resource. We need to fix that anyway though. |
Yes, we need to fix patch/merge anyway. We need to figure out how to deal with protobuf tags. Honestly, proto's tagging scheme is not a great fit for us given that we have API versioning and discovery via OpenAPI. Maybe we can paper over it using OpenAPI. I also wonder if we could autogenerate the tags by hashing the json paths. We could also use extra OpenAPI / JSON schema attributes for APIs that deviated from the convention -- so convention over configuration, but with configuration as a backup. |
Don't the existing workload APIs support the version of scale from the autoscaling API group? We have a way to independently version the Scale API, so I'm fine with using the one from the autoscaling group until there is an obviously better answer. |
Currently apps uses the extensions representation internally with each group version having a separate, versioned sub resource. |
replicationcontrollers/scale uses autoscaling Scale:
all the rest (in extensions/v1beta1, apps/v1beta1, and apps/v1beta2 API groups) advertise extensions/v1beta1 Scale as their scale subresource type:
however, they return content that does not match the version advertised in discovery |
@liggitt is correct. While each group version defines a scale sub-resource, the rest implementation wires in the extension scale sub-resource. |
more context here |
Well, extensions is just wrong, for any new API. |
Agree. |
Eeeek. That needs to be removed before going stable. We shouldn't have a very on scale in every external API version. It kind-of defeats the point. |
SGTM |
The other reason for subresources was so that autoscalers couldn't
mutate other fields on the object.
Re: proto tags we have an idl, we could in theory identify some tags
but in general the structure of json and proto diverges on a few
types.
|
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. allow */subresource in rbac policy rules xref #29698 xref #38756 xref #49504 xref #38810 Allow `*/subresource` format in RBAC policy rules to support polymorphic subresources like `*/scale` for HPA. @DirectXMan12 fyi ```release-note RBAC PolicyRules now allow resource=`*/<subresource>` to cover `any-resource/<subresource>`. For example, `*/scale` covers `replicationcontroller/scale`. ```
…-client Automatic merge from submit-queue (batch tested with PRs 53743, 53564). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Polymorphic Scale Client This PR introduces a polymorphic scale client based on discovery information that's able to scale scalable resources in arbitrary group-versions, as long as they present the scale subresource in their discovery information. Currently, it supports `extensions/v1beta1.Scale` and `autoscaling/v1.Scale`, but supporting other versions of scale if/when we produce them should be fairly trivial. It also updates the HPA to use this client, meaning the HPA will now work on any scalable resource, not just things in the `extensions/v1beta1` API group. **Release note**: ```release-note Introduces a polymorphic scale client, allowing HorizontalPodAutoscalers to properly function on scalable resources in any API group. ``` Unblocks #29698 Unblocks #38756 Unblocks #49504 Fixes #38810
Automatic merge from submit-queue (batch tested with PRs 53047, 54861, 55413, 55395, 55308). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Switch internal scale type to autoscaling, enable apps/v1 scale subresources xref kubernetes#49504 * Switch workload internal scale type to autoscaling.Scale (internal-only change) * Enable scale subresources for apps/v1 deployments, replicasets, statefulsets ```release-note NONE ```
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
/remove-lifecycle stale |
/close |
This issue covers the proposal for the implementation of kubectl scale for the core controllers.
This is relevant to StatefulSet, ReplicaSet and Deployment.
Currently, kubectl uses GET/PUT on the resource to update scale, and should instead use the scale endpoint.
The scale endpoint exists in RS and Deployment, and will be added to SS as well shortly.
In order to keep compatibility with older server versions, the proposed change to the
kubectl scale
implementation is as follows:cc @liggitt @kubernetes/sig-cli-misc @kubernetes/sig-apps-misc
The text was updated successfully, but these errors were encountered: