Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubectl scale implementation for core workload controllers #49504

Closed
foxish opened this issue Jul 24, 2017 · 35 comments
Closed

kubectl scale implementation for core workload controllers #49504

foxish opened this issue Jul 24, 2017 · 35 comments
Assignees
Labels
sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/cli Categorizes an issue or PR as relevant to SIG CLI.

Comments

@foxish
Copy link
Contributor

foxish commented Jul 24, 2017

This issue covers the proposal for the implementation of kubectl scale for the core controllers.
This is relevant to StatefulSet, ReplicaSet and Deployment.

Currently, kubectl uses GET/PUT on the resource to update scale, and should instead use the scale endpoint.
The scale endpoint exists in RS and Deployment, and will be added to SS as well shortly.

In order to keep compatibility with older server versions, the proposed change to the kubectl scale implementation is as follows:

  1. Use the discovery API to check for the presence of a scale sub-resource
  2. If scale sub-resource is present, use it for updating the number of replicas
  3. If it is absent, fall back to the existing method of GET/PUT on the resource

cc @liggitt @kubernetes/sig-cli-misc @kubernetes/sig-apps-misc

@k8s-ci-robot k8s-ci-robot added sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Jul 24, 2017
@liggitt
Copy link
Member

liggitt commented Jul 24, 2017

  1. Use the discovery API to check for the presence of a scale sub-resource
  2. If scale sub-resource is present, use it for updating the number of replicas

This should allow kubectl scale to work against arbitrary resources if they include a /scale subresource, right?

@foxish
Copy link
Contributor Author

foxish commented Jul 24, 2017

Currently, we have ScaleInterface being implemented by each group's client. So, extending it to arbitrary types would need a new generic ScaleInterface.
From what I understand, we'd need such a Scale type and interface to exist in metav1 if we want to implement a generic interface of this nature.

@crimsonfaith91
Copy link
Contributor

+1 for generic Scale interface.

@smarterclayton
Copy link
Contributor

smarterclayton commented Jul 25, 2017 via email

@smarterclayton
Copy link
Contributor

smarterclayton commented Jul 25, 2017 via email

@mwielgus mwielgus added the sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. label Jul 31, 2017
@mwielgus
Copy link
Contributor

cc: @DirectXMan12

@DirectXMan12
Copy link
Contributor

I think scale should live in autoscaling. I look at Scale like an interface, and groups like packages in this case -- it makes sense to me for the "interfaces" required for autoscaling to live in the autoscaling "package" with the "concrete types" (HPA).

@DirectXMan12
Copy link
Contributor

I lean slightly towards adding "Scale" to another group (like "autoscaling" or a net new empty group)

For future reference, there's already a copy of scale there ;-).

@adohe-zz
Copy link

adohe-zz commented Aug 6, 2017

and scale for jobs here.

@sttts
Copy link
Contributor

sttts commented Aug 7, 2017

I think scale should live in autoscaling

Doesn't sound right. Autoscaling is only one way of scaling an object.

@DirectXMan12
Copy link
Contributor

sure, but the scale subresource is mainly used for autoscaling -- that's mainly why it exists so "automated" processes can scale different resources without having specific knowledge of them.

@nikhita
Copy link
Member

nikhita commented Sep 11, 2017

What must be done here? The proposal for subresources for CustomResources (kubernetes/community#913) is blocked on where a unified Scale object should live.

cc @deads2k @liggitt @enisoc

@sttts
Copy link
Contributor

sttts commented Oct 4, 2017

@DirectXMan12 any update on polymorphic scale?

@kow3ns
Copy link
Member

kow3ns commented Oct 4, 2017

I think there is still open discussion on placement. I would propose that we should have one representation of scale for serving and stateful workloads. This should live in the apps/v1 group version and should have the existing signature.

// ScaleSpec describes the attributes of a scale subresource
type ScaleSpec struct {
	// desired number of instances for the scaled object.
	// +optional
	Replicas int32 `json:"replicas,omitempty" protobuf:"varint,1,opt,name=replicas"`
}

// ScaleStatus represents the current status of a scale subresource.
type ScaleStatus struct {
	// actual number of observed instances of the scaled object.
	Replicas int32 `json:"replicas" protobuf:"varint,1,opt,name=replicas"`

	// label query over pods that should match the replicas count. More info: http://kubernetes.io/docs/user-guide/labels#label-selectors
	// +optional
	Selector map[string]string `json:"selector,omitempty" protobuf:"bytes,2,rep,name=selector"`

	// label selector for pods that should match the replicas count. This is a serializated
	// version of both map-based and more expressive set-based selectors. This is done to
	// avoid introspection in the clients. The string will be in the same format as the
	// query-param syntax. If the target type only supports map-based selectors, both this
	// field and map-based selector field are populated.
	// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
	// +optional
	TargetSelector string `json:"targetSelector,omitempty" protobuf:"bytes,3,opt,name=targetSelector"`
}

// Scale represents a scaling request for a resource.
type Scale struct {
	metav1.TypeMeta `json:",inline"`
	// Standard object metadata; More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata.
	// +optional
	metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`

	// defines the behavior of the scale. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status.
	// +optional
	Spec ScaleSpec `json:"spec,omitempty" protobuf:"bytes,2,opt,name=spec"`

	// current status of the scale. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status. Read-only.
	// +optional
	Status ScaleStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"`
}

We moved everything that is a workload (DaemonSet, Deployment, ReplicaSet, and StatefulSet) to apps, and for workloads that have a notion of scale (Deployment ,ReplicaSet, and StatefulSet), they all have an implemented scale resource. I would argue that anything that is a scalable serving or stateful workload should use the same interface.

There has also been a proposal for scalable batch workloads. Scaling a Job has a different meaning than scaling a serving or stateful workload. When you scale a serving or stateful workload, you are definitively requesting or releasing capacity (mem, compute, or storage), in general in response to QPS demands. When you scale a batch workload (as represented by Job), you are actually asking for potentially more capacity, to increase parallelism, and to thereby decrease the time to completion. Depending on the number of already completed Tasks, scaling a Job may not actually require more capacity. There is a proposed API for this type of scaling and it looks something like this.

type ScaleSpec struct {
	// Specifies the maximum desired number of pods the job should
	// run at any given time. The actual number of pods running in steady state will
	// be less than this number when ((.spec.completions - .status.successful) < .spec.parallelism),
	// i.e. when the work left to do is less than max parallelism.
	// +optional
	Parallelism int32
}

// represents the current status of a scale subresource.
type ScaleStatus struct {
	// Specifies the desired number of successfully finished pods the
	// job should be run with.  Setting to nil means that the success of any
	// pod signals the success of all pods, and allows parallelism to have any positive
	// value.  Setting to 1 means that parallelism is limited to 1 and the success of that
	// pod signals the success of the job.
	// +optional
	Completions int32

	// The number of actively running pods.
	// +optional
	Active int32

       // The current max parallelism of the batch resource.
       Parallelism int32

	// label query over pods that should match the completions + active count.
	// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
	// +optional
	Selector *metav1.LabelSelector
}

// represents a scaling request for a resource.
type Scale struct {
	metav1.TypeMeta
	// Standard object metadata; More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata.
	// +optional
	metav1.ObjectMeta

	// defines the behavior of the scale. More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status.
	// +optional
	Spec ScaleSpec

	// current status of the scale. More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status. Read-only.
	// +optional
	Status ScaleStatus
}

As the status of the scale of batch Job considers the number of complete Jobs, the number of active Jobs, and the max parallelism, and as the spec is really changing the parallelism and not requesting a fixed capacity increase or decrease, imo, it warrants having a different signature for the Scale sub resource. I'd argue this sub resource should live in batch/v1.

@smarterclayton
Copy link
Contributor

smarterclayton commented Oct 4, 2017 via email

@kow3ns
Copy link
Member

kow3ns commented Oct 4, 2017

If we go with a generic implementation over specialized implementations (within reason), aren't we just aggregating the bloat into a single location and causing confusion?

  1. If I want to extend the existing scale to support completions in its status. I have to add a field that now applies to serving workloads, and probably doesn't make sense.
  2. The existing replicas field doesn't really communicate what scaling a Job does. Increasing parallelism may or may not increase the number of Pods.

Perhaps apps and batch might not be the right home, but if we are moving toward and architecture where the nucleus is the core and most workloads are in the applications layer, apps might be a good place.

I think, as it stands now, we could have a general implementation for any controller that launches Pods with a required restart policy of always, and another for any controller that launches Pods that will eventually complete as successful or failed. I'm not sure I see a benefit to attempting to unify them when the semantics are so different.

@bgrant0607
Copy link
Member

This is a dupe of #29698

@bgrant0607
Copy link
Member

Do we feel like the scale subresource has worked well for autoscaling, at least?

As with conditions (#7856), I am starting to feel that subresources are overly complicated and cumbersome just for polymorphism. I am now in favor of duck typing, which has been extremely successful for metadata. We could scale anything with a .status.replicas field. I'm in favor of creating a convention for selectors in status, also: #3676.

I also want to do this for client-side resource manipulation, anyway. If we use subresources for kubectl scale I'll also want a generickubectl set replicas that works based on the field path alone.

#29698 (comment)

.status.replicas doesn't work for Job, but that's acceptable IMO. Job scaling requires policies that take into account at least the amount of queued work and the deadline. Why do we need a Scale flavor for Job? Do we expect to have more than one Job-like API?

With respect to the Scale subresource type:

@DirectXMan12 accurately explained why Scale was put into the autoscaling group. Is there a concrete mechanical problem with that? Why can't every controller that implements Scale use its own copy? When encoded, all copies will be structurally indistinguishable. Go typing is not helping us here, but I don't understand why we care. The type is very small.

@kow3ns
Copy link
Member

kow3ns commented Oct 6, 2017

@bgrant0607 If we don't do a generic scale sub resource, we will need to find another mechanism to integrate pod disruption budgets and custom resource definitions. Currently PDBs will work based on the replicas of known controller objects only.

@bgrant0607
Copy link
Member

@kow3ns By "generic", you just mean structurally equivalent?

@enisoc
Copy link
Member

enisoc commented Oct 6, 2017

We could scale anything with a .status.replicas field.

The duck typing philosophy sounds interesting to me, although it puts more pressure on fixing some related issues. One of the reasons we moved to /scale for some controllers was because it helped paper over version skew problems with get/roundtrip/update on the main resource. We need to fix that anyway though.

@bgrant0607
Copy link
Member

Yes, we need to fix patch/merge anyway.

We need to figure out how to deal with protobuf tags. Honestly, proto's tagging scheme is not a great fit for us given that we have API versioning and discovery via OpenAPI.

Maybe we can paper over it using OpenAPI. I also wonder if we could autogenerate the tags by hashing the json paths.

We could also use extra OpenAPI / JSON schema attributes for APIs that deviated from the convention -- so convention over configuration, but with configuration as a backup.

@bgrant0607
Copy link
Member

Don't the existing workload APIs support the version of scale from the autoscaling API group?

We have a way to independently version the Scale API, so I'm fine with using the one from the autoscaling group until there is an obviously better answer.

@kow3ns
Copy link
Member

kow3ns commented Oct 9, 2017

Currently apps uses the extensions representation internally with each group version having a separate, versioned sub resource.
While we may one day scale things other than the kinds that we have decided to move to apps, we don't currently. In particular HPA works for Replication Controllers and Deployments, but, with the addition of custom metrics, it may be possible to get it to work well for StatefulSets. Also, we would like to use the sub resource to implement kubectl scale so that we stop round tripping resources and stripping data due to version skew between kubectl and the API server.

@liggitt
Copy link
Member

liggitt commented Oct 9, 2017

Don't the existing workload APIs support the version of scale from the autoscaling API group?

replicationcontrollers/scale uses autoscaling Scale:

$ kubectl get --raw /api/v1 | jq . | grep --color scale -A 5
      "name": "replicationcontrollers/scale",
      "singularName": "",
      "namespaced": true,
      "group": "autoscaling",
      "version": "v1",
      "kind": "Scale",

all the rest (in extensions/v1beta1, apps/v1beta1, and apps/v1beta2 API groups) advertise extensions/v1beta1 Scale as their scale subresource type:

$ kubectl get --raw /apis/extensions/v1beta1 | jq . | grep --color scale -A 5
      "name": "deployments/scale",
      "singularName": "",
      "namespaced": true,
      "group": "extensions",
      "version": "v1beta1",
      "kind": "Scale",
--
      "name": "replicasets/scale",
      "singularName": "",
      "namespaced": true,
      "group": "extensions",
      "version": "v1beta1",
      "kind": "Scale",
--
      "name": "replicationcontrollers/scale",
      "singularName": "",
      "namespaced": true,
      "kind": "Scale",

$ kubectl get --raw /apis/apps/v1beta1 | jq . | grep --color scale -A 5
      "name": "deployments/scale",
      "singularName": "",
      "namespaced": true,
      "group": "extensions",
      "version": "v1beta1",
      "kind": "Scale",
--
      "name": "statefulsets/scale",
      "singularName": "",
      "namespaced": true,
      "group": "extensions",
      "version": "v1beta1",
      "kind": "Scale",

$ kubectl get --raw /apis/apps/v1beta2 | jq . | grep --color scale -A 5
      "name": "deployments/scale",
      "singularName": "",
      "namespaced": true,
      "group": "extensions",
      "version": "v1beta1",
      "kind": "Scale",
--
      "name": "replicasets/scale",
      "singularName": "",
      "namespaced": true,
      "group": "extensions",
      "version": "v1beta1",
      "kind": "Scale",
--
      "name": "statefulsets/scale",
      "singularName": "",
      "namespaced": true,
      "group": "extensions",
      "version": "v1beta1",
      "kind": "Scale",

however, they return content that does not match the version advertised in discovery

@kow3ns
Copy link
Member

kow3ns commented Oct 9, 2017

@liggitt is correct. While each group version defines a scale sub-resource, the rest implementation wires in the extension scale sub-resource.

@kow3ns
Copy link
Member

kow3ns commented Oct 9, 2017

more context here

@bgrant0607
Copy link
Member

Well, extensions is just wrong, for any new API.

@liggitt
Copy link
Member

liggitt commented Oct 12, 2017

Well, extensions is just wrong, for any new API.

Agree. Scale in autoscaling/v1 is the closest thing we have to a uniform scale object (is supported by replication controllers, and is defined by the API group related to scaling, not related to a particular workload). It makes sense to me to coalesce around that as the common scale subresource kind

@DirectXMan12
Copy link
Contributor

While each group version defines a scale sub-resource,

Eeeek. That needs to be removed before going stable. We shouldn't have a very on scale in every external API version. It kind-of defeats the point.

@kow3ns
Copy link
Member

kow3ns commented Oct 16, 2017

SGTM

@smarterclayton
Copy link
Contributor

smarterclayton commented Oct 16, 2017 via email

k8s-github-robot pushed a commit that referenced this issue Oct 18, 2017
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

allow */subresource in rbac policy rules

xref #29698
xref #38756
xref #49504
xref #38810

Allow `*/subresource` format in RBAC policy rules to support polymorphic subresources like `*/scale` for HPA.

@DirectXMan12 fyi

```release-note
RBAC PolicyRules now allow resource=`*/<subresource>` to cover `any-resource/<subresource>`.   For example, `*/scale` covers `replicationcontroller/scale`.
```
k8s-github-robot pushed a commit that referenced this issue Oct 23, 2017
…-client

Automatic merge from submit-queue (batch tested with PRs 53743, 53564). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Polymorphic Scale Client

This PR introduces a polymorphic scale client based on discovery information that's able to scale scalable resources in arbitrary group-versions, as long as they present the scale subresource in their discovery information.

Currently, it supports `extensions/v1beta1.Scale` and `autoscaling/v1.Scale`, but supporting other versions of scale if/when we produce them should be fairly trivial.

It also updates the HPA to use this client, meaning the HPA will now work on any scalable resource, not just things in the `extensions/v1beta1` API group.

**Release note**:
```release-note
Introduces a polymorphic scale client, allowing HorizontalPodAutoscalers to properly function on scalable resources in any API group.
```

Unblocks #29698
Unblocks #38756
Unblocks #49504 
Fixes #38810
vdemeester pushed a commit to vdemeester/kubernetes that referenced this issue Nov 10, 2017
Automatic merge from submit-queue (batch tested with PRs 53047, 54861, 55413, 55395, 55308). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Switch internal scale type to autoscaling, enable apps/v1 scale subresources

xref kubernetes#49504

* Switch workload internal scale type to autoscaling.Scale (internal-only change)
* Enable scale subresources for apps/v1 deployments, replicasets, statefulsets

```release-note
NONE
```
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 14, 2018
@nikhita
Copy link
Member

nikhita commented Jan 15, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 15, 2018
@kow3ns
Copy link
Member

kow3ns commented Feb 27, 2018

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/cli Categorizes an issue or PR as relevant to SIG CLI.
Projects
None yet
Development

Successfully merging a pull request may close this issue.