Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add /scale subresource to CRD and replicas to various parts of CRD. #1633

Merged
merged 14 commits into from
Apr 6, 2020
Merged
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -209,4 +209,5 @@ examples/ambassador/headers/ambassador_headers.py
examples/ambassador/shadow/ambassador_shadow.py
examples/models/metrics/metrics.py
examples/models/custom_metrics/customMetrics.py
examples/models/tracing/tracing.py
examples/models/tracing/tracing.py
examples/models/autoscaling/autoscaling_example.py
6 changes: 4 additions & 2 deletions doc/source/examples/notebooks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,20 +79,22 @@ MLOps: Scaling and Monitoring and Observability

.. toctree::
:titlesonly:


Autoscaling Example <autoscaling_example>
Request Payload Logging with ELK <payload_logging>
Custom Metrics with Grafana & Prometheus <metrics>
Distributed Tracing with Jaeger <tracing>
CI / CD with Jenkins Classic <jenkins_classic>
CI / CD with Jenkins X <jenkins_x>
Replica control <scale>


Production Configurations and Integrations
-----

.. toctree::
:titlesonly:

Autoscaling Example <autoscaling_example>
Custom Endpoints <custom_endpoints>
Example Helm Deployments <helm_examples>
Max gRPC Message Size <max_grpc_msg_size>
Expand Down
3 changes: 3 additions & 0 deletions doc/source/examples/scale.nblink
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"path": "../../../notebooks/scale.ipynb"
}
97 changes: 0 additions & 97 deletions doc/source/graph/autoscaling.md

This file was deleted.

154 changes: 154 additions & 0 deletions doc/source/graph/scaling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# Scaling Replicas

## Replica Settings

Replicas settings can be provided at several levels with the most specific taking precedence, from most general to most specific as shown below:

* `.spec.replicas`
* `.spec.predictors[].replicas`
* `.spec.predictors[].componentSpecs[].replicas`

If you use the annotation `seldon.io/engine-separate-pod` you can also set the number of replicas for the service orchestrator in:

* `.spec.predictors[].svcOrchSpec.replicas`

As illustration, a contrived example showing various options is shown below:

```
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: test-replicas
spec:
replicas: 1
predictors:
- componentSpecs:
- spec:
containers:
- image: seldonio/mock_classifier_rest:1.3
name: classifier
- spec:
containers:
- image: seldonio/mock_classifier_rest:1.3
name: classifier2
replicas: 3
graph:
endpoint:
type: REST
name: classifier
type: MODEL
children:
- name: classifier2
type: MODEL
endpoint:
type: REST
name: example
replicas: 2
traffic: 50
- componentSpecs:
- spec:
containers:
- image: seldonio/mock_classifier_rest:1.3
name: classifier3
graph:
children: []
endpoint:
type: REST
name: classifier3
type: MODEL
name: example2
traffic: 50

```

* classfier will have a deployment with 2 replicas as specified by the predictor it is defined within
* classifier2 will have a deployment with 3 replicas as that is specified in its componentSpec
* classifier3 will have 1 replica as it takes its value from `.spec.replicas`

For more details see [a worked example for the above replica settings](../examples/scale.html).

## Scale replicas

Its is possible to use the `kubectl scale` command to set the `replicas` value of the SeldonDeployment. For simple inference graphs this can be an easy way to scale them up and down. For example:

```
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: seldon-scale
spec:
replicas: 1
predictors:
- componentSpecs:
- spec:
containers:
- image: seldonio/mock_classifier_rest:1.3
name: classifier
graph:
children: []
endpoint:
type: REST
name: classifier
type: MODEL
name: example
```

One can scale this Seldon Deployment up using the command:

```
kubectl scale --replicas=2 sdep/seldon-scale
```

For more details you can follow [a worked example of scaling](../examples/scale.html).

## Autoscaling Seldon Deployments

To autoscale your Seldon Deployment resources you can add Horizontal Pod Template Specifications to the Pod Template Specifications you create. There are three steps:

1. Ensure you have a resource request for the metric you want to scale on if it is a standard metric such as cpu or memory.
1. Add a HPA Spec refering to this Deployment. (We presently support v1beta1 version of k8s HPA Metrics spec)

To illustrate this we have an example Seldon Deployment below:

```yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: seldon-model
spec:
name: test-deployment
predictors:
- componentSpecs:
- hpaSpec:
maxReplicas: 3
metrics:
- resource:
name: cpu
targetAverageUtilization: 70
type: Resource
minReplicas: 1
spec:
containers:
- image: seldonio/mock_classifier_rest:1.3
imagePullPolicy: IfNotPresent
name: classifier
resources:
requests:
cpu: '0.5'
terminationGracePeriodSeconds: 1
graph:
children: []
endpoint:
type: REST
name: classifier
type: MODEL
name: example
```

The key points here are:

* We define a CPU request for our container. This is required to allow us to utilize cpu autoscaling in Kubernetes.
* We define an HPA associated with our componentSpec which scales on CPU when the average CPU is above 70% up to a maximum of 3 replicas.


For a worked example see [this notebook](../examples/autoscaling_example.html).
2 changes: 1 addition & 1 deletion doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ Documentation Index
Metrics with Prometheus <analytics/analytics.md>
Payload Logging with ELK <analytics/logging.md>
Distributed Tracing with Jaeger <graph/distributed-tracing.md>
Autoscaling in Kubernetes <graph/autoscaling.md>
adriangonz marked this conversation as resolved.
Show resolved Hide resolved
Replica scaling <graph/scaling.md>

.. toctree::
:maxdepth: 1
Expand Down
19 changes: 10 additions & 9 deletions doc/source/python/api/seldon_core.proto.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ seldon\_core.proto package
==========================

.. automodule:: seldon_core.proto
:members:
:undoc-members:
:show-inheritance:
:members:
:undoc-members:
:show-inheritance:

Submodules
----------
Expand All @@ -13,15 +13,16 @@ seldon\_core.proto.prediction\_pb2 module
-----------------------------------------

.. automodule:: seldon_core.proto.prediction_pb2
:members:
:undoc-members:
:show-inheritance:
:members:
:undoc-members:
:show-inheritance:

seldon\_core.proto.prediction\_pb2\_grpc module
-----------------------------------------------

.. automodule:: seldon_core.proto.prediction_pb2_grpc
:members:
:undoc-members:
:show-inheritance:
:members:
:undoc-members:
:show-inheritance:


Loading