SeldonIO · ukclivecox · Apr 6, 2020 · Mar 29, 2020 · Mar 29, 2020 · Mar 29, 2020
diff --git a/.gitignore b/.gitignore
@@ -209,4 +209,5 @@ examples/ambassador/headers/ambassador_headers.py
 examples/ambassador/shadow/ambassador_shadow.py
 examples/models/metrics/metrics.py
 examples/models/custom_metrics/customMetrics.py
-examples/models/tracing/tracing.py
+examples/models/tracing/tracing.py
+examples/models/autoscaling/autoscaling_example.py
diff --git a/doc/source/examples/notebooks.rst b/doc/source/examples/notebooks.rst
@@ -79,20 +79,22 @@ MLOps: Scaling and Monitoring and Observability
 
 .. toctree::
    :titlesonly:
-
+
+   Autoscaling Example <autoscaling_example>    
    Request Payload Logging with ELK <payload_logging>
    Custom Metrics with Grafana & Prometheus <metrics>
    Distributed Tracing with Jaeger <tracing>
    CI / CD with Jenkins Classic <jenkins_classic>
    CI / CD with Jenkins X <jenkins_x>
+   Replica control <scale>
+
 
 Production Configurations and Integrations
 -----
 
 .. toctree::
    :titlesonly:
 
-   Autoscaling Example <autoscaling_example>
    Custom Endpoints <custom_endpoints>
    Example Helm Deployments <helm_examples>
    Max gRPC Message Size <max_grpc_msg_size>

diff --git a/doc/source/examples/scale.nblink b/doc/source/examples/scale.nblink
@@ -0,0 +1,3 @@
+{
+  "path": "../../../notebooks/scale.ipynb"
+}
diff --git a/doc/source/graph/autoscaling.md b/doc/source/graph/autoscaling.md
diff --git a/doc/source/graph/scaling.md b/doc/source/graph/scaling.md
@@ -0,0 +1,154 @@
+# Scaling Replicas
+
+## Replica Settings
+
+Replicas settings can be provided at several levels with the most specific taking precedence, from most general to most specific as shown below:
+
+  * `.spec.replicas`
+  * `.spec.predictors[].replicas`
+  * `.spec.predictors[].componentSpecs[].replicas`
+
+If you use the annotation `seldon.io/engine-separate-pod` you can also set the number of replicas for the service orchestrator in:
+
+ * `.spec.predictors[].svcOrchSpec.replicas`
+
+As illustration, a contrived example showing various options is shown below:
+
+```
+apiVersion: machinelearning.seldon.io/v1
+kind: SeldonDeployment
+metadata:
+  name: test-replicas
+spec:
+  replicas: 1
+  predictors:
+  - componentSpecs:
+    - spec:
+        containers:
+        - image: seldonio/mock_classifier_rest:1.3
+          name: classifier
+    - spec:
+        containers:
+        - image: seldonio/mock_classifier_rest:1.3
+          name: classifier2
+      replicas: 3
+    graph:
+      endpoint:
+        type: REST
+      name: classifier
+      type: MODEL
+      children:
+      - name: classifier2
+        type: MODEL
+        endpoint:
+          type: REST
+    name: example
+    replicas: 2
+    traffic: 50
+  - componentSpecs:
+    - spec:
+        containers:
+        - image: seldonio/mock_classifier_rest:1.3
+          name: classifier3
+    graph:
+      children: []
+      endpoint:
+        type: REST
+      name: classifier3
+      type: MODEL
+    name: example2
+    traffic: 50
+
+```
+
+ * classfier will have a deployment with 2 replicas as specified by the predictor it is defined within
+ * classifier2 will have a deployment with 3 replicas as that is specified in its componentSpec
+ * classifier3 will have 1 replica as it takes its value from `.spec.replicas`
+
+For more details see [a worked example for the above replica settings](../examples/scale.html).
+
+## Scale replicas
+
+Its is possible to use the `kubectl scale` command to set the `replicas` value of the SeldonDeployment. For simple inference graphs this can be an easy way to scale them up and down. For example:
+
+```
+apiVersion: machinelearning.seldon.io/v1
+kind: SeldonDeployment
+metadata:
+  name: seldon-scale
+spec:
+  replicas: 1  
+  predictors:
+  - componentSpecs:
+    - spec:
+        containers:
+        - image: seldonio/mock_classifier_rest:1.3
+          name: classifier
+    graph:
+      children: []
+      endpoint:
+        type: REST
+      name: classifier
+      type: MODEL
+    name: example
+```
+
+One can scale this Seldon Deployment up using the command:
+
+```
+kubectl scale --replicas=2 sdep/seldon-scale
+```
+
+For more details you can follow [a worked example of scaling](../examples/scale.html).
+
+## Autoscaling Seldon Deployments
+
+To autoscale your Seldon Deployment resources you can add Horizontal Pod Template Specifications to the Pod Template Specifications you create. There are three steps:
+
+  1. Ensure you have a resource request for the metric you want to scale on if it is a standard metric such as cpu or memory.
+  1. Add a HPA Spec refering to this Deployment. (We presently support v1beta1 version of k8s HPA Metrics spec)
+
+To illustrate this we have an example Seldon Deployment below:
+
+```yaml
+apiVersion: machinelearning.seldon.io/v1
+kind: SeldonDeployment
+metadata:
+  name: seldon-model
+spec:
+  name: test-deployment
+  predictors:
+  - componentSpecs:
+    - hpaSpec:
+        maxReplicas: 3
+        metrics:
+        - resource:
+            name: cpu
+            targetAverageUtilization: 70
+          type: Resource
+        minReplicas: 1
+      spec:
+        containers:
+        - image: seldonio/mock_classifier_rest:1.3
+          imagePullPolicy: IfNotPresent
+          name: classifier
+          resources:
+            requests:
+              cpu: '0.5'
+        terminationGracePeriodSeconds: 1
+    graph:
+      children: []
+      endpoint:
+        type: REST
+      name: classifier
+      type: MODEL
+    name: example
+```
+
+The key points here are:
+
+ * We define a CPU request for our container. This is required to allow us to utilize cpu autoscaling in Kubernetes.
+ * We define an HPA associated with our componentSpec which scales on CPU when the average CPU is above 70% up to a maximum of 3 replicas.
+
+
+For a worked example see [this notebook](../examples/autoscaling_example.html).
diff --git a/doc/source/index.rst b/doc/source/index.rst
@@ -94,7 +94,7 @@ Documentation Index
    Metrics with Prometheus <analytics/analytics.md>
    Payload Logging with ELK <analytics/logging.md>
    Distributed Tracing with Jaeger <graph/distributed-tracing.md>
-   Autoscaling in Kubernetes <graph/autoscaling.md>
+   Replica scaling  <graph/scaling.md>
 
 .. toctree::
    :maxdepth: 1

diff --git a/doc/source/python/api/seldon_core.proto.rst b/doc/source/python/api/seldon_core.proto.rst
@@ -2,9 +2,9 @@ seldon\_core.proto package
 ==========================
 
 .. automodule:: seldon_core.proto
-   :members:
-   :undoc-members:
-   :show-inheritance:
+    :members:
+    :undoc-members:
+    :show-inheritance:
 
 Submodules
 ----------
@@ -13,15 +13,16 @@ seldon\_core.proto.prediction\_pb2 module
 -----------------------------------------
 
 .. automodule:: seldon_core.proto.prediction_pb2
-   :members:
-   :undoc-members:
-   :show-inheritance:
+    :members:
+    :undoc-members:
+    :show-inheritance:
 
 seldon\_core.proto.prediction\_pb2\_grpc module
 -----------------------------------------------
 
 .. automodule:: seldon_core.proto.prediction_pb2_grpc
-   :members:
-   :undoc-members:
-   :show-inheritance:
+    :members:
+    :undoc-members:
+    :show-inheritance:
+