Skip to content

Commit

Permalink
Add grpc docs and missing OIP docs for some runtimes (#306)
Browse files Browse the repository at this point in the history
Add open inference protocol and grpc docs

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
  • Loading branch information
sivanantha321 authored Oct 16, 2023
1 parent 74693a7 commit 2257489
Show file tree
Hide file tree
Showing 12 changed files with 1,529 additions and 102 deletions.
32 changes: 17 additions & 15 deletions docs/developer/developer.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ they can be installed via the [directions here](https://github.com/knative/docs/

* If you already have `Istio` or `Knative` (e.g. from a Kubeflow install) then you don't need to install them explicitly, as long as version dependencies are satisfied.

> **_NOTE:_** Note: On a local environment, when using `minikube` or `kind` as Kubernetes cluster, there has been a reported issue that [knative quickstart](https://knative.dev/docs/install/quickstart-install/) bootstrap does not work as expected. It is recommended to follow the installation manual from knative using [yaml](https://knative.dev/docs/install/yaml-install/) or using [knative operator](https://knative.dev/docs/install/operator/knative-with-operators/) for a better result.
!!! Note
On a local environment, when using `minikube` or `kind` as Kubernetes cluster, there has been a reported issue that [knative quickstart](https://knative.dev/docs/install/quickstart-install/) bootstrap does not work as expected. It is recommended to follow the installation manual from knative using [yaml](https://knative.dev/docs/install/yaml-install/) or using [knative operator](https://knative.dev/docs/install/operator/knative-with-operators/) for a better result.

### Setup your environment

Expand Down Expand Up @@ -152,12 +153,12 @@ make deploy
make deploy
```

==**Expected Output**==
```console
$ kubectl get pods -n kserve -l control-plane=kserve-controller-manager
NAME READY STATUS RESTARTS AGE
kserve-controller-manager-0 2/2 Running 0 13m
```
!!! success "Expected Output"
```console
$ kubectl get pods -n kserve -l control-plane=kserve-controller-manager
NAME READY STATUS RESTARTS AGE
kserve-controller-manager-0 2/2 Running 0 13m
```
!!! Note
By default it installs to `kserve` namespace with the published controller manager image from master branch.

Expand All @@ -177,12 +178,12 @@ make deploy-dev-xgb
```

Run the following command to deploy explainer with your local change.
```
```bash
make deploy-dev-alibi
```

Run the following command to deploy storage initializer with your local change.
```
```bash
make deploy-dev-storageInitializer
```

Expand All @@ -204,11 +205,12 @@ You should see model serving deployment running under default or your specified
$ kubectl get pods -n default -l serving.kserve.io/inferenceservice=flower-sample
```

==**Expected Output**==
```
NAME READY STATUS RESTARTS AGE
flower-sample-default-htz8r-deployment-8fd979f9b-w2qbv 3/3 Running 0 10s
```
!!! success "Expected Output"
```
NAME READY STATUS RESTARTS AGE
flower-sample-default-htz8r-deployment-8fd979f9b-w2qbv 3/3 Running 0 10s
```

## Running unit/integration tests
`kserver-controller-manager` has a few integration tests which requires mock apiserver
and etcd, they get installed along with [`kubebuilder`](https://book.kubebuilder.io/quick-start.html#installation).
Expand All @@ -227,7 +229,7 @@ To setup from local code, do:
3. `make deploy-dev`

Go to `python/kserve` and install kserve python sdk deps
```
```bash
pip3 install -e .[test]
```
Then go to `test/e2e`.
Expand Down
69 changes: 55 additions & 14 deletions docs/modelserving/v1beta1/lightgbm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,13 +129,13 @@ curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1
{"predictions": [[0.9, 0.05, 0.05]]}
```

## Deploy the model with [Open Inference Protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2)
## Deploy the model with [Open Inference Protocol](https://github.com/kserve/open-inference-protocol/)

### Test the model locally
Once you've got your model serialized `model.bst`, we can then use [KServe LightGBM Server](https://github.com/kserve/kserve/tree/master/python/lgbserver) to create a local model server.

!!! Note
This step is optional and just meant for testing, feel free to jump straight to [deploying with InferenceService](#deploy-with-inferenceservice).
This step is optional and just meant for testing, feel free to jump straight to [deploying with InferenceService](#deploy-inferenceservice-with-rest-endpoint).

#### Pre-requisites

Expand All @@ -162,7 +162,7 @@ The `lgbserver` package takes three arguments.
With the `lgbserver` runtime package installed locally, you should now be ready to start our server as:

```bash
python3 lgbserver --model_dir /path/to/model_dir --model_name lightgbm-iris
python3 lgbserver --model_dir /path/to/model_dir --model_name lightgbm-v2-iris
```

### Deploy InferenceService with REST endpoint
Expand Down Expand Up @@ -205,7 +205,7 @@ kubectl apply -f lightgbm-v2.yaml
You can now test your deployed model by sending a sample request.

Note that this request **needs to follow the [V2 Dataplane protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2)**.
You can see an example payload below:
You can see an example payload below. Create a file named `iris-input-v2.json` with the sample input.

```json
{
Expand Down Expand Up @@ -263,13 +263,35 @@ curl -v \
### Create the InferenceService with gRPC endpoint
Create the inference service yaml and expose the gRPC port, currently only one port is allowed to expose either HTTP or gRPC port and by default HTTP port is exposed.

=== "Yaml"
!!! Note
Currently, KServe only supports exposing either HTTP or gRPC port. By default, HTTP port is exposed.

=== "Serverless"
```yaml
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "lightgbm-v2-iris"
name: "lightgbm-v2-iris-grpc"
spec:
predictor:
model:
modelFormat:
name: lightgbm
protocolVersion: v2
runtime: kserve-lgbserver
storageUri: "gs://kfserving-examples/models/lightgbm/v2/iris"
ports:
- name: h2c # knative expects grpc port name to be 'h2c'
protocol: TCP
containerPort: 8081
```

=== "RawDeployment"
```yaml
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "lightgbm-v2-iris-grpc"
spec:
predictor:
model:
Expand All @@ -279,7 +301,7 @@ Create the inference service yaml and expose the gRPC port, currently only one p
runtime: kserve-lgbserver
storageUri: "gs://kfserving-examples/models/lightgbm/v2/iris"
ports:
- name: h2c
- name: grpc-port # Istio requires the port name to be in the format <protocol>[-<suffix>]
protocol: TCP
containerPort: 8081
```
Expand All @@ -299,22 +321,22 @@ After the gRPC `InferenceService` becomes ready, [grpcurl](https://github.com/fu

```bash
# download the proto file
curl -O https://raw.githubusercontent.com/kserve/kserve/master/docs/predict-api/v2/grpc_predict_v2.proto
curl -O https://raw.githubusercontent.com/kserve/open-inference-protocol/main/specification/protocol/open_inference_grpc.proto
INPUT_PATH=iris-input-v2-grpc.json
PROTO_FILE=grpc_predict_v2.proto
SERVICE_HOSTNAME=$(kubectl get inferenceservice lightgbm-v2-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3)
PROTO_FILE=open_inference_grpc.proto
SERVICE_HOSTNAME=$(kubectl get inferenceservice lightgbm-v2-iris-grpc -o jsonpath='{.status.url}' | cut -d "/" -f 3)
```

The gRPC APIs follow the KServe [prediction V2 protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2).

[Determine the ingress IP and port](../../../get_started/first_isvc.md#4-determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`. Now, you can use `curl` to send the inference requests.
The gRPC APIs follow the KServe [prediction V2 protocol / Open Inference Protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2).
For example, `ServerReady` API can be used to check if the server is ready:

```bash
grpcurl \
-plaintext \
-proto ${PROTO_FILE} \
-authority ${SERVICE_HOSTNAME}" \
-authority ${SERVICE_HOSTNAME} \
${INGRESS_HOST}:${INGRESS_PORT} \
inference.GRPCInferenceService.ServerReady
```
Expand All @@ -326,6 +348,25 @@ grpcurl \
}
```

You can test the deployed model by sending a sample request with the below payload.
Notice that the input format differs from the in the previous `REST endpoint` example.
Prepare the inference input inside the file named `iris-input-v2-grpc.json`.
```json
{
"model_name": "lightgbm-v2-iris-grpc",
"inputs": [
{
"name": "input-0",
"shape": [2, 4],
"datatype": "FP32",
"contents": {
"fp32_contents": [6.8, 2.8, 4.8, 1.4, 6.0, 3.4, 4.5, 1.6]
}
}
]
}
```

`ModelInfer` API takes input following the `ModelInferRequest` schema defined in the `grpc_predict_v2.proto` file. Notice that the input file differs from that used in the previous `curl` example.

```bash
Expand Down Expand Up @@ -364,7 +405,7 @@ grpcurl \
Response contents:
{
"modelName": "lightgbm-v2-iris",
"modelName": "lightgbm-v2-iris-grpc",
"outputs": [
{
"name": "predict",
Expand Down
2 changes: 1 addition & 1 deletion docs/modelserving/v1beta1/lightgbm/iris-input-v2-grpc.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"model_name": "lightgbm-v2-iris",
"model_name": "lightgbm-v2-iris-grpc",
"inputs": [
{
"name": "input-0",
Expand Down
4 changes: 2 additions & 2 deletions docs/modelserving/v1beta1/lightgbm/lightgbm-v2-grpc.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "lightgbm-v2-iris"
name: "lightgbm-v2-iris-grpc"
spec:
predictor:
model:
Expand All @@ -12,4 +12,4 @@ spec:
ports:
- name: h2c
protocol: TCP
containerPort: 9000
containerPort: 8081
276 changes: 264 additions & 12 deletions docs/modelserving/v1beta1/paddle/README.md

Large diffs are not rendered by default.

13 changes: 13 additions & 0 deletions docs/modelserving/v1beta1/paddle/jay-v2-grpc.json

Large diffs are not rendered by default.

12 changes: 12 additions & 0 deletions docs/modelserving/v1beta1/paddle/jay-v2.json

Large diffs are not rendered by default.

Loading

0 comments on commit 2257489

Please sign in to comment.