This example introduces the development and deployment of a whole-flow confidential inference service case (Tensorflow-serving
). By referring to this framework, application developers could get below benefits.
- Directly transfer the application to Occlum TEE application.
- No SGX remote attestation development required but still have whole-flow sensitive data protection.
-
Whole-flow sensitive data protection by utilizing the Occlum
GRPC RATLS
solution. -
Directly generate inference service (
Tensorflow-serving
) running in TEE from Docker image (tensorflow/serving
) without modification. -
Way to build out the Docker container image in minimum size based on the Occlum package.
-
Deploy Occlum Confidential Inference Service on Kubernetes please refer to
kubernetes
.
The GRPC-RATLS server holds some sensitive data thus it is usually deployed on secure environment. The application consuming the sensitive data could be deployed on general environment, such as Cloud service vendor provided SGX2 instance. There is no HW SGX requirement for the inference requester. For this example, all are running on one SGX2 instance.
GRPC-RATLS server starts and gets ready for any secret request through GRPC channel. In this example, it is localhost:50051
in default.
In this example, two secrets need to be protected.
-
ssl_config
It is a tensorflow-serving required SSL config file to set up a secure gRPC channel. It is generated by combiningserver.key
andserver.crt
. Theserver.key
is a private key andserver.crt
is a self-signed certificate, both are generated byopenssl
. Details please refer to scriptgenerate_ssl_config.ssh
. -
image_key
It is used to encrypt/decrypt the Occlum application RootFS image which is Tensorflow-serving in this example. It is generated by commandocclum gen-image-key image_key
. The image encryption could be done byocclum build --image-key image-key
. With this encryption, anything saved in the RootFS has a good protection.
Application starts. First it starts the init
process. This customized init
requests ssl_config
and image_key
from GRPC-RATLS server through a secure GRPC RATLS connection. Then it uses the image_key
to decrypt the RootFS where the real application is located, mount the RootFS, save the ssl_config
to RootFS /etc/tf_ssl.cfg
.
Detail description of the above two steps Init-RA operation could refer to Init-RA
.
The real application tensorflow_model_server
starts with tf_ssl.cfg
and prefetched model, serves an inference service through secure GRPC channel which is localhost:9000
in this example.
Extra model_key could be added to protect the models if necessary. (not included in this demo)
Now users could send inference request with server certificates (server.crt
).
There are prebuilt docker images could be used for the examples, either in the following docker way or kubernates
way. Users could pull them directly and try the example.
docker pull occlum/init_ra_server:0.29.5-ubuntu20.04
docker pull occlum/tf_demo:0.29.5-ubuntu20.04
docker pull occlum/tf_demo_client:0.29.5-ubuntu20.04
If users want to build or customize the images, please check below part.
Our target is to deploy the demo in separated container images, so docker build is necessary steps. Thanks to the docker run in docker
method, this example build could be done in Occlum development container image.
First, please make sure docker
is installed successfully in your host. Then start the Occlum container (use version latest-ubuntu20.04
for example) as below.
$ sudo docker run --rm -itd --network host \
-v $(which docker):/usr/bin/docker -v /var/run/docker.sock:/var/run/docker.sock \
occlum/occlum:latest-ubuntu20.04
All the following are running in the above container.
This step prepares all the content and builds the Occlum images.
# ./build_content.sh
Below are the two Occlum images.
- occlum_server
It works as the role of GRPC-RATLS server.
The primary content are from toolchain ra_tls
.
- occlum_tf
It works as the role of Init-RA and tensorflow-serving.
For the tensorflow-serving, there is no need rebuild from source, just use the one from docker image tensorflow/serving
. This example combines the docker image export and Occlum copy_bom
tool to generate a workable tensorflow-serving Occlum image. Details please refer to the script build_content.sh
.
Once all content ready, runtime container images build are good to go.
This step builds two container images, init_ra_server
and tf_demo
.
# ./build_container_images.sh <registry> <tag>
<registry>
means the docker registry prefix for the generated container images.
For example, using demo
here will generate container images:
demo/init_ra_server
demo/tf_demo
To minimize the size of the container images, only necessary SGX libraries and runtime Occlum RPM got installed, plus the packaged Occlum image. The build script and Dockerfile are in directory container
.
Once the container images are ready, demo could be started in the host.
Script run_container.sh
is provided to run the container images one by one.
$ ./run_container.sh -h
Run container images init_ra_server and tf_demo on background.
usage: run_container.sh [OPTION]...
-s <GRPC Server Domain> default localhost.
-p <GRPC Server port> default 50051.
-u <PCCS URL> default https://localhost:8081/sgx/certification/v3/.
-r <registry prefix> the registry for this demo container images.
-g <image tag> the container images tag, default it is "latest".
-h <usage> usage help
For example, using PCCS service from aliyun.
$ sudo ./run_container.sh -s localhost -p 50051 -u https://sgx-dcap-server.cn-shanghai.aliyuncs.com/sgx/certification/v3/ -r demo -g <tag>
If everything goes well, the tensorflow serving service would be available by GRPC secure channel localhost:9000
.
There is an example python based inference client
which sends a picture to tensorflow serving service to do inference with previously generated server certificate.
Install the dependent python packages.
# pip3 install -r client/requirements.txt
Start the inference request.
# cd client
# python3 resnet_client_grpc.py --server=localhost:9000 --crt ../ssl_configure/server.crt --image cat.jpg
If everything goes well, you will get the most likely predication class (int value, mapping could be found on https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt) and its probability.
Or you can use the demo client container image to do the inference test.
$ docker run --rm --network host <registry>/tf_demo_client:<tag> python3 resnet_client_grpc.py --server=localhost:9000 --crt server.crt --image cat.jpg