diff --git a/community/rfcs/24-05-17-001-OPEA-Deployment-Design.md b/community/rfcs/24-05-17-001-OPEA-Deployment-Design.md index 96b80493..817edb79 100644 --- a/community/rfcs/24-05-17-001-OPEA-Deployment-Design.md +++ b/community/rfcs/24-05-17-001-OPEA-Deployment-Design.md @@ -1,6 +1,6 @@ **Author** -[ftian1](https://github.com/ftian1), [lvliang-intel](https://github.com/lvliang-intel), [hshen14](https://github.com/hshen14), **Edit Here to add your id** +[ftian1](https://github.com/ftian1), [lvliang-intel](https://github.com/lvliang-intel), [hshen14](https://github.com/hshen14), [irisdingbj](https://github.com/irisdingbj), [KfreeZ](https://github.com/kfreez), [zhlsunshine](https://github.com/zhlsunshine) **Edit Here to add your id** **Status** @@ -8,7 +8,7 @@ Under Review **Objective** -Have a clear and good design for users to deploy their own GenAI applications on-premis or cloud environment. +Have a clear and good design for users to deploy their own GenAI applications on docker or Kubernetes environment. **Motivation** @@ -27,67 +27,155 @@ The proposed OPEA deployment workflow is For GenAI applications, we provides two interfaces for deployment -1. on-premis deployment by python +1. Docker deployment by python For example, constructing RAG (Retrieval-Augmented Generation) application with python code is something like: ```python from comps import MicroService, ServiceOrchestrator class ChatQnAService: -     def __init__(self, port=8080): -         self.service_builder = ServiceOrchestrator(port=port, endpoint="/v1/chatqna") -     def add_remote_service(self): -         embedding = MicroService( -             name="embedding", port=6000, expose_endpoint="/v1/embeddings", use_remote_service=True -         ) -         retriever = MicroService( -             name="retriever", port=7000, expose_endpoint="/v1/retrieval", use_remote_service=True -         ) -         rerank = MicroService( -             name="rerank", port=8000, expose_endpoint="/v1/reranking", use_remote_service=True -         ) -         llm = MicroService( -             name="llm", port=9000, expose_endpoint="/v1/chat/completions", use_remote_service=True -         ) -         self.service_builder.add(embedding).add(retriever).add(rerank).add(llm) -         self.service_builder.flow_to(embedding, retriever) -         self.service_builder.flow_to(retriever, rerank) -         self.service_builder.flow_to(rerank, llm) + def __init__(self, port=8080): + self.service_builder = ServiceOrchestrator(port=port, endpoint="/v1/chatqna") + def add_remote_service(self): + embedding = MicroService( + name="embedding", port=6000, expose_endpoint="/v1/embeddings", use_remote_service=True + ) + retriever = MicroService( + name="retriever", port=7000, expose_endpoint="/v1/retrieval", use_remote_service=True + ) + rerank = MicroService( + name="rerank", port=8000, expose_endpoint="/v1/reranking", use_remote_service=True + ) + llm = MicroService( + name="llm", port=9000, expose_endpoint="/v1/chat/completions", use_remote_service=True + ) + self.service_builder.add(embedding).add(retriever).add(rerank).add(llm) + self.service_builder.flow_to(embedding, retriever) + self.service_builder.flow_to(retriever, rerank) + self.service_builder.flow_to(rerank, llm) ``` -2. cloud deployment by yaml +2. Kubernetes deployment by yaml For example, constructing RAG (Retrieval-Augmented Generation) application with yaml is something like: ```yaml opea_micro_services: -   embedding: -     endpoint: /v1/embeddings + embedding: + endpoint: /v1/embeddings port: 6000 -   retrieval: -     endpoint: /v1/retrieval + retrieval: + endpoint: /v1/retrieval port: 7000 -   reranking: -     endpoint: /v1/reranking + reranking: + endpoint: /v1/reranking port: 8000 -   llm: -     endpoint: /v1/chat/completions + llm: + endpoint: /v1/chat/completions port: 9000 -   + opea_mega_service: -   port: 8080 -   mega_flow: -     - embedding >> retrieval >> reranking >> llm + port: 8080 + mega_flow: + - embedding >> retrieval >> reranking >> llm ``` -When user wants to deploy the GenAI application to clould environment, such yaml configuration file should be defined and coverted to `docker composer` or `kubernetes manifest` or `kubernetes helm chart` files. +When user wants to deploy the GenAI application to Kubernetes environment, such yaml configuration file should be defined and coverted to `docker composer`or [GenAI Microservice Connecto -(GMC)](https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector) Custom Resource files. +A sample GMC Custom Resource is like below: +```yaml + apiVersion: gmc.opea.io/v1alpha3 + kind: GMConnector + metadata: + labels: + app.kubernetes.io/name: gmconnector + name: chatqna + namespace: gmcsample + spec: + routerConfig: + name: router + serviceName: router-service + nodes: + root: + routerType: Sequence + steps: + - name: Embedding + internalService: + serviceName: embedding-service + config: + endpoint: /v1/embeddings + - name: TeiEmbedding + internalService: + serviceName: tei-embedding-service + config: + gmcTokenSecret: gmc-tokens + hostPath: /root/GMC/data/tei + modelId: BAAI/bge-base-en-v1.5 + endpoint: /embed + isDownstreamService: true + - name: Retriever + data: $response + internalService: + serviceName: retriever-redis-server + config: + RedisUrl: redis-vector-db + IndexName: rag-redis + tei_endpoint: tei-embedding-service + endpoint: /v1/retrieval + - name: VectorDB + internalService: + serviceName: redis-vector-db + isDownstreamService: true + - name: Reranking + data: $response + internalService: + serviceName: reranking-service + config: + tei_reranking_endpoint: tei-reranking-service + gmcTokenSecret: gmc-tokens + endpoint: /v1/reranking + - name: TeiReranking + internalService: + serviceName: tei-reranking-service + config: + gmcTokenSecret: gmc-tokens + hostPath: /root/GMC/data/rerank + modelId: BAAI/bge-reranker-large + endpoint: /rerank + isDownstreamService: true + - name: Llm + data: $response + internalService: + serviceName: llm-service + config: + tgi_endpoint: tgi-service + gmcTokenSecret: gmc-tokens + endpoint: /v1/chat/completions + - name: Tgi + internalService: + serviceName: tgi-service + config: + gmcTokenSecret: gmc-tokens + hostPath: /root/GMC/data/tgi + modelId: Intel/neural-chat-7b-v3-3 + endpoint: /generate + isDownstreamService: true +``` +There should be an available `gmconnectors.gmc.opea.io` CR named `chatqna` under the namespace `gmcsample`, showing below: + +```bash +$kubectl get gmconnectors.gmc.opea.io -n gmcsample +NAME URL READY AGE +chatqa http://router-service.gmcsample.svc.cluster.local:8080 Success 3m +``` + +And the user can access the application pipeline via the value of `URL` field in above. The whole deployment process illustrated by the diagram below. - Deployment Process + Deployment Process @@ -108,3 +196,4 @@ n/a - [ ] k8s GMC with istio + diff --git a/community/rfcs/opea_deploy_process.png b/community/rfcs/opea_deploy_process_v0.png similarity index 100% rename from community/rfcs/opea_deploy_process.png rename to community/rfcs/opea_deploy_process_v0.png diff --git a/community/rfcs/opea_deploy_process_v1.png b/community/rfcs/opea_deploy_process_v1.png new file mode 100644 index 00000000..6b5cf4af Binary files /dev/null and b/community/rfcs/opea_deploy_process_v1.png differ