From 6f2188f444d620df1e33483255c512d490272877 Mon Sep 17 00:00:00 2001 From: ZePan110 Date: Mon, 20 Jan 2025 09:13:08 +0800 Subject: [PATCH 1/7] Optimize output prompt words (#271) The original output was unclear, this optimization ensures that the user can find the source of the path check CI failure. Signed-off-by: ZePan110 --- .github/workflows/pr-path-detection.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/pr-path-detection.yml b/.github/workflows/pr-path-detection.yml index aceb81d6..0dfca05d 100644 --- a/.github/workflows/pr-path-detection.yml +++ b/.github/workflows/pr-path-detection.yml @@ -143,7 +143,7 @@ jobs: fi fi else - echo "$check_path does not exist $png_line" + echo "Invalid reference path from $refer_path, reference path: $(echo $png_line | cut -d ']' -f2)" fail="TRUE" fi done From c145d47b2c8944c51fc3be29486b079a1e23bb06 Mon Sep 17 00:00:00 2001 From: Yi Yao Date: Fri, 24 Jan 2025 13:36:19 +0800 Subject: [PATCH 2/7] Update release notes/v1.2 based on dmsuehir's comments Co-authored-by: Dina Suehiro Jones --- release_notes/v1.2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release_notes/v1.2.md b/release_notes/v1.2.md index 15bad56c..a28b2118 100644 --- a/release_notes/v1.2.md +++ b/release_notes/v1.2.md @@ -53,7 +53,7 @@ Additionally, OPEA supports manual deployment on virtual servers across `AWS`, ` #### Enhanced GenAI Examples - ChatQnA: Enabled [embedding and reranking on vLLM](https://github.com/opea-project/GenAIExamples/issues/1203), and [Jaeger UI and OpenTelemetry tracing](https://github.com/opea-project/GenAIExamples/pull/1316) for TGI serving on HPU. - AgentQnA: Added [SQL worker agent](https://github.com/opea-project/GenAIExamples/pull/1370) and introduced a [Svelte-based GUI](https://github.com/opea-project/GenAIExamples/pull/1389) for ChatCompletion API for non-streaming interactions. -- MultimodalQnA: Supported [PDF](https://github.com/opea-project/GenAIExamples/pull/1381) and [audio](https://github.com/opea-project/GenAIExamples/pull/1225) inputs. +- MultimodalQnA: Added support for [PDF](https://github.com/opea-project/GenAIExamples/pull/1381) ingestion, and [image](https://github.com/opea-project/GenAIExamples/pull/1381)/[audio](https://github.com/opea-project/GenAIExamples/pull/1225) queries. - EdgeCraftRAG: Supported image/url data retrieval and display, display of LLM-used context sources in UI, pipeline remove operation in RESTful API and UI, RAG pipeline performance benchmark and display in UI. ([#GenAIExamples/1324](https://github.com/opea-project/GenAIExamples/pull/1324)) - DocSum: Added [URL summary option](https://github.com/opea-project/GenAIExamples/pull/1248) to Gradio-based UI. - DocIndexRetriever: Add the pipeline without Reranking. From 3586d4e1e14ae6900089352baf622de1e7a99392 Mon Sep 17 00:00:00 2001 From: Neo Zhang Jianyu Date: Fri, 24 Jan 2025 14:32:53 +0800 Subject: [PATCH 3/7] support building history release online doc, provide the help for process to build histoy release (#285) Co-authored-by: ZhangJianyu --- scripts/hist_rel.sh | 98 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 98 insertions(+) create mode 100755 scripts/hist_rel.sh diff --git a/scripts/hist_rel.sh b/scripts/hist_rel.sh new file mode 100755 index 00000000..6b4773b7 --- /dev/null +++ b/scripts/hist_rel.sh @@ -0,0 +1,98 @@ +#!/bin/bash + +if [[ $# < 1 ]]; then + echo "Miss parameter" + echo "$0 [version]" + echo " like: 1.2, which is defined in html_context.versions of conf.py" + echo "" + echo "How to build online doc for history release?" + echo "" + echo " Prepare: add tag in all repos with format 'v*.*', like v1.2" + echo "" + echo " 1. Add history release version (like 1.2) in html_context.versions of conf.py." + echo " 2. Execute this script with release version (like $0 1.2). Build the history release document and output to release folder, like 1.2." + echo " 3. Execute scripts\build.sh. Update the 'latest' to add new release link in 'Document Versions'." + echo " 4. Git push the content of opea-project.github.io." + exit 1 +fi + +version=$1 +TAG="v${version}" + +echo "TAG=${TAG}" +pwd +cd scripts + +#add "f" to force create env +bash setup_env.sh $1 +cd ../.. + +ENV_NAME=env_sphinx +pwd +source $ENV_NAME/bin/activate + +#clone repos +for repo_name in docs GenAIComps GenAIEval GenAIExamples GenAIInfra opea-project.github.io; do + echo "prepare for $repo_name" + + if [[ "$1" == "f" ]]; then + echo "force to clone rep ${repo_name}" + rm -rf ${repo_name} + fi + + if [ ! -d ${repo_name} ]; then + URL=https://github.com/opea-project/${repo_name}.git + echo "git clone $URL" + git clone $URL + retval=$? + if [ $retval -ne 0 ]; then + echo "git clone ${repo_name} is wrong, try again!" + rm -rf ${repo_name} + exit 1 + fi + sleep 10 + else + echo "repo ${repo_name} exists, skipping cloning" + fi + cd ${repo_name} + echo "checkout ${TAG} in ${repo_name}" + pwd + git checkout ${TAG} + cd .. +done + +echo "Build HTML" +cd docs +make clean +make DOC_TAG=release RELEASE=${version} html +#make DOC_TAG=release RELEASE=${version} publish +retval=$? +echo "result = $retval" +if [ $retval -ne 0 ]; then + echo "make html is error" + exit 1 +else + echo "Done" +fi + +if [ ! -d _build/html ]; then + echo "Build online doc is wrong!" + exit 1 +else + echo "Build online doc done!" +fi + +echo "Update github.io" + +RELEASE_FOLDER=../opea-project.github.io +BUILDDIR=_build +PUBLISHDIR=${RELEASE_FOLDER}/${version} + +echo "Clear all content in ${PUBLISHDIR}" + +mkdir -p ${PUBLISHDIR} +rm -rf ${PUBLISHDIR}/* +echo "Copy html content to ${PUBLISHDIR}" +cp -r ${BUILDDIR}/html/* ${PUBLISHDIR} + +echo "Copied html content to ${PUBLISHDIR}" From 2c66be9daa5ec91f736df1a2f18d6bf27dc9664d Mon Sep 17 00:00:00 2001 From: sri-intel <108247623+srinarayan-srikanthan@users.noreply.github.com> Date: Fri, 24 Jan 2025 01:33:17 -0500 Subject: [PATCH 4/7] codegen xeon update (#282) * codegen xeon update Signed-off-by: Srinarayan Srikanthan * typo fix Signed-off-by: Srinarayan Srikanthan --------- Signed-off-by: Srinarayan Srikanthan Co-authored-by: Ying Hu --- examples/CodeGen/CodeGen_Guide.rst | 3 +- examples/CodeGen/deploy/xeon.md | 369 +++++++++++++++++++++++++++++ 2 files changed, 371 insertions(+), 1 deletion(-) create mode 100644 examples/CodeGen/deploy/xeon.md diff --git a/examples/CodeGen/CodeGen_Guide.rst b/examples/CodeGen/CodeGen_Guide.rst index 2b1da1bc..33a8a6af 100644 --- a/examples/CodeGen/CodeGen_Guide.rst +++ b/examples/CodeGen/CodeGen_Guide.rst @@ -41,5 +41,6 @@ Here are some deployment options, depending on your hardware and environment: .. toctree:: :maxdepth: 1 - + + Intel® Xeon® Scalable processor Gaudi AI Accelerator diff --git a/examples/CodeGen/deploy/xeon.md b/examples/CodeGen/deploy/xeon.md new file mode 100644 index 00000000..7aff3718 --- /dev/null +++ b/examples/CodeGen/deploy/xeon.md @@ -0,0 +1,369 @@ +# Single node on-prem deployment with TGI on Xeon + +This deployment section covers single-node on-prem deployment of the CodeGen +example with OPEA comps to deploy using the TGI service. We will be showcasing how +to build an e2e CodeGen solution with the Qwen2.5-Coder-7B-Instruct, +deployed on Intel® Xeon® Scalable processors. To quickly learn about OPEA in just 5 minutes and set up the required hardware and software, please follow the instructions in the +[Getting Started](https://opea-project.github.io/latest/getting-started/README.html) section. + +## Overview + +The CodeGen use case uses a single microservice called LLM. In this tutorial, we +will walk through the steps on how on enable it from OPEA GenAIComps to deploy on +a single node TGI megaservice solution. + +The solution is aimed to show how to use the Qwen2.5-Coder-7B-Instruct model on the Intel® +Xeon® Scalable processors. We will go through how to setup docker containers to start +the microservice and megaservice. The solution will then take text input as the +prompt and generate code accordingly. It is deployed with a UI with 2 modes to +choose from: + +1. Basic UI +2. React-Based UI + +The React-based UI is optional, but this feature is supported in this example if you +are interested in using it. + +Below is the list of content we will be covering in this tutorial: + +1. Prerequisites +2. Prepare (Building / Pulling) Docker images +3. Use case setup +4. Deploy the use case +5. Interacting with CodeGen deployment + +## Prerequisites + +The first step is to clone the GenAIExamples and GenAIComps. GenAIComps are +fundamental necessary components used to build examples you find in +GenAIExamples and deploy them as microservices. + +```bash +git clone https://github.com/opea-project/GenAIComps.git +git clone https://github.com/opea-project/GenAIExamples.git +export TAG=1.2 +``` + +The examples utilize model weights from HuggingFace and langchain. + +Setup your [HuggingFace](https://huggingface.co/) account and generate +[user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). + +Setup the HuggingFace token +``` +export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" +``` + +The example requires you to set the `host_ip` to deploy the microservices on +endpoint enabled with ports. Set the host_ip env variable +``` +export host_ip=$(hostname -I | awk '{print $1}') +``` + +Make sure to setup Proxies if you are behind a firewall +``` +export no_proxy=${your_no_proxy},$host_ip +export http_proxy=${your_http_proxy} +export https_proxy=${your_http_proxy} +``` + +## Prepare (Building / Pulling) Docker images + +This step will involve building/pulling relevant docker +images with step-by-step process along with sanity check in the end. For +CodeGen, the following docker images will be needed: LLM with TGI. +Additionally, you will need to build docker images for the +CodeGen megaservice, and UI (React UI is optional). In total, +there are **3 required docker images** and an optional docker image. + +### Build/Pull Microservice image + +::::::{tab-set} + +:::::{tab-item} Pull +:sync: Pull + +If you decide to pull the docker containers and not build them locally, +you can proceed to the next step where all the necessary containers will +be pulled in from dockerhub. + +::::: +:::::{tab-item} Build +:sync: Build + +From within the `GenAIComps` folder, checkout the release tag. +``` +cd GenAIComps +git checkout tags/v${TAG} +``` + +#### Build LLM Image + +```bash +docker build -t opea/llm-textgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile . +``` + +### Build Mega Service images + +The Megaservice is a pipeline that channels data through different +microservices, each performing varied tasks. The LLM microservice and +flow of data are defined in the `codegen.py` file. You can also add or +remove microservices and customize the megaservice to suit your needs. + +Build the megaservice image for this use case + +```bash +cd .. +cd GenAIExamples +git checkout tags/v${TAG} +cd CodeGen +``` + +```bash +docker build -t opea/codegen:${TAG} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . +cd ../.. +``` + +### Build the UI Image + +You can build 2 modes of UI + +*Svelte UI* + +```bash +cd GenAIExamples/CodeGen/ui/ +docker build -t opea/codegen-ui:${TAG} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . +cd ../../.. +``` + +*React UI (Optional)* +If you want a React-based frontend. + +```bash +cd GenAIExamples/CodeGen/ui/ +docker build --no-cache -t opea/codegen-react-ui:${TAG} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react . +cd ../../.. +``` + +### Sanity Check +Check if you have the following set of docker images by running the command `docker images` before moving on to the next step: + +* `opea/llm-tgi:${TAG}` +* `opea/codegen:${TAG}` +* `opea/codegen-ui:${TAG}` +* `opea/codegen-react-ui:${TAG}` (optional) + +::::: +:::::: + +## Use Case Setup + +The use case will use the following combination of GenAIComps and tools + +|Use Case Components | Tools | Model | Service Type | +|---------------- |--------------|-----------------------------|-------| +|LLM | TGI | meta-llama/CodeLlama-7b-hf | OPEA Microservice | +|UI | | NA | Gateway Service | + +Tools and models mentioned in the table are configurable either through the +environment variables or `compose.yaml` file. + +Set the necessary environment variables to setup the use case case by running the `set_env.sh` script. +Here is where the environment variable `LLM_MODEL_ID` is set, and you can change it to another model +by specifying the HuggingFace model card ID. + +```bash +cd GenAIExamples/CodeGen/docker_compose/ +source ./set_env.sh +cd ../../.. +``` + +## Deploy the Use Case + +In this tutorial, we will be deploying via docker compose with the provided +YAML file. The docker compose instructions should be starting all the +above mentioned services as containers. + +```bash +cd GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon +docker compose up -d +``` + + +### Checks to Ensure the Services are Running +#### Check Startup and Env Variables +Check the start up log by running `docker compose logs` to ensure there are no errors. +The warning messages print out the variables if they are **NOT** set. + +Here are some sample messages if proxy environment variables are not set: + + WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. + +#### Check the Container Status + +Check if all the containers launched via docker compose has started. + +The CodeGen example starts 4 docker containers. Check that these docker +containers are all running, i.e, all the containers `STATUS` are `Up`. +You can do this with the `docker ps -a` command. + +``` +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +bbd235074c3d opea/codegen-ui:latest "docker-entrypoint.s…" About a minute ago Up About a minute 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp codegen-xeon-ui-server +8d3872ca66fa opea/codegen:latest "python codegen.py" About a minute ago Up About a minute 0.0.0.0:7778->7778/tcp, :::7778->7778/tcp codegen-xeom-backend-server +b9fc39f51cdb opea/llm-tgi:latest "bash entrypoint.sh" About a minute ago Up About a minute 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-xeon-server +39994e007f15 ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu "text-generation-lau…" About a minute ago Up About a minute 0.0.0.0:8028->80/tcp, :::8028->80/tcp tgi-server +``` + +## Interacting with CodeGen for Deployment + +This section will walk you through the different ways to interact with +the microservices deployed. After a couple minutes, rerun `docker ps -a` +to ensure all the docker containers are still up and running. Then proceed +to validate each microservice and megaservice. + +### TGI Service + +```bash +curl http://${host_ip}:8028/generate \ + -X POST \ + -d '{"inputs":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","parameters":{"max_new_tokens":256, "do_sample": true}}' \ + -H 'Content-Type: application/json' +``` + +Here is the output: + +``` +{"generated_text":"Start with a user story. We will add story tests later. In this case, we'll choose a story about adding a TODO:\n ```ruby\n as a user,\n i want to add a todo,\n so that i can get a todo list.\n\n conformance:\n - a new todo is added to the list\n - if the todo text is empty, raise an exception\n ```\n\n1. Write the first test:\n ```ruby\n feature Testing the addition of a todo to the list\n\n given a todo list empty list\n when a user adds a todo\n the todo should be added to the list\n\n inputs:\n when_values: [[\"A\"]]\n\n output validations:\n - todo_list contains { text:\"A\" }\n ```\n\n1. Write the first step implementation in any programming language you like. In this case, we will choose Ruby:\n ```ruby\n def add_"} +``` + +### LLM Microservice + +```bash +curl http://${host_ip}:9000/v1/chat/completions\ + -X POST \ + -d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \ + -H 'Content-Type: application/json' +``` + +The output is given one character at a time. It is too long to show +here but the last item will be +``` +data: [DONE] +``` + +### MegaService + +```bash +curl http://${host_ip}:7778/v1/codegen -H "Content-Type: application/json" -d '{ + "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception." + }' +``` + +The output is given one character at a time. It is too long to show +here but the last item will be +``` +data: [DONE] +``` + +## Launch UI +### Svelte UI +To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below: +```bash + codegen-xeon-ui-server: + image: ${REGISTRY:-opea}/codegen-ui:${TAG:-latest} + ... + ports: + - "5173:5173" +``` + +### React-Based UI (Optional) +To access the React-based frontend, modify the UI service in the `compose.yaml` file. Replace `codegen-xeon-ui-server` service with the codegen-xeon-react-ui-server service as per the config below: +```bash +codegen-xeon-react-ui-server: + image: ${REGISTRY:-opea}/codegen-react-ui:${TAG:-latest} + container_name: codegen-xeon-react-ui-server + environment: + - no_proxy=${no_proxy} + - https_proxy=${https_proxy} + - http_proxy=${http_proxy} + - APP_CODE_GEN_URL=${BACKEND_SERVICE_ENDPOINT} + depends_on: + - codegen-xeon-backend-server + ports: + - "5174:80" + ipc: host + restart: always +``` +Once the services are up, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below: +```bash + codegen-xeon-react-ui-server: + image: ${REGISTRY:-opea}/codegen-react-ui:${TAG:-latest} + ... + ports: + - "80:80" +``` + +## Check Docker Container Logs + +You can check the log of a container by running this command: + +```bash +docker logs -t +``` + +You can also check the overall logs with the following command, where the +`compose.yaml` is the megaservice docker-compose configuration file. + +Assumming you are still in this directory `GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon`, +run the following command to check the logs: +```bash +docker compose -f compose.yaml logs +``` + +View the docker input parameters in `./CodeGen/docker_compose/intel/cpu/xeon/compose.yaml` + +```yaml + tgi-service: + image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu + container_name: tgi-server + ports: + - "8028:80" + volumes: + - "./data:/data" + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + HABANA_VISIBLE_DEVICES: all + OMPI_MCA_btl_vader_single_copy_mechanism: none + HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} + runtime: habana + cap_add: + - SYS_NICE + ipc: host + command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048 +``` + +The input `--model-id` is `${LLM_MODEL_ID}`. Ensure the environment variable `LLM_MODEL_ID` +is set correctly. Check spelling. Whenever this is changed, restart the containers to use +the newly selected model. + + +## Stop the services + +Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: +``` +docker compose down +``` From 03ff4fb7c3aa4310e41c82ddfff4ad63fbaab406 Mon Sep 17 00:00:00 2001 From: devpramod Date: Fri, 24 Jan 2025 01:35:20 -0500 Subject: [PATCH 5/7] Add deployment example for CodeTrans (#281) * Add deployment example for CodeTrans Signed-off-by: devpramod * fix doc build Signed-off-by: devpramod * fix typos, grammar - xeon Signed-off-by: devpramod * fix typos, grammar - gaudi Signed-off-by: devpramod --------- Signed-off-by: devpramod Co-authored-by: Ying Hu --- examples/CodeTrans/CodeTrans_Guide.rst | 50 ++++ examples/CodeTrans/deploy/gaudi.md | 396 +++++++++++++++++++++++++ examples/CodeTrans/deploy/xeon.md | 387 ++++++++++++++++++++++++ examples/index.rst | 1 + 4 files changed, 834 insertions(+) create mode 100644 examples/CodeTrans/CodeTrans_Guide.rst create mode 100644 examples/CodeTrans/deploy/gaudi.md create mode 100644 examples/CodeTrans/deploy/xeon.md diff --git a/examples/CodeTrans/CodeTrans_Guide.rst b/examples/CodeTrans/CodeTrans_Guide.rst new file mode 100644 index 00000000..b56a1791 --- /dev/null +++ b/examples/CodeTrans/CodeTrans_Guide.rst @@ -0,0 +1,50 @@ +.. _CodeTrans_Guide: + +Code Translation Sample Guide +############################## + +.. note:: This guide is in its early development and is a work-in-progress with + placeholder content. + +Overview +******** + +This example showcases a code translation system that converts code from one programming language to another while preserving the original logic and functionality. The primary component is the CodeTrans MegaService, which encompasses an LLM microservice that performs the actual translation. +A lightweight Gateway service and a User Interface allow users to submit their source code in a given language and receive the translated output in another language. + +Purpose +******* +* **Enable code conversion and modernization**: Developers can seamlessly migrate legacy code to newer languages or frameworks, leveraging modern best practices without having to rewrite large code bases from scratch. + +* **Facilitate multi-language support**: By providing a system that understands multiple programming languages, organizations can unify their development approaches and reduce the barrier to adopting new languages. + +* **Improve developer productivity**: Automated code translation drastically reduces manual, time-consuming porting efforts, allowing developers to focus on higher-level tasks like feature design and optimization. + +How It Works +************ + +.. figure:: /GenAIExamples/CodeTrans/assets/img/code_trans_architecture.png + :alt: ChatQnA Architecture Diagram + +1. A user specifies the source language, the target language, and the snippet of code to be translated. This request is handled by the front-end UI or via a direct API call. + + +2. The user’s request is sent to the CodeTrans Gateway, which orchestrates the call to the LLM MicroService. The gateway handles details like constructing prompts and managing responses. + + +3. The large language model processes the user’s code snippet, analyzing syntax and semantics before generating an equivalent snippet in the target language. + +4. The gateway formats the model’s output and returns the translated code to the user, either via an API response or rendered within the UI. + + +Deployment +********** +Here are some deployment options, depending on your hardware and environment: + +Single Node ++++++++++++++++ +.. toctree:: + :maxdepth: 1 + + Xeon Scalable Processor + Gaudi diff --git a/examples/CodeTrans/deploy/gaudi.md b/examples/CodeTrans/deploy/gaudi.md new file mode 100644 index 00000000..ce0a3026 --- /dev/null +++ b/examples/CodeTrans/deploy/gaudi.md @@ -0,0 +1,396 @@ +# # Single node on-prem deployment with TGI on Gaudi AI Accelerator + +This deployment section covers the single-node on-prem deployment of the CodeTrans example with OPEA comps using the Text Generation service based on TGI. The solution demonstrates building a code translation service using `mistralai/Mistral-7B-Instruct-v0.3` model deployed on the Intel® Gaudi® AI Accelerator. To quickly learn about OPEA in just 5 minutes and set up the required hardware and software, please follow the instructions in the [Getting Started](https://opea-project.github.io/latest/getting-started/README.html) section. + +## Overview + +In this tutorial, we will walk through how to enable the following microservices from OPEA GenAIComps to deploy a single node Text Generation megaservice solution for code translation: + +1. LLM with TGI +2. Nginx Service + +The solution demonstrates using the Mistral-7B-Instruct-v0.3 model on the Intel® Gaudi® AI Accelerator for translating code between different programming languages. We will go through how to set up docker containers to start the microservices and megaservice. Users can input code in one programming language and get it translated into another language. The solution is deployed with a basic UI accessible through both direct port and Nginx. + +## Prerequisites + +The first step is to clone the GenAIExamples and GenAIComps. GenAIComps are fundamental components used to build examples you find in GenAIExamples and deploy them as microservices. + +``` +git clone https://github.com/opea-project/GenAIComps.git +git clone https://github.com/opea-project/GenAIExamples.git +``` +The examples utilize model weights from HuggingFace. +Set up your [HuggingFace](https://huggingface.co/) account and +apply for model access to `Mistral-7B-Instruct-v0.3` which is a gated model. To obtain access for using the model, visit the [model site](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) and click on `Agree and access repository`. + +Next, generate [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). + +Setup the HuggingFace token + +``` +export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" +``` + +The example requires you to set the `host_ip` to deploy the microservices on the endpoint enabled with ports. Set the host_ip env variable. + +``` +export host_ip=$(hostname -I | awk '{print $1}') +``` + +Make sure to set Proxies if you are behind a firewall. + +```bash +export no_proxy=${your_no_proxy},$host_ip +export http_proxy=${your_http_proxy} +export https_proxy=${your_http_proxy} +``` + +## Prepare (Building / Pulling) Docker images + +This step involves either building or pulling four required Docker images. Each image serves a specific purpose in the CodeTrans architecture. + +::::::{tab-set} + +:::::{tab-item} Pull +:sync: Pull + +If you decide to pull the docker containers and not build them locally, you can proceed to [Use Case Setup](#use-case-setup). where all the necessary containers will be pulled in from the docker hub. +::::: +:::::{tab-item} Build +:sync: Build + +From within the `GenAIComps` folder, check out the release tag. +``` +cd GenAIComps +git checkout tags/v1.2 +``` + +### Build LLM Image + +First, build the Text Generation LLM service image: + +```bash +docker  build  -t  opea/llm-textgen:latest  --build-arg  https_proxy=$https_proxy  \ +--build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile . +``` + +>**Note**: `llm-textgen` uses Text Generation Inference (TGI) which is pulled automatically via the docker compose file in the next steps. + +### Build Nginx Image + +Build the Nginx service image that will handle routing: + +```bash +docker  build  -t  opea/nginx:latest  --build-arg  https_proxy=$https_proxy  \ +--build-arg http_proxy=$http_proxy -f comps/third_parties/nginx/src/Dockerfile . + +``` + +### Build MegaService Image + +The Megaservice is a pipeline that channels data through different microservices, each performing varied tasks. We define the different microservices and the flow of data between them in the  `code_translation.py` file, in this example, CodeTrans MegaService formats the input code and language parameters into a prompt template, sends it to the LLM microservice, and returns the translated code.. You can also add newer or remove some microservices and customize the megaservice to suit the needs. + +Build the megaservice image for this use case. + +```bash +git  clone  https://github.com/opea-project/GenAIExamples.git +cd  GenAIExamples/CodeTrans +git checkout tags/v1.2 +``` +``` +docker  build  -t  opea/codetrans:latest  --build-arg  https_proxy=$https_proxy  \ +--build-arg http_proxy=$http_proxy -f Dockerfile . +``` + +### Build UI Image + +Build the UI service image: + +```bash +cd  GenAIExamples/CodeTrans/ui +docker  build  -t  opea/codetrans-ui:latest  --build-arg  https_proxy=$https_proxy  \ +--build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . +``` + +### Sanity Check + +Before proceeding, verify that you have all required Docker images by running `docker images`. You should see the following images: + +* opea/llm-textgen:latest +* opea/codetrans:latest +* opea/codetrans-ui:latest +* opea/nginx:latest + +::::: +:::::: + +## Use Case Setup + +The use case will use the following combination of the GenAIComps with the tools. + +| Use Case Components | Tools         | Model                                | Service Type         | +|---------------------|---------------|--------------------------------------|----------------------| +| LLM                 | TGI           | mistralai/Mistral-7B-Instruct-v0.3   | OPEA Microservice    | +| UI                  |               | NA                                   | Gateway Service      | +| Ingress             | Nginx         | NA                                   | Gateway Service      | + +Tools and models mentioned in the table are configurable either through the environment variable or `compose.yaml` + +Set the necessary environment variables to set the use case. + +```bash +cd GenAIExamples/CodeTrans/docker_compose +git checkout tags/v1.2 +source ./set_env.sh +``` +Set up a desired port for Nginx: +```bash +# Example: NGINX_PORT=80 +export  NGINX_PORT=${your_nginx_port} +``` + +## Deploy the use case + +In this tutorial, we will be deploying via docker compose with the provided YAML file. The docker compose instructions should start all the above-mentioned services as containers. + +```bash +cd intel/hpu/gaudi +docker compose up -d +``` + +### Validate microservice + +#### Check Env Variables + +Check the startup log by `docker compose -f ./compose.yaml logs`. +The warning messages print out the variables if they are **NOT** set. + +ubuntu@xeon-vm:~/GenAIExamples/CodeTrans/docker_compose/intel/cpu/xeon$ docker compose -f ./compose.yaml up -d + +WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. +WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. + +#### Check the container status + +Check if all the containers launched via docker compose has started +For example, the CodeTrans example starts 5 docker (services), check these docker containers are all running, i.e., all the containers `STATUS` are `Up`. + +To do a quick sanity check, try `docker ps -a` to see if all the containers are running. + +``` +CONTAINER ID   IMAGE                                 COMMAND                  CREATED         STATUS                   PORTS                                       NAMES +a6d83e9fb44f   opea/nginx:latest                     "/docker-entrypoint.…"   8 minutes ago   Up 26 seconds            0.0.0.0:80->80/tcp, :::80->80/tcp           codetrans-gaudi-nginx-server +42af29c8a8b6   opea/codetrans-ui:latest              "docker-entrypoint.s…"   8 minutes ago   Up 27 seconds            0.0.0.0:5173->5173/tcp, :::5173->5173/tcp   codetrans-gaudi-ui-server +d995d76e7b52   opea/codetrans:latest                 "python code_transla…"   8 minutes ago   Up 27 seconds            0.0.0.0:7777->7777/tcp, :::7777->7777/tcp   codetrans-gaudi-backend-server +f40e954b107e   opea/llm-textgen:latest               "bash entrypoint.sh"     8 minutes ago   Up 27 seconds            0.0.0.0:9000->9000/tcp, :::9000->9000/tcp   llm-textgen-gaudi-server +0eade4fe0637   ghcr.io/huggingface/tgi-gaudi:2.0.6   "text-generation-lau…"   8 minutes ago   Up 8 minutes (healthy)   0.0.0.0:8008->80/tcp, :::8008->80/tcp       codetrans-tgi-service + +``` + + +## Interacting with CodeTrans deployment + +In this section, you will walk through the different ways to interact with the deployed microservices. + +### TGI Service + +In the first startup, this service will take more time to download the model files. After it's finished, the service will be ready. + +Try the command below to check whether the LLM serving is ready. +``` +docker logs ${CONTAINER_ID} | grep Connected +``` +If the service is ready, you will get a response like below. + +``` +2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected +``` +```bash +curl  http://${host_ip}:8008/generate  \ +-X POST \ +-d  '{"inputs":" ### System: Please translate the following Golang codes into Python codes. ### Original codes: '\'''\'''\''Golang \npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n '\'''\'''\'' ### Translated codes:","parameters":{"max_new_tokens":17, "do_sample": true}}'  \ +-H 'Content-Type: application/json' +``` + +TGI service generates text for the input prompt. Here is the expected result from TGI: +  +``` +{"generated_text":"'''Python\nprint(\"Hello, World!\")"} +``` +**NOTE**: After launching TGI, it takes a few minutes for the TGI server to load the LLM model and warm up. + +### Text Generation Microservice + +This service handles the core language model operations. You can validate it's working by sending a direct request to translate a simple "Hello World" program from Go to Python: + +```bash +curl http://${host_ip}:9000/v1/chat/completions \ + -X POST \ +  -d '{ + "query": "### System: Please translate the following Golang codes into Python codes. ### Original codes: ```Golang\npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}\n``` ### Translated codes:", + "max_tokens": 17 + }' \ + -H 'Content-Type: application/json' +``` +The expected output is as shown below: +``` +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737123223,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737123223,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"``"}],"created":1737123223,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"`"}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"Py"}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"thon"}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"print"}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"(\""}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"Hello"}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":","}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" World"}],"created":1737123225,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"!"}],"created":1737123225,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\")"}],"created":1737123225,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737123225,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"``"}],"created":1737123225,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"length","index":0,"logprobs":null,"text":"`"}],"created":1737123225,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":{"completion_tokens":17,"prompt_tokens":58,"total_tokens":75,"completion_tokens_details":null,"prompt_tokens_details":null}} +data: [DONE] +``` + +### MegaService + +The CodeTrans megaservice orchestrates the entire translation process. Test it with a simple code translation request: + +```bash +curl  http://${host_ip}:7777/v1/codetrans  \ +-H "Content-Type: application/json" \ +-d  '{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}' +``` +When you send this request, you’ll receive a streaming response from the MegaService. It will appear line by line like so: +``` +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"        "}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" Python"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"        "}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" print"}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"(\""}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"Hello"}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":","}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" World"}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"!"}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\")"}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121309,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"        "}],"created":1737121309,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" ```"}],"created":1737121309,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"eos_token","index":0,"logprobs":null,"text":""}],"created":1737121309,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":{"completion_tokens":18,"prompt_tokens":74,"total_tokens":92,"completion_tokens_details":null,"prompt_tokens_details":null}} +data: [DONE] +``` +Within this output, each line contains JSON that includes a `text` field. Once you combine the `text` values in order, you’ll reconstruct the translated code. In this example, the final code is simply: +``` +print("Hello, World!") +``` +This demonstrates how the MegaService streams each segment of the response, which you can then piece together to get the complete translation. + +### Nginx Service + +The Nginx service acts as a reverse proxy and load balancer for the application. You can verify it's properly routing requests by sending the same translation request through Nginx: + +```bash +curl  http://${host_ip}:${NGINX_PORT}/v1/codetrans  \ +-H "Content-Type: application/json" \ +-d  '{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}' +``` +The expected output is the same as the MegaService output. + +Each of these endpoints should return a successful response with the translated Python code. If any of these tests fail, check the corresponding service logs for more details. + +## Check the docker container logs + +Following is an example of debugging using Docker logs: + +Check the log of the container using: + +`docker logs -t` + +Check the log using `docker logs 0eade4fe0637 -t`. + +``` +2024-06-05T01:30:30.695934928Z error: a value is required for '--model-id ' but none was supplied + +2024-06-05T01:30:30.697123534Z + +2024-06-05T01:30:30.697148330Z For more information, try '--help'. +``` +The log indicates the `MODEL_ID` is not set. + +View the docker input parameters in `./CodeTrans/docker_compose/intel/hpu/gaudi/compose.yaml` +``` +tgi-service: + image: ghcr.io/huggingface/tgi-gaudi:2.0.6 + container_name: codetrans-tgi-service + ports: + - "8008:80" + volumes: + - "./data:/data" + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + HABANA_VISIBLE_DEVICES: all + OMPI_MCA_btl_vader_single_copy_mechanism: none + HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} + ENABLE_HPU_GRAPH: true + LIMIT_HPU_GRAPH: true + USE_FLASH_ATTENTION: true + FLASH_ATTENTION_RECOMPUTE: true + healthcheck: + test: ["CMD-SHELL", "sleep 500 && exit 0"] + interval: 1s + timeout: 505s + retries: 1 + runtime: habana + cap_add: + - SYS_NICE + ipc: host + command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048 +``` +The input `MODEL_ID` is `${LLM_MODEL_ID}` + +Check environment variable `LLM_MODEL_ID` is set correctly, and spelled correctly. + +Set the `LLM_MODEL_ID` then restart the containers. + +You can also check overall logs with the following command, where the +`compose.yaml` is the MegaService docker-compose configuration file. +``` +docker compose -f ./docker_compose/intel/hpu/gaudi/compose.yaml logs +``` +## Launch UI + +### Basic UI + +To access the frontend user interface (UI), the primary method is through the Nginx reverse proxy service. Open the following URL in your browser: `http://${host_ip}:${NGINX_PORT}`. This provides a stable and secure access point to the UI. The value of `${NGINX_PORT}` has been defined in the earlier steps. + +Alternatively, you can access the UI directly using its internal port. This method bypasses the Nginx proxy and can be used for testing or troubleshooting purposes. To access the UI directly, open the following URL in your browser: http://${host_ip}:5173. By default, the UI runs on port 5173. + +If you need to change the port used to access the UI directly (not through Nginx), modify the ports section of the `compose.yaml` file: + +``` +codetrans-gaudi-ui-server: + image: ${REGISTRY:-opea}/codetrans-ui:${TAG:-latest} + container_name: codetrans-gaudi-ui-server + depends_on: + - codetrans-gaudi-backend-server + ports: + - "YOUR_HOST_PORT:5173" # Change YOUR_HOST_PORT to your desired port +``` +Remember to replace YOUR_HOST_PORT with your preferred host port number. After making this change, you will need to rebuild and restart your containers for the change to take effect. + + +### Stop the services + +Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: + +``` +docker compose -f compose.yaml down +``` \ No newline at end of file diff --git a/examples/CodeTrans/deploy/xeon.md b/examples/CodeTrans/deploy/xeon.md new file mode 100644 index 00000000..e74d2df5 --- /dev/null +++ b/examples/CodeTrans/deploy/xeon.md @@ -0,0 +1,387 @@ +# Single node on-prem deployment with TGI on Xeon Scalable processors + +This deployment section covers the single-node on-prem deployment of the CodeTrans example with OPEA comps using the Text Generation service based on TGI. The solution demonstrates building a code translation service using the `mistralai/Mistral-7B-Instruct-v0.3` model deployed on Intel® Xeon® Scalable processors. To quickly learn about OPEA in just 5 minutes and set up the required hardware and software, please follow the instructions in the [Getting Started](https://opea-project.github.io/latest/getting-started/README.html) section. + +## Overview + +In this tutorial, we will walk through how to enable the following microservices from OPEA GenAIComps to deploy a single node Text Generation megaservice solution for code translation: + +1. LLM with TGI +2. Nginx Service + +The solution demonstrates using the Mistral-7B-Instruct-v0.3 model on Intel Xeon Scalable processors to translate code between different programming languages. We will go through setting up Docker containers to start the microservices and megaservice. Users can input code in one programming language and get it translated to another. The solution is deployed with a basic UI accessible through both direct port and Nginx. + +## Prerequisites + +The first step is to clone the GenAIExamples and GenAIComps. GenAIComps are fundamental components used to build examples you find in GenAIExamples and deploy them as microservices. + +``` +git clone https://github.com/opea-project/GenAIComps.git +git clone https://github.com/opea-project/GenAIExamples.git +``` +The examples utilize model weights from HuggingFace. +Set up your [HuggingFace](https://huggingface.co/) account and +apply for model access to `Mistral-7B-Instruct-v0.3` which is a gated model. To obtain access for using the model, visit the [model site](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) and click on `Agree and access repository`. + +Next, generate [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). + +Setup the HuggingFace token + +``` +export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" +``` + +The example requires you to set the `host_ip` to deploy the microservices on the endpoint enabled with ports. Set the host_ip env variable. + +``` +export host_ip=$(hostname -I | awk '{print $1}') +``` + +Make sure to set Proxies if you are behind a firewall. + +```bash +export no_proxy=${your_no_proxy},$host_ip +export http_proxy=${your_http_proxy} +export https_proxy=${your_http_proxy} +``` + +## Prepare (Building / Pulling) Docker images + +This step involves either building or pulling four required Docker images. Each image serves a specific purpose in the CodeTrans architecture. + +::::::{tab-set} + +:::::{tab-item} Pull +:sync: Pull + +If you decide to pull the docker containers and not build them locally, you can proceed to [Use Case Setup](#use-case-setup). where all the necessary containers will be pulled in from the docker hub. +::::: +:::::{tab-item} Build +:sync: Build + +From within the `GenAIComps` folder, check out the release tag. +``` +cd GenAIComps +git checkout tags/v1.2 +``` + +### Build LLM Image + +First, build the Text Generation LLM service image: + +```bash +docker build -t opea/llm-textgen:latest --build-arg https_proxy=$https_proxy \ +--build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile . +``` + +>**Note**: `llm-textgen` uses Text Generation Inference (TGI) which is pulled automatically via the docker compose file in the next steps. + +### Build Nginx Image + +Build the Nginx service image that will handle routing: + +```bash +docker build -t opea/nginx:latest --build-arg https_proxy=$https_proxy \ +--build-arg http_proxy=$http_proxy -f comps/third_parties/nginx/src/Dockerfile . + +``` + +### Build MegaService Image + +The Megaservice is a pipeline that channels data through different microservices, each performing varied tasks. We define the different microservices and the flow of data between them in the `code_translation.py` file, in this example, CodeTrans MegaService formats the input code and language parameters into a prompt template, sends it to the LLM microservice, and returns the translated code. You can also add newer or remove some microservices and customize the megaservice to suit the needs. + +Build the megaservice image for this use case. + +```bash +git clone https://github.com/opea-project/GenAIExamples.git +cd GenAIExamples/CodeTrans +git checkout tags/v1.2 +``` +``` +docker build -t opea/codetrans:latest --build-arg https_proxy=$https_proxy \ +--build-arg http_proxy=$http_proxy -f Dockerfile . +``` + +### Build UI Image + +Build the UI service image: + +```bash +cd GenAIExamples/CodeTrans/ui +docker build -t opea/codetrans-ui:latest --build-arg https_proxy=$https_proxy \ +--build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . +``` + +### Sanity Check + +Before proceeding, verify that you have all required Docker images by running `docker images`. You should see the following images: + +* opea/llm-textgen:latest +* opea/codetrans:latest +* opea/codetrans-ui:latest +* opea/nginx:latest + +::::: +:::::: + +## Use Case Setup + +The use case will use the following combination of the GenAIComps with the tools. + +| Use Case Components | Tools | Model | Service Type | +|---------------------|---------------|--------------------------------------|----------------------| +| LLM | TGI | mistralai/Mistral-7B-Instruct-v0.3 | OPEA Microservice | +| UI | | NA | Gateway Service | +| Ingress | Nginx | NA | Gateway Service | + +Tools and models mentioned in the table are configurable either through the environment variable or `compose.yaml` + +Set the necessary environment variables to set the use case. + +```bash +cd GenAIExamples/CodeTrans/docker_compose +git checkout tags/v1.2 +source ./set_env.sh +``` +Set up a desired port for Nginx: +```bash +# Example: NGINX_PORT=80 +export NGINX_PORT=${your_nginx_port} +``` + +## Deploy the use case + +In this tutorial, we will be deploying via docker compose with the provided YAML file. The docker compose instructions should start all the above-mentioned services as containers. + +```bash +cd intel/cpu/xeon +docker compose up -d +``` + +### Validate microservice + +#### Check Env Variables + +Check the startup log by `docker compose -f ./compose.yaml logs`. +The warning messages print out the variables if they are **NOT** set. + +ubuntu@xeon-vm:~/GenAIExamples/CodeTrans/docker_compose/intel/cpu/xeon$ docker compose -f ./compose.yaml up -d + +WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. +WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. + +#### Check the container status + +Check if all the containers launched via docker compose have started. +For example, the CodeTrans example starts 5 docker containers (services), check these docker containers are all running, i.e., all the containers `STATUS` are `Up`. + +To do a quick sanity check, try `docker ps -a` to see if all the containers are running. + +``` +| CONTAINER ID | IMAGE | COMMAND | CREATED | STATUS | PORTS | NAMES | +|--------------|-------------------------------------------------------------------|---------------------------|----------------|------------------------------------|---------------------------------------------|---------------------------------| +| 0744c6693a64 | opea/nginx:latest | `/docker-entrypoint.…` | 20 minutes ago | Up 9 minutes | 0.0.0.0:80->80/tcp, :::80->80/tcp | codetrans-xeon-nginx-server | +| 1e9c8c900843 | opea/codetrans-ui:latest | `docker-entrypoint.s…` | 20 minutes ago | Up 9 minutes | 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp | codetrans-xeon-ui-server | +| 3ed57de43648 | opea/codetrans:latest | `python code_transla…` | 20 minutes ago | Up 9 minutes | 0.0.0.0:7777->7777/tcp, :::7777->7777/tcp | codetrans-xeon-backend-server | +| 29d0fe6382dd | opea/llm-textgen:latest | `bash entrypoint.sh` | 20 minutes ago | Up 9 minutes | 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp | llm-textgen-server | +| e1b37ad9e078 | ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu | `text-generation-lau…` | 20 minutes ago | Up 13 minutes (healthy) | 0.0.0.0:8008->80/tcp, [::]:8008->80/tcp | codetrans-tgi-service | + +``` + +## Interacting with CodeTrans deployment + +In this section, you will walk through the different ways to interact with the deployed microservices. + +### TGI Service + +In the first startup, this service will take more time to download the model files. After it's finished, the service will be ready. + +Try the command below to check whether the LLM serving is ready. +``` +docker logs ${CONTAINER_ID} | grep Connected +``` +If the service is ready, you will get a response like below. + +``` +2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected +``` +```bash +curl http://${host_ip}:8008/generate \ +-X POST \ +-d '{"inputs":" ### System: Please translate the following Golang codes into Python codes. ### Original codes: '\'''\'''\''Golang \npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n '\'''\'''\'' ### Translated codes:","parameters":{"max_new_tokens":17, "do_sample": true}}' \ +-H 'Content-Type: application/json' +``` + +TGI service generates text for the input prompt. Here is the expected result from TGI: + +``` +{"generated_text":"'''Python\nprint(\"Hello, World!\")"} +``` +**NOTE**: After launching TGI, it takes a few minutes for the TGI server to load the LLM model and warm up. + +### Text Generation Microservice + +This service handles the core language model operations. You can validate it's working by sending a direct request to translate a simple "Hello World" program from Go to Python: + +```bash +curl http://${host_ip}:9000/v1/chat/completions \ + -X POST \ + -d '{ + "query": "### System: Please translate the following Golang codes into Python codes. ### Original codes: ```Golang\npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}\n``` ### Translated codes:", + "max_tokens": 17 + }' \ + -H 'Content-Type: application/json' +``` +The expected output is as shown below: +``` +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737123223,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737123223,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"``"}],"created":1737123223,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"`"}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"Py"}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"thon"}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"print"}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"(\""}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"Hello"}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":","}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" World"}],"created":1737123225,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"!"}],"created":1737123225,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\")"}],"created":1737123225,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737123225,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"``"}],"created":1737123225,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"length","index":0,"logprobs":null,"text":"`"}],"created":1737123225,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":{"completion_tokens":17,"prompt_tokens":58,"total_tokens":75,"completion_tokens_details":null,"prompt_tokens_details":null}} +data: [DONE] +``` + +### MegaService + +The CodeTrans megaservice orchestrates the entire translation process. Test it with a simple code translation request: + +```bash +curl http://${host_ip}:7777/v1/codetrans \ +-H "Content-Type: application/json" \ +-d '{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}' +``` +When you send this request, you’ll receive a streaming response from the MegaService. It will appear line by line like so: +``` +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" "}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" Python"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" "}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" print"}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"(\""}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"Hello"}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":","}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" World"}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"!"}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\")"}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121309,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" "}],"created":1737121309,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" ```"}],"created":1737121309,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} +data: {"id":"","choices":[{"finish_reason":"eos_token","index":0,"logprobs":null,"text":""}],"created":1737121309,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":{"completion_tokens":18,"prompt_tokens":74,"total_tokens":92,"completion_tokens_details":null,"prompt_tokens_details":null}} +data: [DONE] +``` +Within this output, each line contains JSON that includes a `text` field. Once you combine the `text` values in order, you’ll reconstruct the translated code. In this example, the final code is simply: +``` +print("Hello, World!") +``` +This demonstrates how the MegaService streams each segment of the response, which you can then piece together to get the complete translation. + +### Nginx Service + +The Nginx service acts as a reverse proxy and load balancer for the application. You can verify it's properly routing requests by sending the same translation request through Nginx: + +```bash +curl http://${host_ip}:${NGINX_PORT}/v1/codetrans \ +-H "Content-Type: application/json" \ +-d '{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}' +``` +The expected output is the same as the MegaService output. + +Each of these endpoints should return a successful response with the translated Python code. If any of these tests fail, check the corresponding service logs for more details. + +## Check the docker container logs + +Following is an example of debugging using Docker logs: + +Check the log of the container using: + +`docker logs -t` + +Check the log using `docker logs e1b37ad9e078 -t`. + +``` +2024-06-05T01:30:30.695934928Z error: a value is required for '--model-id ' but none was supplied + +2024-06-05T01:30:30.697123534Z + +2024-06-05T01:30:30.697148330Z For more information, try '--help'. +``` +The log indicates the `MODEL_ID` is not set. + +View the docker input parameters in `./CodeTrans/docker_compose/intel/cpu/xeon/compose.yaml` +``` +tgi-service: + image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu + container_name: codetrans-tgi-service + ports: + - "8008:80" + volumes: + - "./data:/data" + shm_size: 1g + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} + host_ip: ${host_ip} + healthcheck: + test: ["CMD-SHELL", "curl -f http://$host_ip:8008/health || exit 1"] + interval: 10s + timeout: 10s + retries: 100 + command: --model-id ${LLM_MODEL_ID} --cuda-graphs 0 +``` +The input `MODEL_ID` is `${LLM_MODEL_ID}` + +Check environment variable `LLM_MODEL_ID` is set correctly, and spelled correctly. + +Set the `LLM_MODEL_ID` then restart the containers. +You can also check overall logs with the following command, where the +`compose.yaml` is the MegaService docker-compose configuration file. +``` +docker compose -f ./docker_compose/intel/cpu/xeon/compose.yaml logs +``` +## Launch UI + +### Basic UI + +To access the frontend user interface (UI), the primary method is through the Nginx reverse proxy service. Open the following URL in your browser: http://${host_ip}:${NGINX_PORT}. This provides a stable and secure access point to the UI. The value of ${NGINX_PORT} has been defined in the earlier steps. + +Alternatively, you can access the UI directly using its internal port. This method bypasses the Nginx proxy and can be used for testing or troubleshooting purposes. To access the UI directly, open the following URL in your browser: http://${host_ip}:5173. By default, the UI runs on port 5173. + +If you need to change the port used to access the UI directly (not through Nginx), modify the ports section of the `compose.yaml` file: + +``` +codetrans-xeon-ui-server: + image: ${REGISTRY:-opea}/codetrans-ui:${TAG:-latest} + container_name: codetrans-xeon-ui-server + depends_on: + - codetrans-xeon-backend-server + ports: + - "YOUR_HOST_PORT:5173" # Change YOUR_HOST_PORT to your desired port +``` +Remember to replace YOUR_HOST_PORT with your preferred host port number. After making this change, you will need to rebuild and restart your containers for the change to take effect. + + +### Stop the services + +Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: + +``` +docker compose -f compose.yaml down +``` \ No newline at end of file diff --git a/examples/index.rst b/examples/index.rst index c7d72fb3..f09c1cf1 100644 --- a/examples/index.rst +++ b/examples/index.rst @@ -11,6 +11,7 @@ GenAIExamples are designed to give developers an easy entry into generative AI, AgentQnA/AgentQnA_Guide ChatQnA/ChatQnA_Guide CodeGen/CodeGen_Guide + CodeTrans/CodeTrans_Guide ---- From b357f825168428fdb0c3cbab7c6fa7848dd61475 Mon Sep 17 00:00:00 2001 From: rbrugaro Date: Fri, 24 Jan 2025 13:41:51 -0800 Subject: [PATCH 6/7] Contributing guide update post refactor (#283) * Contributing guide update post refactor: - Update folder structure - Update file name conventions - Update file descriptions Signed-off-by: rbrygaro * Contributing guide updates from reviewers feedback Signed-off-by: rbrygaro * support building history release online doc, provide the help for process to build histoy release (#285) Co-authored-by: ZhangJianyu Signed-off-by: rbrygaro * codegen xeon update (#282) * codegen xeon update Signed-off-by: Srinarayan Srikanthan * typo fix Signed-off-by: Srinarayan Srikanthan --------- Signed-off-by: Srinarayan Srikanthan Co-authored-by: Ying Hu Signed-off-by: rbrygaro * Add deployment example for CodeTrans (#281) * Add deployment example for CodeTrans Signed-off-by: devpramod * fix doc build Signed-off-by: devpramod * fix typos, grammar - xeon Signed-off-by: devpramod * fix typos, grammar - gaudi Signed-off-by: devpramod --------- Signed-off-by: devpramod Co-authored-by: Ying Hu Signed-off-by: rbrygaro * Update community/CONTRIBUTING.md typos, wording, grammar fixes from reviewers Co-authored-by: Dan Signed-off-by: rbrygaro * Apply suggestions from code review typo, wording and grammar fixes Co-authored-by: Dan Signed-off-by: rbrygaro * reviewer suggestion GMC Signed-off-by: rbrugaro * cosmetic reviewer edits Signed-off-by: rbrugaro --------- Signed-off-by: rbrygaro Signed-off-by: Srinarayan Srikanthan Signed-off-by: devpramod Signed-off-by: rbrugaro Co-authored-by: Neo Zhang Jianyu Co-authored-by: ZhangJianyu Co-authored-by: sri-intel <108247623+srinarayan-srikanthan@users.noreply.github.com> Co-authored-by: Ying Hu Co-authored-by: devpramod Co-authored-by: Dan --- community/CONTRIBUTING.md | 225 ++++++++++++++++++++++---------------- 1 file changed, 131 insertions(+), 94 deletions(-) diff --git a/community/CONTRIBUTING.md b/community/CONTRIBUTING.md index 37e230ae..c6b41596 100644 --- a/community/CONTRIBUTING.md +++ b/community/CONTRIBUTING.md @@ -13,32 +13,40 @@ Thanks for considering contributing to OPEA project. The contribution process is ``` GenAIComps ├── comps + │   ├── __init__.py │   ├── agent + │   ├── animation │   ├── asr │   ├── chathistory │   ├── cores - │ │ ├── mega #orchestrator, gateway, micro_service class code - │ │ ├── proto #api protocol + | │   ├── common + │ │ ├── mega # orchestrator, gateway, micro_service class code + │ │ ├── proto # api protocol │ │ └── telemetry │   ├── dataprep │   ├── embeddings │   ├── feedback_management │   ├── finetuning │   ├── guardrails - │   ├── intent_detection - │   ├── knowledgegraphs + │   ├── image2image + │   ├── image2video │   ├── llms │   ├── lvms - │   ├── nginx │   ├── prompt_registry - │   ├── ragas - │   ├── reranks + │   ├── rerankings │   ├── retrievers + │   ├── text2image + │   ├── text2sql + │   ├── third_parties # open microservices (i,e tgi, vllm...) │   ├── tts - │   ├── vectorstores + │   ├── version.py │   └── web_retrievers + ├── pyproject.toml + ├── requirements.txt + ├── setup.py └── tests    ├── agent +    ├── animation    ├── asr    ├── chathistory    ├── cores @@ -47,15 +55,18 @@ Thanks for considering contributing to OPEA project. The contribution process is    ├── feedback_management    ├── finetuning    ├── guardrails -    ├── intent_detection +    ├── image2image +    ├── image2video    ├── llms    ├── lvms -    ├── nginx    ├── prompt_registry -    ├── reranks +    ├── rerankings    ├── retrievers +    ├── text2image +    ├── text2sql +    ├── third_parties    ├── tts -    ├── vectorstores +    ├── utils    └── web_retrievers ``` @@ -65,45 +76,69 @@ Thanks for considering contributing to OPEA project. The contribution process is GenAIComps ├── comps │ └── embeddings - │ ├── __init__.py - │ └── tei #vendor name or serving framework name - │ ├── langchain - │ │ ├── Dockerfile - │ │ ├── Dockerfile.amd_gpu - │ │ ├── Dockerfile.nvidia_gpu - │ │ ├── embedding_tei.py # definition and registration of microservice - │ │ ├── README.md - │ │ └── requirements.txt - │ └── llama_index - │ └── . . . + | | ├── deployment + | | │   ├── docker_compose + | | │   │   └── compose.yaml + | | │   └── kubernetes + | | │   ├── README.md + | | │   └── cpu-values.yaml + | | └── src + | | ├── Dockerfile + | | ├── README.md # could be multiple files for different integrations + | | ├── __init__.py + | | ├── integrations + | | │   ├── __init__.py + | | │   ├── multimodal_bridgetower.py + | | │   ├── predictionguard.py + | | │   └── tei.py + | | ├── opea_embedding_microservice.py + | | ├── opea_multimodal_embedding_microservice.py + | | └── requirements.txt + | ├── third_parties + | │   ├── bridgetower + | │   ├── clip + | │   ├── elasticsearch + | │   ├── gpt-sovits + | │   ├── milvus + | │   ├── mongodb + | │   ├── nginx + | │   ├── opensearch + | │   ├── pathway + | │   ├── pgvector + | │   ├── redis + | │   ├── speecht5 + | │   ├── tei + | │   ├── teirerank + | │   ├── tgi + | │   ├── vdms + | │   ├── vllm + | │   ├── wav2lip + | │   └── whisper ├── tests │ └── embeddings - │ ├── test_embeddings_tei_langchain.sh - │ ├── test_embeddings_tei_langchain_on_amd_gpu.sh - │ └── test_embeddings_tei_llama_index.sh + | ├── test_embeddings_multimodal_bridgetower.sh + | ├── test_embeddings_multimodal_bridgetower_on_intel_hpu.sh + | ├── test_embeddings_predictionguard.sh + | └── test_embeddings_tei.sh └── README.md - ``` - **File Descriptions**: - - `embedding_tei.py`: This file defines and registers the microservice. It serves as the entrypoint of the Docker container. Refer to [whisper ASR](https://github.com/opea-project/GenAIComps/tree/main/comps/asr/whisper/README.md) for a simple example or [TGI](https://github.com/opea-project/GenAIComps/blob/main/comps/llms/text-generation/tgi/llm.py) for a more complex example that required adapting to the OpenAI API. - - `requirements.txt`: This file is used by Docker to install the necessary dependencies. - - `Dockerfile`: Used to generate the service container image. Please follow naming conventions: - - Dockerfile: `Dockerfile.[vendor]_[hardware]`, vendor and hardware in lower case (i,e Dockerfile.amd_gpu) - - Docker Image: `opea/[microservice type]-[microservice sub type]-[library name]-[vendor]-[hardware]:latest` all lower case (i,e opea/llm-vllm-intel-hpu, opea/llm-faqgen-tgi-intel-hpu-svc) - - - `tests/[microservices type]/` : contains end-to-end test for microservices Please refer to an example [test_asr_whisper.sh](https://github.com/opea-project/GenAIComps/blob/main/tests/asr/test_asr_whisper.sh). Please follow naming convention:`test_[microservice type]_[microservice sub type]_[library name]_on_[vendor]_[hardware].sh` - - `tests/cores/` : contains Unit Tests (UT) for the core python components (orchestrator, gateway...). Please follow the naming convention:`test_[core component].sh` - + - `deployment/docker_compose` and `deployment/kubernetes`: These folders contain the deployments for the different integrations + - `src/opea_embedding_microservice.py`: This file defines and registers the microservice. It serves as the entrypoint of the Docker container. + - `src/integrations/tei.py`: Integrations define how OPEA integrates the third_parties services. + - `src/requirements.txt`: This file is used by Docker to install the necessary dependencies for opea component and all integrations + - `src/Dockerfile`: Used to generate the service container image. + - `tests/[microservices type]/` : contains end-to-end test for microservices. Please follow naming convention:`test_[microservice type]_[integration type].sh` or `test_[microservice type]_[integration type]_on_[hardware].sh` if hardware specific. This will ensure CICD evaluates components correctly. - `README.md`: at minimum it should include: microservice description, build, and start commands and a curl command with expected output. -4. Now you have created all the required files, and validated your service. Last step is to modify the `README.md` at the component level `GenAIComps/comps/[microservice type]` to list your new component. Now you are ready to file your PR! Once your PR is merged, in the next release the project release maintainers will publish the Docker Image for the same to the Docker Hub. +4. Now you have created all the required files, and validated your service. Last step is to add a `src/README_[interation_type].md` at the component level `GenAIComps/comps/[microservice type]` to list your new component. Now you are ready to file your PR! Once your PR is merged, in the next release the project release maintainers will publish the Docker Image for the same to Docker Hub. 5. After your component has been merged, you are likely interested in building an application with it, and perhaps contributing it also to OPEA! Please continue on to the [Contribute a GenAI Example](#contribute-a-genai-example) guide. ### Contribute a GenAI Example -Each of the samples in OPEA GenAIExamples are a common oft used solution. They each have scripts to ease deployment, and have been tested for performance and scalability with Docker compose and Kubernetes. When contributing an example, a Docker Compose deployment is the minimum requirement. However, since OPEA is intended for enterprise applications, supporting Kubernetes deployment is highly encouraged. You can find [examples for Kubernetes deployment](https://github.com/opea-project/GenAIExamples/tree/main/README.md#deploy-examples) using manifests, Helms Charts, and the [GenAI Microservices Connector (GMC)](https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector/README.md). GMC offers additional enterprise features, such as the ability to dynamically adjust pipelines on Kubernetes (e.g., switching to a different LLM on the fly, adding guardrails), composing pipeleines that include external services hosted in public cloud or on-premisees via URL, and supporting sequential, parallel and conditional flows in the pipelines. +Each of the samples in OPEA GenAIExamples is a commonly used solution. They each have scripts to ease deployment, and have been tested for performance and scalability with Docker Compose and Kubernetes. When contributing an example, a Docker Compose deployment is the minimum requirement. However, since OPEA is intended for enterprise applications, supporting Kubernetes deployment is highly encouraged. You can find [examples for Kubernetes deployment](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/kubernetes/helm#readme) using Helm Charts. - Navigate to [OPEA GenAIExamples](https://github.com/opea-project/GenAIExamples/tree/main/README.md) and check the catalog of examples. If you find one that is very similar to what you are looking for, you can contribute your variation of it to that particular example folder. If you are bringing a completly new application you will need to create a separate example folder. @@ -191,64 +226,68 @@ flowchart LR ``` -- OPEA uses gateways to handle requests and route them to the corresponding megaservices (unless you have an agent that will otherwise handle the gateway function). If you are just making small changes to the application, like swaping one DB for another, you can reuse the existing Gateway but if you are contributing a completely new application, you will need to add a Gateway class. Navigate to [OPEA GenAIComps Gateway](https://github.com/opea-project/GenAIComps/blob/main/comps/cores/mega/gateway.py) and implement how Gateway should handle requests for your application. Note that Gateway implementation is moving to GenAIExamples in future release. - - Follow the folder structure in the ChatQA example below: ``` + ├── Dockerfile + ├── Dockerfile.guardrails + ├── Dockerfile.without_rerank + ├── README.md ├── assets - ├── benchmark # optional + │   └── img + ├── benchmark + │   ├── accuracy + │   │   ├── README.md + │   │   ├── eval_crud.py + │   │   ├── eval_multihop.py + │   │   ├── process_crud_dataset.py + │   │   └── run_acc.sh + │   └── performance + │   └── kubernetes ├── chatqna.py # Main application definition (microservices, megaservice, gateway). - ├── chatqna.yaml # starting v1.0 used to generate manifests for k8s w orchestrator_with_yaml + ├── chatqna_wrapper.py ├── docker_compose - │ ├── intel - │ │ ├── cpu - │ │ │ └── xeon - │ │ │ ├── compose.yaml - │ │ │ ├── README.md - │ │ │ └── set_env.sh #export env variables - │ │ └── hpu - │ │ └── gaudi - │ │ ├── compose.yaml - │ │ ├── how_to_validate_service.md #optional - │ │ ├── README.md - │ │ └── set_env.sh - │ └── nvidia - │ └── gpu - │ ├── compose.yaml - │ ├── README.md - │ └── set_env.sh - ├── Dockerfile + │   ├── amd + │   │   └── gpu + │   ├── install_docker.sh + │   ├── intel + │   │   ├── cpu + │   │   └── hpu + │   └── nvidia + │   └── gpu ├── docker_image_build - │ └── build.yaml + │   └── build.yaml ├── kubernetes - │ ├── intel - │ │ ├── cpu - │ │ │ └── xeon - │ │ │ ├── gmc - │ │ │ │ └── chatQnA_xeon.yaml - │ │ │ └── manifest - │ │ │ └── chatqna.yaml - │ │ └── hpu - │ │ └── gaudi - │ │ ├── gmc - │ │ │ └── chatQnA_gaudi.yaml - │ │ └── manifest - │ │ └── chatqna.yaml - │ ├── amd - │ │ ├── cpu - │ │ │ ├── gmc - │ │ │ └── manifest - │ │ └── gpu - │ │ ├── gmc - │ │ └── manifest - │ ├── README_gmc.md # K8s quickstar - │ └── README.md # quickstart - ├── README.md + │   ├── gmc + │   │   ├── README.md + │   │   ├── chatQnA_dataprep_gaudi.yaml + │   │   ├── chatQnA_dataprep_xeon.yaml + │   │   ├── chatQnA_gaudi.yaml + │   │   ├── chatQnA_switch_gaudi.yaml + │   │   ├── chatQnA_switch_xeon.yaml + │   │   └── chatQnA_xeon.yaml + │   └── helm + │   ├── README.md + │   ├── cpu-values.yaml + │   ├── gaudi-values.yaml + │   ├── gaudi-vllm-values.yaml + │   ├── guardrails-gaudi-values.yaml + │   ├── guardrails-values.yaml + │   ├── norerank-values.yaml + │   └── nv-values.yaml ├── tests - │ ├── test_compose_on_gaudi.sh #could be more tests for different flavors of the app - │ ├── test_gmc_on_gaudi.sh - │ ├── test_manifest_on_gaudi.sh + │   ├── test_compose_guardrails_on_gaudi.sh + │   ├── test_compose_on_gaudi.sh + │   ├── test_compose_on_rocm.sh + │   ├── test_compose_on_xeon.sh + │   ├── test_compose_pinecone_on_xeon.sh + │   ├── test_compose_qdrant_on_xeon.sh + │   ├── test_compose_tgi_on_gaudi.sh + │   ├── test_compose_tgi_on_xeon.sh + │   ├── test_compose_without_rerank_on_gaudi.sh + │   ├── test_compose_without_rerank_on_xeon.sh + │   ├── test_gmc_on_gaudi.sh + │   ├── test_gmc_on_xeon.sh └── ui ``` @@ -256,13 +295,11 @@ flowchart LR - **File Descriptions**: - `chatqna.py`: application definition using microservice, megaservice and gateway. There could be multiple .py in the folder based on slight modification of the example application. - `docker_build_image/build.yaml`: builds necessary images pointing to the Dockerfiles in the GenAIComp repository. - - `docker_compose/vendor/device/compose.yaml`: defines pipeline for Docker compose deployment. For selectng docker image name please follow the naming convention: + - `docker_compose/[vendor]/[device]/compose.yaml`: defines pipeline for Docker Compose deployment. For naming the Docker Image file please follow the naming convention: - Docker Image: `opea/[example name]-[feature name]:latest` all lower case (i,e: opea/chatqna, opea/codegen-react-ui) - - `kubernetes/vendor/device/manifests/chatqna.yaml`: used for K8s deployemnt - - `kubernetes/vendor/device/gmc/chatqna.yaml`: (optional) used for deployment with GMC - - `tests/`: at minimum you need to provide an E2E test with Docker compose. If you are contritbutng K8s manifests and GMC yaml, you should also provide test for those. Please follow naming convention: + - `kubernetes/helm` and `kubernetes/gmc` : used for K8s deployemnt with helm or [GenAI Microservices Connector (GMC)](https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector#genai-microservices-connectorgmc) + - `tests/`: at minimum you need to provide an E2E test with Docker Compose. If you are contributing a K8s Helm Chart or GMC yaml, you should also provide tests for those. Please follow this naming convention: - Docker compose test: `tests/test_compose_on_[hardware].sh` - - K8s test: `tests/test_manifest_on_[hardware].sh` - K8s with GMC test: `tests/test_gmc_on_[hardware].sh` - `ui`: (optional) - `assets`: nice to have an application flow diagram @@ -289,7 +326,7 @@ The quality of OPEA project's documentation can have a huge impact on its succes ### Reporting Issues -If OPEA user runs into some unexpected behavior, reporting the issue to the `Issues` page under the corresponding github project is the proper way to do. Please ensure there is no similar one already existing on the issue list). Please follow the Bug Report template and supply as much information as you can, and any additional insights you might have. It's helpful if the issue submitter can narrow down the problematic behavior to a minimal reproducible test case. +If you run into unexpected behavior, please report it using the `Issues` page under the corresponding github project but first check if there is already a similar existing issue. If not, please follow the Bug Report template and supply as much information as you can. It's helpful if the issue submitter can narrow down the problematic behavior to a minimal reproducible test case. ### Proposing New Features @@ -370,7 +407,7 @@ The OPEA projects use GitHub Action for CI test. - End to End Test, the PR must pass all end to end tests. #### Pull Request Review -You can add reviewers from [the code owners list](./codeowner.md) to your PR. +You can tag or add reviewers from [the code owners list](https://github.com/opea-project/docs/blob/main/community/codeowner.md) to your PR. ## Support From 3a9e68715925e2c2112ebf3036a71e91ac60094b Mon Sep 17 00:00:00 2001 From: rbrugaro Date: Fri, 24 Jan 2025 15:25:31 -0800 Subject: [PATCH 7/7] address requested changes from PR #283 (#286) Signed-off-by: rbrugaro --- community/CONTRIBUTING.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/community/CONTRIBUTING.md b/community/CONTRIBUTING.md index c6b41596..7db1f7dd 100644 --- a/community/CONTRIBUTING.md +++ b/community/CONTRIBUTING.md @@ -140,7 +140,7 @@ Thanks for considering contributing to OPEA project. The contribution process is Each of the samples in OPEA GenAIExamples is a commonly used solution. They each have scripts to ease deployment, and have been tested for performance and scalability with Docker Compose and Kubernetes. When contributing an example, a Docker Compose deployment is the minimum requirement. However, since OPEA is intended for enterprise applications, supporting Kubernetes deployment is highly encouraged. You can find [examples for Kubernetes deployment](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/kubernetes/helm#readme) using Helm Charts. -- Navigate to [OPEA GenAIExamples](https://github.com/opea-project/GenAIExamples/tree/main/README.md) and check the catalog of examples. If you find one that is very similar to what you are looking for, you can contribute your variation of it to that particular example folder. If you are bringing a completly new application you will need to create a separate example folder. +- Navigate to [OPEA GenAIExamples](https://github.com/opea-project/GenAIExamples/tree/main/README.md) and check the catalog of examples. If you find one that is very similar to what you are looking for, you can contribute your variation of it to that particular example folder. If you are bringing a completely new application you will need to create a separate example folder. We recommend submitting an RFC first in this case to the doc sub-project to discuss your new application and potentially get suggestions and collect fellow travellers. - Before stitching together all the microservices to build your application, let's make sure all the required building blocks are available!. Take a look at this **ChatQnA Flow Chart**: @@ -326,7 +326,7 @@ The quality of OPEA project's documentation can have a huge impact on its succes ### Reporting Issues -If you run into unexpected behavior, please report it using the `Issues` page under the corresponding github project but first check if there is already a similar existing issue. If not, please follow the Bug Report template and supply as much information as you can. It's helpful if the issue submitter can narrow down the problematic behavior to a minimal reproducible test case. +If you run into unexpected behavior, please report it using the `Issues` page under the corresponding GitHub project but first check if there is already a similar existing issue. If not, please follow the Bug Report template and supply as much information as you can. It's helpful if the issue submitter can narrow down the problematic behavior to a minimal reproducible test case. ### Proposing New Features @@ -352,7 +352,7 @@ It is not necessary for changes like: ```{literalinclude} rfcs/rfc_template.txt ``` -- Submit the proposal to the `Issues` page of the corresponding OPEA github repository. +- Submit the proposal to the `Issues` page of the corresponding OPEA GitHub repository. - Reach out to your RFC's assignee if you need any help with the RFC process. - Amend your proposal in response to reviewer's feedback.