Skip to content

Commit

Permalink
Update documentation to version 0.30.0 (#2523)
Browse files Browse the repository at this point in the history
  • Loading branch information
xyang16 authored Nov 11, 2024
1 parent 945c2dc commit ac65b3c
Show file tree
Hide file tree
Showing 15 changed files with 41 additions and 42 deletions.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,20 +48,20 @@ brew services stop djl-serving
For Ubuntu

```
curl -O https://publish.djl.ai/djl-serving/djl-serving_0.28.0-1_all.deb
sudo dpkg -i djl-serving_0.28.0-1_all.deb
curl -O https://publish.djl.ai/djl-serving/djl-serving_0.30.0-1_all.deb
sudo dpkg -i djl-serving_0.30.0-1_all.deb
```

For Windows

We are considering to create a `chocolatey` package for Windows. For the time being, you can
download djl-serving zip file from [here](https://publish.djl.ai/djl-serving/serving-0.28.0.zip).
download djl-serving zip file from [here](https://publish.djl.ai/djl-serving/serving-0.30.0.zip).

```
curl -O https://publish.djl.ai/djl-serving/serving-0.28.0.zip
unzip serving-0.28.0.zip
curl -O https://publish.djl.ai/djl-serving/serving-0.30.0.zip
unzip serving-0.30.0.zip
# start djl-serving
serving-0.28.0\bin\serving.bat
serving-0.30.0\bin\serving.bat
```

### Docker
Expand Down
2 changes: 1 addition & 1 deletion awscurl/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ You can download `awscurl` like this:

```sh
# download stable release version:
curl -O https://publish.djl.ai/awscurl/0.28.0/awscurl \
curl -O https://publish.djl.ai/awscurl/0.30.0/awscurl \
&& chmod +x awscurl

# or download nightly release
Expand Down
16 changes: 8 additions & 8 deletions benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,25 +40,25 @@ sudo snap alias djlbench djl-bench
- Or download .deb package from S3

```
curl -O https://publish.djl.ai/djl-bench/0.29.0/djl-bench_0.29.0-1_all.deb
sudo dpkg -i djl-bench_0.29.0-1_all.deb
curl -O https://publish.djl.ai/djl-bench/0.30.0/djl-bench_0.30.0-1_all.deb
sudo dpkg -i djl-bench_0.30.0-1_all.deb
```

For macOS, centOS or Amazon Linux 2

You can download djl-bench zip file from [here](https://publish.djl.ai/djl-bench/0.29.0/benchmark-0.29.0.zip).
You can download djl-bench zip file from [here](https://publish.djl.ai/djl-bench/0.30.0/benchmark-0.30.0.zip).

```
curl -O https://publish.djl.ai/djl-bench/0.29.0/benchmark-0.29.0.zip
unzip benchmark-0.29.0.zip
rm benchmark-0.29.0.zip
sudo ln -s $PWD/benchmark-0.29.0/bin/benchmark /usr/bin/djl-bench
curl -O https://publish.djl.ai/djl-bench/0.30.0/benchmark-0.30.0.zip
unzip benchmark-0.30.0.zip
rm benchmark-0.30.0.zip
sudo ln -s $PWD/benchmark-0.30.0/bin/benchmark /usr/bin/djl-bench
```

For Windows

We are considering to create a `chocolatey` package for Windows. For the time being, you can
download djl-bench zip file from [here](https://publish.djl.ai/djl-bench/0.29.0/benchmark-0.29.0.zip).
download djl-bench zip file from [here](https://publish.djl.ai/djl-bench/0.30.0/benchmark-0.30.0.zip).

Or you can run benchmark using gradle:

Expand Down
2 changes: 1 addition & 1 deletion benchmark/snapcraft/snapcraft.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: djlbench
version: '0.29.0'
version: '0.30.0'
title: DJL Benchmark
license: Apache-2.0
summary: A machine learning benchmarking toolkit
Expand Down
4 changes: 2 additions & 2 deletions engines/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,13 @@ The javadocs output is generated in the `build/doc/javadoc` folder.
## Installation
You can pull the Python engine from the central Maven repository by including the following dependency:

- ai.djl.python:python:0.28.0
- ai.djl.python:python:0.30.0

```xml
<dependency>
<groupId>ai.djl.python</groupId>
<artifactId>python</artifactId>
<version>0.28.0</version>
<version>0.30.0</version>
<scope>runtime</scope>
</dependency>
```
Expand Down
20 changes: 10 additions & 10 deletions serving/docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ You can find different `compose-target` in `docker-compose.yml`, like `cpu`, `lm

## Run docker image

You can find DJL latest release docker image on [dockerhub](https://hub.docker.com/r/deepjavalibrary/djl-serving/tags?page=1&name=0.28.0).
You can find DJL latest release docker image on [dockerhub](https://hub.docker.com/r/deepjavalibrary/djl-serving/tags?page=1&name=0.30.0).
DJLServing also publishes nightly publish to the [dockerhub nightly](https://hub.docker.com/r/deepjavalibrary/djl-serving/tags?page=1&name=nightly).
You can just pull the image you need from there.

Expand All @@ -29,55 +29,55 @@ Here are a few examples to run djl-serving docker image:
### CPU

```shell
docker pull deepjavalibrary/djl-serving:0.28.0
docker pull deepjavalibrary/djl-serving:0.30.0

mkdir models
cd models
curl -O https://resources.djl.ai/test-models/pytorch/bert_qa_jit.tar.gz

docker run -it --rm -v $PWD:/opt/ml/model -p 8080:8080 deepjavalibrary/djl-serving:0.28.0
docker run -it --rm -v $PWD:/opt/ml/model -p 8080:8080 deepjavalibrary/djl-serving:0.30.0
```

### GPU

```shell
docker pull deepjavalibrary/djl-serving:0.28.0-pytorch-gpu
docker pull deepjavalibrary/djl-serving:0.30.0-pytorch-gpu

mkdir models
cd models
curl -O https://resources.djl.ai/test-models/pytorch/bert_qa_jit.tar.gz

docker run -it --runtime=nvidia --shm-size 2g -v $PWD:/opt/ml/model -p 8080:8080 deepjavalibrary/djl-serving:0.28.0-pytorch-gpu
docker run -it --runtime=nvidia --shm-size 2g -v $PWD:/opt/ml/model -p 8080:8080 deepjavalibrary/djl-serving:0.30.0-pytorch-gpu
```

### AWS Inferentia

```shell
docker pull deepjavalibrary/djl-serving:0.28.0-pytorch-inf2
docker pull deepjavalibrary/djl-serving:0.30.0-pytorch-inf2

mkdir models
cd models

curl -O https://resources.djl.ai/test-models/pytorch/resnet18_inf2_2_4.tar.gz
docker run --device /dev/neuron0 -it --rm -v $PWD:/opt/ml/model -p 8080:8080 deepjavalibrary/djl-serving:0.28.0-pytorch-inf2
docker run --device /dev/neuron0 -it --rm -v $PWD:/opt/ml/model -p 8080:8080 deepjavalibrary/djl-serving:0.30.0-pytorch-inf2
```

### aarch64 machine

```shell
docker pull deepjavalibrary/djl-serving:0.28.0-aarch64
docker pull deepjavalibrary/djl-serving:0.30.0-aarch64

mkdir models
cd models

curl -O https://resources.djl.ai/test-models/pytorch/resnet18_inf2_2_4.tar.gz
docker run --device /dev/neuron0 -it --rm -v $PWD:/opt/ml/model -p 8080:8080 deepjavalibrary/djl-serving:0.28.0-aarch64
docker run --device /dev/neuron0 -it --rm -v $PWD:/opt/ml/model -p 8080:8080 deepjavalibrary/djl-serving:0.30.0-aarch64
```

## Run docker image with custom command line arguments

You can pass command line arguments to `djl-serving` directly when you using `docker run`

```
docker run -it --rm -p 8080:8080 deepjavalibrary/djl-serving:0.28.0 djl-serving -m "djl://ai.djl.huggingface.pytorch/sentence-transformers/all-MiniLM-L6-v2"
docker run -it --rm -p 8080:8080 deepjavalibrary/djl-serving:0.30.0 djl-serving -m "djl://ai.djl.huggingface.pytorch/sentence-transformers/all-MiniLM-L6-v2"
```
2 changes: 1 addition & 1 deletion serving/docs/lmi/deployment_guide/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ A more in-depth explanation about configurations is presented in the deployment
| | HuggingFace Accelerate | LMI_dist (9.0.0) | TensorRTLLM (0.8.0) | TransformersNeuronX (2.18.0) | vLLM (0.3.3) |
|---------------------------------------|------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
| DLC | LMI | LMI | LMI TRTLLM | LMI Neuron | LMI |
| Default handler | [huggingface](https://github.com/deepjavalibrary/djl-serving/blob/0.28.0-dlc/engines/python/setup/djl_python/huggingface.py) | [huggingface](https://github.com/deepjavalibrary/djl-serving/blob/0.28.0-dlc/engines/python/setup/djl_python/huggingface.py) | [tensorrt-llm](https://github.com/deepjavalibrary/djl-serving/blob/0.28.0-dlc/engines/python/setup/djl_python/tensorrt_llm.py) | [transformers-neuronx](https://github.com/deepjavalibrary/djl-serving/blob/0.28.0-dlc/engines/python/setup/djl_python/transformers_neuronx.py) | [huggingface](https://github.com/deepjavalibrary/djl-serving/blob/0.28.0-dlc/engines/python/setup/djl_python/huggingface.py) |
| Default handler | [huggingface](https://github.com/deepjavalibrary/djl-serving/blob/0.30.0-dlc/engines/python/setup/djl_python/huggingface.py) | [huggingface](https://github.com/deepjavalibrary/djl-serving/blob/0.30.0-dlc/engines/python/setup/djl_python/huggingface.py) | [tensorrt-llm](https://github.com/deepjavalibrary/djl-serving/blob/0.30.0-dlc/engines/python/setup/djl_python/tensorrt_llm.py) | [transformers-neuronx](https://github.com/deepjavalibrary/djl-serving/blob/0.30.0-dlc/engines/python/setup/djl_python/transformers_neuronx.py) | [huggingface](https://github.com/deepjavalibrary/djl-serving/blob/0.30.0-dlc/engines/python/setup/djl_python/huggingface.py) |
| support quantization | BitsandBytes/GPTQ | GPTQ/AWQ | SmoothQuant, AWQ, GPTQ | INT8 | GPTQ/AWQ |
| AWS machine supported | G4/G5/G6/P4D/P5 | G5/G6/P4D/P5 | G5/G6/P4D/P5 | INF2/TRN1 | G4/G5/G6/P4D/P5 |
| execution mode | Python | MPI | MPI | Python | Python |
Expand Down
4 changes: 2 additions & 2 deletions serving/docs/lmi/deployment_guide/deploying-your-endpoint.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ sagemaker_session = sagemaker.session.Session()
region = sagemaker_session._region_name
# get the lmi image uri
# available frameworks: "djl-lmi" (for vllm, lmi-dist), "djl-tensorrtllm" (for tensorrt-llm), "djl-neuronx" (for transformers neuronx)
container_uri = sagemaker.image_uris.retrieve(framework="djl-lmi", version="0.28.0", region=region)
container_uri = sagemaker.image_uris.retrieve(framework="djl-lmi", version="0.30.0", region=region)
# create a unique endpoint name
endpoint_name = sagemaker.utils.name_from_base("my-lmi-endpoint")
# s3 uri object prefix under which the serving.properties and optional model artifacts are stored
Expand Down Expand Up @@ -107,7 +107,7 @@ sagemaker_session = sagemaker.session.Session()
region = sagemaker_session._region_name
# get the lmi image uri
# available frameworks: "djl-lmi" (for vllm, lmi-dist), "djl-tensorrtllm" (for tensorrt-llm), "djl-neuronx" (for transformers neuronx)
container_uri = sagemaker.image_uris.retrieve(framework="djl-lmi", version="0.28.0", region=region)
container_uri = sagemaker.image_uris.retrieve(framework="djl-lmi", version="0.30.0", region=region)
# create a unique endpoint name
endpoint_name = sagemaker.utils.name_from_base("my-lmi-endpoint")
# instance type you will deploy your model to
Expand Down
4 changes: 2 additions & 2 deletions serving/docs/lmi/deployment_guide/testing-custom-script.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ For example:

```
docker run -it -p 8080:8080 --shm-size=12g --runtime=nvidia -v /home/ubuntu/test.py:/workplace/test.py \
763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.28.0-lmi10.0.0-cu124 /bin/bash
763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.30.0-lmi12.0.0-cu124 /bin/bash
```

### Step 2: Install DJLServing Python module
Expand All @@ -36,7 +36,7 @@ pip install git+https://github.com/deepjavalibrary/djl-serving.git#subdirectory=
### From a specific DLC version

```
pip install git+https://github.com/deepjavalibrary/djl-serving.git@0.28.0-dlc#subdirectory=engines/python/setup
pip install git+https://github.com/deepjavalibrary/djl-serving.git@0.30.0-dlc#subdirectory=engines/python/setup
```

## Tutorial 1: Running with default handler with rolling batch
Expand Down
3 changes: 1 addition & 2 deletions serving/docs/lmi/tutorials/tnx_aot_tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,7 @@ For example:
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-west-2.amazonaws.com

# Download docker image
docker pull 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.28.0-neuronx-sdk2.18.2

docker pull 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.29.0-neuronx-sdk2.19.1
```

### Step 3: Set the environment variables:
Expand Down
4 changes: 2 additions & 2 deletions serving/docs/lmi/tutorials/trtllm_aot_tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Refer [here](https://github.com/aws/deep-learning-containers/blob/master/availab
For example:

```
docker pull 763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.28.0-tensorrtllm0.9.0-cu122
docker pull 763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.29.0-tensorrtllm0.11.0-cu124
```

### Step 3: Set the environment variables:
Expand Down Expand Up @@ -91,7 +91,7 @@ docker run --runtime=nvidia --gpus all --shm-size 12gb \
-e OPTION_TENSOR_PARALLEL_DEGREE=$OPTION_TENSOR_PARALLEL_DEGREE \
-e OPTION_MAX_ROLLING_BATCH_SIZE=$OPTION_MAX_ROLLING_BATCH_SIZE \
-e OPTION_DTYPE=$OPTION_DTYPE \
763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.28.0-tensorrtllm0.9.0-cu122 python /opt/djl/partition/trt_llm_partition.py \
763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.29.0-tensorrtllm0.11.0-cu124 python /opt/djl/partition/trt_llm_partition.py \
--properties_dir $PWD \
--trt_llm_model_repo /tmp/trtllm \
--tensor_parallel_degree $OPTION_TENSOR_PARALLEL_DEGREE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ docker run -it --runtime=nvidia --gpus all --shm-size 12gb \
-p 8080:8080 \
-v /opt/dlami/nvme/large_store:/opt/djl/large_store \
-v /opt/dlami/nvme/tmp/.cache:/tmp/.cache \
763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.28.0-tensorrtllm0.9.0-cu122 /bin/bash
763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.29.0-tensorrtllm0.11.0-cu124 /bin/bash
```

Here we assume you are using g5, g6, p4d, p4de or p5 machine that has NVMe disk available.
Expand Down
4 changes: 2 additions & 2 deletions serving/docs/lmi/user_guides/chat_input_output_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This document describes the API schema for the chat completions endpoints (`v1/chat/completions`) when using the built-in inference handlers in LMI containers.
This schema is applicable to our latest release, v0.30.0, and is compatible with [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat/create).
Documentation for previous releases is available on our GitHub on the relevant version branch (e.g. 0.28.0-dlc).
Documentation for previous releases is available on our GitHub on the relevant version branch (e.g. 0.30.0-dlc).

On SageMaker, Chat Completions API schema is supported with the `/invocations` endpoint without additional configurations.
If the request contains the "messages" field, LMI will treat the request as a chat completions style request, and respond
Expand Down Expand Up @@ -301,4 +301,4 @@ Example:
"completion_tokens":100,
"total_tokens":133
}
```
```
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ public void testInstallDependency() throws IOException {
DependencyManager dm = DependencyManager.getInstance();
dm.installEngine("XGBoost");

dm.installDependency("ai.djl.pytorch:pytorch-jni:2.1.1-0.27.0");
dm.installDependency("ai.djl.pytorch:pytorch-jni:2.4.0-0.30.0");

Assert.assertThrows(() -> dm.installDependency("ai.djl.pytorch:pytorch-jni"));
} finally {
Expand Down
2 changes: 1 addition & 1 deletion wlm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ You can pull the server from the central Maven repository by including the follo
<dependency>
<groupId>ai.djl.serving</groupId>
<artifactId>wlm</artifactId>
<version>0.28.0</version>
<version>0.30.0</version>
</dependency>
```

0 comments on commit ac65b3c

Please sign in to comment.