From 15f27590080016843363012594e7091e322d0d15 Mon Sep 17 00:00:00 2001 From: dujiangsu Date: Mon, 30 May 2022 10:12:38 +0800 Subject: [PATCH] update readme --- README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 4b74d3f..8a0164c 100644 --- a/README.md +++ b/README.md @@ -8,12 +8,12 @@ A Large-scale Model Inference System. -EnergonAI provides 3 levels of abstraction for enabling the large-scale model inference: +Energon-AI provides 3 levels of abstraction for enabling the large-scale model inference: - **Runtime** - tensor parallel operations, pipeline parallel wrapper, distributed message queue, distributed checkpoint loading, customized CUDA kernels. - **Engine** - encapsulate the single instance multiple devices (SIMD) execution with the remote procedure call, which acts as the single instance single device (SISD) execution. - **Serving** - batching requests, managing engines. -For models trained by [Colossal-AI](https://github.com/hpcaitech/ColossalAI), they can be seamlessly transferred to EnergonAI. +For models trained by [Colossal-AI](https://github.com/hpcaitech/ColossalAI), they can be seamlessly transferred to Energon-AI. For single-device models, they require manual coding works to introduce tensor parallelism and pipeline parallelism. At present, we pre-build distributed Bert and GPT models. @@ -22,7 +22,7 @@ For Bert, Google reports a [super-large Bert with 481B parameters](https://mlcom ### Installation ``` bash -$ git clone https://github.com/hpcaitech/ColossalAI-Inference.git +$ git clone git@github.com:hpcaitech/EnergonAI.git $ pip install -r requirements.txt $ pip install . ``` @@ -56,7 +56,7 @@ Method 2: Here GPT3-12-layers in FP16 is adopted. Here a node with 8 A100 80 GB GPUs is adopted. GPUs are fully connected with NvLink. -EnergonAI adopts the redundant computation elimination method from [EffectiveTransformer](https://github.com/bytedance/effective_transformer) and the sequence length is set the half of the padding length. +Energon-AI adopts the redundant computation elimination method from [EffectiveTransformer](https://github.com/bytedance/effective_transformer) and the sequence length is set the half of the padding length.
Architecture
@@ -64,7 +64,7 @@ EnergonAI adopts the redundant computation elimination method from [EffectiveTra #### Latency Here GPT3 in FP16 is adopted. Here a node with 8 A100 80 GB GPUs is adopted. Every two GPUs are connected with NvLink. -Here the sequence length is set the half of the padding length when using redundant computation elimination method, which is the EnergonAI(RM). +Here the sequence length is set the half of the padding length when using redundant computation elimination method, which is the Energon-AI(RM). Here FasterTransformer is adopted in comparison and it does not support the redundant computation elimination method in the distributed execution.
Architecture