Skip to content
This repository has been archived by the owner on Oct 16, 2023. It is now read-only.

Commit

Permalink
Merge pull request #77 from hpcaitech/feature/open_source
Browse files Browse the repository at this point in the history
change project name
  • Loading branch information
MaruyamaAya authored May 25, 2022
2 parents bdb2e35 + eccb434 commit 2d1dee7
Show file tree
Hide file tree
Showing 80 changed files with 135 additions and 192 deletions.
8 changes: 4 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Contributing

The ColossalAI Inference project is always open for constructive suggestion and contributions from the community. We sincerely invite you to take a part in making this project more friendly and easier to use.
The EnergonAI project is always open for constructive suggestion and contributions from the community. We sincerely invite you to take a part in making this project more friendly and easier to use.

## Environment Setup
The first step of becoming a contributor would be setting up the environment for ColossalAI-Inference.
Run the following codes to build your own ColossalAI-Inference.
The first step of becoming a contributor would be setting up the environment for EnergonAI.
Run the following codes to build your own EnergonAI.

---
``` bash
$ git clone https://github.com/hpcaitech/ColossalAI-Inference.git
$ git clone https://github.com/hpcaitech/EnergonAI.git
$ python setup.py install or python setup.py develop
```

Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,19 @@
<img src="https://user-images.githubusercontent.com/12018307/170214566-b611b131-fff1-41c0-9447-786a8a6f0bac.png" width = "600" height = "148" alt="Architecture" align=center />
</div>

# Energon
# Energon-AI

![](https://img.shields.io/badge/Made%20with-ColossalAI-blueviolet?style=flat)
[![GitHub license](https://img.shields.io/github/license/hpcaitech/FastFold)](https://github.com/hpcaitech/ColossalAI-Inference/blob/main/LICENSE)


A Large-scale Model Inference System.
Energon provides 3 levels of abstraction for enabling the large-scale model inference:
EnergonAI provides 3 levels of abstraction for enabling the large-scale model inference:
- **Runtime** - tensor parallel operations, pipeline parallel wrapper, distributed message queue, distributed checkpoint loading, customized CUDA kernels.
- **Engine** - encapsulate the single instance multiple devices (SIMD) execution with the remote procedure call, which acts as the single instance single device (SISD) execution.
- **Serving** - batching requests, managing engines.

For models trained by [Colossal-AI](https://github.com/hpcaitech/ColossalAI), they can be seamlessly transferred to Energon.
For models trained by [Colossal-AI](https://github.com/hpcaitech/ColossalAI), they can be seamlessly transferred to EnergonAI.
For single-device models, they require manual coding works to introduce tensor parallelism and pipeline parallelism.

At present, we pre-build distributed Bert and GPT models.
Expand All @@ -40,7 +40,7 @@ $ wget https://huggingface.co/gpt2/blob/main/merges.txt

# Launch the service
export PYTHONPATH=~/ColossalAI-Inference/examples/hf_gpt2
energon service init --config_file=~/ColossalAI-Inference/hf_gpt2/hf_gpt2_config.py
energonai service init --config_file=~/ColossalAI-Inference/hf_gpt2/hf_gpt2_config.py

# Request for the service
Method 1:
Expand All @@ -57,15 +57,15 @@ Method 2:

Here GPT3-12-layers in FP16 is adopted.
Here a node with 8 A100 80 GB GPUs is adopted. GPUs are fully connected with NvLink.
Energon adopts the redundant computation elimination method from [EffectiveTransformer](https://github.com/bytedance/effective_transformer) and the sequence length is set the half of the padding length.
EnergonAI adopts the redundant computation elimination method from [EffectiveTransformer](https://github.com/bytedance/effective_transformer) and the sequence length is set the half of the padding length.
<div align="center">
<img src="https://user-images.githubusercontent.com/12018307/168971637-ffd1d6ba-44bb-4043-a275-3dc2a008c048.png" width = "600" height = "240" alt="Architecture" align=center />
</div>

#### Latency
Here GPT3 in FP16 is adopted.
Here a node with 8 A100 80 GB GPUs is adopted. Every two GPUs are connected with NvLink.
Here the sequence length is set the half of the padding length when using redundant computation elimination method, which is the Energon(RM).
Here the sequence length is set the half of the padding length when using redundant computation elimination method, which is the EnergonAI(RM).
Here FasterTransformer is adopted in comparison and it does not support the redundant computation elimination method in the distributed execution.
<div align="center">
<img src="https://user-images.githubusercontent.com/12018307/169728315-8ac95e4f-3e81-44e5-b82b-5873ffe85351.png" width = "600" height = "300" alt="Architecture" align=center />
Expand Down
13 changes: 0 additions & 13 deletions energon/kernel/cuda_native/scale_mask_softmax.py

This file was deleted.

File renamed without changes.
2 changes: 1 addition & 1 deletion energon/cli/__init__.py → energonai/cli/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import click
import typer
from energon.cli.service import service
from energonai.cli.service import service

app = typer.Typer()

Expand Down
4 changes: 2 additions & 2 deletions energon/cli/service.py → energonai/cli/service.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
import click
import torch
import inspect
import energon.server as server
import energonai.server as server
import multiprocessing as mp

from energon.context import Config
from energonai.context import Config


def launches(model_class=None,
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

from colossalai.core import global_context as gpc
from colossalai.context import ParallelMode
from energon.utils import get_current_device
from energonai.utils import get_current_device


def all_gather(tensor: Tensor, dim: int, parallel_mode: ParallelMode, async_op: bool = False) -> Tensor:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

from colossalai.core import global_context as gpc
from colossalai.context import ParallelMode
from energon.utils import get_current_device
from energonai.utils import get_current_device
from functools import reduce
import operator
from .utils import split_tensor_into_1d_equal_chunks, gather_split_1d_tensor
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

from colossalai.core import global_context as gpc
from colossalai.context import ParallelMode
from energon.utils import get_current_device, synchronize
from energonai.utils import get_current_device, synchronize


def ring_forward(tensor_send_next: torch.Tensor, parallel_mode: ParallelMode):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

from colossalai.core import global_context as gpc
from colossalai.context import ParallelMode
from energon.utils import get_current_device
from energonai.utils import get_current_device


def send_tensor_meta(tensor, need_meta=True, next_rank=None):
Expand Down
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion energon/context/config.py → energonai/context/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import sys
from importlib.machinery import SourceFileLoader
from pathlib import Path
from energon.logging import get_dist_logger
from energonai.logging import get_dist_logger


class Config(dict):
Expand Down
File renamed without changes.
6 changes: 3 additions & 3 deletions energon/engine/engine.py → energonai/engine/engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,12 @@

from colossalai.core import global_context as gpc
from colossalai.context import ParallelMode
from energon.initialize import launch_from_multiprocess
from energonai.initialize import launch_from_multiprocess

from energon.utils import ensure_directory_exists
from energonai.utils import ensure_directory_exists
from colossalai.logging import get_dist_logger

logger = get_dist_logger('energon')
logger = get_dist_logger('energonai')


class InferenceEngine(Module):
Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import torch.distributed as dist
from typing import List, Tuple, Union

from energon.communication import send_forward, recv_forward, send_tensor_meta, recv_tensor_meta
from energonai.communication import send_forward, recv_forward, send_tensor_meta, recv_tensor_meta
from colossalai.context import ParallelMode
from colossalai.core import global_context as gpc

Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from .vit_pipeline_wrapper import ViTPipelineCommWrapper
from colossalai.logging import get_dist_logger

logger = get_dist_logger('energon')
logger = get_dist_logger('energonai')

pipe_wrapper = {
'vit': ViTPipelineCommWrapper,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import torch.distributed as dist
from typing import List, Tuple, Union

from energon.communication import send_forward, recv_forward, send_tensor_meta, recv_tensor_meta
from energonai.communication import send_forward, recv_forward, send_tensor_meta, recv_tensor_meta
from colossalai.context import ParallelMode
from colossalai.core import global_context as gpc

Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def __init__(self, normalized_shape, eps=1e-5, device=None, dtype=None):
global colossal_layer_norm_cuda
if colossal_layer_norm_cuda is None:
try:
colossal_layer_norm_cuda = importlib.import_module("energon_layer_norm")
colossal_layer_norm_cuda = importlib.import_module("energonai_layer_norm")
except ImportError:
raise RuntimeError('MixedFusedLayerNorm requires cuda extensions')

Expand Down
13 changes: 13 additions & 0 deletions energonai/kernel/cuda_native/scale_mask_softmax.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
import torch
import importlib

try:
energonai_scale_mask = importlib.import_module("energonai_scale_mask")
except ImportError:
raise RuntimeError('energonai_scale_mask requires cuda extensions')


def scale_mask_softmax(batch_size, batch_seq_len, head_num, src, seq_len_list):
src = src.contiguous()
dst = energonai_scale_mask.scale_mask_softmax_wrapper(batch_size, batch_seq_len, head_num, src, seq_len_list)
return dst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import importlib

try:
energon_transpose_pad = importlib.import_module("energon_transpose_pad")
energonai_transpose_pad = importlib.import_module("energonai_transpose_pad")
except ImportError:
raise RuntimeError('transpose_pad requires cuda extensions')

Expand All @@ -12,7 +12,7 @@
def transpose_pad(src, batch_size, max_seq_len, seq_len_list, head_num, size_per_head):
src = src.contiguous()

dst = energon_transpose_pad.transpose_pad_wrapper(src, batch_size, max_seq_len, seq_len_list, head_num,
dst = energonai_transpose_pad.transpose_pad_wrapper(src, batch_size, max_seq_len, seq_len_list, head_num,
size_per_head)

return dst
Expand All @@ -21,7 +21,7 @@ def transpose_pad(src, batch_size, max_seq_len, seq_len_list, head_num, size_per
def transpose_depad(src, batch_size, sum_seq, max_seq_len, seq_len_list, head_num, size_per_head):
src = src.contiguous()

dst = energon_transpose_pad.transpose_depad_wrapper(src, batch_size, sum_seq, max_seq_len, seq_len_list, head_num,
dst = energonai_transpose_pad.transpose_depad_wrapper(src, batch_size, sum_seq, max_seq_len, seq_len_list, head_num,
size_per_head)

return dst
Expand All @@ -44,7 +44,7 @@ def ft_build_padding_offsets(seq_lens, batch_size, max_seq_len, valid_word_num,
seq_lens = seq_lens.contiguous()
# tmp_mask_offset = tmp_mask_offset.contiguous()

energon_transpose_pad.ft_build_padding_offsets_wrapper(seq_lens, batch_size, max_seq_len, valid_word_num,
energonai_transpose_pad.ft_build_padding_offsets_wrapper(seq_lens, batch_size, max_seq_len, valid_word_num,
tmp_mask_offset)


Expand All @@ -53,15 +53,15 @@ def ft_remove_padding(src, tmp_mask_offset, mask_offset, valid_word_num, hidden_
# tmp_mask_offset = tmp_mask_offset.contiguous()
# mask_offset = mask_offset.contiguous()

dst = energon_transpose_pad.ft_remove_padding_wrapper(src, tmp_mask_offset, mask_offset, valid_word_num, hidden_dim)
dst = energonai_transpose_pad.ft_remove_padding_wrapper(src, tmp_mask_offset, mask_offset, valid_word_num, hidden_dim)
return dst


def ft_rebuild_padding(src, mask_offset, valid_word_num, hidden_dim, batch_size, max_seq_len):
src = src.contiguous()
# mask_offset = mask_offset.contiguous()

dst = energon_transpose_pad.ft_rebuild_padding_wrapper(src, mask_offset, valid_word_num, hidden_dim, batch_size,
dst = energonai_transpose_pad.ft_rebuild_padding_wrapper(src, mask_offset, valid_word_num, hidden_dim, batch_size,
max_seq_len)
return dst

Expand All @@ -75,13 +75,13 @@ def ft_transpose_rebuild_padding(Q, K, V, q_buf, k_buf, v_buf, batch_size, seq_l
k_buf = k_buf.contiguous()
v_buf = v_buf.contiguous()

energon_transpose_pad.ft_transpose_rebuild_padding_wrapper(Q, K, V, q_buf, k_buf, v_buf, batch_size, seq_len,
energonai_transpose_pad.ft_transpose_rebuild_padding_wrapper(Q, K, V, q_buf, k_buf, v_buf, batch_size, seq_len,
head_num, size_per_head, valid_word_num, mask_offset)


def ft_transpose_remove_padding(src, valid_word_num, batch_size, seq_len, head_num, size_per_head, mask_offset):
src = src.contiguous()

dst = energon_transpose_pad.ft_transpose_remove_padding_wrapper(src, valid_word_num, batch_size, seq_len, head_num,
dst = energonai_transpose_pad.ft_transpose_remove_padding_wrapper(src, valid_word_num, batch_size, seq_len, head_num,
size_per_head, mask_offset)
return dst
File renamed without changes.
20 changes: 10 additions & 10 deletions energon/logging/logging.py → energonai/logging/logging.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
#!/usr/bin/env python
# -*- encoding: utf-8 -*-

import energon
import energonai
import logging
from pathlib import Path
from typing import Union

from colossalai.context import ParallelMode

_FORMAT = 'energon - %(name)s - %(asctime)s %(levelname)s: %(message)s'
_FORMAT = 'energonai - %(name)s - %(asctime)s %(levelname)s: %(message)s'
logging.basicConfig(level=logging.INFO, format=_FORMAT)


Expand Down Expand Up @@ -39,7 +39,7 @@ def get_instance(name: str):
def __init__(self, name):
if name in DistributedLogger.__instances:
raise Exception(
'Logger with the same name has been created, you should use energon.logging.get_dist_logger')
'Logger with the same name has been created, you should use energonai.logging.get_dist_logger')
else:
self._name = name
self._logger = logging.getLogger(name)
Expand Down Expand Up @@ -77,10 +77,10 @@ def log_to_file(self, path: Union[str, Path], mode: str = 'a', level: str = 'INF
path = Path(path)

# set the default file name if path is a directory
if not energon.core.global_context.is_initialized(ParallelMode.GLOBAL):
if not energonai.core.global_context.is_initialized(ParallelMode.GLOBAL):
rank = 0
else:
rank = energon.core.global_context.get_global_rank()
rank = energonai.core.global_context.get_global_rank()

if suffix is not None:
log_file_name = f'rank_{rank}_{suffix}.log'
Expand All @@ -99,7 +99,7 @@ def _log(self, level, message: str, parallel_mode: ParallelMode = ParallelMode.G
if ranks is None:
getattr(self._logger, level)(message)
else:
local_rank = energon.core.global_context.get_local_rank(parallel_mode)
local_rank = energonai.core.global_context.get_local_rank(parallel_mode)
if local_rank in ranks:
getattr(self._logger, level)(message)

Expand All @@ -109,7 +109,7 @@ def info(self, message: str, parallel_mode: ParallelMode = ParallelMode.GLOBAL,
:param message: The message to be logged
:type message: str
:param parallel_mode: The parallel mode used for logging. Defaults to ParallelMode.GLOBAL
:type parallel_mode: :class:`energon.context.parallel_mode.ParallelMode`
:type parallel_mode: :class:`energonai.context.parallel_mode.ParallelMode`
:param ranks: List of parallel ranks
:type ranks: list
"""
Expand All @@ -121,7 +121,7 @@ def warning(self, message: str, parallel_mode: ParallelMode = ParallelMode.GLOBA
:param message: The message to be logged
:type message: str
:param parallel_mode: The parallel mode used for logging. Defaults to ParallelMode.GLOBAL
:type parallel_mode: :class:`energon.context.parallel_mode.ParallelMode`
:type parallel_mode: :class:`energonai.context.parallel_mode.ParallelMode`
:param ranks: List of parallel ranks
:type ranks: list
"""
Expand All @@ -133,7 +133,7 @@ def debug(self, message: str, parallel_mode: ParallelMode = ParallelMode.GLOBAL,
:param message: The message to be logged
:type message: str
:param parallel_mode: The parallel mode used for logging. Defaults to ParallelMode.GLOBAL
:type parallel_mode: :class:`energon.context.parallel_mode.ParallelMode`
:type parallel_mode: :class:`energonai.context.parallel_mode.ParallelMode`
:param ranks: List of parallel ranks
:type ranks: list
"""
Expand All @@ -145,7 +145,7 @@ def error(self, message: str, parallel_mode: ParallelMode = ParallelMode.GLOBAL,
:param message: The message to be logged
:type message: str
:param parallel_mode: The parallel mode used for logging. Defaults to ParallelMode.GLOBAL
:type parallel_mode: :class:`energon.context.parallel_mode.ParallelMode`
:type parallel_mode: :class:`energonai.context.parallel_mode.ParallelMode`
:param ranks: List of parallel ranks
:type ranks: list
"""
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit 2d1dee7

Please sign in to comment.