PPLNN
, which is short for "PPLNN is a Primitive Library for Neural Network", is a high-performance deep-learning inference engine for efficient AI inferencing. It can run various ONNX models and has better support for OpenMMLab.
- PMX has changed to OPMX at 25/04/2024.
- ChatGLM1 will not be supported in OPMX.
- All LLM must be converted(or just rename
pmx_params.json
toopmx_params.json
) and exported again. - You can find the old code at llm_v1
- NCCL issue on some Device: Currently reported that L40S and H800 may encounter illegal memory access on NCCL AllReduce. We suggest trying to turn NCCL protocol
Simple
off by setting environmentNCCL_PROTO=^Simple
to fix this issue.
- New LLM Engine(Overview)
- Flash Attention
- Split-k Attention(Similar with Flash Decoding)
- Group-query Attention
- Dynamic Batching(Also called Continous Batching or In-flight Batching)
- Tensor Parallelism
- Graph Optimization
- INT8 groupwise KV Cache(Numerical accuracy is very close to FP16🚀)
- INT8 per token per channel Quantization(W8A8)
-
Installing prerequisites:
- On Debian or Ubuntu:
apt-get install build-essential cmake git python3 python3-dev
- On RedHat or CentOS:
yum install gcc gcc-c++ cmake3 make git python3 python3-devel
-
Cloning source code:
git clone https://github.com/openppl-public/ppl.nn.git
- Building from source:
cd ppl.nn
./build.sh -DPPLNN_USE_X86_64=ON -DPPLNN_ENABLE_PYTHON_API=ON
- Running python demo:
PYTHONPATH=./pplnn-build/install/lib python3 ./tools/pplnn.py --use-x86 --onnx-model tests/testdata/conv.onnx
Refer to Documents for more details.
- Building from Source
- How to Integrate
- APIs
- C++
- Python
- Develop Guide
- Adding New Engines and Ops
- X86
- CUDA
- RISCV
- ARM
- LLM-CUDA
- Models
- 实现细节
Questions, reports, and suggestions are welcome through GitHub Issues!
WeChat Official Account | QQ Group |
---|---|
OpenPPL | 627853444 |
This project uses Contributor Covenant as code of conduct. Any contributions would be highly appreciated.
This project is distributed under the Apache License, Version 2.0.