FlagCX is a scalable and adaptive cross-chip communication library developed with the backing of the Beijing Academy of Artificial Intelligence (BAAI).
FlagCX is also a part of FlagAI-Open, an open-source initiative by BAAI that aims to foster an open-source ecosystem for AI technologies. It serves as a platform where developers, researchers, and AI enthusiasts can collaborate on various AI projects, contribute to the development of cutting-edge AI solutions, and share their work with the global community.
FlagCX leverages native collective communications libraries to provide the full support of single-chip communications on different platforms. In addition to its native x-CCL support, FlagCX provides an original device-buffer RDMA design to offer advanced support for cross-chip high-performance sendrecev operations (CORE
module), which can also be integrated with native x-CCL backends to enable optimized cross-chip collective communications. A comprehensive list of currently supported communication backends and their different capabilities are listed as follows:
Backend | NCCL | IXCCL | CNCL | GLOO | CORE+x-CCL |
---|---|---|---|---|---|
Mode | Homo | Homo | Homo | Hetero | Hetero |
send | ✓ | ✓ | ✓ | ✓ | ✓ |
recv | ✓ | ✓ | ✓ | ✓ | ✓ |
broadcast | ✓ | ✓ | ✓ | ✘ | ✓ |
gather | ✓ | ✓ | ✓ | ✘ | ✓ |
scatter | ✓ | ✓ | ✓ | ✘ | ✓ |
reduce | ✓ | ✓ | ✓ | ✘ | ✓ |
allreduce | ✓ | ✓ | ✓ | ✓ | ✓ |
allgather | ✓ | ✓ | ✓ | ✓ | ✓ |
reducescatter | ✓ | ✓ | ✓ | ✘ | ✓ |
alltoall | ✓ | ✓ | ✓ | ✓ | ✓ |
group ops | ✓ | ✓ | ✓ | ? | ? |
Note that Homo
and Hetero
modes refer to communications among homogeneous and heterogeneous clusters. All supported native collective communication libraries can be referenced through the links below:
- NCCL, NVIDIA Collective Communications Library.
- IXCCL, Iluvatar Corex Collective Communications Library.
- CNCL, Cambricon Communications Library.
- GLOO, Gloo Collective Communications Library.
FlagCX also develops plugins to integrate with upper-layer applications such as PyTorch based on its unified APIs. The table below presents the communication operations currently supported by the plugins of their corresponding frameworks, where the batch_XXX
and XXX_coalesced
ops refer to the usage of group primitives.
Plugin | PyTorch |
---|---|
send | ✓ |
recv | ✓ |
batch_isend_irecv | ✓ |
broadcast | ✓ |
all_reduce | ✓ |
all_reduce_coalesced | ✘ |
reduce | ✓ |
all_gather | ✓ |
all_gather_into_tensor_coalesced | ✘ |
gather | ✓ |
scatter | ✓ |
reduce_scatter | ✓ |
reduce_scatter_tensor_coalesced | ✘ |
all_to_all | ✓ |
-
Clone the repository:
git clone https://github.com/FlagOpen/FlagCX.git
-
Build the library with different flags targeting to different platforms:
cd FlagCX make [USE_NVIDIA/USE_ILUVATAR_COREX/USE_CAMBRICON/USE_GLOO]=1
The default install path is set to
build/
, you can manually setBUILDDIR
to specify the build path. You may also defineDEVICE_HOME
andCCL_HOME
to indicate the install paths of device runtime and communication libraries.
Tests for FlagCX are maintained in test/perf
.
cd test/perf
make
./test_allreduce -b 128M -e 8G -f 2
Note that the default MPI install path is set to /usr/local/mpi
, you may specify the MPI path with:
make MPI_HOME=<path to mpi install>
All tests support the same set of arguments:
- Sizes to scan
-b <min size in bytes>
minimum size to start with. Default: 1M.-e <max size in bytes>
maximum size to end at. Default: 1G.-f <increment factor>
multiplication factor between sizes. Default: 2.
- Performance
-w, <warmup iteration count>
number of warmup iterations (not timed). Default: 5.-n, <iteration count>
number of iterations. Default: 20.
- Utils
-p, <0/1>
print buffer info. Default: 0.-h
print help message. Default: disabled.
This project is licensed under the Apache License (Version 2.0).