GitHub - iDC-NEU/NeutronTask

NeutronTask is a Multi-GPU Graph Neural Networks (GNN) training framework that design GNN task parallelism and task-decoupled GNN training.

The code is implemented based on NeutronStar, extending its single-GPU execution to a multi-GPU platform.

Quick Start

A compiler supporting OpenMP and C++11 features (e.g. lambda expressions, multi-threading, etc.) is required.

cmake >=3.16.3

MPI for inter-process communication

cuda > 11.3 for GPU based graph operation.

libnuma for NUMA-aware memory allocation.

cub for GPU-based graph propagation

sudo apt install libnuma-dev"

libtorch version > 1.13 with gpu support for nn computation

unzip the libtorch package in the root dir of NeutronStar and change CMAKE_PREFIX_PATH in "CMakeList.txt"to your own path

download cub to the ./NeutronStar/cuda/ dictionary.

configure PATH and LD_LIBRARY_PATH for cuda and mpi

export CUDA_HOME=/usr/local/cuda-10.2
export MPI_HOME=/path/to/your/mpi
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
export PATH=$MPI_HOME/bin:$CUDA_HOME/bin:$PATH

clang-format is optional for auto-formatting:

sudo apt install clang-format

please able GPU compilation with the following command at configure time:

cmake -DCUDA_ENABLE=ON ..

To build:

mkdir build

cd build

cmake ..

make -j4

single-machine multi-GPUs

./run_nts.sh 1 task_parallelism.cfg

Dataset

`.edge` File

Purpose: Defines the graph's edges, representing the graph's topology.
Format: Each line represents an edge between two nodes:

node_id1	node_id2
1	2
2	3
3	4
4	1
1	3

`.feature` File

Purpose: Stores features for each node in the graph.
Format: Each row represents the feature vector of a node, optionally prefixed by the node ID:

node_id	feature1	feature2	feature3
1	0.1	0.2	0.3
2	0.4	0.5	0.6
3	0.7	0.8	0.9
4	1.0	1.1	1.2

`.label` File

Purpose: Stores labels for nodes or edges for classification or regression tasks.
Format: Each row represents a node/edge and its corresponding label:

node_id	label
1	0
2	1
3	0
4	1

`.mask` File

Purpose: Specifies which nodes or edges belong to training, validation, or testing sets.
Format: Each row is a boolean or binary indicator (1/0) for a node's inclusion in a dataset:

node_id	train_mask	val_mask	test_mask
1	1	0	0
2	0	1	0
3	0	0	1
4	1	0	0

CFG file

Section	Parameter	Description	Example
General Parameters	ALGORITHM	Specifies the algorithm to use (e.g., APPNP, GCN).	APPNP
	Decoupled	Set to 1 to enable decoupled training, 0 otherwise.	1
	SMALLGRAPH	Set to 1 for small graph optimizations, 0 otherwise.	0
Dataset Parameters	VERTICES	Number of nodes in the graph.	2708
	LAYERS	Defines the neural network architecture (e.g., input-hidden-output).	1433-64-7
	EDGE_FILE	Path to the graph's edge file.	data/cora.edge
	FEATURE_FILE	Path to the node feature file.	data/cora.feature
	LABEL_FILE	Path to the node label file.	data/cora.label
	MASK_FILE	Path to the file with train/val/test masks.	data/cora.mask
Training Parameters	EPOCHS	Number of training epochs.	200
	LEARN_RATE	Learning rate for the optimizer.	0.01
	WEIGHT_DECAY	Regularization to avoid overfitting.	5e-4
	DROP_RATE	Dropout rate for regularization during training.	0.5
Processing Parameters	PROC_CUDA	Set to 1 to use CUDA (GPU acceleration), 0 otherwise.	1
	GPUNUM	Number of GPUs to use.	2
	PIPELINENUM	Number of pipeline stages for processing.	4
Algorithm-Specific Parameters	ALPHA	PageRank teleport probability for APPNP.	0.1
	K	Number of propagation iterations.	8

toolkits

Toolkit	Description
APPNP_DP	Approximate Personalized Propagation of Neural Predictions (APPNP) using Data Parallelism
GCN_DP	Graph Convolutional Networks (GCN) with Data Parallelism using Data Parallelism
GCN_TP_TD_pipeline	Graph Convolutional Networks (GCN) with Data Parallelism using Task Parallelism and Task Decoupled Training and pipeline
GCN_TP_TD_pipeline_wopipeline	Graph Convolutional Networks (GCN) with Data Parallelism using Task Parallelism and Task Decoupled Training
GAT_Double_Decouple_pipeline	Graph Attention Networks (GAT)
These toolkits are configured via the `.cfg` file, which defines parameters such as algorithms, dataset paths, training hyperparameters, and processing options. Each toolkit can be customized by setting appropriate values in the configuration file to suit specific graph learning tasks.

baseline

Baseline script to run Sancus and DGL experiments

Sancus

bash baseline/sancus/light-dist-gnn/run.sh

DGL

bash baseline/dgl/examples/multigpu/run_random.sh
bash baseline/dgl/examples/pytorch/gcn/run.sh

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
baseline		baseline
build		build
comm		comm
configs		configs
core		core
cuda		cuda
dep/gemini		dep/gemini
log		log
nts_data		nts_data
toolkits		toolkits
.gitignore		.gitignore
APPNP.cfg		APPNP.cfg
Build.sh		Build.sh
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
copy_all.sh		copy_all.sh
gpu.sh		gpu.sh
monitor_gpu.py		monitor_gpu.py
run_nts.sh		run_nts.sh
run_random.sh		run_random.sh
task_parallelism.cfg		task_parallelism.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Start

Dataset

`.edge` File

`.feature` File

`.label` File

`.mask` File

CFG file

toolkits

baseline

Sancus

DGL

About

Releases

Packages

Contributors 2

Languages

License

iDC-NEU/NeutronTask

Folders and files

Latest commit

History

Repository files navigation

Quick Start

Dataset

.edge File

.feature File

.label File

.mask File

CFG file

toolkits

baseline

Sancus

DGL

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`.edge` File

`.feature` File

`.label` File

`.mask` File

Packages