This repository contains a C++ implementation of the FAR task scheduler, targeted for NVIDIA GPUs that support physical partitioning via Multi-Instance GPU (MIG). This scheduler is useful for reducing the joint execution time of tasks (makespan) by cleverly co-executing them with MIG. The FAR algorithm used by the scheduler is presented and accompanied by a comprehensive evaluation in this paper (preprint for the moment).
Currently, the software explicitly supports the following NVIDIA GPU models:
- NVIDIA A30
- NVIDIA A100
- NVIDIA H100
Moreover, it probably works correctly for models with the same MIG characteristics as some of the above. In particular, it will probably work for models NVIDIA B100 and B200 that have the same MIG partitioning as A100 and H100, although it is not tested for them.
To use this software, ensure the following prerequisites are met:
- An NVIDIA GPU with MIG technology (see previous section for supported models).
- Operating System: Linux recommended. NVML is available on Windows, but MIG management is limited or unsupported in some cases. Use on Windows is experimental and not guaranteed.
- NVIDIA Driver: Version 470 or later, with MIG support enabled.
- CUDA Toolkit: Version 11.0 or newer.
- NVIDIA Management Library (NVML): Comes bundled with the NVIDIA driver; ensure it is correctly installed (usually in
/usr/include/
or/usr/local/cuda/include
).
- CMake:
- Version 3.10 or newer to configure and build the project.
- C++ Compiler:
- g++ version at least 5.1 to robustly support C++11.
- Ensure you have administrative privileges to configure MIG instances on the GPU.
To install and build the project, follow these steps:
One option is to clone this repository on your local machine using the following command:
git clone https://github.com/artecs-group/FAR_MIG_scheduler.git
cd FAR_MIG_scheduler
Another option is to download it from one of the generated releases.
Edit the first line of /FAR_MIG_scheduler/CMakeLists.txt
changing the CUDA_ROOT path to point to your CUDA installation directory.
set(CUDA_ROOT "/usr/local/cuda" CACHE PATH "CUDA Toolkit root directory")
You must be located in the root directory of the project (FAR_MIG_scheduler
). Build it in a FAR_MIG_scheduler/build
directory:
mkdir -p build
cd build
cmake ..
This will create two executable files: mig_scheduler.exe
and mig_scheduler_debug.exe
. Both do the same thing, but the debugging one includes some stop points to make it easier to follow the execution (it stops and asks the user to type some key to continue).
make
The scheduler is ready to be tested with 9 kernels of the Rodinia suite. The code of Rodinia configured for that test is provided as file gpu-rodinia.tar.gz
, attached in the releases due to its huge size (some IO files are very large). Either of these two options can be used to include it:
-
Option 1: Run the
scripts/prepare_test_data.sh
script which will download the properly configured Rodinia kernels and data, unzip it and delete the compressed file.cd scripts sh prepare_test_data.sh
-
Option 2 Download the file
gpu-rodinia.tar.gz
from some release of this repository, and unzip its content using some tool like gzip in the pathFAR_MIG_scheduler/data/kernels
.
Once this is done, the test can be run by executing the scheduler as indicated in the next section. The path ../input_test/kernels_rodinia.txt
must be pass as task file (second argument of the program).
To use the software, invoke the mig_scheduler.exe
executable with the following arguments:
- GPU Index: Specify the index of the GPU where tasks will be executed. You can use the
nvidia-smi
command to list the GPUs available on your system and their respective indices. - Task File Path: Provide the path to a file containing information about the tasks to execute. The file should follow this format:
- Each line represents a task and contains three fields separated by spaces:
- Task Name: A unique identifier for the task, used for reporting during execution.
- Task Directory: The directory path where is located the script with the task.
- Task Script: The name of the script that executes the GPU kernels, defining the task.
- Each line represents a task and contains three fields separated by spaces:
To schedule a set of GPU tasks from the Rodinia suite (included in this repository), you can use the following command (its important to run it with sudo
administrative permissions to use MIG):
sudo ./mig_scheduler.exe 0 ../data/input_test/kernels_rodinia.txt
where:
0
specifies the GPU index (GPU 0 on the system).../data/input_test/kernels_rodinia.txt
is the path to the task file. Below is an example content of thekernels_rodinia.txt
file with 2 tasks:
gaussian ../kernels/gpu-rodinia/cuda/gaussian run
pathfinder ../kernels/gpu-rodinia/cuda/pathfinder run
Each line describes a task. For example the first task is named gaussian
. It is located in the directory ../kernels/gpu-rodinia/cuda/gaussian
, and the script run
in that directory is executed to perform the task.
During execution, the program provides detailed logging through standard output and error streams. These messages are prefixed with the tags INFO
for general information and ERROR
for critical issues. Most errors correspond to unrecoverable exceptions and terminate execution.
-
Initial GPU Information
At the start of execution, the program reports the provided GPU name and confirms that MIG has been activated:INFO: Device has been binded INFO: GPU model: NVIDIA A30 detected INFO: MIG has been activated
-
Profiling Times
The program profiles reconfiguration times (creation and destruction of MIG instances) and task execution times. Note: Task profiling is currently done by executing tasks but may be replaced with faster profiling methods in future versions (see the paper for references). Example output:INFO: Profiling instance creation and destruction times INFO: Instance(start=0, size=1) has been created INFO: Size 1. Creation time: 0.16s. Destruction time: 0.10s INFO: Instance(start=0, size=2) has been created INFO: Size 2. Creation time: 0.13s. Destruction time: 0.11s ... INFO: Task gaussian profiled with 22.13s in size 1
-
Repartitioning Tree and Scheduling Plan
The program calculates and outputs a partitioning tree that includes the scheduling plan calculated by the algorithm and the estimated execution times. These trees show the hierarchy of possible instances on the GPU being used across the nodes, storing in each of them a list with the names of the tasks to be executed (task
attribute), along with the list of the expected completion time for them (end
attribute). For more details on these trees see Section 3.2 of the paper, especially the references to Figure 4. Example of output:====================================== Node(start=0, size=4, tasks=[], ends=[]) --Node(start=0, size=2, tasks=[lavaMD], ends=[21.72]) ----Node(start=0, size=1, tasks=[huffman, heartwall], ends=[23.01, 23.05]) ----Node(start=1, size=1, tasks= [nw, particlefilter], ends=[22.79, 23.23]) --Node(start=2, size=2, tasks=[], ends=[]) ----Node(start=2, size=1, tasks=[gaussian], ends=[22.45]) ----Node(start=3, size=1, tasks=[pathfinder, lu], ends=[21.09, 28.86]) Tree makespan: 28.86s ======================================
In the example below, the
lavaMD
task has been assigned to the GPU instance spanning the first 2 slices (start = 0
andsize = 2
). This task is expected to complete in21.72s
after the execution of the entire set of tasks begins. Once it finishes, that GPU instance will be split into two instances of size 1: the first withstart = 0
andsize = 1
, and the second withstart = 1
andsize = 1
. Thehuffman
andheartwall
tasks are assigned to the first new instance and are expected to finish at23.01s
and23.05s
, respectively, from the start of the overall execution. Similarly, thenw
andparticlefilter
tasks are assigned to the second GPU instance, and are expected to finish at22.79s
and23.23s
, respectively, from the start. -
Task Execution
Tasks are executed on the GPU according to the scheduling plan, with real-time execution reports for each task, including real start and end times:INFO: Start task pathfinder at 0.12s ... INFO: End task pathfinder at 20.64s ... INFO: Start task huffman at 20.85s ... INFO: End task huffman at 23.54s
-
MIG Deactivation
After task execution is completed, the program deactivates MIG and logs the corresponding information:INFO: MIG has been deactivated
To facilitate the tracking of the execution through the above mentioned reports, during the execution of each task, the standard and error outputs are redirected to a file in with the path ./logs-<year>-<month>-<day>/<task name>.log
. These files, indexed by date, accumulate the output of the different executions of each task in a day. Thus, the standard output of the scheduler is much more summarized and shows the primary information of the proposal (as explained above).
- Bugs in NVIDIA A100 and H100: In these models, the MIG API of the NVML library currently has some bugs in the handling of some instances, and the scheduler may not work correctly. Most of the problems occur with the size 3 instances supported exclusively by these GPUs which, as explained in the paper, are of little importance for the scheduler. However, there are some other errors that may appear occasionally. With a future version of the library that fixes these problems the scheduler should work perfectly for these GPUs.
- Error in the destruction of instance “device in use by another user”: Sometimes an instance that has been released is briefly locked for a short period of time with the message “device in use by another user”. The code solves this by trying to remove it in a loop until it succeeds, usually after a few tenths of a second. This influences very slightly the actual execution of the schedule, which would be slightly improved if the library would fix this bug in the future.
The paper presenting this scheduler is currently under review. For the moment you can access the preprint.
This work is funded by Grant PID2021-126576NB-I00 funded by MCIN/AEI/10.13039/501100011033 and by "ERDF A way of making Europe".