Skip to content

The Dynamic Cluster Compiler (DCC) is a compiler that allows you to compile any project that uses a makefile across a cluster of servers, or nodes.

License

Notifications You must be signed in to change notification settings

DarkAssassin23/Dynamic_Cluster_Compiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dynamic Cluster Compiler (DCC)

Table of Contents


About

The Dynamic Cluster Compiler allows you to compile your program across a cluster of servers, or nodes, rather than being forced to compile it on your local machine or a singular remote host. This, potentially, gives you access to more compute resources than you otherwise would have on a single machine. As a result, this would allow you to achieve faster compile and build times. Assuming, of course, that your code base is large enough to take advantage of a distributed compiler and the initial overhead can be made up by the even more parallelized compilation via the processing power of multiple servers. This compiler works with any programming language that can be compiled with a makefile.

More information about how it works can be found at my blog post which you can find here.


How it works

Preping the source code

The Dynamic Cluster Compiler starts by parsing the distributed.cfg file. This file must be in the root of the given directory. This can be done by either running the compiler in your current directory and having a distributed.cfg file present, or passing in a command-line argument for a directory that contains the distributed.cfg file.

The compiler then creates folders for each node in the cluster. It pulls this information from the inventory.ini file. By default, the compiler will look for that file is in the root of the directory, just like the distributed.cfg file. However, where the compiler looks for the inventory.ini file can be configured in the distributed.cfg file. The compiler parses the inventory.ini file and pulls out the worker nodes as well as the master node. It then checks to make sure each host is online via a ping check. If a host fails the ping check, it is assumed to be down, and a folder will not be created for it. If the master node fails the ping check, the compiler will exit.

Once the folders are created, it then sifts through all of your source code and evenly distributes it amongst the various folders it created for each node. Which files are source code is denoted by the fileTypes variable in the distributed.cfg file.

Compiling across a cluster

After the source code is distributed, it then kicks off an Ansible playbook to orchestrate the compiling process across all the nodes. Using information found in the mnt.cfg file, it mounts a network share on each of the nodes. Like the other files mentioned, by default the compiler looks for it in the root of the directory, but where the compiler looks can be configured in the distributed.cfg file. Once the share is mounted, each node will compile its assigned source code before the master node links everything together to create the executable.

Custom makefiles

Example makefiles can be found in the examples section of this repo. Your standard makefile will not work with this compiler, you will have to modify your makefile slightly in order for your code to compile correctly. In the examples section of the repo, you'll be able to see how a standard makefile compares to a distributed one.

The most notable changes are having two variables to track output files required to build your application and having defined rules for compiling only the source code in the current directory and the executable. In regards to the variables used to track output files, the first must include all of the files required to build your application, while the second should only contain the files that will be generated by the current directory. This is because the source code is distributed, and you only want nodes compiling the code assigned to them. I.e., only compile the code in their "select" folder. However, you also need to make sure that when the master node takes all the output files to make the application, all of the required files are accounted for.

By default, the compiler will look for your distributed_makefile in the root of the directory, but where the compiler looks can be configured in the distributed.cfg file. This makefile will be copied to each folder for each node that will do work.


Requiernments

Below are system requierments for: the nodes doing the compiling, the source code directory, and the machine initiating the compiler to run.

  • sshfs

    • sshfs must be installed on each node in the cluster
    • Each node in the cluster must have swapped ssh keys with the remote server where the code is
      • As such, the server where the source code is located needs to be able to accept ssh connections
  • The following files are required:

    • distributed.cfg
      • This file is required to be in the root of the given directory
    • inventory.ini
    • mnt.cfg
    • distributed_makefile

NOTE: Examples for each of these configs can be found in the examples directory of this repo

  • Ansible
    • The device running the compiler program must have Ansible installed

About

The Dynamic Cluster Compiler (DCC) is a compiler that allows you to compile any project that uses a makefile across a cluster of servers, or nodes.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages