Gattaca is published in Molecular Biology & Evolution which can be found here, or read the open-access preprint on BioRxiv.
GattacaExample.Gattaca is a method for tracking base pair resolution data within agent based simulations. It consists of three parts the
GattacaExample.Gattaca depends on having a proper reference genome. That genome must match a snpEff database. The following are dependencies.
- snpEff
- Gunzip compressed primary genome assembly (recommended GRCh37 or GRCh38 from Ensembl, located from ftp download)
- biopython
- HAL
- Colt jar for use within HAL as
cern.jet.random
. Dragcolt/lib/colt.jar
file intoHAL/lib
directory.
Change the directory locations for the pre-requisites in the usr_path.ini
file. This will be the location of snpEff and the reference genome, shown below:
[snpeff]
snpeff = /Users/rschenck/Desktop/BioinformaticsTools/snpEff/snpEff.jar
[reference]
ref = /Users/rschenck/Desktop/BioinformaticsTools/References/GRCh37.75/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
An example execution from within the Gattaca directory:
python Gattaca.py --geneList ./tests/TestGenes.txt --genome=GRCh37.75 --contextFile=./Tests/MutContext.txt --mutRate=3.2E-9 --output=./GattacaExample/
# You can also get help my running:
python Gattaca.py --help
Gattaca will yield two files that must be placed in the executable directory (generally /src) for simulations within HAL.
GattacaExample.Gattaca.java
triNucsPos.csv
Once these are placed within your simulation project move to Part 2.
While other java frameworks can be used or a port of the java code could be constructed for other simulation frameworks GattacaExample.Gattaca is designed to work flawlessly within HAL. Thus an additional pre-requisite is HAL. GattacaExample.Gattaca uses some random number generators and distributions from the Colt library (Colt jar) as well, which can be placed in the HAL lib directory. This can easily be linked from within ideaJ (instructions on linking libraries can be found from HAL or ideaJ)
Integration into HAL requires a few steps:
- Place the two output files from part 1 within the scope of your main executible class.
- From within your main function initialize the first clone, this provides a root clone for downstream tracking (also speeds up your simulations):
// The Gattaca constructor requires: parent, String, Hue, Saturation, Value, Rand RNG
Gattaca initClone = new Gattaca(null, "", 1, 0, 0.3, RN);
3: Pass initClone into your Grid class. You will initialize clone1 using this, so that your first cell is constructed from within your grid using:
Gattaca clone1=new Gattaca(this.clone0, "",1,0,0.3, RN);
Cell c=NewAgentSQ(xpt, ypt).Init(clone1, BIRTHPROBABILITY); // Birth cell with genome clone1
c.genome.IncPop(); // Increases genome population by 1
4: Make sure that your cell class has a Gattaca variable.
5: Increase and decrease the population of the genome using a cells genome DecPop() and IncPop() commands.
6: From within your main model step. You can choose to record clones at any timepoint:
initClone.RecordClones(G.GetTick());
7: Output information to file:
String[] AttributesList = new String[]{"Genome", "H", "S", "V"};
initClone.OutputClonesToCSV("/Users/rschenck/Dropbox/GATTACA/Gattaca/tests/GattacaEx/gattaca_output_fullyseeded." + Integer.toString(CON.SEED) + ".csv", AttributesList, (Gattaca g) -> {
return GetAttributes(g);
}, 0);
// Function to retrieve the attributes of your choice.
public static String GetAttributes(Gattaca root) {
return root.PrivateGenome + "," + Double.toString(root.h) + "," + Double.toString(root.s)+ "," + Double.toString(root.v);
}
Ideally, simulations will be ran in replicate for downstream statistical analysis, but a single simulation can also be handled by Gattaca analysis.
In this example, we have given the output directory name for our simulations from within HAL as ./tests/GattacaEx/
. So this will house all of the Gattaca outputs.
To process these samples using GattacaFinish.py
we run the following from within the Gattaca directory (make sure you use the same arguments as Part 1 for -f
and --genome
options):
python GattacaFinish.py -i ./tests/GattacaEx/ -f ./tests/TestGenes.txt -o ~/Desktop/tmp --genome=GRCh37.75