- Read trimming (by Trim Galore)
- Read mapping (by BWA or Bowtie 2)
- Genetic variant calling (by GATK3, GATK4 or BCFtools)
- Data filtering and format converting (by PLINK v 1.9)
- Population genetic analyses
- Principal component analysis (by GCTA or PLINK v 2.0)
- Phylogenetic tree analysis (by SNPhylo)
- Population tree analysis (by TreeMix)
- Population structure analysis (by ADMIXTURE)
- Linkage disequilibrium decay analysis (by PopLDdecay)
- Selective sweep analysis (by SweepFinder2)
- Population admixture analysis (by AdmixTools)
- Pairwise sequentially Markovian coalescent analysis (by psmc)
- Multiple sequentially Markovian coalescent analysis (by msmc2)
- Fixation index analysis (by VCFtools)
Skip if your machine already has the engine (Installation document).
curl -fsSL https://get.docker.com/ | sudo sh
Skip if your account is already added in the docker group
sudo usermod -aG docker $USER
wget http://bioinfo.konkuk.ac.kr/PAPipe/PAPipe.tar.gz # Download the Docker image file
docker load -i ./PAPipe.tar.gz # Load the Docker image file
docker image ls # Check if the image loaded well ("REPOSITORY:pap_docker, TAG:latest" must be shown)
Setting local input directories (Caution: do not change the names and the directory structure)
mkdir RUN_DOCKER/
cd RUN_DOCKER/
mkdir data/
cd data/
mkdir ref/
mkdir input/
-
Place the following two files of a reference species in
RUN_DOCKER/data/ref/
- Genome assembly file (gzip-compressed FASTA file with an extension .fa.gz)
- dbSNP VCF file (gzip-compressed VCF file with an extension .vcf.gz)
-
Place all other input data (read sequence files, read mapping files, or variant calling files) in
RUN_DOCKER/data/input/
- First, create separate directory for each population (one per population) in the "input" directory
- Then, place files of each population in its directory (example below)
- Files for Angus in
RUN_DOCKER/data/input/Angus/
- Files for Jersey in
RUN_DOCKER/data/input/Jersey/
- Files for Angus in
Preparing parameter files
PAPipe requires the following three parameter files
- main_sample.txt: setting for populations and samples
- main_input.txt: setting for input data files
- main_param.txt: controlling parameters for PAPipe including various tools in PAPipe
The above three files must be placed in the above "RUN_DOCKER" directory.
You can easily generate the parameter files using our parameter file genetator.
Check out more details about the parameter file generator here.
Creating a Docker container that mounts the above "RUN_DOCKER" directory
docker run -v [absolute path of the "RUN_DOCKER" directory]:/RUN_DOCKER/ -it pap_docker:latest
Running PAPipe inside the Docker container
# Run in the docker container
cd /RUN_DOCKER/
python3 /PAPipe/bin/main.py -P ./main_param.txt -I ./main_input.txt -A ./main_sample.txt &> ./log
Analysis results will be generated in the output directory specified in the "main_param.txt" file.
Check out more details about the analysis results here.
Generating HTML pages for browsing analysis results
You can check all analysis results in the output directory specified in the "main_param.txt" file.
However, PAPipe also supports the generation of HTML pages for easily browsing the analysis results.
# Run in the docker container
perl /PAPipe/bin/html/webEnvSet.pl ./out &> webenvset.log # Suppose "out" is the output directory set in the "main_param.txt" file
cd ./out/web/
perl /PAPipe/bin/html/prep_html.pl ./ &> ./webgen.log
After successfully running the above commands, follow the two steps below to open the HTML pages.
- Download the entire directory of "web" into your personal computer.
- Open the "index.html" file in the "web" directory using any web browser
If your machine supports a graphic user interface, you can directly go into the "web" directory and open the "index.html" file without downloading the "web" directory into your personal computer.
Check out more details about the generated HTML pages here.
Check out the HTML pages generated by the following test data here.
You can test PAPipe using a small test data. Check out more details here.