- The Epitope pipeline is consisted of three parts: peptide generation, hla typing, and neoepitope prediction
- samtools 1.3.1
samtools mpileup -f {fasta} {bam} > {pileup}
CAUTION: Make sure that the naming convention for chromosome is consistent between the bam files and the fasta index file. For instant, if the convention of the bam file is "1", "2", "3", and the convention of the fasta index file is "chr1", "chr2", "chr3", this step will not work. To check what the naming convention of the bam file is, you can do:
samtools view -H {bam}
If your bam file labels chr1 as 1, an easy fix is to modify the fasta index file using sed
:
sed -i 's/chr//g' ucsc.hg19.fasta.fai
- samtools 1.3.1
samtools index {bam} {bai}
- VarScan 2.3.9
- Threshold:
- minimum covererage: 10
- minimum varian allele frequency: 0.08
- somatic p value: 0.05
java -jar VarScan.v2.3.9.jar somatic {pielup} {out} –min-coverage 10 –min-var-freq 0.08 –somatic-p-value 0.05
- VarScan 2.3.9
java -jar VarScan.v2.3.9.jar processSomatic {VarScan.snp}
- VarScan 2.3.9
java -jar VarScan.v2.3.9.jar somaticFilter {VarScan.snp.Somatic.hc} -indelfile {VarScan.indel} -output-file {VarScan.snp.Somatic.hc.filter}
- binary program
bam-readcount
and perl scriptfpfilter-2.pl
- Variant effect predictor version 86
- pvacseq version 3.0.5
- See https://github.com/tanyaphung/run_hla_polysolver for how to run the software
- Configure the directory where inputs needed for MHC and outputs from MHC will be stored.
- Use the Python script
configuration.py
:
python configuration.py -h
Usage: python configuration.py <options>
Options:
-h, --help show this help message and exit
--directory=DIRECTORY
--patientID=PATIENTID
--peptides_directory=PEPTIDES_DIRECTORY
python configuration.py --directory {directory_to_create} --peptides_directory {peptides_directory} --patientID {patientID}
-
The Python script
configuration.py
does the following:- create a directory where the inputs for MHC and outputs from MHC will be stored. The name of this directory is specified by the argument
directory
. - under the new directory
directory
, create 3 directories: peptides, hla, and IEDB_out. - copy the peptide files to the peptides directory
- create a directory where the inputs for MHC and outputs from MHC will be stored. The name of this directory is specified by the argument
-
Under the directory
hla
, each text file is an HLA type. For example:
cat hla_a_03_02_01.txt
HLA-A hla_a_03_02_01
- A note about naming convention, since we are interested in HLA-A, HLA-B, and HLA-C, and there are 2 alleles for each, I named them: hla_a_1, hla_a_2, hla_b_1, hla_b_2, hla_c_1, and hla_c_2.
- Download the IEDB tool from http://tools.iedb.org/mhci/download/
- untar the folder:
tar -xzvf IEDB_MHC_I-2.19.1.tar.gz
- configure the tool and exit out of the directory
cd mhc_i/
./configure
cd ..
- You will get a message like this after configuring the tool:
All prerequisites found!
Copying the standalone-specific netMHCcons template into place
IEDB MHC class I binding prediction tools successfully installed!
Use the command 'python src/predict_binding.py' to get started
- Download the file
predict_binding.py
from this Github repo. You will need to replace thepredict_binding.py
script that came with IEDB_MHC with this file.
- Important: the tool will not work if you use the script
predict_binding.py
that comes with the installation.
- Use the script
run_mhc.py
. This Python script has to be run in Python 2. We can use a conda directive to invoke Python2 in snakemake.
python run_mhc.py -h
usage: run_mhc.py [-h] --hla HLA --patientID PATIENTID --hla_type HLA_TYPE
Run mhc
optional arguments:
-h, --help show this help message and exit
--hla HLA REQUIRED. Input the path to the hla file.
--patientID PATIENTID
REQUIRED. Input the patient ID. PatientID is also the
name of the output directory
--hla_type HLA_TYPE REQUIRED.
- Example:
python run_mhc.py --hla TCGA-BC-A10Y_WES_XY_withoutYpar/hla/hla_a_03_02_01.txt --patientID TCGA-BC-A10Y_WES_XY_withoutYpar --hla_type hla_a_1
- Use the script
find_potential_neoepitope.py
. This Python script is run in Python 3.
python find_potential_neoepitope.py -h
usage: find_potential_neoepitope.py [-h] --hla HLA --patientID PATIENTID
--hla_type HLA_TYPE
Find potential neoepitope
optional arguments:
-h, --help show this help message and exit
--hla HLA REQUIRED. Input the path to the hla file.
--patientID PATIENTID
REQUIRED. Input the patient ID. PatientID is also the
name of the output directory
--hla_type HLA_TYPE REQUIRED.
- Example:
python find_potential_neoepitope.py --hla TCGA-BC-A10Y_WES_XY_withoutYpar/hla/hla_a_03_02_01.txt --patientID TCGA-BC-A10Y_WES_XY_withoutYpar --hla_type hla_a_1
- Use the script
filter_neoepitope.py
. This Python script is run in Python 3.
python filter_neoepitope.py -h
Usage: python filter_neoepitope.py <options>
Options:
-h, --help show this help message and exit
--patientID=PATIENTID
--hla_type=HLA_TYPE
- Example:
python filter_neoepitope.py --patientID TCGA-BC-A10Y_WES_XY_withoutYpar --hla_type hla_a_1