-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.md~
195 lines (137 loc) · 6.92 KB
/
README.md~
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
# TC_hunter
## TC-hunter identifies transgenic insertion sites within host genome
TC-hunter searches for transgenic insertion sites in a host genome and returns figures and a report to support these findings.
There's two programs; **TC_hunter** and **TC_hunter_BWA**.
- :green_square: **TC_hunter_BWA.nf**
TC_hunter_BWA accepts raw pair end fastq files (from one or several samples) as input and performes BWA MEM alignment before searching for trasgenic insertion site.
- :yellow_square: **TC_hunter.nf**
Accepts one or several aligned BAM files (mapped to both host and transgenic sequence) as input.
TC-hunter then identifies anchors and chimeric reads that maps to both host and transgenig sequence.
![](Plots/TC_hunter_pipeline.png)
## Install TC-hunter
Clone the repository from Github and put it in your path (or add the direct path to config file)
```
$ git clone https://github.com/vborjesson/TC_hunter.git
$ export PATH="/home/yourPath/TC_hunter":$PATH
```
## Software Dependencies
In order to run TC_hunter you need to have some programs installed. Here's three options on how you can do it:
1. Install required programs and tools using Anaconda yml-file (prefered)
```
$ conda env create --file TC_hunter/Scripts/TC_hunter.yml
$ source activate TC_hunter_v1.0
```
2. Create your own conda environment
```
$ conda create -n TC_hunter R=3.5
$ source activate TC_hunter
$ conda install -c bioconda samtools=1.10
$ conda install -c bioconda nextflow=19.01.0
(only if runing TC_hunter_BWA) $ conda install -c bioconda bwa
$ conda install -c anaconda pandas
$ conda install -c conda-forge r-circlize
$ conda install -c r r-dplyr
$ conda install -c r r-data.table
```
3. Download manually
softwares
```
R 3.5 or higher
python 2.7
samtools 1.10 (works on other versions as well)
nextflow 19.01.0
bwa 0.7
```
R packages
```
circlize
dplyr
data.table
```
## Run TC_hunter with test data (takes approximately 1 minute to run)
Download data
```
mkdir test_run
cd test_run
pip install gdown # If you don't already have it installed
gdown https://drive.google.com/uc?id=1FXKJWD2yq1iUuL0lEATQ3Bqfr2vOyioK
cp ../TC_hunter/Test_data/* .
```
Then run TC_hunter:
```
nextflow ../TC_hunter/TC_hunter.nf -c testrun.config --workingDir <realpath_to_test_run_dir> --tc_hunter_path <realpath_to_tchunter>
```
You should see TC_hunter running each process one after each other
1. samtools_index
2. create_links_sup
3. create_links_soft
4. create_karyotype
5. create_histogram
6. create_plots
7. create_html
When it's done check that you have an output_summary.html file.
## Create construct.txt file (required)
In order to generate figures with construct information, you need to add this informtaion.
Create a txt-file with gene info per line, separated by space. The info should be; 1) name, 2) start position and 3) end position.
e.g.
```
Amp 1 500
lyz 1000 1200
Gene3 2000 5000
Gene4 7000 7700
```
## Make Configuration file
Create a configuration file from template.
```
$ cp TC_hunter/template/TC_hunter.config /path/to/WorkingDir
```
Add required information to config file
### TC_hunter.nf
| Argument | Usage | Description |
| ------------- | ------------- | ------------- |
| WorkingDir | <Path/WorkingDir> | Path to your working directory (this is where the output html and figures will be) |
| TC_hunter_path | <Path/TC_hunter> | Path to TC_hunter, only TC_hunter if it's in your $PATH |
| Construct_file | <Path/construct.txt> | Path to your construct.txt file (See `Create construct.txt file` above) |
| Construct_length | <Length> | The length of your construct in numbers |
| Construct_name | <Name> | The name of the construct, most match the name in the reference file, no space |
| bam | <Bam_directory> | The path to the directory where you have your bam file or (if several sampes) bam files. |
| Reference | <Jointref.fa> | Path to the merged reference file including both host and construct genome. `cat host_ref construct_ref > Jointref.fa` |
e. g. [example.config](https://github.com/vborjesson/TC_hunter/blob/master/template/tchunter_example.config)
### TC_hunter_BWA.nf
| Argument | Usage | Description |
| ------------- | ------------- | ------------- |
| WorkingDir | <Path/WorkingDir> | Path to your working directory (this is where the output html and figures will be) |
| TC_hunter_path | <Path/TC_hunter> | Path to TC_hunter, only TC_hunter if it's in your $PATH |
| Construct_file | <Path/construct.txt> | Path to your construct.txt file (See `Create construct.txt file` above) |
| Construct_length | <Length> | Length in numbers of your construct that will be plotted |
| Construct_name | <Name> | Name of the construct, most match the neme in your reference file |
| sample | <sample_directory> | Path to directory where you have the fastq-files (R1 and R2) |
| folder | <sample_directory> | Path to directory containing one directory for each sample. The name of the samples will be the same as the directory names |
| host_ref | <host_ref.fa> | Path to host reference file |
| construct_ref | <construct_ref.fa> | Path to construct reference file |
e. g. [example.config](https://github.com/vborjesson/TC_hunter/blob/master/template/tchunter_BWA_example.config)
## Run TC_hunter.nf
Before running, make sure you have a config file with all required information (see "Make Configuration file").
```
$ nextflow TC_hunter.nf -c <file.config> [-with-report <report name>]
```
## Run TC_hunter_BWA.nf
Before running, make sure you have a config file with all required information (see "Make Configuration file").
```
$ nextflow TC_hunter_BWA.nf -c <file.config> [-with-report <report name>]
```
## Run IGV separately
In order to get the IGV figures you need to have GUI available. If not; you can run IGV separately when TC-hunter is finished. Run one .bat file for each sample.
```
$ igv.sh -b <sample_name.bat>
```
## Understand your output
TC-hunter finds insertion sites based on chimeric and discordant read pair.
![](Plots/softclipped.png)
TC_hunter reports each possible insertion site in an html file called ```output_summary.html```. The file contains 5 columns; 1) Ranking - best hit based on score is ranked first, second best second etc, 2) Score - Based on the number of chimeric and discordant read pairs supporting this insertion site, 3) Breakpoint host - Where in the host is this insertionsite located, 4) Breakpoint construct - Where in the construct is this insertion site located, 5) figures - three figures I) circular plot (see below), II) igv, III) igv more zoomed in.
* output_summary.html
![](Plots/tc_hunter_out.png)
For every predicted insertion site a circular figure is created. Red links, "lines" represent every discordant read pair supporting this event. Black links represent chimeric reads supporting this event.
![](Plots/circlize.png)!
## Supporting material for TC_hunter paper
[Supporting data](https://github.com/vborjesson/TC_hunter_supplementary)