-
Notifications
You must be signed in to change notification settings - Fork 15
/
Copy pathREADME.how.to.call.peaks
127 lines (81 loc) · 6 KB
/
README.how.to.call.peaks
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
This file describes how to call peaks using tools put together by the modENCODE DCC group. Please send your
questions and comments to modENCODE DCC at [email protected].
Step 1. See instructions in docs/README.how.to.launch.Galaxy.on.Amazon on how to launch Galaxy on Amazon.
Ensure your Galaxy is up and running before continue on to the next step ( see Step 6 in
docs/README.how.to.launch.Galaxy.on.Amazon on how to ensure your Galaxy is up and running ).
Step 2. Create your Galaxy account by going to your Galaxy URL, click on 'User' and click on 'Register'. Enter
your email address and password. Click on 'Submit'.
Step 3. Add yourself to the Galaxy admin group so you can install tools from Galaxy toolshed. Go to your
CloudMan Console and click on 'Admin'. Enter your email address in the 'Add Galaxy admin users' textbox
and click on 'Add admin users'.
Galaxy should restarted automatically. Wait until Galaxy is fully up and running.
Step 4. Before any analysis, you will need one or more additional compute nodes for your analysis.
The number of compute nodes to add depends on the number of data sets, how large your data set is, and
how many analysis you want to do with your data. For the purpose of showing how to use modENCODE DCC tools,
start one additional compute node. Login to your CloudMan Console and click on "Add nodes" to
add one more additional compute node for your analysis. Leave "Type of node(s)" as "Same as Master" and
click on "Start Additional Nodes".
Wait until all your compute nodes are up and running - in your CloudMan Console, your compute nodes are up
and running when their icons are in green and black colors.
Step 5. Use the ssh command output as described in Step 5 of docs/README.how.to.launch.Galaxy.on.Amazon to login
to your Galaxy instance. Your ssh command should look similar to:
> ssh -i KEY_FILE ubuntu@HOST_NAME
Step 6. On your Galaxy instance, get a copy of the modENCODE DCC tools from githhub and install their
depedencies:
> sudo su -
> cd
> git clone https://github.com/modENCODE-DCC/Galaxy.git
> cd Galaxy
> . bin/auto_install.sh
'bin/auto_install.sh' does the following:
- get the latest updates from Galaxy repository
- install and configure modENCODE DCC tools and their dependencies on Galaxy instance and all compute nodes
- restart Galaxy so all changes are updated
This process will take about 5 to 10 mins!
** Note:
In case you were to add more cluster node(s) after Step 6, you can simply follow Step 4, and run the ./auto_install.sh
again (on the master node). It will install all the required dependencies on the new node(s).
Step 7. Launch the Galaxy GUI by clicking "Access Galaxy" on Cloudman's main page. Upload your bam files by clicking on
'Get Data' and 'Upload File'. For the demo purpose of how to call peaks using Galaxy, we have provided sample bam files
below. Cut and paste the following URLs to the URL/Text box and click on 'Execute':
http://data.modencode.org/modENCODE_Galaxy/Test_Data/bam/3066_Snyder_GEI-11_GFP_L3_rep1.fastq.gz%23Cele_WS220%23gei-11%23rep_all%23ChIP.bam
http://data.modencode.org/modENCODE_Galaxy/Test_Data/bam/3066_Snyder_GEI-11_Input_L3_rep1.fastq.gz%23Cele_WS220%23gei-11%23rep_all%23input.bam
http://data.modencode.org/modENCODE_Galaxy/Test_Data/bam/3066_Snyder_GEI-11_GFP_L3_rep2.fastq.gz%23Cele_WS220%23gei-11%23rep_all%23ChIP.bam
http://data.modencode.org/modENCODE_Galaxy/Test_Data/bam/3066_Snyder_GEI-11_Input_L3_rep2.fastq.gz%23Cele_WS220%23gei-11%23rep_all%23input.bam
The raw FASTQ files for these four bam files are provided by the Snyder Lab - http://snyderlab.stanford.edu.
We used BWA and WS220 reference to produce these four bam files. The first two bam files are ChIP and Input
( Control ) of replicate 1 and the other two bam files are ChIP and Input of replicate 2.
Once the upload is complete, you should see the 4 bam entries in your History panel.
Step 8. To call peaks using MACS2 ( https://github.com/taoliu/MACS/blob/master/README ), in your Tools panel,
expand modENCODE tools and click on MACS2. Set or select the following MACS2 options:
- Select action to be performed: Peak Calling
- ChIP-Seq Tag File: ( select replicate 1 ChIP bam file )
- ChIP-Seq Control File: ( select replicate 1 Input bam file )
- Effective genome size: 130000000
Leave the rest of the options or agruments as defaults. You can optimize your agruments later on if needed.
Click on 'Execute' to run MACS2. MACS2 should produce 4 output files:
8: MACS2: callpeak on data 2 and data 1 (peaks: encodePeak)
7: MACS2: callpeak on data 2 and data 1 (peaks: xls)
6: MACS2: callpeak on data 2 and data 1 (html report)
5: MACS2: callpeak on data 2 and data 1 (peaks: bed)
Step 9. Repeat Step 8 for replicate 2 ChIP and Input. MACS2 should produce the following 4 files:
12: MACS2: callpeak on data 4 and data 3 (peaks: encodePeak)
11: MACS2: callpeak on data 4 and data 3 (peaks: xls)
10: MACS2: callpeak on data 4 and data 3 (html report
9: MACS2: callpeak on data 4 and data 3 (peaks: bed)
Step 10. Run IDR on MACS2 outputs from Steps 8 and 9. Expand modENCODE tools and click on IDR. Select
your input data for IDR:
First NarrowPeak File: ( select your data entry 8, which is the encodePeak produced by MACS2 from replicate 1 )
Second NarrowPeak File: ( select your data entry 12, which is the encodePeak produced by MACS2 from replicate 2 )
Leave the rest of the options as defaults. Click on 'Execute' to run IDR. IDR should produce 5 output files:
17: IDR.uri.sav
16: IDR.em.sav
15: IDR.overlapped-peaks.txt
14: IDR.npeaks-aboveIDR.txt
13: IDR.Rout.txt
Step 11. Run IDR-Plot on IDR outputs from Step 10. Expand modENCODE tools and click on IDR-Plot.
Select your input data for IDR-Plot:
uri.sav file (output from IDR consistency analysis): ( select data entry 17 )
em.sav file (output from IDR consistency analysis): ( select data entry 16 )
IDR-Plot should produce 1 pdf file.
And that's it. Please send your comments/questions to modENCODE DCC at [email protected].