forked from jokergoo/circlize_book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
16-a-complex-example-with-chord-diagram.Rmd
265 lines (220 loc) · 10.7 KB
/
16-a-complex-example-with-chord-diagram.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
```{r, echo = FALSE}
library(circlize)
```
# A complex example of Chord diagram
In this chapter, we demonstrate how to make a complex Chord diagram and how to
customize additional tracks by visualizing chromatin state transitions as well
as methylation changes. A chromatin state transition matrix shows how much a
chromatin state in the genome has been changed from e.g. one group of samples
to the other. The genomic regions for which the chromatin states change also
have variable methylation patterns which may show interesting correspondance
to chromatin states change.
The data used in this post is processed from [Roadmap
dataset](http://www.roadmapepigenomics.org/). The chromatin states [are learned
from five core chromatin marks](http://egg2.wustl.edu/roadmap/web_portal/chr_s
tate_learning.html#core_15state). Roadmap samples are classified into two
groups based on expression profile. In each group, a chromatin state is
assigned to the corresponding genomic bin if it is recurrent in at least half
of the samples in each group.
The processed data can be set from https://jokergoo.github.io/circlize_book/data/chromatin_transition.RData.
```{r}
load("data/chromatin_transition.RData")
```
In the RData file, there are three matrix: `mat`, `meth_mat_1` and
`meth_mat_2` which are:
- `mat`: chromatin state transition matrix. Rows correspond to states in group
1 and columns correspond to group 2. The value in the matrix are total base
pairs that transite from one state to the other. E.g. `mat["TssA",
"TxFlnk"]` is the total base pairs that have "TssA" state in samples in
group 1 and transites to "TxFlnk" state in samples in group 2. On the
diagonal are the regions where the states have not been changed in the two
groups.
- `meth_mat_1`: mean methylation for each set of regions in group 1.
E.g. `meth_mat_1["TssA", "TxFlnk"]` is the mean methylation for the regions in
**group 1** that have "TssA" state in group 1 and "TxFlnk" state in group 2.
- `meth_mat_2`: mean methylation for each set of regions in group 2.
E.g. `meth_mat_2["TssA", "TxFlnk"]` is the mean methylation for the regions in
**group 2** that have "TssA" state in group 1 and "TxFlnk" state in group 2.
```{r}
mat[1:4, 1:4]
meth_mat_1[1:4, 1:4]
```
Normally, in majority in the genome, chromatin states of regions are not
changed in the two groups, thus, we should only look at the regions in which
the states are changed.
```{r}
# proportion of the unchanges states in the genome
sum(diag(mat))/sum(mat)
# remove the unchanged states
diag(mat) = 0
```
When making the plot, actually rows and columns are different (because one is
from group 1 and the other is from group 2), thus we give them different names
and the original names are stored in `all_states`.
```{r}
all_states = rownames(mat)
n_states = nrow(mat)
rownames(mat) = paste0("R_", seq_len(n_states))
colnames(mat) = paste0("C_", seq_len(n_states))
dimnames(meth_mat_1) = dimnames(mat)
dimnames(meth_mat_2) = dimnames(mat)
```
Next we set the colors. `colmat` is the color of the links and the colors
are represent as hexadecimal code. Links have more transparent (`A0`) if they
contain few transitions (< 70th percentile) because we don't want it
to disturb the visualization of the major transitions.
```{r}
state_col = c("TssA" = "#E41A1C", "TssAFlnk" = "#E41A1C",
"TxFlnk" = "#E41A1C", "Tx" = "#E41A1C",
"TxWk" = "#E41A1C", "EnhG" = "#E41A1C",
"Enh" = "#E41A1C", "ZNF/Rpts" = "#E41A1C",
"Het" = "#377EB8", "TssBiv" = "#377EB8",
"BivFlnk" = "#377EB8", "EnhBiv" = "#377EB8",
"ReprPC" = "#377EB8", "ReprPCWk" = "#377EB8",
"Quies" = "black")
# one for rows and one for columns
state_col2 = c(state_col, state_col)
names(state_col2) = c(rownames(mat), colnames(mat))
colmat = rep(state_col2[rownames(mat)], n_states)
colmat = rgb(t(col2rgb(colmat)), maxColorValue = 255)
qati = quantile(mat, 0.7)
colmat[mat > qati] = paste0(colmat[mat > qati], "A0")
colmat[mat <= qati] = paste0(colmat[mat <= qati], "20")
dim(colmat) = dim(mat)
```
Now we can use `chordDiagram()` function to make the plot. Here we set one
pre-allocated track in which the methylation information will be added later.
Also we only set `annotationTrack` to `grid` and the axes and sector labels
will be customized in later code.
`chordDiagram()` returns a data frame which contains coordinates for all links.
```{r first, fig.width = 6, fig.height = 6}
circos.par(cell.padding = c(0, 0, 0, 0), points.overflow.warning = FALSE)
cdm_res = chordDiagram(mat, col = colmat, grid.col = state_col2,
directional = TRUE, annotationTrack = "grid",
big.gap = 10, small.gap = 1,
preAllocateTracks = list(track.height = 0.1),
link.target.prop = FALSE)
```
```{r}
head(cdm_res)
```
```{r, echo = FALSE}
circos.clear()
chunks <- knitr:::knit_code$get()
```
Now the axes are added in the second track, also, the index of states are
added at the center of the grids in the second track, if the degree for a
sector is larger than 3 degrees. Note since there is already one pre-allocated
track, the circular rectangles are in the second track (`track.index = 2`).
```{r second, eval = FALSE}
circos.track(track.index = 2, panel.fun = function(x, y) {
if(abs(CELL_META$cell.start.degree - CELL_META$cell.end.degree) > 3) {
sn = CELL_META$sector.index
i_state = as.numeric(gsub("(C|R)_", "", sn))
circos.text(CELL_META$xcenter, CELL_META$ycenter, i_state, col = "white",
font = 2, cex = 0.7, adj = c(0.5, 0.5), niceFacing = TRUE)
xlim = CELL_META$xlim
breaks = seq(0, xlim[2], by = 4e5)
circos.axis(major.at = breaks, labels = paste0(breaks/1000, "KB"), labels.cex = 0.5)
}
}, bg.border = NA)
```
```{r, results = "hide", fig.width = 6, fig.height = 6, echo = FALSE}
eval(parse(text = chunks[["first"]]))
eval(parse(text = chunks[["second"]]))
circos.clear()
```
On the top half, it is easy to see the proportion of different transitions in
group 1 that come to every state in group 2. However, it is not
straightforward for the states in the bottom half to see the proportion of
different states in group 2 they transite to. This can be solved by adding
small circular rectangles to represent the proportions. In following example, the newly added circular
rectangles in the bottom half show e.g. how much the state 15 in group 1 has
been transited to different states in group 2.
Note from version 0.4.11, there is a new argument `link.target.prop` in `chordDiagram()`.
It basically automatically applies following code. However, we still suggest users
to read the following code because more tracks with similar code will be drawn later.
```{r third, eval = FALSE}
for(i in seq_len(nrow(cdm_res))) {
if(cdm_res$value1[i] > 0) {
circos.rect(cdm_res[i, "x1"], -mm_y(1),
cdm_res[i, "x1"] - abs(cdm_res[i, "value1"]), -mm_y(2),
col = state_col2[cdm_res$cn[i]], border = state_col2[cdm_res$cn[i]],
sector.index = cdm_res$rn[i], track.index = 2)
}
}
```
```{r, results = "hide", fig.width = 6, fig.height = 6, echo = FALSE}
eval(parse(text = chunks[["first"]]))
eval(parse(text = chunks[["second"]]))
eval(parse(text = chunks[["third"]]))
circos.clear()
```
Methylation in each category is put on the most outside of the circle. On this track, we will
put two paralle rectangles which are mean methylation and methylation difference between group 1
and group 2. Basically, on the bottom, we show `meth_mat_2 - meth_mat_1` and on the top we show
`meth_mat_1 - meth_mat_2`.
The logic of following code is simple that it just simply adds rectangles repeatedly.
```{r last, eval = FALSE}
abs_max = quantile(abs(c(meth_mat_1, meth_mat_2) - 0.5), 0.95, na.rm = TRUE)
col_fun = colorRamp2(c(0.5 - abs_max, 0.5, 0.5 + abs_max), c("blue", "white", "red"))
col_fun2 = colorRamp2(c(-abs_max, 0, abs_max), c("green", "white", "orange"))
ylim = get.cell.meta.data("ylim", sector.index = rownames(mat)[1], track.index = 1)
y1 = ylim[1] + (ylim[2] - ylim[1])*0.4
y2 = ylim[2]
for(i in seq_len(nrow(cdm_res))) {
if(cdm_res$value1[i] > 0) {
circos.rect(cdm_res[i, "x1"], y1, cdm_res[i, "x1"] - abs(cdm_res[i, "value1"]), y1 + (y2-y1)*0.45,
col = col_fun(meth_mat_1[cdm_res$rn[i], cdm_res$cn[i]]),
border = col_fun(meth_mat_1[cdm_res$rn[i], cdm_res$cn[i]]),
sector.index = cdm_res$rn[i], track.index = 1)
circos.rect(cdm_res[i, "x1"], y1 + (y2-y1)*0.55, cdm_res[i, "x1"] - abs(cdm_res[i, "value1"]), y2,
col = col_fun2(meth_mat_2[cdm_res$rn[i], cdm_res$cn[i]] - meth_mat_1[cdm_res$rn[i], cdm_res$cn[i]]),
border = col_fun2(meth_mat_2[cdm_res$rn[i], cdm_res$cn[i]] - meth_mat_1[cdm_res$rn[i], cdm_res$cn[i]]),
sector.index = cdm_res$rn[i], track.index = 1)
circos.rect(cdm_res[i, "x2"], y1, cdm_res[i, "x2"] - abs(cdm_res[i, "value1"]), y1 + (y2-y1)*0.45,
col = col_fun(meth_mat_2[cdm_res$rn[i], cdm_res$cn[i]]),
border = col_fun(meth_mat_2[cdm_res$rn[i], cdm_res$cn[i]]),
sector.index = cdm_res$cn[i], track.index = 1)
circos.rect(cdm_res[i, "x2"], y1 + (y2-y1)*0.55, cdm_res[i, "x2"] - abs(cdm_res[i, "value1"]), y2,
col = col_fun2(meth_mat_1[cdm_res$rn[i], cdm_res$cn[i]] - meth_mat_2[cdm_res$rn[i], cdm_res$cn[i]]),
border = col_fun2(meth_mat_1[cdm_res$rn[i], cdm_res$cn[i]] - meth_mat_2[cdm_res$rn[i], cdm_res$cn[i]]),
sector.index = cdm_res$cn[i], track.index = 1)
}
}
circos.clear()
```
```{r, results = "hide", fig.width = 6, fig.height = 6, echo = FALSE}
eval(parse(text = chunks[["first"]]))
eval(parse(text = chunks[["second"]]))
eval(parse(text = chunks[["third"]]))
eval(parse(text = chunks[["last"]]))
circos.clear()
```
Legends can be added according to instructions discussed in Section \@ref(legends).
```{r, echo = FALSE, fig.height = 6, fig.width = 6 + 1.165229 + 0.08, dev = "png"}
library(grid)
library(gridBase)
plot.new()
circle_size = unit(1, "snpc") # snpc unit gives you a square region
pushViewport(viewport(x = 0, y = 0.5, width = circle_size, height = circle_size,
just = c("left", "center")))
par(omi = gridOMI(), new = TRUE)
eval(parse(text = chunks[["first"]]))
eval(parse(text = chunks[["second"]]))
eval(parse(text = chunks[["third"]]))
eval(parse(text = chunks[["last"]]))
circos.clear()
upViewport()
library(ComplexHeatmap)
lgd_list_vertical = packLegend(
Legend(at = 1:n_states, labels = names(state_col), pch = as.character(1:n_states),
legend_gp = gpar(col = "white", fontsize = 8), background = state_col, type = "points",
title = "Chromatin States", grid_width = unit(6, "mm")),
Legend(at = c(0.1, 0.3, 0.5, 0.7, 0.9), col_fun = col_fun, title = "Mean Methylation"),
Legend(at = c(-0.4, -0.2, 0, 0.2, 0.4), col_fun = col_fun2, title = "Mean Difference"))
pushViewport(viewport(x = circle_size, y = 0.5, width = ComplexHeatmap:::width(lgd_list_vertical),
height = ComplexHeatmap:::height(lgd_list_vertical), just = c("left", "center")))
grid.draw(lgd_list_vertical)
upViewport()
```