-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmore.Rmd
636 lines (513 loc) · 21.6 KB
/
more.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
Possible material to add
TODO
- Visualisation section: use MSnExps
- Spectral counting: use MSGFplus
## Visualising raw data
The importance of flexible access to specialised data becomes visible
in the figure
below^[Figure and code taken from the `r Biocannopkg("RforProteomics")` [visualisation vignette](http://bioconductor.org/packages/release/data/experiment/vignettes/RforProteomics/inst/doc/RProtVis.html)]. Not
only can we access specific data and understand/visualise them, but we
can transverse all the data and extracted/visualise/understand
structured slices of data.
![Visual summary of a full round of MS1 and MS2 acquisition for specific MS1 spectrum.](./Figures/visfig-1.png)
In this code chunks we start by selecting relevant spectra of
interest. We will focus on the first MS1 spectrum acquired after 30
minutes of retention time.
```{r pxd1}
## (1) Open raw data file
ms <- openMSfile(mzf)
## (2) Extract the header information
hd <- header(ms)
## (3) MS1 spectra indices
ms1 <- which(hd$msLevel == 1)
## (4) Select MS1 spectra with retention time between 30 and 35 minutes
rtsel <- hd$retentionTime[ms1] / 60 > 30 & hd$retentionTime[ms1] / 60 < 35
## (5) Indices of the 1st and 2nd MS1 spectra after 30 minutes
i <- ms1[which(rtsel)][1]
j <- ms1[which(rtsel)][2]
## (6) Interleaved MS2 spectra
ms2 <- (i+1):(j-1)
```
Now now extract and plot all relevant information:
1. The upper panel represents the chromatogram of the `r fileName(ms)`
raw data file, produced by plotting the data returned by the
`chromatogram` function.
```{r visfig01, fig.cap = "A chromatogram."}
plot(chromatogram(ms)[[1]], type = "l")
```
2. We concentrate at a specific retention time,
`r formatRt(hd[i, "retentionTime"])` minutes (`r hd[i, "retentionTime"]` seconds)
```{r visfig02, fig.cap = "A chromatogram with highlighted retention time of interest."}
plot(chromatogram(ms)[[1]], type = "l")
abline(v = hd[i, "retentionTime"]/60, col = "red")
```
3. This corresponds to the `r i`th MS1 spectrum, shown on the second
row of figures.
```{r visfig03, fig.cap = "The MS1 spectrum at the retention time of interest."}
plot(peaks(ms, i), type = "l", xlim = c(400, 1000))
legend("topright", bty = "n",
legend = paste0(
"Acquisition ", hd[i, "acquisitionNum"], "\n",
"Retention time ", formatRt(hd[i, "retentionTime"])))
```
4. The ions that were selected for MS2 are highlighted by vertical
lines. These are represented in the bottom part of the figure.
```{r visfig04, fig.cap = "The MS1 spectrum at the retention time of interest, highlighting the peptides selected for MS2."}
plot(peaks(ms, i), type = "l", xlim = c(400, 1000))
legend("topright", bty = "n",
legend = paste0(
"Acquisition ", hd[i, "acquisitionNum"], "\n",
"Retention time ", formatRt(hd[i, "retentionTime"])))
abline(v = hd[ms2, "precursorMZ"],
col = c("#FF000080",
rep("#12121280", 9)))
```
5. On the right, we zoom on the isotopic envelope of one peptide in
particular (the one highlighted with a red line).
```{r visfig05, fig.cap = "Isotopic envelope of a peptide."}
plot(peaks(ms, i), type = "l", xlim = c(521, 522.5))
abline(v = hd[ms2, "precursorMZ"], col = "#FF000080")
```
6. A final loop through the relevant MS2 spectra plots the
`length(ms2)` MS2 spectra highlighted above.
```{r visfig06, fig.width = 8, fig.height = 10, fig.cap = "MS2 spectra of all selected precursors from the previous MS1 spectrum."}
par(mfrow = c(5, 2), mar = c(2, 2, 0, 1))
for (ii in ms2) {
p <- peaks(ms, ii)
plot(p, xlab = "", ylab = "", type = "h", cex.axis = .6)
legend("topright", legend = paste0("Prec M/Z\n",
round(hd[ii, "precursorMZ"], 2)),
bty = "n", cex = .8)
}
```
```{r mslayout, echo=FALSE}
## Preparing the layout (not shown)
lout <- matrix(NA, ncol = 10, nrow = 8)
lout[1:2, ] <- 1
for (ii in 3:4)
lout[ii, ] <- c(2, 2, 2, 2, 2, 2, 3, 3, 3, 3)
lout[5, ] <- rep(4:8, each = 2)
lout[6, ] <- rep(4:8, each = 2)
lout[7, ] <- rep(9:13, each = 2)
lout[8, ] <- rep(9:13, each = 2)
```
Putting it all together:
```{r visfig, fig.cap = "Visual summary of a full round of MS1 and MS2 acquisition for specific MS1 spectrum."}
layout(lout)
par(mar=c(4,2,1,1))
plot(chromatogram(ms)[[1]], type = "l")
abline(v = hd[i, "retentionTime"]/60, col = "red")
par(mar = c(3, 2, 1, 0))
plot(peaks(ms, i), type = "l", xlim = c(400, 1000))
legend("topright", bty = "n",
legend = paste0(
"Acquisition ", hd[i, "acquisitionNum"], "\n",
"Retention time ", formatRt(hd[i, "retentionTime"])))
abline(h = 0)
abline(v = hd[ms2, "precursorMZ"],
col = c("#FF000080",
rep("#12121280", 9)))
par(mar = c(3, 0.5, 1, 1))
plot(peaks(ms, i), type = "l", xlim = c(521, 522.5),
yaxt = "n")
abline(h = 0)
abline(v = hd[ms2, "precursorMZ"], col = "#FF000080")
par(mar = c(2, 2, 0, 1))
for (ii in ms2) {
p <- peaks(ms, ii)
plot(p, xlab = "", ylab = "", type = "h", cex.axis = .6)
legend("topright", legend = paste0("Prec M/Z\n",
round(hd[ii, "precursorMZ"], 2)),
bty = "n", cex = .8)
}
```
## A spectral counting example
Authors: Laurent Gatto and Vlad Petyuk
A longer, more comprehensive example illustrating a spectral counting
analysis pipeline, from data access to statistical analyses.
```{r env, echo=FALSE, message=FALSE}
library("rpx")
library("jsonlite")
library("R.utils")
library("Biostrings")
library("rTANDEM")
library("parallel")
library("dplyr")
library("ggplot2")
library("MSnbase")
library("MSnID")
library("msmsEDA")
library("msmsTests")
library("Heatplus")
```
### Introduction
This vignette demonstrates a real-world example of using R for
proteomics data processing, starting from download of raw data and
FASTA files with the `r Biocpkg("rpx")` package, peptide and protein
identification with X!Tandem and `r Biocpkg("rTANDEM")`, filtering
MS/MS data with `r Biocpkg("MSnID")` and statistical analysis of
spectral counts with `r Biocpkg("msmsTests")`.
### Getting the data
```{r idex1, message=FALSE}
library("rpx")
id <- "PXD002161"
px <- PXDataset(id)
try(setInternet2(FALSE), silent=TRUE)
library("jsonlite")
addr <- "http://www.ebi.ac.uk:80/pride/ws/archive/%s/list/project/%s"
files <- fromJSON(sprintf(addr, "file", id))$list
assays <- fromJSON(sprintf(addr, "assay", id))$list
files <- subset(files, fileType == 'PEAK',
select = c("assayAccession","fileName"))
assays <- assays[,c("assayAccession",
"experimentalFactor",
"proteinCount",
"peptideCount",
"uniquePeptideCount",
"identifiedSpectrumCount",
"totalSpectrumCount")]
pttrn <- "Age: young, Diet; fully fed"
assays <- subset(assays, grepl(pttrn, experimentalFactor, fixed=T))
# encode phenotype and sample names
group <- sub(".*Name: Y-(.+?)-FF\\.(\\d)", "\\1", assays$experimentalFactor)
splnm <- sub(".*Name: Y-(.+?)-FF\\.(\\d)", "\\1_\\2", assays$experimentalFactor)
assays <- with(assays, {data.frame(assayAccession,
phenotype=sub(".*Name: Y-(.+?)-FF\\.(\\d)",
"\\1", experimentalFactor),
sampleName=sub(".*Name: Y-(.+?)-FF\\.(\\d)",
"\\1_\\2", experimentalFactor),
stringsAsFactors=F)})
files <- subset(files, assayAccession %in% assays$assayAccession)
## c_elegans_A_3_1_21Apr10_Draco_10-03-04.mzML
## c_elegans_A_3_3_21Apr10_Draco_10-03-04.mzML
## c_elegans_B_2_1_21Apr10_Draco_10-03-05.mzML
## c_elegans_B_2_3_21Apr10_Draco_10-03-05.mzML
## c_elegans_C_1_1_21Apr10_Draco_10-03-06.mzML
## c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML
## c_elegans_D_1_1_21Apr10_Draco_10-03-07.mzML
## c_elegans_D_1_3_21Apr10_Draco_10-03-07.mzML
## c_elegans_E_3_1_21Apr10_Draco_10-03-04.mzML
## c_elegans_E_3_3_21Apr10_Draco_10-03-04.mzML
## do we already have all files?
allfiles <- length(dir(pattern = "c_elegans_.*mzML")) == 10
if (!allfiles)
pxget(px, files$fileName) # fetch them from PX
files$datasetName <- sub('.mzML.gz','', files$fileName, fixed=TRUE)
meta <- merge(files[,c("assayAccession","datasetName")], assays)
rownames(meta) <- meta$datasetName
meta <- meta[order(meta$sampleName),]
rownames(meta) <- NULL
if (!allfiles) {
library("R.utils")
sapply(list.files(pattern = "mzML.gz"), gunzip)
}
```
Now we have `*.mzML` files for MS/MS search. Let's download
FASTA file from the current release of the Wormbase. We will leverage
Bioconductor's `r Biocpkg("Biostrings")` package to reverse the protein
sequence and append them to the original FASTA file. The reverse
sequences will be use for estimate of the false discovery rate of
peptide-to-spectrum matches, peptide and protein identifications.
```{r idex2, eval=FALSE}
library("Biostrings")
fasta_location <- "ftp://ftp.wormbase.org/pub/wormbase/releases/WS250/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS250.protein.fa.gz"
fwd.seqs <- readAAStringSet(fasta_location, format="fasta",
nrec=-1L, skip=0L, use.names=TRUE)
rev.seqs <- reverse(fwd.seqs)
names(rev.seqs) <- paste("XXX", names(rev.seqs), sep='_')
fwd.rev.seqs <- append( fwd.seqs, rev.seqs)
writeXStringSet(x=fwd.rev.seqs, filepath="c_elegans_fwd_rev.fasta", format="fasta")
```
### Peptide identification
Setting up X!Tandem parameter files.
```{r idex3}
library("rTANDEM")
param <- setParamOrbitrap()
taxonomy <- rTTaxo(taxon="celegans",
format="peptide",
URL= "c_elegans_fwd_rev.fasta")
param <- setParamValue(param, 'list path', 'taxonomy information', taxonomy)
param <- setParamValue(param, 'protein', 'taxon', value='celegans')
def.input.path <- system.file("extdata/default_input.xml", package="rTANDEM")
param <- setParamValue(param, 'list path', 'default parameters',
value=def.input.path)
param <- setParamValue(param, "output", "message", "r-for-proteomics ")
param <- setParamValue(param, "refine", value="no")
```
Note, we will be executing X!Tandem on all available cores on the
computer. To determine the number of cores we leverage
`detectCores()` function from the \CRANpkg{parallel}
package.
```{r expar}
library("parallel")
param <- setParamValue(param, "spectrum", "threads", detectCores())
```
Actuall excetion of the X!Tandem searches. As the output we will
capture the names of corresponging XML files with the MS/MS search
results.
```{r rtout}
output.files <-
lapply(sub("\\.gz","",files$fileName),
function(x) {
param <- setParamValue(param, 'spectrum', 'path', value=x)
output.file <- tandem(param)
})
```
Since X!Tandem reports the results in its own XML format, parsing then
is a little bit involved process. As we mentioned above, fortunately,
the most recent version of (X!Tandem)[http://www.thegpm.org/tandem/]
PILEDRIVER (2015.04.01) is capable of producing results in `mzIdentML`
format. Then extracting the results will be trivial with the help of
`r Biocpkg("mzID")` package. The change will be probably caught up in
the upcoming versions of the `r Biocpkg("rTANDEM")` package. However,
at this point we will use a custom function that produces a
`data.frame` with all the necessary columns. The columns that are
required for the `r Biocpkg("MSnIDP")` package to create an MS/MS
identification object are: accession, calculatedMassToCharge,
chargeState, experimentalMassToCharge, isDecoy, peptide, spectrumFile
and spectrumID.
```{r xttodf, warning=FALSE, message=FALSE}
library("dplyr")
.P <- MSnID:::.PROTON_MASS # 1.007276
## parse to comply with MSnID
xtandem_to_df <- function(output.file){
res <- GetResultsFromXML(output.file)
proteins <- GetProteins(res)
peptides <- GetPeptides(protein.uid = proteins$uid, results = res)
peptides <- select(as.data.frame(res@peptides),
prot.uid, spectrum.mh, mh, expect.value,
tandem.score, start.position, end.position,
spectrumID = spectrum.id,
chargeState = spectrum.z,
pepSeq = sequence)
peptides <- mutate(peptides,
calculatedMassToCharge = (mh + .P*(chargeState - 1))/chargeState,
experimentalMassToCharge = (spectrum.mh + .P*(chargeState - 1))/chargeState,
spectrum.mh = NULL,
mh = NULL)
proteins <- select(as.data.frame(res@proteins),
prot.uid=uid, sequence, description)
proteins <- mutate(proteins,
accession = sub("(\\S+)\t.*", "\\1", description),
isDecoy = grepl("XXX_", accession))
x <- inner_join(peptides, proteins)
pre <- with(x, substr(sequence, start.position-1, start.position-1))
pre <- ifelse(pre == "", "-", pre)
post <- with(x, substr(sequence, end.position+1, end.position+1))
post <- ifelse(post == "", "-", post)
x <- mutate(x,
peptide = paste(pre,x$pepSeq, post, sep='.'),
sequence = NULL)
return(x)
}
out <- lapply(output.files, xtandem_to_df)
for(i in seq_along(out))
out[[i]]$spectrumFile <- sub("\\.mzML\\.gz","",files$fileName)[i]
out <- Reduce(rbind, out)
```
### Filtering identification data
The *MSnID* class object from the library of the same name
allows flexible manipulation and filterting of MS/MS identification
results data.
```{r msnidm}
library(MSnID)
m <- MSnID()
psms(m) <- out
show(m)
```
Let's take a look and mass measurement error and a large scale. Some
peptide-to-spectrum matches have the error as large as +1 and +2
Dalton. This is due to the fact that the instrument sometime picks
non-monoisotopic peaks for fragmentation. This is more common for the
peptides with large mass.
```{r msnisfig, fig.cap = "Isotope selection error before and after correction."}
library("ggplot2")
dM <- with(psms(m),
round((experimentalMassToCharge-calculatedMassToCharge)*chargeState))
x <- as.data.frame(table(data.frame(dM, isDecoy=m$isDecoy)))
x <- as.data.frame(table(dM, isDecoy=m$isDecoy))
ggplot(x, aes(x=dM, fill=isDecoy, y=Freq)) +
geom_bar(stat="identity", width=0.25)
```
For correction of the parent ion mass at the isotope-scale level there
is a method with the corresponding name `correct_peak_selection()`.
```{r correctpeaksel, fig.with=6, fig.height=6}
m <- correct_peak_selection(m)
dM <- with(psms(m),
round((experimentalMassToCharge-calculatedMassToCharge)*chargeState))
x <- as.data.frame(table(data.frame(dM, isDecoy=m$isDecoy)))
x <- as.data.frame(table(dM, isDecoy=m$isDecoy))
ggplot(x, aes(x=dM, fill=isDecoy, y=Freq)) +
geom_bar(stat="identity", width=0.25)
```
The mass measurement error at a finer grain, typicaly related to the
mis-calibrated instument, can be accessed with
`mass_measurement_error` method.
```{r parentionmass, fig.cap = "Parent ion mass measurement error histograms for non-decoy and decoy peptide-to-spectrum matches"}
mme <- data.frame(ppm=mass_measurement_error(m), isDecoy=m$isDecoy)
ggplot(mme, aes(x=ppm, fill=isDecoy)) +
geom_histogram(binwidth=1) +
xlim(-20,+20) +
facet_wrap( ~ isDecoy) +
scale_y_sqrt()
```
Finding the right set of filtering criteria for MS/MS data is not a
trivial task. For the purpose of flexible filter optimization the
`r Biocpkg("MSnID")` package has the *MSnIDFilter* class. We will
start with defining the criteria we want to filter on.
```{r msnidfilt}
m$score <- -log10(m$expect.value) # higher the better
m$mme <- abs(mass_measurement_error(m)) # lower the better
fltr <- MSnIDFilter(m)
```
Let define some starting values. What is important is to define what
should be retained higher or lower the given threshold. Threshold
itself can be set to arbitrary value if "Grid" optimization method
will be used as a first pass. "Grid" optimization loops through a
preset number (`n.iter` argument) of combinations of thresholds
ignoring the initial values. In the filtering criteria optimization
routines, the objective function is the number of peptide-to-spectrum
matches, peptide or protein identifications defined by the
`level` argument. However, if the FDR (estimated using reverse
sequences) exceeds the pre-defined maxium value, the objective
function sets to zero.
```{r }
fltr$score <- list(comparison=">", threshold=0)
fltr$mme <- list(comparison="<", threshold=0)
fltr.grid <- optimize_filter(fltr, m, fdr.max=0.01,
method="Grid", level="peptide",
n.iter=1000)
show(fltr.grid)
```
```{r }
as.numeric(fltr.grid)
evaluate_filter(m, fltr.grid)
```
Further fine-tuning of the filtering criteria can be approached with
Nelder-Mead or simulated annealing optimization techniques specified
by the `method` arugment.
```{r filtsann, cache=TRUE}
fltr.sann <- optimize_filter(fltr.grid, m, fdr.max=0.01,
method="SANN", level="peptide",
n.iter=1000)
show(fltr.sann)
as.numeric(fltr.sann)
evaluate_filter(m, fltr.sann)
```
Applying the filter.
```{r applsann}
m <- apply_filter(m, fltr.sann)
```
Visualizing the number of peptides covering the proteins.
```{r nbidpeps, fig.cap = "Number of identified peptides per protein. Almost half of the proteins are 'single-hit wonders', i.e. they have been identified with one peptide only."}
pp <- unique(psms(m)[,c("peptide","accession")])
pep_per_prot <- as.data.frame(table(table(pp$accession)))
pep_per_prot$peptides <- with(pep_per_prot,
ifelse(as.numeric(Var1) > 5, "6+", as.numeric(Var1)))
ggplot(pep_per_prot,
aes(x=factor(1), fill=peptides, y=Freq)) +
geom_bar(width=1, stat = "identity", position = "stack") +
coord_polar(theta = "y") +
ylab('') + xlab('') +
scale_fill_discrete(name="number of\npeptides per\nprotein")
```
Filtering out single-hit-wonders (that is proteins covered with only
one peptide). Note the massive drop in FDR (down to 0\%). Therefore, if
this particular path of filtering MS/MS identifications seems
stringent the steps can be easily flipped around in the script or
re-arranged in any other way that produce the results suitable for a
particular application.
```{r applflitbigex}
nms <- names(which(table(pp$accession) > 1))
m <- apply_filter(m, "accession %in% nms")
show(m)
```
### Exploratory data analysis
We continue using the `MSnID` object results from X!Tandem
searches of 10 *C. elegans* files. First, we will convert it to
an `MSnSet` object that is more taylored to quantitative
analysis and explore the data using the `r Biocpkg("msmsEDA")` package.
```{r }
mset <- as(m, "MSnSet")
show(mset)
```
Summing up peptide counts to protein (or assession in general terms)
level. The redundancy of peptide to protein mapping is controlled by
the `redundancy.handler` argument. In this case we will use
only uniquely mapping peptides for protein quantification.
```{r cmbfeats}
library("MSnbase")
mset <- combineFeatures(mset,
fData(mset)$accession,
redundancy.handler="unique",
fun="sum",
cv=FALSE)
show(mset)
```
Subsetting to the ones that were detected in at least half of the samples
```{r filt}
mset <- mset[rowSums(exprs(mset) > 0) >= 6, ]
```
Updating `phenoData` using the meta-information on experimental
designed fetched
from
[ProteomeXchange project page](http://www.ebi.ac.uk/pride/archive/projects/PXD002161).
```{r prep}
pData(mset) <- meta[match(sampleNames(mset), meta$datasetName),]
mset$phenotype <- as.factor(mset$phenotype)
sampleNames(mset) <- mset$sampleName
pData(mset)
```
The PCA plot below shows clear separation of control and *daf-2*
mutant *C. elegans* strains.
```{r scpca, fig.cap = "Clear separation of control and daf-2 strains on PCA plot indicates on substantial difference in proteome composition."}
library("msmsEDA")
counts.pca(mset,
facs = pData(mset)[,'phenotype',drop=FALSE],
snms = sampleNames(mset))
```
### Statistical analysis
Null hypothesis significance testing using quasi-likelihood Poisson
approach from the `r Biocpkg("msmsTests")` package. The p-value histogram
shows a substantial increase of p-value frequency in low-value bins
(tailing-up), indicating a significant proteome difference between the
strains.
```{r phist, fig.cap = "Histogram of p-values."}
library("msmsTests")
alt.f <- "y ~ phenotype"
null.f <- "y ~ 1"
div <- colSums(exprs(mset)) # normalization factor
res <- msms.glm.qlll(mset, "y ~ phenotype", "y ~ 1", div=div)
hist(res$p.value, 100)
```
A volcano plot is an alternative way to visualize the estimate fold
and significance of change of change.
```{r scvolc, fig.cap = "Volcano plot"}
lst <- test.results(res, mset, pData(mset)$phenotype,
"ctrl", "daf2", div, alpha = 0.05,
minSpC = 0, minLFC = 1, method = "BH")
res.volcanoplot(lst$tres, min.LFC = 1,
max.pval = 0.05, ylbls = NULL, maxy = 4)
```
Finally, we use a heatmap to visualize relative protein abundances
across the study samples. In this example we will subset the proteins
only to the ones that have a fold change (up or down) more then 2-fold
and pass the 0.05 threshold for the adjusted p-value.
```{r hmap, fig.cap = "Heatmap of top proteins."}
library("Heatplus")
regulated <- subset(lst$tres, adjp < 0.05 & abs(LogFC) > 1)
## order MSnSet object the daf-16 status
mset <- mset[,order(pData(mset)$phenotype)]
## matrix with regulated proteins
selected.data <- exprs(mset[rownames(regulated),])
## scaling counts from 0 to 1
selected.data <- sweep(selected.data, 1, apply(selected.data, 1, min), '-')
selected.data <- sweep(selected.data, 1, apply(selected.data, 1, max), '/')
heatmap_plus(selected.data,
scale='none',
col=colorRampPalette(c("snow","steelblue"))(10))
```
### Session information {-}
```{r si}
sessionInfo()
```