-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathBetadiversity.Rmd
executable file
·189 lines (140 loc) · 5.77 KB
/
Betadiversity.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
---
title: "Beta diversity and microbiome divergence"
author: "Leo Lahti, Sudarshan Shetty et al."
bibliography:
- bibliography.bib
output:
BiocStyle::html_document:
number_sections: no
toc: yes
toc_depth: 4
toc_float: true
self_contained: true
thumbnails: true
lightbox: true
gallery: true
use_bookdown: false
highlight: haddock
---
<!--
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{microbiome tutorial - variability}
%\usepackage[utf8]{inputenc}
%\VignetteEncoding{UTF-8}
-->
## Beta diversity
Beta diversity quantifies dissimilarity in community composition between samples. Dissimilarity can be also quantified by _distance_ or _divergence_. These measures have a broad use in statistical data analysis.
The [vegan R package](https://cran.r-project.org/web/packages/vegan/index.html) and the [phyloseq R package](https://bioconductor.org/packages/release/bioc/html/phyloseq.html) implement a number of standard ecological dissimilarity measures implemented in the 'vegdist' function.
Here, we show brief examples on how to compare sample heterogeneity between groups and over time.
Load example data
```{r divergence-example2, message=FALSE}
library(microbiome)
library(dplyr)
data(peerj32)
pseq <- peerj32$phyloseq
```
## Intra-individual divergence
Divergence within subjects may increase following intervention.
```{r divergence2c, message=FALSE, warning=FALSE}
library(tidyverse)
betas <- list()
groups <- as.character(unique(meta(pseq)$group))
for (g in groups) {
df <- subset(meta(pseq), group == g)
beta <- c()
for (subj in df$subject) {
# Pick the samples for this subject
dfs <- subset(df, subject == subj)
# Check that the subject has two time points
if (nrow(dfs) == 2) {
s <- as.character(dfs$sample)
# Here with just two samples we can calculate the
beta[[subj]] <- divergence(abundances(pseq)[, s[[1]]],abundances(pseq)[, s[[2]]],method = "bray")
}
}
betas[[g]] <- beta
}
# boxplot
df <- as.data.frame(unlist(betas))
s<- rownames(df)
si<- as.data.frame(s)
si<- separate(si, s, into = c('names','s'))
df1<- bind_cols(df, si)
rownames(df1)<- df1$s ; df1$s<- NULL
p<- ggplot(df1, aes(x = names, y = `unlist(betas)`))+ geom_boxplot() + ylab('') + xlab('')
plot(p)
```
## Divergence within individual over time
Community divergence within individual often tends to increase over time with respect to the baseline sample.
```{r homogeneity-example2d, message=FALSE, warning=FALSE}
library(MicrobeDS)
library(microbiome)
library(dplyr)
library(vegan)
data(MovingPictures)
# Pick the metadata for this subject and sort the
# samples by time
# Pick the data and modify variable names
pseq <- MovingPictures
s <- "F4" # Selected subject
b <- "UBERON:feces" # Selected body site
# Let us pick a subset
pseq <- subset_samples(MovingPictures, host_subject_id == s & body_site == b)
# Rename variables
sample_data(pseq)$subject <- sample_data(pseq)$host_subject_id
sample_data(pseq)$sample <- sample_data(pseq)$X.SampleID
# Tidy up the time point information (convert from dates to days)
sample_data(pseq)$time <- as.numeric(as.Date(gsub(" 0:00", "", as.character(sample_data(pseq)$collection_timestamp)), "%m/%d/%Y") - as.Date("10/21/08", "%m/%d/%Y"))
# Order the entries by time
df <- meta(pseq) %>% arrange(time)
# Calculate the beta diversity between each time point and
# the baseline (first) time point
beta <- c() # Baseline similarity
s0 <- subset(df, time == 0)$sample
# Let us transform to relative abundance for Bray-Curtis calculations
a <-microbiome::abundances(microbiome::transform(pseq, "compositional"))
for (tp in df$time[-1]) {
# Pick the samples for this subject
# If the same time point has more than one sample,
# pick one at random
st <- sample(subset(df, time == tp)$sample, 1)
# Beta diversity between the current time point and baseline
b <- vegdist(rbind(a[, s0], a[, st]), method = "bray")
# Add to the list
beta <- rbind(beta, c(tp, b))
}
colnames(beta) <- c("time", "beta")
beta <- as.data.frame(beta)
theme_set(theme_bw(20))
library(ggplot2)
p <- ggplot(beta, aes(x = time, y = beta)) +
geom_point() +
geom_line() +
geom_smooth() +
labs(x = "Time (Days)", y = "Beta diversity (Bray-Curtis)")
print(p)
```
## Inter-individual divergence / spread
Divergence within a sample set quantifies the overall heterogeneity in community composition across samples or individuals. This is sometimes quantified as the average dissimilarity of each sample from the group mean; the dissimilarity can be quantified by beta diversity as in [Salonen et al. ISME J 2014](http://www.nature.com/ismej/journal/v8/n11/full/ismej201463a.html) (they focused on homogeneity using inverse divergence but the measure is essentially the same).
Calculate divergences within the LGG (probiotic) and Placebo groups with respect to the median profile within each group.
```{r divergence-example2bb, message=FALSE}
pseq <- peerj32$phyloseq
b.pla <- divergence(subset_samples(pseq, group == "Placebo"),
apply(abundances(subset_samples(pseq, group == "Placebo")), 1, median))
b.lgg <- divergence(subset_samples(pseq, group == "LGG"),
apply(abundances(subset_samples(pseq, group == "LGG")), 1, median))
```
The group with larger values has a more heterogeneous community composition.
```{r divergence-example2bbb, message=FALSE}
library(reshape)
l<- list(b.pla, b.lgg)
df<- melt(l)
df$L1[df$L1 == '1']<- 'placebo'
df$L1[df$L1 == '2']<- 'LGG'
df$L1<- factor(df$L1, levels = c('placebo','LGG'))
p<- ggplot(df, aes(x = L1, y = value)) + geom_boxplot()+ xlab('')
plot(p)
```
See [Community comparisons](Comparisons.html) for examples on group-level comparisons based on beta diversity.
#####
For visualizing temporal trajectory in beta diversity [click here](TemporalMicrobiotaTrajectory.html)