forked from philippberens/lifespan
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathanalysis.Rmd
482 lines (320 loc) · 19.5 KB
/
analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
---
title: "Unclear Evidence for a Limit to Human Lifespan"
author: "Philipp Berens and Tom Wallis"
date: "October 14, 2016"
output: github_document
# output:
# tufte::tufte_handout: default
# # tufte::tufte_html: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
library(dplyr)
library(rstan)
library(rstanarm)
library(BayesFactor)
library(XML)
library(stringr)
```
Dong et al. claim to present statistical evidence in favor of an absolute limit to the human lifespan. Here we present a reanalysis of central figures in their paper showing that (1) the main data of their paper does not provide evidence one way or another and (2) data analyzed in the Extended Data may be incomplete but completing it showed some evidence for their claims. Nevertheless, modeling maxima of distributions by assuming Gaussian noise makes inappropriate assumptions. A new analysis using extreme value theory is required to determine whether the data provide evidence for the authors' claims.
# The model by the authors (Figure 2a)
The authors graph the maximum age reported at death (MRAD) for each year between 1968 and 2006. We acquired the data using WebPlotDigitizer and rounded the numbers to full years (which is what likely was the case for the original data). Originally the data came from the [IDL Database](http://www.supercentenarians.org/).
Here is the raw data, as presented by the authors, fitting separate regression for years up to 1994 and after 1995.
```{r, echo=FALSE}
tbl <- read.csv('lifeexpectancy.csv')
tbl <- round(tbl)
tbl$Group <- factor(tbl$Year>=1995, levels = c("FALSE", "TRUE"), labels = c("<1995", ">=1995"))
```
The plot shows the raw data points in black and separate linear fits with 95%-CIs for years before and after 1995. It is not clear from the paper why the authors chose 1995 as a point to separate models. In a [response](https://publons.com/review/480517/) to a critical review on publons, they provide an analysis with a segmented regression package, choosing 1998 as break point.
We can also obtain the statistics for this model by fitting a linear model with the additional group-variable as predictor including interactions, allowing for a changes slope and offset for the data after 1995.
```{r author_model, fig.width=4.5, fig.height=3.2}
mdl1 <- lm(Age~Year*Group,tbl)
summary.lm(mdl1)
tbl2 <- tbl
tbl2$yhat <- predict(mdl1)
plt <- ggplot(tbl2, aes(x = Year, y = Age, colour = Group)) +
geom_point() +
geom_line(aes(y = yhat)) +
ylab('Yearly maximum reported age at death (years)')
plt
```
Consistent with the paper, the fitted model has a slope of `r signif(mdl1$coefficients['Year'],digits = 3)` years for years before 1995 and one of `r signif(mdl1$coefficients['Year']+mdl1$coefficients['Year:Group>=1995'],digits = 3)` for years afterwards (compare their Figure 2a).
We also plot the regression line from this combined model to show that it is the same as the authors' separate regression fits.
# A linear model
A simple alternative hypothesis to the claim of the authors would be that MRAD actually keeps increasing and therefore, that there is no limit to human lifespan. We are agnostic as to whether or not this model makes plausible predictions (apparently people have argued both ways), but it does represent a simple comparison model:
```{r, linear_model, echo=FALSE, fig.width=3.5, fig.height=3.2}
ggplot(tbl,aes(x=Year,y=Age)) + geom_point() +
geom_smooth(method=lm) +
ylab('Yearly maximum reported age at death (years)')
```
The plots shows the raw data points again, with a linear regression with 95% CIs fitted to all the data.
```{r}
mdl2 <- lm(Age~Year,tbl)
summary.lm(mdl2)
```
In this case, MRAD increases slightly by `r signif(mdl2$coefficients['Year'],digits = 2)` years per year.
# Showing the models side-by-side
```{r combined_plot, fig.width=6, fig.height=3.2}
# prepare data:
tmp1 <- tbl
tmp2 <- tbl
tmp1$model <- "Trend break"
tmp1$yhat <- predict(mdl1)
tmp2$model <- "Linear"
tmp2$yhat <- predict(mdl2)
combined_data <- rbind(tmp1, tmp2)
plt <- ggplot(combined_data, aes(x = Year, y = Age, colour = Group)) +
facet_grid(~ model) +
geom_point() +
geom_line(aes(y = yhat))
# appearance:
plt <- plt +
scale_y_continuous(name = "Yearly maximum reported age at death (years)",
breaks = seq(110, 126, by = 2)) +
ylab("Yearly maximum reported age at death (years)") +
theme_minimal(base_size = 8) +
theme(panel.grid.major = element_line(colour = "grey90", size = 0.5)) +
theme(panel.margin = unit(2, "lines")) +
scale_color_manual(values = c("#4A87CB", "#E76826"), name = "") +
theme(legend.position = "bottom")
plt
#ggsave("combined_data_plot.pdf", width = 3.5, height = 3.2)
#ggsave("combined_data_plot.eps", width = 3.5, height = 3.2)
```
# Model comparison
Which model is better? In the paper, the authors fail to provide evidence for their model, they seem to argue that the data looks like there is a saturation effect or a decline in MRAD after 1995.
One can do better and objectively compare the two fitted models. If we look at the output of the models above, the model by the authors explains a little more variance in the data than the linear model (0.42 vs. 0.29). On the other hand, the model also uses four parameters to do so, compared to only two in the linear model.
We can therefore ask if the increase in explained variance is "worth" the additional parameters.
A number of model comparison metrics exist; in general these weigh the tradeoff between model fit and complexity differently.
We present the results of several classical model comparison metrics below.
## Classical ANOVA
First, we consider the two models as a nested set and compare them using classical ANOVA.
```{r}
anova(mdl2, mdl1)
```
The two extra degrees of freedom in the Dong et al model does not lead to a statistically-significant improvement in residual error over the simple linear model (following the traditional $p < .05$ cutoff for significance).
## AIC
Another comparison metric with appealing theoretical links to information theory (see Burnham & Anderson, 2002) is the so-called Akaike Information Criterion (AIC).
In the AIC, smaller values denote better models.
```{r}
AIC(mdl1)
AIC(mdl2)
```
Here, the model of Dong et al has the lower AIC and is thus the preferred model.
However, following the heuristics suggested by Burnham & Anderson (2002, p.70), an AIC difference of `r signif(AIC(mdl1) - AIC(mdl2), digits=2)` indicates that the data provide "substantial" support for the simpler linear model.
## BIC
A related model comparison metric is the Bayesian Information Criterion (BIC), which is more conservative than AIC because it additionally penalises models with more parameters.
```{r}
BIC(mdl1)
BIC(mdl2)
```
Following Raftery (1995), a BIC difference of `r signif(BIC(mdl1) - BIC(mdl2), digits=2)` is not worth mentioning, providing no evidence of one versus the other model.
## Bayes Factors
Finally, Bayes Factors (the ratio of the posterior model evidence), can be easily computed to compare simple linear models using the BayesFactor package (Morey & Rouder, 2015).
```{r}
bf1 <- lmBF(Age~Year*Group, tbl)
bf2 <- lmBF(Age~Year, tbl)
bf1 / bf2
```
Under these default priors (assuming for example that both models are equally likely *a priori*), the models receive approximately equal support from the data (the Dong et al model is favoured with an odds ratio of 1.24-to-1).
```{r test_bf_effects, echo=FALSE, results="hide"}
bf1_wide <- lmBF(Age~Year*Group, tbl, rscaleFixed = "ultrawide", rscaleCont = "ultrawide")
bf2_wide <- lmBF(Age~Year, tbl, rscaleFixed = "ultrawide", rscaleCont = "ultrawide")
bf1_wide / bf2_wide
```
Changing the priors of BayesFactor from "medium" to "ultrawide" on the standardised effect size scale did not appreciably affect these conclusions; with ultrawide effect-size priors the simpler model is now preferred with odds of 1.09-to-1.
Note that in the above mentioned [response](https://publons.com/review/480517/) the authors argue that the segmented regression model provides a better AIC than the linear model. We will add the segmented model here later.
## Bayesian estimation of model parameters
Here we take a Bayesian approach to model estimation, and fit the full linear model including interaction terms.
We employ a Student-t prior with a mean of zero, standard deviation of 2.5 and five degrees of freedom, which yields modest shrinkage of the coefficients towards zero, i.e. enforcing some conservatism in inference.
We fit the models using the package `rstanarm`, which allows relatively straightforward use of Bayesian methods.
```{r fit_stan_model}
bmdl <- stan_glm(Age~Year*Group, tbl,
prior = student_t(5, 0, 2.5),
prior_intercept = student_t(5, 0, 50),
family = gaussian(),
adapt_delta = 0.99)
```
We can summarize the fitted model and plot the posterior density over the parameters:
```{r, fig.width=5, fig.height=5}
plot(bmdl,'dens')
bmdl
```
Comparing the fitted model to the frequentist models above shows that the posterior median of the linear effect of Year (`r signif(bmdl$coefficients['Year'],digits=3)`) is similar to the estimated value above (`r signif(mdl1$coefficients['Year'],digits=3)`), but shrunken towards zero by the prior.
The posterior density on the interaction term is centered around zero during inference, arguing that there is little evidence of a different slope after 1995.
There is a small effect of the interaction term on the y-intercept, increasing the estimated y-intercept by `r signif(mdl1$coefficients['Group>=1995'],digits=1)`. This is likely an artefact of the model parametrization.
```{r plot_bayes_preds, fig.width=3.5, fig.height=3.2}
draws <- as.data.frame(as.matrix(bmdl))
X <- draws[1:200,1:4]
foomdl <- mdl1
tbl2 <- tbl["Year"]
base <-ggplot(tbl, aes(x = Year, y = Age))
for (i in 1:200){
foomdl$coefficients <- c(X[i,1], X[i,2], X[i,3], X[i,4])
tbl2["Pred"] <- predict.lm(foomdl,tbl)
base <- base + geom_line(data=tbl2,mapping = aes(x=Year, y=Pred), color="skyblue", alpha=0.5,size=1.1)
}
X = coef(bmdl)
foomdl$coefficients <- c(X[1], X[2], X[3], X[4])
tbl2["Pred"] <- predict.lm(foomdl,tbl)
base +
geom_point() +
geom_line(data=tbl2,mapping = aes(x=Year, y=Pred),size=1.1)
```
# Extended data figure
The authors present a similar dataset from an independent source, the Gerontological Research Group, in Extended Data Figure 6.
They find similar results to those reported for the main analysis, and they argue that this provides independent evidence for their central analysis.
We again aquired these data using WebPlotDigitizer, and present them below.
```{r extended_data_import, echo=FALSE}
ext_tbl <- read.csv('extended_data_6.csv')
ext_tbl <- round(ext_tbl)
ext_tbl$Group <- factor(ext_tbl$Year>=1995, levels = c("FALSE", "TRUE"), labels = c("<1995", ">=1995"))
```
```{r author_model_extended, echo=FALSE, fig.width=3.5, fig.height=3.2}
ext_mdl1 <- lm(Age~Year*Group,ext_tbl)
summary.lm(ext_mdl1)
# additionally show that you get the same model fit from this combined regression:
blah <- ext_tbl
blah$yhat <- predict(ext_mdl1)
plt <- ggplot(blah, aes(x = Year, y = Age, colour = Group)) +
geom_point() +
geom_line(aes(y = yhat)) +
ylab('Yearly maximum reported age at death (years)')
plt
```
The fitted model has a slope of `r signif(ext_mdl1$coefficients['Year'],digits = 3)` years for years before 1995 (their slope = 0.1194) and one of `r signif(ext_mdl1$coefficients['Year']+ext_mdl1$coefficients['Year:Group>=1995'],digits = 3)` (their slope = -0.1367) for years afterwards (compare their Extended Data Figure 6).
Differences to their model fit likely reflect data uncertainty from the digitisation process (owing to the poorer resolution of this figure in the paper).
```{r extended_data_lm, echo=FALSE}
ext_mdl2 <- lm(Age~Year,ext_tbl)
anova(ext_mdl2, ext_mdl1)
ext_bf1 <- lmBF(Age~Year*Group, ext_tbl)
ext_bf2 <- lmBF(Age~Year, ext_tbl)
ext_bf1 / ext_bf2
```
Here, the ANOVA analysis does provide support for the Dong et al model.
What about the other model comparison metrics?
The AIC difference between the models is `r signif(AIC(ext_mdl1) - AIC(ext_mdl2), digits=2)`, which corresponds to "considerably less" support for the simple linear model relative to the authors' model.
The BIC difference is `r signif(BIC(ext_mdl1) - BIC(ext_mdl2), digits=2)`, which provides "positive evidence" (Raftery, 1995) in support of the authors' model.
Finally, the Bayes factor computed as above is about 8.8, which also provides positive support for the authors' model.
An important caveat for these data is that they are missing observations between 1989-1996. Since this data spans the key years of the "break", they could have an important influence on these conclusions. The Dong et al paper does not clarify why this data is missing from the authors' plot (compare to data table online [here](http://www.grg.org/Adams/A.HTM)).
# Recovering missing data
In collaboration with [Adam Lenart](http://www.sdu.dk/staff/alenart), we recovered the full dataset from the Gerontology Research Group [website](http://www.grg.org/Adams/A.HTM), assuming that this was the source of the data the authors used.
This is a dataset of "verified supercentenarians" as of January 1, 2014.
```{r pull_grg_data, echo=FALSE, warning=FALSE}
# This code thanks to Adam Lenart
# read data in
theurl <- "http://www.grg.org/Adams/A_files/sheet001.htm"
tables <- readHTMLTable(theurl,header=TRUE,skip.rows=1:12)
dat <- tables[[1]][1:1627,]
# format data set
dat$Birthpl <- as.character(dat$Birthplace)
dat$Deathpl <- as.character(dat$"Residence/Place of death")
removeParsQM <- function(x) unlist(strsplit(x,split=" \\(|\\?"))[1]
dat$BirthC <- sapply(FUN=removeParsQM,X=dat$Birthpl)
dat$DeathC <- sapply(FUN=removeParsQM,X=dat$Deathpl)
dat$Dead <- ifelse(dat$Died=="",yes=1,no=0)
dat$Age <- as.numeric(as.character(dat$Years))+(as.numeric(as.character(dat$Days))+0.5)/365.25
dat2 <- dat
# ---------------- Date of Death ------------------- #
Sys.setlocale("LC_TIME", 'English')
dD <- as.character(dat$Died)
dD <- gsub("Sept.", "September", dD)
deathDate <- as.Date(as.character(dD),"%b. %d, %Y")
deathDate[is.na(deathDate)==TRUE] <-
as.Date(as.character(dD)[is.na(deathDate)==TRUE],"%B %d, %Y")
dat$deathDate <- deathDate
dat2$deathDatePX <- as.POSIXlt(deathDate)
dat2$deathYear <- dat2$deathDatePX$year+1900
grg <- dat2
# create group code:
grg$Group <- factor(grg$deathYear>=1995, levels = c("FALSE", "TRUE"), labels = c("<1995", ">=1995"))
# save to a binary:
save(grg, file = "GRG_data.RData")
write.csv2(grg, file = "GRG_data.csv")
```
Here's all the data.
The smoothing curve shows a local polynomial regression, and indicates an *upward* trend in age-at-death in recent years:
```{r plot_all_grg, echo=FALSE}
plt <- ggplot(grg, aes(x = deathYear, y = Age)) +
geom_point(alpha = 0.5) +
stat_smooth(method = "loess")
plt
```
Now we can do the appropriate transformations and get the MRAD data.
Note that unlike in their analysis, here we're not rounding to the nearest year.
```{r compute_mrad_grg}
grg_mrad <- grg %>%
select(-deathDatePX) %>%
rename(Year = deathYear) %>%
filter(Year >= 1950) %>%
group_by(Year) %>%
summarise(Age = max(Age)) %>%
mutate(Group = factor(Year>=1995,
levels = c("FALSE", "TRUE"),
labels = c("<1995", ">=1995")))
```
Here's a plot of that MRAD data:
```{r plot_grg_MRAD, echo=FALSE}
plt <- ggplot(grg_mrad, aes(x = Year, y = Age)) +
geom_point(alpha = 0.5) +
stat_smooth(method = "loess") +
ylab("Yearly maximum reported age at death (years)")
plt
```
There seem to be no shortage of datapoints between the years missing in the authors' plot.
Note that the smooth indicates a downward trend in the maximum ages.
The authors' Figure 2a shows MRAD computed from the [IDL Database](http://www.supercentenarians.org/), only for countries France, Japan, UK and US.
The GRG dataset is worldwide.
How similar are the two datasets?
```{r compare_datasets, echo=FALSE}
merge1 <- grg_mrad
merge1$source <- "GRG"
merge2 <- tbl
merge2$source <- "IDL"
merged <- rbind(merge1, merge2)
# merged$Age <- round(merged$Age)
ggplot(merged, aes(x = Year, y = Age, colour = source)) +
geom_point(alpha = 0.7) +
stat_smooth(method = "loess") +
ylab("Yearly maximum reported age at death (years)")
```
Note the lack of rounding and wider range of years for the GRG set.
## Fitting models to the GRG dataset
We can now fit and compare the two candidate models for this dataset.
```{r author_model_grg}
grg_m1 <- lm(Age~Year*Group, grg_mrad)
summary.lm(grg_m1)
# additionally show that you get the same model fit from this combined regression:
blah <- grg_mrad
blah$yhat <- predict(grg_m1)
plt <- ggplot(blah, aes(x = Year, y = Age, colour = Group)) +
geom_point() +
geom_line(aes(y = yhat)) +
ylab('Yearly maximum reported age at death (years)') +
ggtitle("Author model fit to GRG data")
plt
```
```{r lin_model_grg, echo=FALSE}
grg_m2 <- lm(Age~Year, grg_mrad)
summary(grg_m2)
anova(grg_m2, grg_m1)
grg_bf1 <- lmBF(Age~Year*Group, as.data.frame(grg_mrad))
grg_bf2 <- lmBF(Age~Year, as.data.frame(grg_mrad))
#
grg_bf1 / grg_bf2
```
Including the missing data from the table substantially increases support for the authors' model over a simple linear model.
The ANOVA is highly significant; the AIC difference is `r signif(AIC(grg_m1) - AIC(grg_m2), digits=2)`, which corresponds to "essentially no" support for the simple linear model relative to the authors' model.
The BIC difference is `r signif(BIC(grg_m1) - BIC(grg_m2), digits=2)`, which provides "very strong" (Raftery, 1995) support for the authors' model over the simple linear one.
Finally, the Bayes factor computed as above is about 150, which also provides very strong positive support for the authors' model.
# Conclusion
The model comparison metrics presented here, using both Frequentist, information theoretic and Bayesian approaches, suggest that the main data reported in the paper provide no support for or against the idea that there is a limit to human lifespan. A simple linear model showing a positive relationship between year and lifespan is just as plausible given the data from Figure 2a. However, the data we recovered from GRG website do provide strong support for a segmented "trend-break" model over a linear one.
Nevertheless, there are deeper statistical issues regarding these models and the analysis presented by Dong et al. For example, it is questionable to model the extreme values of a distribution such as age-at-death by assuming linearity and Gaussian noise (Coles, 2001). Even if one accepts this as a valid modeling framework however, a model comparison like we performed here should have clearly been part of the Dong et al. paper.
# References
* Burnham, K. P., & Anderson, D. R. (2002): Model selection and multimodel inference a practical information-theoretic approach. New York: Springer.
* Coles, S. (2001): An introduction to statistical modeling of extreme values. London: Springer.
* Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795.
* Morey and Rouder (2015). BayesFactor: Computation of Bayes Factors for Common Designs. R package version 0.9.12-2. https://CRAN.R-project.org/package=BayesFactor
* Raftery, A. E. (1995). Bayesian model selection in social research. Sociological Methodology, 111–163.
* Vehtari, Gelman and Gabry (2016): Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC, arxiv [link](https://arxiv.org/pdf/1507.04544v5.pdf)