-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
273 lines (203 loc) · 11.6 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
---
output: github_document
editor_options:
chunk_output_type: console
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "figures/README/",
fig.width = 9,
fig.height = 6,
dpi = 300,
out.width = "100%"
)
```
# coronaR
The motivation for such an R package and the plots it produces is that most people don't know if X deaths caused by something is a lot or not. Indeed, unless you are a demographer, you probably have no idea how many people die in the world or even in your city, over a year, or over a day. (If you want to know, a good rule of thumb is that over a given year, in a "rich" country, you can expect roughly 1% of its whole population to die.)
## General tips for interpreting the plots
### How to read the x-axis?
Here I express the number of _reported_ COVID19 deaths compared to 100 "normal" deaths.
For example, a value of 50 would imply that if 100 normal deaths used to occur in a city, then we would observe 150 deaths in total (100 normal + 50 COVID19 ones) assuming that normal deaths have not changed.
This is more informative than expressing the result as a percentage: if the COVID deaths reach very large numbers, we will still see on this graph such an increase progressing linearly. Instead, expressed as a percentage, the death toll would just slowly converge towards 100%.
If you still want to express such number as a percentage, just do:
If you still want to express such number as a percentage, just do `100 * X/(100 + X)`. For example, with 50 COVID death per 100 normal ones,
the percentage of COVID death is thus 100 * 50/150 = 33.333%.
If instead you want to instead express values along the x-axis as the odd of dying from COVID, just divide the value X you read on the axis by 100: `(X / (100 + X))/(100 / (100 + X)) = X/100`. For example, with 50 COVID death per 100 normal ones,
the odd of COVID death is thus 50/100 = 0.5.
### Daily vs cumulative deaths
The package allows to explore or plot either daily deaths or cumulated deaths since a given country had reached a total of 10 deaths. Daily deaths are susceptible to vary due to lags in reporting alone. For this reason, I try to discount the lump reports from nursing homes when they correspond to several days, but I cannot keep track of all of them.
Cumulative deaths are thus probably better since when the deaths are being reported matters less.
### Baseline
When deaths are counted on a daily basis, the baseline is also computed on a daily basis.
When deaths are cumulated, the baseline is computed using normal mortality that would have occurred during the period between the date when the country reached 10 cumulative total deaths and the date of the report being analysed.
To count normal deaths and produce a baseline, the package allows you to either use the most recent mortality data for the same country that is analysed (2018), or to use the average mortality data from the entire world (which is much higher than the country level one in the healthier countries). The choice of the baseline impacts on the ranking.
### Why on Earth can cumulative mortality go down?
It is because the cumulative deaths are expressed relatively to the baseline mortality that occurs during the same period. Therefore, if the deaths caused by COVID19 pile up more slowly that the normal deaths, then the relative measure shown in the plot can go down.
## Package installation
You can install this package using __{remotes}__ (or __{devtools}__):
```{r, installation, eval = FALSE}
remotes::install_github("courtiol/coronaR")
```
## Basic usage of the package
### Load the package {coronaR}
```{r, loading pkg}
library(coronaR)
```
### Create the data with the COVID mortality information
```{r, prep ECDC}
today <- Sys.Date() ## note: you can change the date, to rebuild plots retrospectively
data_COVID <- prepare_data_ECDC(path_save_data = "~/Downloads/COVID19",
date_of_report = today)
data_COVID
## NOTE: the following Warning is expected:
#Warning message:
# In countrycode::countrycode(.data$iso2c, origin = "iso2c", destination = "continent") :
# Some values were not matched unambiguously: XK
```
### Small manual fix for lumped report of deaths
For daily plots, you may want to remove manually lump report of deaths from nursing homes:
```{r, prep COVID}
data_COVID[data_COVID$country == "France" & data_COVID$date_report == "2020-04-04", "deaths_daily"] <- NA
data_COVID[data_COVID$country == "Belgium" & data_COVID$date_report == "2020-04-08", "deaths_daily"] <- NA
data_COVID[data_COVID$date_report == c("2020-04-26") & data_COVID$country == "Ireland", "deaths_daily"] <- NA
data_COVID[data_COVID$date_report == c("2020-09-07") & data_COVID$country == "Ecuador", "deaths_daily"] <- NA
data_COVID[data_COVID$date_report == c("2020-07-24") & data_COVID$country == "Peru", "deaths_daily"] <- NA
data_COVID[data_COVID$date_report == c("2020-08-14") & data_COVID$country == "Peru", "deaths_daily"] <- NA
data_COVID[data_COVID$date_report == c("2020-07-18") & data_COVID$country == "Kyrgyzstan", "deaths_daily"] <- NA
data_COVID[data_COVID$date_report == c("2020-09-07") & data_COVID$country == "Bolivia", "deaths_daily"] <- NA
data_COVID[data_COVID$date_report == c("2020-10-02") & data_COVID$country == "Argentina", "deaths_daily"] <- NA
data_COVID[data_COVID$date_report == c("2020-07-18") & data_COVID$country == "Chile", "deaths_daily"] <- NA
data_COVID[data_COVID$date_report == c("2020-06-08") & data_COVID$country == "Chile", "deaths_daily"] <- NA
```
We also solve some other issues that show up from time to time (I use www.worldometers.com to figure this out and/or official national reports):
```{r, prep COVID2}
## Date from Spain have one day lag:
if (data_COVID[data_COVID$country == "Spain" & data_COVID$date_report == Sys.Date(), "deaths_daily"] == 0) {
data_COVID[data_COVID$country == "Spain" & data_COVID$date_report == Sys.Date(), "deaths_daily"] <- data_COVID[data_COVID$country == "Spain" & data_COVID$date_report == Sys.Date() - 1, "deaths_daily"]
}
```
### Create the data with the baseline mortality information
```{r, prep WB}
data_baseline_mortality <- prepare_data_WB() ## the data do sometimes change from day to day!
data_baseline_mortality
```
### Create the plots:
To look at daily deaths, using the baseline mortality from each country:
```{r, plot1}
plot_deaths(data_ECDC = data_COVID,
data_WB = data_baseline_mortality,
type_major = "daily",
baseline_major = "country",
select_major = "worst_day",
type_minor = "daily",
baseline_minor = "country",
select_minor = "last_day",
title = "Deaths by COVID19 on the worst and last day (light & dark colour)\nrelative to baseline country mortality")
```
```{r, save plot1, echo = FALSE}
ggplot2::ggsave(filename = paste0("./figures/extra_mortality_daily_country",
today, ".png"), width = 9, height = 6, units = "in")
```
To look at daily deaths, using the baseline mortality from the world:
```{r, plot2}
plot_deaths(data_ECDC = data_COVID,
data_WB = data_baseline_mortality,
type_major = "daily",
baseline_major = "world",
select_major = "worst_day",
type_minor = "daily",
baseline_minor = "world",
select_minor = "last_day",
title = "Deaths by COVID19 on the worst and last day (light & dark colour)\nrelative to baseline worldwide mortality")
```
To look at cumulative deaths, using the baseline mortality from each country:
```{r, plot3}
plot_deaths(data_ECDC = data_COVID,
data_WB = data_baseline_mortality,
type_major = "cumul",
baseline_major = "country",
select_major = "worst_day",
type_minor = "cumul",
baseline_minor = "country",
select_minor = "last_day",
title = "Cumulative deaths by COVID19 on the worst and last day (light & dark colour)\nrelative to baseline country mortality")
```
```{r, save plot3, echo = FALSE}
ggplot2::ggsave(filename = paste0("./figures/extra_mortality_cumul_country",
today, ".png"), width = 9, height = 6, units = "in")
```
To look at cumulative deaths, using the baseline mortality from the world:
```{r, plot4}
plot_deaths(data_ECDC = data_COVID,
data_WB = data_baseline_mortality,
type_major = "cumul",
baseline_major = "world",
select_major = "worst_day",
type_minor = "cumul",
baseline_minor = "world",
select_minor = "last_day",
title = "Cumulative deaths by COVID19 on the worst and last day (light & dark colour)\nrelative to baseline worldwide mortality")
```
## More advanced usage of the package
### Do your own plot
The workhorse function that lead to tidy longitudinal series is `merge_datasets()`.
You can for example use it like that:
```{r, merge}
full_data <- merge_datasets(data_ECDC = data_COVID,
data_WB = data_baseline_mortality,
type = "daily",
baseline = "country",
select = "worst_day")
```
```{r, merge output}
str(full_data)
```
### Recover the data behind the plot
```{r, recover plot_data, fig.keep="none"}
plot_data <- plot_deaths(data_ECDC = data_COVID,
data_WB = data_baseline_mortality,
type_major = "daily",
baseline_major = "country",
select_major = "worst_day",
type_minor = "daily",
baseline_minor = "country",
select_minor = "last_day",
title = "Deaths by COVID19 on the worst and last day (light & dark colour)\nrelative to baseline mortality")
```
```{r, recover plot_data2}
class(plot_data)
```
### Modify the plot
Just add `return_plot = TRUE`, when calling `plot_deaths()` and store the output in an object. The object created will be a plot.
```{r, recover plot_plot, fig.keep="none"}
plot_plot <- plot_deaths(data_ECDC = data_COVID,
data_WB = data_baseline_mortality,
type_major = "daily",
baseline_major = "country",
select_major = "worst_day",
type_minor = "daily",
baseline_minor = "country",
select_minor = "last_day",
title = "Deaths by COVID19 on the worst and last day (light & dark colour)\nrelative to baseline mortality",
return_plot = TRUE)
```
```{r, recover plot_plot2}
class(plot_plot)
```
## Known caveats
There are many limitation that directly stem from the data. For example:
- some countries (seem to under-report death by COVID19. This is because for many deaths occurying outside hospitals the exact cause of death is not known. (We will be able to look at that when overall death rates will be known.)
- some countries are not included because either we have no data for COVID19, or the population and mortality data are not in the database I am using. The latter is for example the case of Taiwan.
- the baseline mortality is based on **average** daily mortality from 2018.
- comorbidities are not accounted for.
## Developers corner
Here is my current R/computer configuration:
```{r, info session}
devtools::session_info()
```
## Help & feedbacks wanted!
If you find that this project interesting an idea worth pursuing, please let me know by liking, RT or messaging on Twitter (@alexcourtiol).
Developing is always more fun when it becomes a collaborative work, so please also email me (or leave an issue) if you want to get involved!