README.Rmd

---
output: github_document
editor_options: 
  chunk_output_type: console
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "figures/README/",
  fig.width = 9,
  fig.height = 6,
  dpi = 300,
  out.width = "100%"
)
```

# coronaR

The motivation for such an R package and the plots it produces is that most people don't know if X deaths caused by something is a lot or not. Indeed, unless you are a demographer, you probably have no idea how many people die in the world or even in your city, over a year, or over a day. (If you want to know, a good rule of thumb is that over a given year, in a "rich" country, you can expect roughly 1% of its whole population to die.)


## General tips for interpreting the plots

### How to read the x-axis?

Here I express the number of _reported_ COVID19 deaths compared to 100 "normal" deaths.

For example, a value of 50 would imply that if 100 normal deaths used to occur in a city, then we would observe 150 deaths in total (100 normal + 50 COVID19 ones) assuming that normal deaths have not changed.

This is more informative than expressing the result as a percentage: if the COVID deaths reach very large numbers, we will still see on this graph such an increase progressing linearly. Instead, expressed as a percentage, the death toll would just slowly converge towards 100%.

If you still want to express such number as a percentage, just do:

If you still want to express such number as a percentage, just do `100 * X/(100 + X)`. For example, with 50 COVID death per 100 normal ones,
the percentage of COVID death is thus 100 * 50/150 = 33.333%.

If instead you want to instead express values along the x-axis as the odd of dying from COVID, just divide the value X you read on the axis by 100: `(X / (100 + X))/(100 / (100 + X)) = X/100`. For example, with 50 COVID death per 100 normal ones,
the odd of COVID death is thus 50/100 = 0.5.

### Daily vs cumulative deaths

The package allows to explore or plot either daily deaths or cumulated deaths since a given country had reached a total of 10 deaths. Daily deaths are susceptible to vary due to lags in reporting alone. For this reason, I try to discount the lump reports from nursing homes when they correspond to several days, but I cannot keep track of all of them.

Cumulative deaths are thus probably better since when the deaths are being reported matters less.

### Baseline

When deaths are counted on a daily basis, the baseline is also computed on a daily basis.

When deaths are cumulated, the baseline is computed using normal mortality that would have occurred during the period between the date when the country reached 10 cumulative total deaths and the date of the report being analysed.

To count normal deaths and produce a baseline, the package allows you to either use the most recent mortality data for the same country that is analysed (2018), or to use the average mortality data from the entire world (which is much higher than the country level one in the healthier countries). The choice of the baseline impacts on the ranking.

### Why on Earth can cumulative mortality go down?

It is because the cumulative deaths are expressed relatively to the baseline mortality that occurs during the same period. Therefore, if the deaths caused by COVID19 pile up more slowly that the normal deaths, then the relative measure shown in the plot can go down.

## Package installation

You can install this package using __{remotes}__ (or __{devtools}__):

```{r, installation, eval = FALSE}
remotes::install_github("courtiol/coronaR")
```

## Basic usage of the package

### Load the package {coronaR}

```{r, loading pkg}
library(coronaR)
```

### Create the data with the COVID mortality information
```{r, prep ECDC}
today <- Sys.Date() ## note: you can change the date, to rebuild plots retrospectively
data_COVID <- prepare_data_ECDC(path_save_data = "~/Downloads/COVID19",
                                date_of_report = today)
data_COVID
  
## NOTE: the following Warning is expected:
#Warning message:                                                                         
#  In countrycode::countrycode(.data$iso2c, origin = "iso2c", destination = "continent") :
#  Some values were not matched unambiguously: XK
```

### Small manual fix for lumped report of deaths

For daily plots, you may want to remove manually lump report of deaths from nursing homes:

```{r, prep COVID}
data_COVID[data_COVID$country == "France" & data_COVID$date_report == "2020-04-04", "deaths_daily"] <- NA
data_COVID[data_COVID$country == "Belgium" & data_COVID$date_report == "2020-04-08", "deaths_daily"] <- NA
data_COVID[data_COVID$date_report == c("2020-04-26") & data_COVID$country == "Ireland", "deaths_daily"] <- NA
data_COVID[data_COVID$date_report == c("2020-09-07") & data_COVID$country == "Ecuador", "deaths_daily"] <- NA
data_COVID[data_COVID$date_report == c("2020-07-24") & data_COVID$country == "Peru", "deaths_daily"] <- NA
data_COVID[data_COVID$date_report == c("2020-08-14") & data_COVID$country == "Peru", "deaths_daily"] <- NA
data_COVID[data_COVID$date_report == c("2020-07-18") & data_COVID$country == "Kyrgyzstan", "deaths_daily"] <- NA
data_COVID[data_COVID$date_report == c("2020-09-07") & data_COVID$country == "Bolivia", "deaths_daily"] <- NA
data_COVID[data_COVID$date_report == c("2020-10-02") & data_COVID$country == "Argentina", "deaths_daily"] <- NA
data_COVID[data_COVID$date_report == c("2020-07-18") & data_COVID$country == "Chile", "deaths_daily"] <- NA
data_COVID[data_COVID$date_report == c("2020-06-08") & data_COVID$country == "Chile", "deaths_daily"] <- NA
```

We also solve some other issues that show up from time to time (I use www.worldometers.com to figure this out and/or official national reports):

```{r, prep COVID2}
## Date from Spain have one day lag:
if (data_COVID[data_COVID$country == "Spain" & data_COVID$date_report == Sys.Date(), "deaths_daily"] == 0) {
  data_COVID[data_COVID$country == "Spain" & data_COVID$date_report == Sys.Date(), "deaths_daily"] <- data_COVID[data_COVID$country == "Spain" & data_COVID$date_report == Sys.Date() - 1,  "deaths_daily"]
}
```
  
### Create the data with the baseline mortality information

```{r, prep WB}
data_baseline_mortality <- prepare_data_WB() ## the data do sometimes change from day to day!
data_baseline_mortality
```

### Create the plots:

To look at daily deaths, using the baseline mortality from each country:
```{r, plot1}
plot_deaths(data_ECDC = data_COVID,
            data_WB = data_baseline_mortality,
            type_major = "daily",
            baseline_major = "country",
            select_major = "worst_day",
            type_minor = "daily",
            baseline_minor = "country",
            select_minor = "last_day",
            title = "Deaths by COVID19 on the worst and last day (light & dark colour)\nrelative to baseline country mortality")
```

```{r, save plot1, echo = FALSE}
ggplot2::ggsave(filename = paste0("./figures/extra_mortality_daily_country",
                                  today, ".png"), width = 9, height = 6, units = "in")
```

To look at daily deaths, using the baseline mortality from the world:
```{r, plot2}
plot_deaths(data_ECDC = data_COVID,
            data_WB = data_baseline_mortality,
            type_major = "daily",
            baseline_major = "world",
            select_major = "worst_day",
            type_minor = "daily",
            baseline_minor = "world",
            select_minor = "last_day",
            title = "Deaths by COVID19 on the worst and last day (light & dark colour)\nrelative to baseline worldwide mortality")
```

To look at cumulative deaths, using the baseline mortality from each country:
```{r, plot3}
plot_deaths(data_ECDC = data_COVID,
            data_WB = data_baseline_mortality,
            type_major = "cumul",
            baseline_major = "country",
            select_major = "worst_day",
            type_minor = "cumul",
            baseline_minor = "country",
            select_minor = "last_day",
            title = "Cumulative deaths by COVID19 on the worst and last day (light & dark colour)\nrelative to baseline country mortality")
```

```{r, save plot3, echo = FALSE}
ggplot2::ggsave(filename = paste0("./figures/extra_mortality_cumul_country",
                                  today, ".png"), width = 9, height = 6, units = "in")
```

To look at cumulative deaths, using the baseline mortality from the world:
```{r, plot4}
plot_deaths(data_ECDC = data_COVID,
            data_WB = data_baseline_mortality,
            type_major = "cumul",
            baseline_major = "world",
            select_major = "worst_day",
            type_minor = "cumul",
            baseline_minor = "world",
            select_minor = "last_day",
            title = "Cumulative deaths by COVID19 on the worst and last day (light & dark colour)\nrelative to baseline worldwide mortality")
```

## More advanced usage of the package

### Do your own plot

The workhorse function that lead to tidy longitudinal series is `merge_datasets()`.
You can for example use it like that:

```{r, merge}
full_data <- merge_datasets(data_ECDC = data_COVID,
                            data_WB = data_baseline_mortality,
                            type = "daily",
                            baseline = "country",
                            select = "worst_day")
```

```{r, merge output}
str(full_data)
```


### Recover the data behind the plot
```{r, recover plot_data, fig.keep="none"}
plot_data <- plot_deaths(data_ECDC = data_COVID,
            data_WB = data_baseline_mortality,
            type_major = "daily",
            baseline_major = "country",
            select_major = "worst_day",
            type_minor = "daily",
            baseline_minor = "country",
            select_minor = "last_day",
            title = "Deaths by COVID19 on the worst and last day (light & dark colour)\nrelative to baseline mortality")
```


```{r, recover plot_data2}
class(plot_data)
```


### Modify the plot

Just add `return_plot = TRUE`, when calling `plot_deaths()` and store the output in an object. The object created will be a plot.

```{r, recover plot_plot, fig.keep="none"}
plot_plot <- plot_deaths(data_ECDC = data_COVID,
            data_WB = data_baseline_mortality,
            type_major = "daily",
            baseline_major = "country",
            select_major = "worst_day",
            type_minor = "daily",
            baseline_minor = "country",
            select_minor = "last_day",
            title = "Deaths by COVID19 on the worst and last day (light & dark colour)\nrelative to baseline mortality",
            return_plot = TRUE)
```

```{r, recover plot_plot2}
class(plot_plot)
```


## Known caveats

There are many limitation that directly stem from the data. For example:

- some countries (seem to under-report death by COVID19. This is because for many deaths occurying outside hospitals the exact cause of death is not known. (We will be able to look at that when overall death rates will be known.)

- some countries are not included because either we have no data for COVID19, or the population and mortality data are not in the database I am using. The latter is for example the case of Taiwan.

- the baseline mortality is based on **average** daily mortality from 2018.

- comorbidities are not accounted for.

## Developers corner

Here is my current R/computer configuration:
```{r, info session}
devtools::session_info()
```

## Help & feedbacks wanted!

If you find that this project interesting an idea worth pursuing, please let me know by liking, RT or messaging on Twitter (@alexcourtiol).

Developing is always more fun when it becomes a collaborative work, so please also email me (or leave an issue) if you want to get involved!