student_t_test.qmd

# Two-sample t-test {#sec-two_samples}

```{r}
#| include: false
library(fontawesome)
library(htmlwidgets)
```

Two sample t-test (Student's t-test) can be used if we have two independent (unrelated) groups (e.g., males-females, treatment-non treatment) and one quantitative variable of interest.

When we have finished this Chapter, we should be able to:

::: {.callout-caution icon="false"}
## `r fa("circle-dot", prefer_type = "regular", fill = "red")` Learning objectives

-   Applying hypothesis testing
-   Compare two independent samples applying Student's t-test
-   Interpret the results
:::

## Research question and Hypothesis Testing

We consider the data in *depression* dataset. In an experiment designed to test the effectiveness of paroxetine for treating bipolar depression, the participants were randomly assigned into two groups (paroxetine Vs placebo). The researchers used the Hamilton Depression Rating Scale (HDRS) to measure the depression state of the participants and wanted to find out if the HDRS score is different in paroxetine group as compared to placebo group at the end of the experiment. The significance level $\alpha$ was set to 0.05.

**Note:** A score of 0--7 in HDRS is generally accepted to be within the normal range, while a score of 20 or higher indicates at least moderate severity.

::: {.callout-note icon="false"}
## `r fa("angles-right", fill = "#1DC5CE")` Null hypothesis and alternative hypothesis for the main research question

-   $H_0$: the means of HDRS in the two groups are equal ($\mu_{1} = \mu_{2}$)
-   $H_1$: the means of HDRS in the two groups are not equal ($\mu_{1} \neq \mu_{2}$)
:::

## Packages we need

We need to load the following packages:

```{r}
#| message: false
#| warning: false

# packages for graphs
library(ggrain)
library(ggsci)
library(ggpubr)
library(ggprism)

# packages for data description, transformation and analysis
library(dlookr)
library(descriptr)
library(rstatix)
library(here)
library(tidyverse)

# packages for reporting the results
library(gtsummary)
library(report)
```

## Preparing the data

We import the data *depression* in R:

```{r}
#| warning: false

library(readxl)
depression <- read_excel(here("data", "depression.xlsx"))
```

```{r}
#| echo: false
#| label: fig-depression
#| fig-cap: Table with data from "depression" file.

DT::datatable(
  depression, extensions = 'Buttons', options = list(
    dom = 'tip',
    columnDefs = list(list(className = 'dt-center', targets = "_all"))
  )
)

```

We inspect the data and the type of variables:

```{r}
glimpse(depression)
```

The data set *depression* has 76 patients (rows) and includes two variables (columns). The numeric (`<dbl>`) `HDRS` variable and the character (`<chr>`) `intervention` variable which should be converted to a factor (`<fct>`) variable using the `factor()` function as follows:

```{r}
depression <- depression |> 
  mutate(intervention = factor(intervention))

glimpse(depression)
```

## Assumptions

::: {.callout-note icon="false"}
## `r fa("angles-right", fill = "#1DC5CE")` Check if the following assumptions are satisfied

A.  The data are **normally** distributed in **both** groups
B.  The data in both groups have similar **variance** (also named as homogeneity of variance or homoscedasticity)
:::

### A. Explore the characteristics of distribution for each group and check for normality

The distributions can be explored visually with appropriate plots. Additionally, summary statistics and significance tests to check for normality (e.g., Shapiro-Wilk test) can be used.

**Graphs**

We can visualize the distribution of HDRS for the two groups:

```{r}
#| warning: false
#| label: fig-rainplot1
#| fig-cap: Raincloud plot of HDRS variable stratified by intervention.

ggplot(depression, aes(x= intervention, y = HDRS, fill = intervention)) +
  geom_rain(likert= TRUE, seed = 123, point.args = list(alpha = 0.3)) +
  #theme_prism(base_size = 20, base_line_size = 0.4, palette = "office") +
  labs(title = "Grouped Raincloud Plot: HDRS by intervention") +
  scale_fill_jco() +
  theme(legend.position = "none")
```

The above figure shows that the data are close to symmetry and the assumption of a normal distribution is reasonable.

```{r}
#| warning: false
#| label: fig-qq_HDRS
#| fig-cap: Normality Q-Q plot for HDRS for paroxetine and placebo.

ggqqplot(depression, "HDRS", color = "intervention", conf.int = F) +
  #theme_prism(base_size = 20, base_line_size = 0.4, palette = "office") +
  scale_color_jco() +
  facet_wrap(~ intervention) + 
  theme(legend.position = "none")
```

**Summary statistics**

The HDRS summary statistics for each group are:

::: {#exercise-joins .callout-important icon="false"}
## `r fa("r-project", fill = "#0008C1")` Summary statistics by group

::: panel-tabset
## dplyr

```{r}
depression |> 
  group_by(intervention) |> 
  dplyr::summarise(
    n = n(),
    na = sum(is.na(HDRS)),
    min = min(HDRS, na.rm = TRUE),
    q1 = quantile(HDRS, 0.25, na.rm = TRUE),
    median = quantile(HDRS, 0.5, na.rm = TRUE),
    q3 = quantile(HDRS, 0.75, na.rm = TRUE),
    max = max(HDRS, na.rm = TRUE),
    mean = mean(HDRS, na.rm = TRUE),
    sd = sd(HDRS, na.rm = TRUE),
    skewness = EnvStats::skewness(HDRS, na.rm = TRUE),
    kurtosis= EnvStats::kurtosis(HDRS, na.rm = TRUE)
  ) |> 
  ungroup() |> 
  print(width = 100)
```

## dlookr

```{r}
depression |>  
  group_by(intervention) |>  
  dlookr::describe(HDRS) |>  
  dplyr::select(intervention, n, na, mean, sd, p25, p50, p75, skewness, kurtosis) |>  
  ungroup() |> 
  print(width = 100)

```

## descriptr

```{r}
depression |> 
ds_group_summary(intervention, HDRS)
```
:::
:::

The means are close to medians (20.3 vs 21 and 21.5 vs 21). The skewness is approximately zero (symmetric distribution) and the (excess) kurtosis falls into the acceptable range of \[-1, 1\] indicating approximately normal distributions for both groups.

**Normality test**

::: {.callout-note icon="false"}
## `r fa("angles-right", fill = "#1DC5CE")` Hypothesis testing for Shapiro-Wilk test for normality

-   $H_{0}$: the data came from a normally distributed population
-   $H_{1}$: the data tested are not normally distributed
:::

The *Shapiro-Wilk* test for normality for each group is:

```{r}
depression |> 
  group_by(intervention) |> 
  shapiro_test(HDRS) |> 
  ungroup()
```

The tests of normality suggest that the data for the `HDRS` in both groups are normally distributed (p=0.67 \>0.05 and p=0.61 \>0.05, respectively).

::: callout-warning
## Normality tests frequently fail to be valuable indicators

-   For **small** sample sizes, the Shapiro-Wilk test (and other normality tests) has little power to reject the null hypothesis (under-powered test).

-   If the sample size is **large** normality tests may detect even trivial deviations from the normal distribution (over-powered test).
:::

::: {.callout-tip icon="false"}
## `r fa("comment", fill = "#1DC5CE")` Comment

The decision about normality of data should be based on a careful consideration of all available information such as graphs (histograms, Q-Q plots), summary and shape measures and statistical tests.
:::

### B. Check Levene's test for equality of variances

::: {.callout-note icon="false"}
## `r fa("angles-right", fill = "#1DC5CE")` Hypothesis testing for Levene's test for equality of variances

-   $H_{0}$: the variances of data in two groups are equal
-   $H_{1}$: the variances of data in two groups are not equal
:::

The *Levene's test* for equality of variances is:

```{r}
depression |> 
  levene_test(HDRS ~ intervention)
```

Since the p-value = 0.676 \>0.05, the $H_o$ is not rejected. The variances are supposed to be equal.

::: {.callout-tip icon="false"}
## `r fa("comment", fill = "#1DC5CE")` Comment

If the assumption of equal variances is not satisfied ($H_o$ of Levene's test is rejected), the **Welch's t-test** can be applied. This statistical test assumes normality but does not assume equal variances.
:::

## Run the t-test

### Formulas

We will perform a pooled variance t-test (Student's t-test) to test the null hypothesis that the mean HDRS score is the same for both groups (paroxetine and placebo).

-   Under this $H_o$, the test statistic is:

$$t = \frac{\bar{x}_{1} - \bar{x}_{2}}{SE_{\bar{x}_{1} - \bar{x}_{2}}}= \frac{\bar{x}_{1} - \bar{x}_{2}}{s_{p} \cdot \sqrt{\frac{1}{n_{1}} + \frac{1}{n_{2}}}}$$ {#eq-t1}

where $n_1$ and $n_2$ are the sample sizes for paroxetine and placebo groups respectively, and $s_{p}$ is an estimate of the pooled standard deviation of the two groups which is calculated by the following equation:

$$s_{p} = \sqrt{\frac{(n_{1}-1)s_{1}^2 + (n_{2}-1)s_{2}^2}{n_{1}+ n_{2}-2}}$$ {#eq-sp}

Under the null hypothesis, the t statistic follows the t-distribution with $n - 2$ degrees of freedom (d.f.).

-   The **95% confidence interval** of the mean difference is:

$$ 
95\% \ CI = \bar x_1 - \bar x_2 \pm t^{*}_{df;a/2} * SE_{\bar x_1 - \bar x_2} 
$$ {#eq-ci}

::: {.callout-tip icon="false"}
## `r fa("circle-info", fill = "#1DC5CE")` INFO

-   **Sample size of the groups:** The Student t-test does not have any restrictions on $n_1$ and $n_2$ ---they can be equal or unequal. However, equal samples are preferred because this maximizes the power to detect a specified difference.

-   **Degrees of freedom:** The paroxetine group has $df_1 = n_1 - 1$ and the placebo group has $df_2 = n_2 - 1$ , so we have $df = n_1 + n_2 -2 = n -2$ in total. Another way of thinking of this is that the complete sample size is $n$, and we have estimated two parameters from the data (the two means), so we have $df = n-2$ (see also @sec-prob_distributions).
:::

### In R:

First, we calculate the **mean difference**:

```{r}
mean_1 <- mean(depression$HDRS[depression$intervention == "paroxetine"])
mean_2 <- mean(depression$HDRS[depression$intervention == "placebo"])

mean_dif <- mean_1 - mean_2
mean_dif
  
```

Second, we find the **pooled standard deviation**:

```{r}
n_1 <- length(depression$HDRS[depression$intervention == "paroxetine"])
n_2 <- length(depression$HDRS[depression$intervention == "placebo"])
st_div_1 <- sd(depression$HDRS[depression$intervention == "paroxetine"])
st_div_2 <- sd(depression$HDRS[depression$intervention == "placebo"])
numerator <- (n_1-1)*st_div_1^2 + (n_2-1)*st_div_2^2
denominator <- n_1 + n_2 - 2

pooled_st_div <- sqrt(numerator/denominator)
pooled_st_div
```

Third, we find the **standard error** of the mean difference:

```{r}
st_error <- pooled_st_div*sqrt(1/n_1 + 1/n_2)
st_error
```

Therefore, the **t statistic** is:

```{r}
t <- mean_dif / st_error
t
```

The corresponding probability for this value can be calculated as the cumulative probability P(T ≤ t) for n-2 degrees of freedom. Then, the **p-value** is 2\*P(T ≤ t) because we are doing a two-tailed test.

```{r}
P <- pt(t, df = n_1 + n_2 - 2)

p_value <- 2*P
p_value
```

The **95% confidence interval** of the mean difference is:

```{r}
lower_CI <- mean_dif - qt(0.025, df = 74, lower.tail = FALSE)*st_error
lower_CI

upper_CI <- mean_dif + qt(0.025, df = 74, lower.tail = FALSE)*st_error
upper_CI
```

Note that the 95% confidence interval of the mean difference (-2.78, 0.47) **includes** the hypothesized null value of 0.

Next, we present **R functions** to carry out all the tasks on our behalf.

::: {#exercise-joins .callout-important icon="false"}
## `r fa("r-project", fill = "#0008C1")` Student's t-test

::: panel-tabset
## stats

```{r}
t.test(HDRS ~ intervention, var.equal = T, data = depression)                     
```

## rstatix

```{r}
depression |> 
  t_test(HDRS ~ intervention, var.equal = T, detailed = T)              
```
:::

**NOTE:** If we reject the null hypothesis of Levene's test, we have to type **`var.equal = F`** (or type nothing as this is the default), so the Welch's t-test is applied.
:::

The difference between means (20.33 - 21.49) equals -1.16 units and it is not significant (failed to reject $H_0$; p = 0.16 \> 0.05).

## Present the results

**Summary table**

It is common practice to report the mean (sd) for each group in summary tables.

```{r}
#| code-fold: true
#| code-summary: "Show the code"

depression |>  
  tbl_summary(
    by = intervention, 
    statistic = HDRS ~ "{mean} ({sd})", 
    digits = list(everything() ~ 1),
    label = list(HDRS ~ "HDRS score"), 
    missing = c("no")) |>  
    add_difference(test.args = all_tests("t.test") ~ list(var.equal = TRUE),
                   estimate_fun = HDRS ~ function(x) style_sigfig(x, digits = 2),
                   pvalue_fun = function(x) style_pvalue(x, digits = 2)) %>% 
  add_n()
```

**Report the results**

There is also a specific package with the name {report} that may be useful in reporting the results of the t-test:

```{r}
report_results <- t.test(depression$HDRS ~ depression$intervention, var.equal = T) 
report(report_results)
```

We can use the above information to write up a final report:

::: {.callout-note icon="false"}
## `r fa("pen", fill = "#1DC5CE")` Final report

There is not evidence that HDRS score is significantly different in paroxetine group, mean = 20.3 (sd = 3.7), as compared to placebo group, 21.5 (3.4), (mean difference= -1.2 units, 95% CI = -2.8 to 0.47, p = 0.16).
:::