-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathstudent_t_test.qmd
410 lines (289 loc) · 12.9 KB
/
student_t_test.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
# Two-sample t-test {#sec-two_samples}
```{r}
#| include: false
library(fontawesome)
library(htmlwidgets)
```
Two sample t-test (Student's t-test) can be used if we have two independent (unrelated) groups (e.g., males-females, treatment-non treatment) and one quantitative variable of interest.
When we have finished this Chapter, we should be able to:
::: {.callout-caution icon="false"}
## `r fa("circle-dot", prefer_type = "regular", fill = "red")` Learning objectives
- Applying hypothesis testing
- Compare two independent samples applying Student's t-test
- Interpret the results
:::
## Research question and Hypothesis Testing
We consider the data in *depression* dataset. In an experiment designed to test the effectiveness of paroxetine for treating bipolar depression, the participants were randomly assigned into two groups (paroxetine Vs placebo). The researchers used the Hamilton Depression Rating Scale (HDRS) to measure the depression state of the participants and wanted to find out if the HDRS score is different in paroxetine group as compared to placebo group at the end of the experiment. The significance level $\alpha$ was set to 0.05.
**Note:** A score of 0--7 in HDRS is generally accepted to be within the normal range, while a score of 20 or higher indicates at least moderate severity.
::: {.callout-note icon="false"}
## `r fa("angles-right", fill = "#1DC5CE")` Null hypothesis and alternative hypothesis for the main research question
- $H_0$: the means of HDRS in the two groups are equal ($\mu_{1} = \mu_{2}$)
- $H_1$: the means of HDRS in the two groups are not equal ($\mu_{1} \neq \mu_{2}$)
:::
## Packages we need
We need to load the following packages:
```{r}
#| message: false
#| warning: false
# packages for graphs
library(ggrain)
library(ggsci)
library(ggpubr)
library(ggprism)
# packages for data description, transformation and analysis
library(dlookr)
library(descriptr)
library(rstatix)
library(here)
library(tidyverse)
# packages for reporting the results
library(gtsummary)
library(report)
```
## Preparing the data
We import the data *depression* in R:
```{r}
#| warning: false
library(readxl)
depression <- read_excel(here("data", "depression.xlsx"))
```
```{r}
#| echo: false
#| label: fig-depression
#| fig-cap: Table with data from "depression" file.
DT::datatable(
depression, extensions = 'Buttons', options = list(
dom = 'tip',
columnDefs = list(list(className = 'dt-center', targets = "_all"))
)
)
```
We inspect the data and the type of variables:
```{r}
glimpse(depression)
```
The data set *depression* has 76 patients (rows) and includes two variables (columns). The numeric (`<dbl>`) `HDRS` variable and the character (`<chr>`) `intervention` variable which should be converted to a factor (`<fct>`) variable using the `factor()` function as follows:
```{r}
depression <- depression |>
mutate(intervention = factor(intervention))
glimpse(depression)
```
## Assumptions
::: {.callout-note icon="false"}
## `r fa("angles-right", fill = "#1DC5CE")` Check if the following assumptions are satisfied
A. The data are **normally** distributed in **both** groups
B. The data in both groups have similar **variance** (also named as homogeneity of variance or homoscedasticity)
:::
### A. Explore the characteristics of distribution for each group and check for normality
The distributions can be explored visually with appropriate plots. Additionally, summary statistics and significance tests to check for normality (e.g., Shapiro-Wilk test) can be used.
**Graphs**
We can visualize the distribution of HDRS for the two groups:
```{r}
#| warning: false
#| label: fig-rainplot1
#| fig-cap: Raincloud plot of HDRS variable stratified by intervention.
ggplot(depression, aes(x= intervention, y = HDRS, fill = intervention)) +
geom_rain(likert= TRUE, seed = 123, point.args = list(alpha = 0.3)) +
#theme_prism(base_size = 20, base_line_size = 0.4, palette = "office") +
labs(title = "Grouped Raincloud Plot: HDRS by intervention") +
scale_fill_jco() +
theme(legend.position = "none")
```
The above figure shows that the data are close to symmetry and the assumption of a normal distribution is reasonable.
```{r}
#| warning: false
#| label: fig-qq_HDRS
#| fig-cap: Normality Q-Q plot for HDRS for paroxetine and placebo.
ggqqplot(depression, "HDRS", color = "intervention", conf.int = F) +
#theme_prism(base_size = 20, base_line_size = 0.4, palette = "office") +
scale_color_jco() +
facet_wrap(~ intervention) +
theme(legend.position = "none")
```
**Summary statistics**
The HDRS summary statistics for each group are:
::: {#exercise-joins .callout-important icon="false"}
## `r fa("r-project", fill = "#0008C1")` Summary statistics by group
::: panel-tabset
## dplyr
```{r}
depression |>
group_by(intervention) |>
dplyr::summarise(
n = n(),
na = sum(is.na(HDRS)),
min = min(HDRS, na.rm = TRUE),
q1 = quantile(HDRS, 0.25, na.rm = TRUE),
median = quantile(HDRS, 0.5, na.rm = TRUE),
q3 = quantile(HDRS, 0.75, na.rm = TRUE),
max = max(HDRS, na.rm = TRUE),
mean = mean(HDRS, na.rm = TRUE),
sd = sd(HDRS, na.rm = TRUE),
skewness = EnvStats::skewness(HDRS, na.rm = TRUE),
kurtosis= EnvStats::kurtosis(HDRS, na.rm = TRUE)
) |>
ungroup() |>
print(width = 100)
```
## dlookr
```{r}
depression |>
group_by(intervention) |>
dlookr::describe(HDRS) |>
dplyr::select(intervention, n, na, mean, sd, p25, p50, p75, skewness, kurtosis) |>
ungroup() |>
print(width = 100)
```
## descriptr
```{r}
depression |>
ds_group_summary(intervention, HDRS)
```
:::
:::
The means are close to medians (20.3 vs 21 and 21.5 vs 21). The skewness is approximately zero (symmetric distribution) and the (excess) kurtosis falls into the acceptable range of \[-1, 1\] indicating approximately normal distributions for both groups.
**Normality test**
::: {.callout-note icon="false"}
## `r fa("angles-right", fill = "#1DC5CE")` Hypothesis testing for Shapiro-Wilk test for normality
- $H_{0}$: the data came from a normally distributed population
- $H_{1}$: the data tested are not normally distributed
:::
The *Shapiro-Wilk* test for normality for each group is:
```{r}
depression |>
group_by(intervention) |>
shapiro_test(HDRS) |>
ungroup()
```
The tests of normality suggest that the data for the `HDRS` in both groups are normally distributed (p=0.67 \>0.05 and p=0.61 \>0.05, respectively).
::: callout-warning
## Normality tests frequently fail to be valuable indicators
- For **small** sample sizes, the Shapiro-Wilk test (and other normality tests) has little power to reject the null hypothesis (under-powered test).
- If the sample size is **large** normality tests may detect even trivial deviations from the normal distribution (over-powered test).
:::
::: {.callout-tip icon="false"}
## `r fa("comment", fill = "#1DC5CE")` Comment
The decision about normality of data should be based on a careful consideration of all available information such as graphs (histograms, Q-Q plots), summary and shape measures and statistical tests.
:::
### B. Check Levene's test for equality of variances
::: {.callout-note icon="false"}
## `r fa("angles-right", fill = "#1DC5CE")` Hypothesis testing for Levene's test for equality of variances
- $H_{0}$: the variances of data in two groups are equal
- $H_{1}$: the variances of data in two groups are not equal
:::
The *Levene's test* for equality of variances is:
```{r}
depression |>
levene_test(HDRS ~ intervention)
```
Since the p-value = 0.676 \>0.05, the $H_o$ is not rejected. The variances are supposed to be equal.
::: {.callout-tip icon="false"}
## `r fa("comment", fill = "#1DC5CE")` Comment
If the assumption of equal variances is not satisfied ($H_o$ of Levene's test is rejected), the **Welch's t-test** can be applied. This statistical test assumes normality but does not assume equal variances.
:::
## Run the t-test
### Formulas
We will perform a pooled variance t-test (Student's t-test) to test the null hypothesis that the mean HDRS score is the same for both groups (paroxetine and placebo).
- Under this $H_o$, the test statistic is:
$$t = \frac{\bar{x}_{1} - \bar{x}_{2}}{SE_{\bar{x}_{1} - \bar{x}_{2}}}= \frac{\bar{x}_{1} - \bar{x}_{2}}{s_{p} \cdot \sqrt{\frac{1}{n_{1}} + \frac{1}{n_{2}}}}$$ {#eq-t1}
where $n_1$ and $n_2$ are the sample sizes for paroxetine and placebo groups respectively, and $s_{p}$ is an estimate of the pooled standard deviation of the two groups which is calculated by the following equation:
$$s_{p} = \sqrt{\frac{(n_{1}-1)s_{1}^2 + (n_{2}-1)s_{2}^2}{n_{1}+ n_{2}-2}}$$ {#eq-sp}
Under the null hypothesis, the t statistic follows the t-distribution with $n - 2$ degrees of freedom (d.f.).
- The **95% confidence interval** of the mean difference is:
$$
95\% \ CI = \bar x_1 - \bar x_2 \pm t^{*}_{df;a/2} * SE_{\bar x_1 - \bar x_2}
$$ {#eq-ci}
::: {.callout-tip icon="false"}
## `r fa("circle-info", fill = "#1DC5CE")` INFO
- **Sample size of the groups:** The Student t-test does not have any restrictions on $n_1$ and $n_2$ ---they can be equal or unequal. However, equal samples are preferred because this maximizes the power to detect a specified difference.
- **Degrees of freedom:** The paroxetine group has $df_1 = n_1 - 1$ and the placebo group has $df_2 = n_2 - 1$ , so we have $df = n_1 + n_2 -2 = n -2$ in total. Another way of thinking of this is that the complete sample size is $n$, and we have estimated two parameters from the data (the two means), so we have $df = n-2$ (see also @sec-prob_distributions).
:::
### In R:
First, we calculate the **mean difference**:
```{r}
mean_1 <- mean(depression$HDRS[depression$intervention == "paroxetine"])
mean_2 <- mean(depression$HDRS[depression$intervention == "placebo"])
mean_dif <- mean_1 - mean_2
mean_dif
```
Second, we find the **pooled standard deviation**:
```{r}
n_1 <- length(depression$HDRS[depression$intervention == "paroxetine"])
n_2 <- length(depression$HDRS[depression$intervention == "placebo"])
st_div_1 <- sd(depression$HDRS[depression$intervention == "paroxetine"])
st_div_2 <- sd(depression$HDRS[depression$intervention == "placebo"])
numerator <- (n_1-1)*st_div_1^2 + (n_2-1)*st_div_2^2
denominator <- n_1 + n_2 - 2
pooled_st_div <- sqrt(numerator/denominator)
pooled_st_div
```
Third, we find the **standard error** of the mean difference:
```{r}
st_error <- pooled_st_div*sqrt(1/n_1 + 1/n_2)
st_error
```
Therefore, the **t statistic** is:
```{r}
t <- mean_dif / st_error
t
```
The corresponding probability for this value can be calculated as the cumulative probability P(T ≤ t) for n-2 degrees of freedom. Then, the **p-value** is 2\*P(T ≤ t) because we are doing a two-tailed test.
```{r}
P <- pt(t, df = n_1 + n_2 - 2)
p_value <- 2*P
p_value
```
The **95% confidence interval** of the mean difference is:
```{r}
lower_CI <- mean_dif - qt(0.025, df = 74, lower.tail = FALSE)*st_error
lower_CI
upper_CI <- mean_dif + qt(0.025, df = 74, lower.tail = FALSE)*st_error
upper_CI
```
Note that the 95% confidence interval of the mean difference (-2.78, 0.47) **includes** the hypothesized null value of 0.
Next, we present **R functions** to carry out all the tasks on our behalf.
::: {#exercise-joins .callout-important icon="false"}
## `r fa("r-project", fill = "#0008C1")` Student's t-test
::: panel-tabset
## stats
```{r}
t.test(HDRS ~ intervention, var.equal = T, data = depression)
```
## rstatix
```{r}
depression |>
t_test(HDRS ~ intervention, var.equal = T, detailed = T)
```
:::
**NOTE:** If we reject the null hypothesis of Levene's test, we have to type **`var.equal = F`** (or type nothing as this is the default), so the Welch's t-test is applied.
:::
The difference between means (20.33 - 21.49) equals -1.16 units and it is not significant (failed to reject $H_0$; p = 0.16 \> 0.05).
## Present the results
**Summary table**
It is common practice to report the mean (sd) for each group in summary tables.
```{r}
#| code-fold: true
#| code-summary: "Show the code"
depression |>
tbl_summary(
by = intervention,
statistic = HDRS ~ "{mean} ({sd})",
digits = list(everything() ~ 1),
label = list(HDRS ~ "HDRS score"),
missing = c("no")) |>
add_difference(test.args = all_tests("t.test") ~ list(var.equal = TRUE),
estimate_fun = HDRS ~ function(x) style_sigfig(x, digits = 2),
pvalue_fun = function(x) style_pvalue(x, digits = 2)) %>%
add_n()
```
**Report the results**
There is also a specific package with the name {report} that may be useful in reporting the results of the t-test:
```{r}
report_results <- t.test(depression$HDRS ~ depression$intervention, var.equal = T)
report(report_results)
```
We can use the above information to write up a final report:
::: {.callout-note icon="false"}
## `r fa("pen", fill = "#1DC5CE")` Final report
There is not evidence that HDRS score is significantly different in paroxetine group, mean = 20.3 (sd = 3.7), as compared to placebo group, 21.5 (3.4), (mean difference= -1.2 units, 95% CI = -2.8 to 0.47, p = 0.16).
:::