forked from MathiasHarrer/Doing-Meta-Analysis-in-R
-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy path06-Subgroup_Analyses.Rmd
115 lines (67 loc) · 9.49 KB
/
06-Subgroup_Analyses.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# Subgroup Analyses {#subgroup}
![](subgroup.jpg)
In [Chapter 6](#heterogeneity), we discussed in depth why **between-study heterogeneity** is such an important issue in interpreting the results of our meta-analysis, and how we can explore the role of **outliers** as potential sources of heterogeneity.
Another source of between-study heterogeneity making our effect size estimate less precise could be that **there are known differences** between studies. For example, in a meta-analysis on the effects of **cognitive behavioral therapy** (CBT) for **depression** in **university students**, it could be the case that some studies delivered the intervention in a **group setting**, while others delivered the therapy to each student **individually**. In the same example, it is also possible that studies used different **criteria** to determine if a student suffers from **depression** (e.g. they either used the *ICD-10* or the *DSM-5* diagnostic manual).
Many other differences of this sort are possible, and such study differences could cause differences in the overall effect.
We can control for the influence of such factors using various forms of **meta-regression**: Analyses that control for the influence of between-study differences.
So-called **subgroup analyses** are similar to an ANOVA, in that they are a regression analysis with only one categorical predictor. We can use them to look at different **subgroups within the studies of our meta-analysis** and try to determine the extent of the **difference between these subgroups**.
```{block,type='rmdinfo'}
**The idea behind subgroup analyses**
Basically, a every subgroup analysis consists of **two parts**: (1) **pooling the effect of each subgroup**, and (2) **comparing the effects of the subgroups** [@borenstein2013meta].
**1. Pooling the effect of each subgroup**
This point it rather straightforward, as the same criteria as the ones for a **simple meta-analysis without subgroups** (see [Chapter 4](#pool) and [Chapter 4.2](#random)) apply here.
* If you assume that **all studies in subgroup** stem from the same population, and all have **one shared true effect**, you may use the **fixed-effect-model**. As we mention in [Chapter 4](#pool), many **doubt** that this assumption is ever **true in psychological** and **medical research**, even when we partition our studies into subgroups.
* The alternative, therefore, is to use a **random-effect-model** which assumes that the studies within a subgroup are drawn from a **universe** of populations follwing its own distribution, for which we want to estimate the **mean**.
**2. Comparing the effects of the subgroups**
After we calculated the pooled effect for each subgroup, **we can compare the size of the effects of each subgroup**. However, to know if this difference is in fact singnificant and/or meaningful, we have to calculate the **Standard Error of the differences between subgroup effect sizes** $SE_{diff}$, to calculate **confidence intervals** and conduct **significance tests**.
There are **two ways to calculate** $SE_{diff}$, and both based on different assumptions.
* **Fixed-effects (plural) model**: The fixed-effects-model for subgroup comparisons is appropriate when **we are only interested in the subgroups at hand** [@borenstein2013meta]. This is the case when **the subgroups we chose to examine** were not randomly "chosen", but represent fixed levels of a characteristic we want to examine. Gender is such a characteristic, as its two subgroups **female** and **male** were not randomly chosen, but are the two subgroups that gender (in its classical conception) has. Same does also apply, for example, if we were to examine if studies in patients with **clinical depression** versus **subclinical depression** yield different effects. Borenstein and Higgins [@@borenstein2013meta] argue that the **fixed-effects (plural) model** may be the **only plausible model** for most analysis in **medical research, prevention, and other fields**.
As this model assumes that **no further sampling error is introduced at the subgroup level** (because subgroups were not randomly sampled, but are fixed), $SE_{diff}$ only depends on the *variance within the subgroups* $A$ and $B$, $V_A$ and $V_B$.
$$V_{Diff}=V_A + V_B$$
The fixed-effects (plural) model can be used to test differences in the pooled effects between subgroups, while the pooling **within the subgroups is still conducted using a random-effects-model**. Such a combination is sometimes called a **mixed-effects-model**. We'll show you how to use this model in R in the [next chapter](#mixed).
* **Random-effects-model**: The random-effects-model for between-subgroup-effects is appropriate when the **subgroups we use were randomly sampled from a population of subgroups**. Such an example would be if we were interested if the effect of an intervention **varies by region** by looking at studies from 5 different countries (e.g., Netherlands, USA, Australia, China, Argentina). These variable "region" has many different potential subgroups (countries), from which we randomly selected five means that this has introduced a **new sampling error**, for which we have to control for using the **random-effects-model** for between-subgroup-comparisons.
The (simplified) formula for the estimation of $V_{Diff}$ using this model therefore looks like this:
$$V_{Diff}=V_A + V_B + \frac{\hat T^2_G}{m} $$
Where $\hat T^2_G$ is the **estimated variance between the subgroups**, and $m$ is the **number of subgroups**.
```
```{block,type='rmdachtung'}
Be aware that subgroup analyses should **always be based on an informed, *a priori* decision** which subgroup differences within the study might be **practically relevant**, and would lead to information gain on relevant **research questions** in your field of research. It is also **good practice** to specify your subgroup analyses **before you do the analysis**, and list them in **the registration of your analysis**.
It is also important to keep in mind that **the capabilites of subgroup analyses to detect meaningful differences between studies is often limited**. Subgroup analyses also need **sufficient power**, so it makes no sense to compare two or more subgroups when your entire number of studies in the meta-analysis is smaller than $k=10$ [@higgins2004controlling].
```
<br><br>
---
## Mixed-Effects-Model {#mixed}
```{r,echo=FALSE, message=FALSE}
library(metaforest)
df<-curry
```
To conduct subgroup analyses using the **Mixed-Effects-Model** (random-effects-model within subgroups, fixed-effects-model between subgroups), you can simply include your grouping variable as a categorical predictor in the `rma` function. Like a classic t-test or ANOVA or regression model, this approach assumes homoscedasticity: The residual heterogeneity is assumed to be the same across groups.
We can use subgroup analysis to examine whether there are significant differences in the pooled effect between studies conducted *within* the USA, and studies conducted elsewhere. To do so, we first make a new variable, `Country`, that codes for studies conducted (partly) within the USA. Then, we specify the mixed-effects model with `~Country` as a moderator. We drop the intercept by specifying `-1`, so the model estimates the mean effect size for both groups (as in ANOVA):
```{r}
# Create a factor (categorical) variable for location
df$Country <- factor(df$location %in% c("USA", "USA/Korea"),
labels = c("Elsewhere", "USA"))
m_subgroup <- rma(yi = d, vi = vi, mods = ~ Country-1, data = df)
m_subgroup
```
We see that the **pooled effects of the subgroups differ quite substantially**: $g = .3132$ for studies conducted in the USA, and $g = .1582$ for studies conducted elsewhere. But is the difference statistically significant? There are two ways we can find out.
### Regression specification
We can test for the significance of the difference between groups by re-specifying the model using the regression specification: With an intercept, and an effect for the dummy variable `Country`, which is the difference between the two groups:
```{r}
# Re-specify the model with an intercept and dummy
m_dummy <- rma(yi = d, vi = vi, mods = ~ Country, data = df)
m_dummy
```
The results indicate that the difference between the pooled effect sizes of studies conducted in the USA and elsewhere is $\Delta g = 0.16, z = 1.95, p = .052$: Not significant.
### T-test on the coefficients
Another way to test the significance of the difference is by manually conducting a post-hoc t-test on the two means from the model with ANOVA specification. The `metaforest` package contains a convenience function that conducts t-tests or z-tests on the parameters of `rma` meta-analyses.
Now, we can use the function to conduct the t-test:
```{r}
coef_test(m_subgroup, "CountryUSA", "CountryElsewhere")
```
The p-value is a bit higher, but otherwise the result is the same as the effect of the dummy variable in `m_dummy`. The difference in p-values is because we calculated a t-test here, whereas `metafor` uses z-tests by default.
### Free variance per group
A logical question might be: Is it a reasonable assumption that the variance for studies conducted within the USA is equal to that of studies conducted elsewhere? After all - studies conducted elsewhere could come from all over the world, so they might be more heterogeneous. It is possible to free the variance between the subgroups, but this is quite advanced. There is an excellent tutorial on it online, if you need to do this in your future work:
http://www.metafor-project.org/doku.php/tips:comp_two_independent_estimates
<br><br>
---