02-getting_data_in_R.Rmd

# Getting your data into R

![](chappool.jpg)

```{block,type='rmdinfo'}
This chapter will tell you about how you can **import** your effect size data in RStudio. We will also show you a few commands to **manipulate data** directly in R.

```


## What should be in the file

To conduct Meta-Analyses in R, you need to have your study data prepared. In primary analysis, you might have a file with rows for people, and columns for variables. In meta-analysis, you will have a file with a row for each **effect size**. The columns often include the following information:

* The **names** of the individual studies, so that they can be easily identified later on. You can use the full reference, or just the first author and publication year of a study (e.g. "Ebert et al., 2018")
* The calculated **effect size** of the study (either Cohen's d or Hedges' g, or some other form of effect size
* The **Standard Error** (SE) or **sampling variance** of the calculated effect
* Often, you do not have the effect size and the variance of the effect size yet. In that case, you need the statistics required to compute it. This could be:
    + The **correlation** coefficient and **number of participants (N)**
    + The **Mean** of both the Intervention and the Control group at the same assessment point
    + The **Standard Deviation** of both the Intervention and the Control group at the same assessment point
    + The **number of participants (N)** in each group of the trial
* Any coded moderators. E.g., if you want to have a look at differences between various study subgroups later on, you also need a **subgroup code** for each study which signifies to which subgroup it belongs. For example, if a study was conducted in children, you might give it the subgroup code "children". Continuous moderators can be included in the same way; e.g., the proportion of men in the sample.

Working with R, the easiest way is often to store data in **EXCEL spreadsheets**, but R can read nearly any input format. Google is your friend.

One advantage of using an **R project** is that the project directory is automatically set as the working directory. Just copy your data file to the folder that contains the *".Rproj"* file, and you will be able to load files by name.


## Importing Excel Files

One way to get Excel files directly into R is by using the XLConnect package. For this example, you will need the *"happy.xlsx"* file. Check if the file is in your project directory. If not, you can download it [here](https://github.com/cjvanlissa/Doing-Meta-Analysis-in-R/blob/master/problem2.sav?raw=true).

Install the package, and try using the readWorksheetFromFile() function to load the data, and assign it to an object called `df`:

```{r, eval = FALSE}
# Run this only once, to download and install the package:
install.packages("XLConnect")
# Load the package:
library(XLConnect)
# Read the 'Happy to help' Excel file into 'df':
df <- readWorksheetFromFile("happy.xlsx",
                            sheet = 1)
```

```{r, echo=FALSE, results = "hide", message=FALSE}
library(XLConnect)
df <- readWorksheetFromFile("happy.xlsx",
                            sheet = 1)
```

### Inspect the data

R does not work with a single spreadsheet (SPSS or Excel). Instead, it can keep many objects in memory. The object `df` is a `data.frame`; an object that behaves similar to a spreadsheet. To see a description of the object, look at the *Environment* tab in the top right of Rstudio, and click the arrow next to `df`.

```{r, echo=FALSE}
library(png)
library(grid)
img <- readPNG("environment.PNG")
grid.raster(img)
```

As you can see, the on the top-right pane **Environment**, your file is now listed as a data set in your RStudio environment.

You can make a quick copy of this data set by assigning the `df` object to a new object. This way, you can edit one, and leave the other unchanged. Assign the object `df` to a new object called `happy`:


```{r}
happy <- df
```

You can also have a look at the contents of `df` by **clicking** the object in the Environment panel, or running the command `head(df)`:

**Here's a (shortened) table for the data**

```{r, echo=FALSE}
madata.s<-df
madata.s$donor=NULL
madata.s$`interventioniv`=NULL
madata.s$`control`=NULL
madata.s$outcomedv=NULL
madata.s$`effect_id`=NULL

kable(head(madata.s))
```
  
<br><br>

---


## Importing SPSS Files (optional)

SPSS files can be loaded using the `foreign` package. For this example, you can download an SPSS file [here](https://github.com/cjvanlissa/Doing-Meta-Analysis-in-R/blob/master/Problem2.sav?raw=true).


```{r, eval = FALSE}
# Install the package, run this only once
install.packages("foreign")

# Load the `foreign` library
library(foreign)

# Read the SPSS data
df_spss <- read.spss("problem2.sav",
                     to.data.frame = TRUE,
                     use.value.labels = FALSE)
```


## Data manipulation (optional)

Now that we have the Meta-Analysis data in RStudio, let's do a **few manipulations with the data**. These functions might come in handy when were conducting analyses later on.


Going back to the output of the `str()` function, we see that this also gives us details on the type of column data we have stored in our data. There a different abbreviations signifying different types of data.

```{r,echo=FALSE}
library(kableExtra)
Package<-c("num","chr","log","factor")
type<-c("Numerical","Character","Logical","Factor")
Description<-c("This is all data stored as numbers (e.g. 1.02)","This is all data stored as words","These are variables which are binary, meaning that they signify that a condition is either TRUE or FALSE","Factors are stored as numbers, with each number signifying a different level of a variable. A possible factor of a variable might be 1 = low, 2 = medium, 3 = high")
m<-data.frame(Package,type,Description)
names<-c("Abbreviation", "Type","Description")
colnames(m)<-names
kable(m)
```

### Converting to factors {#convertfactors}

Let's look at the different kinds of interventions, `df$interventioncode`. We can have a look at this variable by typing the name of our dataset, then adding the selector `$` and then adding the variable we want to have a look at.
This variable is currently a character vector (text). We want it to be a factor: That's a categorical variable.

To convert this to a **factor** variable now, we use the `factor()` function.

```{r, results = "hide"}
df$interventioncode <- factor(df$interventioncode)
df$interventioncode
```
We now see that the variable has been **converted to a factor with the levels "Acts of Kindness, "Other", and "Prosocial Spending"**.


### Selecting specific studies {#select}

It may often come in handy to **select certain studies for further analyses**, or to **exclude some studies in further analyses** (e.g., if they are outliers).

To do this, we can use the `filter` function in the `dplyr` package, which is part of the `tidyverse` package we installed before.

So, let's load the package first.

```{r, eval=FALSE,warning=FALSE}
library(dplyr)
```

Let's say we want to do a Meta-Analysis with studies conducted in the USA, or partly conducted in the USA, only. To do this, we need to create a new dataset containing only these studies using the `dplyr::filter()` function. The `dplyr::` part is necessary as there is more than one ``filter` function in R, and we want to use to use the one of the `dplyr`package.

The R code to store these studies in a new dataset called `df_usa` looks like this:

```{r}
df_usa <- dplyr::filter(df, location %in% c("USA", "USA/Korea"))
```

Note that the `%in%`-Command tells the `filter` function to search for cases whose `location` is included in the vector `c("USA", "USA/Korea")`.
Now, let's have a look at the new data `df_usa` we just created.

```{r,echo=FALSE}
kable(head(df_usa))
```


Note that the function can also be used for any other type of data and variable. We can also use it to e.g., only select studies where the donors were "typical":

```{r}
df_typical <- dplyr::filter(df, donorcode == "Typical")
```


### Changing cell values

Sometimes, even when preparing your data in EXCEL, you might want to **change values in RStudio once you have imported your data**. 

To do this, we have to select a cell in our data frame in RStudio. This can be done by adding `[x,y]` to our dataset name, where **x** signifies the number of the **row** we want to select, and **y** signifies the number of the **column**.

To see how this works, let's select a variable using this command first:

```{r}
df[8,1]
```

We now see the **6th study** in our dataframe, and the value of this study for **Column 1 (the author name)** is displayed. Let's say we had a typo in this name and want to have it changed. In this case, we have to give this exact cell a new value.

```{r}
df[8,1] <- "Aknin, et al. (2012)"
```

Let's check if the name has changed.

```{r}
df[8,1]
```

You can also use this function to change any other type of data, including numericals and logicals. Only for characters, you have to put the values you want to insert in `""`.

<br><br>

---