Step 1: Acquire R and RStudio

Go to: r-project.org

Under Download at the top left of the page, click CRAN

From here, scroll down to the country in which you reside and choose one of the websites to download R (any of these websites will bring you to a page through which you can download R); for the purposes of this tutorial, the first webpage was selected, but all of these institutions should redirect you to “The Comprehensive R Archive Network” page.

Once you are on “The Comprehensive R Archive Network,” choose “Download R” for your appropriate computer platform. Read the first lines on the webpage you pull up to start the download. You should see language that directs you to a clickable link to begin the download.

For Windows for example, click on “Download R for Windows,” and then “Install R for the first time,” then “Download R 3.6.3 for Windows” (Please note that the specific version of R may change because R is updated often and the numbers associated with the most recent version of R may change. That said, simply click on the line at the top that says “Download R” for whichever version you see. At the time of writing, this was 3.6.3 but that could change by the time you start working with R.)

For Mac computers, please note that you will need to know your operating system to choose the appropriate program from the “Download R for (Mac) OS X” screen. Choose the latest release for your Mac.

Once you click on the appropriate version of R, your computer will download the R program as an .exe file. You will need to open this file and follow the prompts to install the program on your computer.

Then, after you have downloaded R, you will then need to download RStudio, which is a very user-friendly program to use within R. You cannot use RStudio without first downloading R however.

Go to: https://rstudio.com/products/rstudio/download/

Choose the first option on the left – the RStudio Desktop (Open Source License) - which is free by clicking the blue Download button.

After you click Download, you will need to click the appropriate installer. For Windows users, this should be the top selection (Windows 10/8/7); for Mac users, this should be the second selection (mac OS 10.13+). (But keep in mind these numbers may change as R Studio changes and with changes to Windows and Mac platforms. Nonetheless, the Windows and Mac options are typically the first and second ones.) Once you click this link, you will download another executable file. (For Mac computers, you will also need install the “git” command after the program loads; click “Install” when that message pops up. See the picture below)

For Macs:

 

Once the program downloads, open the .exe file and follow the prompts to save the program to your computer.

And then, once you have R and R Studio installed on your computer, from your start menu, open RStudio. Once RStudio is downloaded, all instructions will refer to RStudio.

Step 2: Install tidyverse

Once you have opened RStudio, you will need to install the tidyverse package, which will allow you to perform statistical manipulations in a straightforward way.

To install tidyverse, in the Console simply type the following and press enter:

install.packages(“tidyverse”)

Be sure to keep all the letters lowercase (RStudio is sensitive to letter cases.) Also, all the commands you type from this point forward will be in the Console (which will move to the bottom left box).

 

 

(For Mac computers, you may see the message “Do you want to install from sources the package which needs compilation library?” Simply press enter when you see this message and tidyverse will install. See the picture below)

For Macs:

 

Next, type the following and press enter:

library(tidyverse)

Keep in mind that you will need to type library(tidyverse) every time you reopen RStudio.

You are now ready to use the tidyverse package in RStudio for data analysis.

Step 3: Download and open DatapracRStudio

Now that you have the statistical program open, you will need to download the DatapracRStudio dataset, which is included on the OUP compendium website for this book as an excel file that RStudio can read. First, go to the OUP website and download and save the Excel file (called DatapracRStudio) to your computer. You should also download the Dataprac Codebook, which will tell you what all the values in the dataset represent. The codebook is also an Excel file on the website.

Then, to open the DatapracRStudio in RStudio, in the top right box, find Import Dataset at the top right. Then select “From Excel”

Once you select From Excel…, you will see the Import Excel Data window. From there select Browse (on the top right) to find “DatapracRStudio.” It will be where you saved it on your computer.

Once you select the file, click Import at the bottom right of the window.

This will open the DatapracRStudio dataset in RStudio; you should now see the dataset in the top right window. You should also see that the dataset has 1,031 observations and 72 variables.

 

Step 4: Transform your nominal-level variable with more than two categories into a binary variable

If you are using a nominal-level independent variable with more than two categories, you will need to transform it into a binary variable so it can be used in your analysis. We will use DP9 Social Status as the example. There are five categories associated with the variable: 1 working class, 2 lower class, 3 lower middle class, 4 upper middle class, 5 upper class. Let’s transform this into a binary variable by combining the first, second, and third categories (working class, lower class, and lower middle class) into one new category that we’ll call ‘lower classes’ and the fourth and fifth categories (upper middle class and upper class) into a second category that we’ll call ‘upper classes.’ Thus, we need to tell RStudio to combine the values of 1, 2, and 3 for a new category (0 = lower classes) and the values of 4 and 5 into a new category (1 = upper classes).

To do this we need to create a new column in the DatapracRStudio and then add data to it. Rename the variable (I suggest you use something like ‘Socialstatusbinary’ as the new name to make it clear that this is your new binary variable), and then type the following and press enter:

Socialstatusbinary<-recode(DatapracRStudio$DP9,'1'=0,'2'=0,'3'=0,'4'=1,'5'=1)

Let’s break this command apart so you can see what RStudio did to transform the social status variable into a binary variable. First, you gave the variable a new name (Socialstatusbinary). Then you used the <- symbol (< followed by -) to tell RStudio that a command would follow. The command in this case is ‘recode.’ You will always use parentheses after a command like recode. After the open parentheses, you told RStudio that you wanted to use the DatapracRStudio dataset and the $ must be included before you add the variable name as it appears in the DatapracRStudio, in this case DP9 (social status). After the original variable name (DP9), place a comma before you enter new codes to tell RStudio how the new Socialstatusbinary will be recoded. ‘1’ is the original code in the DatapracRStudio (you need the Excel file codebook to know the codes), then =, then 0 (without apostrophes), which is the new code for any observation with a value of 1 in the original DatapracRStudio dataset. Then put in another comma, and then ‘2,’ the second original code, then =, then 0 again and so on (because we want to combine the original values of 1, 2,  and 3 as the new 0 category and 4 and 5 as the new 1 category).

Once this command has been executed, you will see that a new variable has been created in the top right box. Below you can see that the values for the new Socialstatusbinary only include 0s and 1s (as expected).

Now you need to add this new binary variable to the dataset. To do this, type the following and then press enter:

DatapracRStudio<-cbind(DatapracRStudio,Socialstatusbinary)

This command will add the new binary variable to the DatapracRStudio dataset (cbind stands for column bind). Once this is finished, you should now see that your dataset has one more variable than before. Notice below that the DatapracRStudio now has 73 variables (instead of 72). The new variable is the binary variable for social status we created.

You can perform this manipulation for any variable in the DatapracRStudio. You can transform any variable – nominal level or ordinal level – into a binary. You just need the original codes so you can tell R Studio how you want the new variable to be. (Again, all the codes in the DatapracRStudio are in the Excel file Codebook on OUP’s website.) It is common, for example, for researchers to transform scaled variables into binary variables. For example, you could transform any of the confidence variables into binary variables by combining 1 (a great deal) and 2 (quite a lot) into a ‘confident’ category and by combining 3 (not very much) and 4 (none at all) into a ‘not confident’ category. The key is to ensure that the new variable is a valid reflection of the concept you wish to convey in the variable. By keeping the values or 1 and 2 separate from 3 and 4, you are likely creating a new variable that is still valid because the categories ‘a great deal’ and ‘quite a lot’ still convey greater levels of confidence, while the categories ‘not very much’ and ‘none at all’ convey lower levels of confidence.

For example, to transform DP40 (Confidence in the Press) into a binary variable type the following and then press enter:

Confidencepressbinary<-recode(DatapracRStudio$DP40,'1'=1,'2'=1,'3'=0,'4'=0)

Like before, now we need to add this new binary variable to the dataset. To do this, type the following and then press enter:

DatapracRStudio<-cbind(DatapracRStudio,Confidencepressbinary)

This will add the new binary variable to the DatapracRStudio dataset. Like before, once you enter the cbind command, you should now see that your dataset has one more variable than before.

Step 5: Limit the dataset to include only observations included in your research population

Before you can conduct your data analysis, you should limit the DatapracRStudio dataset to include only observations in your research population. Remember that the unit of analysis is ‘individuals’ or ‘people’ for this dataset but the research population is whatever characteristic that unites all the observations. In the example below, we will limit the DatapracRStudio to include only women.

To do this, you will need to create a new dataset – with a new name – that includes only observations in the research population. For example if you want to include only women, you could call the new dataset “DatapracRStudioWomen” that would include only female respondents. To do this in RStudio, you will need the filter command. Type the following and then press enter: 

DatapracRStudioWomen<-filter(DatapracRStudio,DP1==2)

This will create a new dataset from the DatapracRStudio that contains only observations for which DP1 (biological sex) = 2 since ‘filter’ is a function that selects only certain observations. (From the codebook we know every respondent with a value of 2 for DP1 identifies as a woman). Furthermore, the function == means ‘if this condition is met’.

Once this is complete, you will see yet another dataset on the top right. Notice that this new dataset has 525 observations (because all the male respondents have been removed from DatapracRStudioWomen).

You can use the filter command to create a new dataset that includes observations from any variable. For example, if you wanted to limit the dataset based on age (including only people in the younger Americans for example, respondents who less than or equal to age 35), you would type:

DatapracRStudioYoungAmericans<-filter(DatapracRStudio,DP2<=35)

This tells R Studio to create a new dataset (called DatapracRStudioYoungAmericans) taking any observation from the DatapracRStudio who is equal to or less than 35 for DP2 (from the Codebook, you can see that DP2 is age). You should see that this new dataset is now listed in the top right box on the page.

 

Step 6: Create a frequency distribution for nominal level variables (including binary variables) or ordinal level variables with fewer than five categories.

Now that you have your binary variable and that you have selected to include only cases that represent your research population in the dataset, you can create a frequency distribution for your nominal-level variables or ordinal-level variables with fewer than five categories. Remember that a frequency distribution lists the percentages for each code contained in a variable, relative to the total number of observations in the dataset. For the example below, we generate a frequency distribution for the binary variable we created earlier, social class. Remember that the original codes of 1 (working class), 2 (lower class), and 3 (lower middle class) were combined into a single category (0 = lower classes) and that 4 (upper middle class) and 5 (upper class) were combined into a second category (1 = upper classes). We now want to know what percentage of respondents are in the 0 category and what percentage of respondents are in the 1 category.

In the examples below the DatapracRStudioYoungAmericans dataset was used. After you limit the dataset to include only people aged 35 or younger, type the following and then press enter.

count(DatapracRStudioYoungAmericans,Socialstatusbinary)%>%mutate(percentages=prop.table(n))

Since prop.table(n) tells R to provide the frequency distribution for the selected variable (Socialstatusbinary), this function will produce the following table

Here we see that 58.6% of respondents identify with the lower classes (remember that 0 represents the respondents who selected 1, 2, or 3 for the original social status variable in the Dataprac) while 41.4% of respondents identify with the upper classes (1 represents the respondents who selected 4 or 5 in the original social status variable).


You would create a frequency distribution for any variable that is measured at the nominal level. However, if your ordinal level independent variable has fewer than five categories, you would still generate a frequency distribution for it for your descriptive statistics. For example, let’s consider DP20, Close to: The world. This variable is measured on an ordinal scale and has four categories associated with it: (1 = very close, 2 = close, 3 = not very close, and 4 = not close at all). Since this variable is associated with only four possible responses, you should create a frequency distribution for it for your Descriptive Statistics.

To do this in RStudio, type the following and then press enter:

count(DatapracRStudioYoungAmericans,DP20)%>%mutate(percentages=prop.table(n))

This will produce the following table:

From this table, we see that 23% of respondents in the DatapracRStudioYoungAmericans answered this survey question with a 1 (very close), 31.6% with a 2 (close), 29.1% with a 3 (not very close), and 16.3% with a 4 (not close at all).

Step 7: Generate summary statistics for variables with ordinal measurement when the number of categories is high (greater than or equal to five) and for variables with interval or ratio measurement

For variables with ordinal measurement with a high number of categories (like your dependent variable, which is measured on an ordinal scale from 1 to 10), you will need to generate summary statistics that include the variable’s minimum value, maximum value, mean, and standard deviation. As an example, we will examine DP71 Democratic Satisfaction in the DatapracRStudioYoungAmericans dataset. To generate summary statistics in RStudio, first type the following and press enter (remember that the command itself (summary, sd) need to be in lowercase:

summary(DatapracRStudioYoungAmericans$DP71)

Here we see that the minimum value (min) for Democratic Satisfaction among younger Americans in the DatapracRStudioYoungAmericans is 1, the maximum value (max) is 10, and the average (mean) is 5.09.

Now we need the standard deviation for this variable. Type the following and then press enter:

sd(DatapracRStudioYoungAmericans$DP71)

This tells us that the standard deviation for Democratic Satisfaction among young Americans in the DatapracRStudioYoungAmericans is 2.73.

You can generate summary statistics for any variable with ordinal measurement with five or more categories in a dataset. (You can also do this for the only variable in the DatapracRStudio with ratio measurement (age, DP2, for example).

As a final example, let’s generate summary statistics for DP63, Justifiable: Death Penalty. The variable is measured on an ordinal scale from 1 (never justifiable) to 10 (always justifiable). To generate summary statistics for this variable in R, use the summary and sd commands:

summary(DatapracRStudioYoungAmericans$DP63)

sd(DatapracRStudioYoungAmericans$DP63)

 

From these results, we see that in the DatapracRStudioYoungAmericans dataset, DP63 (Justifiable: Death Penalty) has a minimum value of 1, maximum value of 10, the average is 5.43 and the standard deviation is 2.84.

 

Back to top