Step 1: Acquire Stata

You will first need to acquire Stata. Your institution may have a way for you to download Stata directly onto your computer, have a virtual desktop through which you can access Stata from a server on your campus, or include Stata on public computers at your institution. You can also purchase temporary access to Stata (Stata/IC) through Stata’s online student discount store at https://www.stata.com/order/new/edu/gradplans/student-pricing/.

Step 2: Download and open Dataprac

The Dataprac data is currently included on the OUP compendium website for this book as a .dta file that Stata can read (it is called DatapracStata). Once you have Stata on your computer, you will be able to open the dataset. First, go to the OUP website and download and save the DatapracStata dataset to your computer. Then, once Stata is open on your computer, from the top menu, then Open, and then click on the DatapracStata dataset (it will be where you saved it on your computer).

This will open the DatapracStata dataset in Stata. To see the data, from the top menu choose Data, then Data Editor (edit).

 

This will display the DatapracStata dataset, as seen below.

 

Step 3: Transform your nominal-level variable with more than two categories into a binary variable

If you are using a nominal-level independent variable with more than two categories, you will need to transform it into a binary variable so it can be used in your analysis. We will use DP9 Social Status as the example. There are five categories associated with the variable: 1 working class, 2 lower class, 3 lower middle class, 4 upper middle class, 5 upper class. Let’s transform this into a binary variable by combining the first, second, and third categories (working class, lower class, and lower middle class) into one new category that we’ll call ‘lower classes’ and the fourth and fifth categories (upper middle class and upper class) into a second category that we’ll call ‘upper classes.’ Thus, we need to tell Stata to combine the values of 1, 2, and 3 for a new category (0 = lower classes) and the values of 4 and 5 into a new category (1 = upper classes).

To do this in Stata, you will type commands in the Command box which is located at the bottom left of the Stata screen. (For now, you can minimize the Data Editor screen.) Type the following and then press return:

generate socialstatusbinary = DP9

Then type the following and press return:

recode socialstatusbinary (1=0)(2=0)(3=0)(4=1)(5=1)

Now, if you return to the dataset in the Data Editor, (it will be one of your open Stata tabs or you can reopen the Editor using the top menu, click Data and then Data editor), you will be able to find your new variable at the bottom of the variable list. In the picture below, see that Socialstatusbinary is now listed as the last variable after DP72, both in the data itself (scroll all the way to the right) and in the Variables list.

You can perform this manipulation for any variable in the Dataprac. You can transform any variable – nominal level or ordinal level – into a binary. You just need the original codes so you can tell Stata how you want the new variable to be. It is common, for example, for researchers to transform scaled variables into binary variables. For example, you could transform any of the confidence variables into binary variables by combining 1 (a great deal) and 2 (quite a lot) into a ‘confident’ category and by combining 3 (not very much) and 4 (none at all) into a ‘not confident’ category. The key is to ensure that the new variable is a valid reflection of the concept you wish to convey in the variable. By keeping the values of 1 and 2 separate from 3 and 4, you are likely creating a new variable that is still valid because the categories ‘a great deal’ and ‘quite a lot’ still convey greater levels of confidence, while the categories ‘not very much’ and ‘none at all’ convey lower levels of confidence. If you wanted to recode Confidence in the press (DP40), for example, you would type the following two commands, pressing return in between each. This will create a new binary variable for confidence in the press. Notice, too, that below the codes of ‘a great deal’ and ‘quite a lot’ are combined into the 1 category (representing confidence) while ‘not very much’ and ‘none at all’ are combined into the 0 category (representing no confidence).

generate confidenceinthepressbinary = DP40

recode confidenceinthepressbinary (1=1)(2=1)(3=0)(4=0)

Step 4: Limit the dataset to include only observations included in your research population

Before you can conduct your data analysis, you should limit the DatapracStata to include only observations that are included in your research population. Remember that the unit of analysis is ‘individuals’ or ‘people’ in the DatapracStata but the research population is whatever characteristic that unites all the observations. In the example below, we will first limit the data in the DatapracStata dataset to include only respondents that identify as women for DP1 (biological sex).

First, type the following and press enter:

preserve

This command will save the dataset so you can work with different research populations.

Now type:

keep if DP1==2

This will limit the DatapracStata to include only respondents who are women.

To return the dataset to include all observations, simply type:

restore

Now, if you wanted to limit the DatapracStata to include only respondents that are between the ages of 18 and 35 (the millennial generation), first type preserve (so you can return to the full dataset through the restore command) and then keep if DP2<=35:

preserve

keep if DP2<=35

This command tells Stata to delete any observation (respondent) for whom DP2 (age) is less than or equal to 35.

Step 5: Create a frequency distribution for nominal level variables (including binary variables) or ordinal level variables with fewer than five categories.

Now that you have your binary variable and that you have selected to include only cases that represent your research population in the dataset, you can create a frequency distribution for your nominal level variables or for ordinal level variables with fewer than five categories. Remember that a frequency distribution lists the percentages for each code contained in a variable, relative to the total number of observations in the dataset. What we want to know is what percentage of people are one code or another. For the example below, we will return to the binary we created earlier, social status. Remember that the original codes of 1 (working class), 2 (lower class), and 3 (lower middle class) were combined into a single category (0 = lower classes) and that 4 (upper middle class) and 5 (upper class) were combined into a second category (1 = upper classes). We now want to know what percentage of respondents are in the 0 category and what percentage of respondents are in the 1 category. Keep in mind that the research population in the examples below is younger Americans (aged 35 or less).

To do this in Stata, type the following and press enter:

tab socialstatusbinary

This will generate the following table.

 

This will produce a frequency distribution for the social status binary variable. Here we see that 58.6% of the respondents placed themselves in the lower classes categories (original codes of 1, 2, or 3) and that 41.4% of the respondents placed themselves in the upper classes category (original codes 4 and 5). Also, notice that the number of valid observations is 374. This number is lower than the original total of 1,031 in the DatapracStata because we limited the dataset to include only people aged 18 to 35.

You would create a frequency distribution for any variable that is measured at the nominal level. However, if your ordinal level independent variable has less than five categories, you would still generate a frequency distribution for it for your descriptive statistics. For example, let’s consider DP18, Close to: The town or city where you live. This variable is measured on an ordinal scale and has four categories associated with it: (1 = very close, 2 = close, 3 = not very close, and 4 = not close at all). Since this variable is associated with only four possible responses, you should create a frequency distribution for it. Follow the same steps as before to do this, type the following and press enter:

tab DP18

From this table we see that 34.2% of respondents answered with a 1 (very close), 35.3% with a 2 (close), 19.5% with a 3 (not very close), and 11.0% with a 4 (not close at all). Also, note that the cumulative percentage is very helpful here; clearly most people (69.5%) feel either very close or close to the town or city where they live.

Step 6: Generate summary statistics for variables with ordinal measurement when the number of categories is high (greater than or equal to five) and for variables with interval or ratio measurement

For variables with ordinal measurement with a high number of categories (like your dependent variable, which is measured on an ordinal scale from 1 to 10), you will need to generate summary statistics that include the variable’s minimum value, maximum value, mean, and standard deviation. As an example, we will examine DP71 Democratic Satisfaction.  Again, we concentrate on younger Americans as the research population. To generate summary statistics in Stata, type the following and press enter:

summarize DP71

This will produce a very simple table with the information you will need for your summary statistics. Notice that the minimum value is 1, the maximum value is 10, the mean (average) is 5.09 and the standard deviation is 2.727.

You should generate summary statistics for any ordinal-level variable for which the number of categories is greater than or equal to five. As a final example, let’s generate summary statistics for DP63, Justifiable: Death Penalty. The variable is measured on an ordinal scale from 1 (never justifiable) to 10 (always justifiable). To generate summary statistics for this variable in Stata type the following and press enter:

summarize DP63

Here we see that the minimum value is 1, the maximum value is 10, the mean is 5.43 and the standard deviation is 2.84.

 

Back to top