Chapter 8 SPSS tutorial

Step 1: Conducting and interpreting the results from the difference in means test

To do a difference in means test, you need a binary independent variable and a dependent variable that is measured at the ordinal, interval, or ratio level. Be sure that you have transformed any nominal-level variable with more than two categories into a binary variable. For the example below, we will use the examples presented in Chapter 8, which do not limit the dataset to any particular research population. The first example assesses the relationship between biological sex (DP1) and attitudes about whether political violence is ever justifiable (DP62).

For the difference in means test in SPSS, from the top menu choose Analyze, then Compare Means, then Independent Samples T Test.

This will open the Independent Samples T Test window. From here, use the top blue arrow to place the dependent variable (DP62) into the “Test Variable” box. Then place the independent variable (DP1) in the “Grouping Variable” box as seen below:

Now you have to tell SPSS which the two groups you wish to compare. To do this, click on Define Groups, which will open a new window:

Here you need tell SPSS which two groups to compare by placing the appropriate values representing the two categories in the white boxes labeled Group 1 and Group 2. In this case, 1 represents men and 2 represents women. (However note that if you transformed a nominal level variable with more than two categories into a binary, you will need to remember how you assigned the two new values; whether they were 0 and 1 or 1 and 2. In the DatapracSPSS, all binary variables have 1 and 2 as their values, but if you recoded a variable into a new binary variable, you may have chosen 0 and 1 (which is also common in data-based research). Be sure to place the appropriate values for the binary variable you are working with into the Group 1 and Group 2 boxes.

Once you enter the two values, click Continue, and then OK. This will produce your results. See below:

The first table lists the “Group Statistics,” which include the number of observations in each category (1 – men and 2 –women) in addition to the mean (average) and standard deviation for each group. Here we see that the average for Justifiable: Political Violence for men is 3.41 and for women the average is 2.90. Since the variable runs from 1 (never justifiable) to 10 (always justifiable), we see that the average for both men and women falls more toward the lower values (i.e. towards ‘never justifiable’), but there is still a difference between men and women. Specifically, the average for men is higher than it is for women. We can find the actual value of this difference under “Mean Difference” in the “Equal Variances not Assumed” row (the second row), which is .506. There is thus a half point difference between men and women for the Justifiability of Political Violence, with women leaning more towards ‘never justifiable.’

But is this result significant? To find the test’s p value, we look to “Sig. 2 Tailed,” again in the “Equal Variances not Assumed” row. “Sig.” stands for significance; the p value tells us if the result is statistically significant. Here we see that the p value for the test is .005. This means that in only 5 out of 1,000 times doing a test like this on a sample similar to the one in the DatapracSPSS would we see a result this large (a mean difference of .506 between men and women) by chance. As a result, since this p value is below the standard of .05, we can reject the null hypothesis that the mean difference between the two groups (men and women) is actually zero. Rather, the test provides evidence that there is a significant difference between the two groups and how they think about the justifiability of political violence.

Let’s work with the other example in the text: is there a difference between people in lower and upper social classes and whether they believe that people receiving state aid for unemployment is an essential characteristic of democracy (DP65). Before we can conduct this test, we need to transform the social status variable (DP9) in the DatapracSPSS into a binary variable with only two categories. (See Chapter 7 for this: 1 – working class, 2 – lower class, and 3 – lower middle class were combined into one category, while 4 – upper middle class and 5 – upper class were combined into a second category.) Now that it is a binary variable (lower classes vs. upper classes), we can use it in the difference of means test.

From Analyze, select Compare Means, then Independent Samples T Test. Now put the new dependent variable (Essential characteristic of democracy: People receive state aid for unemployment – DP65) into the “Test Variable” box. (You can use the little blue arrows to move the variables around; if you have already performed a previous test, you may need to remove the variables you are no longer using.) Next, place the new binary variable in the “Grouping Variable” box using the little blue arrow. (Remember that any new variable you may have created will be found at the bottom of the variables list.) Also, be sure that your groups are defined appropriately. Biological sex is measured in the DatapracSPSS as 1 (men) and 2 (women), but the social status binary that was used as an example in Chapter 7 was recoded as 0 (lower classes) and 1 (upper classes). Be sure you define Group 1 and Group 2 appropriately. For this test, we need to enter 0 and 1 as Group 1 and Group 2.

The test produces the following results:

Looking at these results, we see that the lower classes (number of observations=614) have an average of 6.31 on opinions concerning whether people receiving state aid for unemployment is an essential characteristic of democracy, while the upper classes (number of observations=417) have an average of 6.41. As seen in the table, the mean difference between the two groups is -.102. While this number is not zero, it is admittedly small. To determine if this result could be due to random chance (when the true difference is really zero), we consult the p value (Sig. (2-tailed) in the Equal variances not assumed row), which is in this case is .549. This means that if we could conduct this same test on samples like the one used in the DatapracSPSS, in almost 55 of 100 times, we would get a result this large if the true difference between the lower and upper classes were really zero. Since this p value is well above the threshold of .05, we cannot reject the null hypothesis and the result is insignificant.

Step 2: Producing and interpreting a correlation coefficient

The correlation coefficient is used to assess the relationship between two variables with ordinal, interval, or ratio-level measurement. Here we will produce the correlation coefficient between age (DP2) and perceptions concerning corruption in the United States (DP56). To do this, from the top menu in SPSS click on Analyze, then Correlate, then Bivariate as seen below:

This will open a new “Bivariate Correlations” window. Place the two variables you wish to correlate into the “Variables” box using the little blue arrow. It does not matter which variable your include first (though you might consider getting in the habit of including the independent variable first).

Once the two variables are in the Variables box, simply click OK. This will produce the following table:

This table displays both the correlation coefficient and the test’s p value. The correlation coefficient is in the second box in the top row, +.218. The positive number means that as age increases, views on corruption also increase (moving toward the value of 10, which represents ‘there is abundant corruption in the United States). On the surface, this appears to confirm the researcher’s hypothesis. But is this correlation coefficient significant, in other words, can we reject the null hypothesis that the true correlation between these variables in the population from which the DatapracSPSS was generated is actually zero? To answer this question, we must look at the “Sig. (2-tailed)” value (which is how the p value is displayed in SPSS). Here we see that the p value is .000, which means that there is a 0 in 1,000 probability that this result is due to random chance. In other words, the result was achieved because there is a very low probability that the true population correlation for these two variables is zero. This means we can reject the null hypothesis; the test suggests that there is likely a positive correlation between age and perceptions of corruption in the population from which the DatapracSPSS sample was generated. Also, you need to know how your variables are measured in order to interpret these statistics properly; specifically knowing how the perceptions of corruption variable is measured from 1 to 10 is crucial to interpret the positive correlation coefficient appropriately (especially since variables concerning age are usually straightforward from younger to older people).

Let’s consider the next two examples concerning the correlation coefficient in the text. First, consider the relationship between age (DP2) and views on whether it is ever justifiable for someone to accept a bribe in the course of their duties (DP60). The researcher believes that older respondents will be more likely to believe that is not justifiable to accept a bribe compared to younger respondents. Since the justifiability variables run from 1 (never justifiable) to 10 (always justifiable), the researcher is expecting a negative number for the correlation coefficient, which would imply an inverse correlation.

Following the same steps described above (changing the dependent variable by removing the previous one with the little blue arrow and inserting the new one with the same arrow), the researcher produces the following table:

Here we see the correlation coefficient is indeed negative, suggesting that as age increases, the values on whether it is justifiable for someone to accept a bribe in the course of their duties decrease, towards never justifiable. This is in line with the research hypothesis. But is this result significant? The p value (Sig. (2-tailed)) for the test is .000. Below the threshold of .05, this p values suggests that the true correlation between these two variables in the population from which the DatapracSPSS was derived is not zero. Rather, the very low p value provides evidence that there is an inverse correlation between age and views on whether it is justifiable for someone to accept a bribe in the course of their duties, and specifically that as age increases, respondents find it less justifiable to accept a bribe. Again, knowing how the variables are measured is very important to interpreting these statistics properly.

Let’s go through the last example from the text, the correlation between age (DP2) and how important God is in people’s lives (DP72). The researcher believes that as age increases, respondents will be more likely to believe that God is very important. Since the importance of God in life variable runs from 1 (not at all important) to 10 (very important), he is expecting a positive correlation. The results from the test are below:

The correlation between age and how important God is in life is .047. While the number is positive (suggesting a positive correlation between the two variables) on the surface, we see that this number is very close to 0. And indeed the p value for the test is .135, which is well above the threshold of .05. As a result, the null hypothesis that the true correlation between these variables cannot be rejected; rather the results suggest that the null hypothesis is indeed true for the population from which the DatapracSPSS was taken.