Step 1: Conducting and interpreting the results from the difference in means test

To do a difference in means test, you need a binary independent variable and a dependent variable that is measured at the ordinal, interval, or ratio level.

After you open RStudio, be sure to type library(tidyverse) to tell RStudio that you wish to use the tidyverse package.

Also, be sure that you have transformed any nominal-level variable with more than two categories into a binary variable (see Chapter 7’s tutorial). For the example below, we will use the examples presented in Chapter 8. The first assesses the relationship between biological sex and attitudes about whether political violence is ever justifiable.

In the following example, no specific research population was selected (thus the DatapracRStudio was used as the dataset). DP1 is biological sex (where 1 represents men and 2 represents women) and DP62 is the justifiability of political violence (where 1 represents never justifiable and 10 represents always justifiable).

For the difference in means test in RStudio type the following and press enter:

t.test(filter(DatapracRStudio,DP1==1)$DP62,filter(DatapracRStudio,DP1==2)$DP62)

t.test tells R Studio to perform a difference in means test. The term filter tells RStudio to provide the average (since the t-test is the difference in means test) for DP64 (justifiability of political violence) for respondents for whom DP1 is equal to 1 (men) and then for only respondents for whom DP1 is equal to 2 (women). This command produces the following information:

This information contains the average for each of the two groups (men and women), and the p-value for the effect. First, look at the very bottom row, where ‘mean of x’ and ‘mean of y’ are located. The x is the first group listed in the t.test (1 – men) and the y is the second group listed in the t.test (2 - women). Thus, the average for DP62 (justifiability of political violence) on a scale from 1 to 10 for men is 3.41 and for women is 2.90. Thus, women have a lower justifiability average than men. If you subtract the two averages (3.41-2.90) you can determine that the mean difference between men and women is .51, which is half a point difference (on the scale from 1 to 10).

But is the mean difference of .51 between men and women in the DatapracRStudio significant? To answer this question, we need the p value for the difference of means test, which is provided in the second line of the results in the RStudio output. The third item on the second line is the p value, which in this case .00491. This means that in only 5 out of 1,000 times doing a test like this on a sample similar to the one in the DatapracRStudio dataset would we see a result this large (a mean difference of .51 between men and women) by chance. As a result, since this p value is below the standard of .05, we can reject the null hypothesis that the mean difference between the two groups (men and women) is actually zero. Rather, the test provides evidence that there is a significant difference between the two groups and how they think about the justifiability of political violence.

Let’s work with the other example in the text: is there a difference between people in lower and upper social classes and whether they believe that people receiving state aid for unemployment is an essential characteristic of democracy (DP65)? Before we can conduct this test, we need to transform the social status variable in the DatapracRStudio into a binary variable with only two categories. (See Chapter 7 for this: 1 – working class, 2 – lower class, and 3 – lower middle class were combined into one category (0), while 4 – upper middle class and 5 – upper class were combined into a second category (1).) Note that this binary variable now utilizes the values of 0 and 1 (unlike biological sex in Dataprac which utilized the values of 1 and 2; this is a good lesson in always being sure how variables are measured when performing these tests.) Now that social status is a binary variable (lower classes vs. upper classes), we can use socialstatusbinary in the difference of means test. To conduct the difference of means test type the following and press enter:

t.test(filter(DatapracRStudio,Socialstatusbinary==0)$DP65,filter(DatapracRStudio,Socialstatusbinary==1)$DP65)

Again, note that the filter function tells RStudio to provide the average for DP65 (whether it is an essential characteristic for democracy to provide state aid for unemployment) for respondents in the 0 category for the social status binary (the lower classes) and to provide the average for DP65 for respondents in the 1 category (the upper classes). The t.test function also means that the p value for the test will be provided in the results.

The test produces the following results:

Looking at these results, we see that the lower classes have an average of 6.31 on opinions concerning whether people receiving state aid for unemployment is an essential characteristic of democracy, while the upper classes have an average of 6.41. The mean difference between the two groups is -.10. While this number is not zero, it is admittedly small. To determine if this result could be due to random chance (when the true difference is really zero), we consult the p value, which is in this case is .549. This means that if we could conduct this same test on samples like the one used in the Dataprac, in almost 55 of 100 times, we would get a result this large if the true difference between the lower and upper classes were really zero. Since this p value is above the threshold of .05, we cannot reject the null hypothesis and the result is statistically insignificant.

Step 2: Producing and interpreting a correlation coefficient

The correlation coefficient is used to assess the relationship between two variables with ordinal, interval, or ratio-level measurement. Here we will produce the correlation coefficient between age (DP2) and perceptions concerning corruption in the United States (DP56). To do this in RStudio type the following and then press enter:

cor.test(DatapracRStudio$DP2,DatapracRStudio$DP56)

This will produce the following results:

These results display both the correlation coefficient and the test’s p value. The correlation coefficient (cor) is the last value displayed, +.218. The positive number means that as age increases, views on corruption also increase (moving toward the value of 10, which represents ‘there is abundant corruption in the United States’). On the surface, this appears to confirm the researcher’s hypothesis. But is this correlation coefficient significant, in other words, can we reject the null hypothesis that the true correlation between these variables in the population from which the DatapracRStudio was generated is actually zero? To answer this question, we must look at the p value, which is displayed in these results as the third statistic in the second line. Here we see that the p value is .000 (1.35 e-12), which means that there is a 0 in 1,000 probability that this result is due to random chance. In other words, the result was achieved because there is a very low probability that the true population correlation for these two variables is zero. This means we can reject the null hypothesis; the test suggests that there is likely a positive correlation between age and perceptions of corruption in the population from which the DatapracRStudio sample was generated. Also, you need to know how your variables are measured in order to interpret these statistics properly; specifically knowing how the perceptions of corruption variable is measured from 1 to 10 is crucial to interpret the positive correlation coefficient appropriately (especially since variables concerning age are usually straightforward, from younger to older people).

Let’s consider the next two examples concerning the correlation coefficient in the text. First, consider the relationship between age (DP2) and views on whether it is ever justifiable for someone to accept a bribe in the course of their duties (DP60). The researcher believes that older respondents will be more likely to believe that is not justifiable to accept a bribe compared to younger respondents.  Since the justifiability variables run from 1 (never justifiable) to 10 (always justifiable), the researcher is expecting a negative number for the correlation coefficient, which would imply an inverse correlation.

To produce the correlation coefficient for these variables, in R Studio type the following and press enter:

cor.test(DatapracRStudio$DP2,DatapracRStudio$DP60)

Here we see the correlation coefficient (cor) is indeed negative (-.350), suggesting that as age increases, the values on whether it is justifiable for someone to accept a bribe in the course of their duties decrease, towards never justifiable. This is in line with the research hypothesis. But is this result significant? The p value is .000 (2.2 e-16). Well below the threshold of .05, this p value suggests that the true correlation between these two variables in the population from which the DatapracRStudio was derived is not zero. Rather, the very low p value provides evidence that there is an inverse correlation between age and views on whether it is ever justifiable for someone to accept a bribe in the course of their duties, and specifically that as age increases, respondents find it less justifiable to accept a bribe. Again, knowing how the variables are measured is very important to interpreting these statistics properly.

Let’s go through the last example from the text, the correlation between age (DP2) and how important God is in people’s lives (DP72). The researcher believes that as age increases, respondents will be more likely to believe that God is very important. Since the importance of God in life variable runs from 1 (not at all important) to 10 (very important), he is expecting a positive correlation. In RStudio type the following and press enter:

cor.test(DatapracRStudio$DP2,DatapracRStudio$DP72)

 

The correlation between age and how important God is in life is .047. While the number is positive (suggesting a positive correlation between the two variables), we also see that this number is very close to 0. And indeed the p value for the test is .135, which is above the threshold of .05. As a result, the null hypothesis that the true correlation between these variables cannot be rejected; rather the results suggest that the null hypothesis is indeed true for the population from which the DatapracRStudio was taken.

 

 

Back to top