Click on each question to check your answer.
Short Answer Questions
1. Explain the difference between a measure of central tendency and a measure of dispersion and give some examples of each.
Measures of central tendency are a group of statistics, each of which presents a single value that best represents the distribution of the variables. The three measures of central tendency are the mode (used for nominal data), the median (used for ordinal data, and sometimes for interval-ratio data), and the mean (used for interval-ratio data). Measures of dispersion are a group of statistics that indicate how well the measure of central tendency represents the distribution, and again the researcher selects the appropriate measure according to level of measurement of the variable. The three measures of dispersion are the variation ratio (used for nominal variables), the range or interquartile range (used for ordinal variables), and the standard deviation (used for interval-ratio variables).
2. Explain what a measure of association is and explain the difference between a proportional reduction in error (PRE) measure and a non-PRE measure.
Measures of association indicate the strength of the relationship with a single numerical value. Measures of association provide a standardized and compact way to convey relationship information to others; they are much easier to report and compare across studies than are contingency tables or scatter plots. A special class of measures of association is the proportional reduction in error (PRE) measures. A PRE measure is basically a before and after comparison: we compare the amount of error we have before knowing the value on the independent variable with the amount of remaining error after knowledge about the independent variable is taken into account. In other words, to what degree does knowledge about the independent variable reduce our error in predicting values of the dependent variable? Non-PRE measures of association can tell us the strength and, occasionally, the direction of the association, but not how much error it reduces. PRE measures include lambda, gamma, and tau-b; non-PRE measures include Cramer's v and tau-c.
3. Why is basic linear regression an improvement over Pearson’s r?
While Pearson’s r is useful for a concise summary of the relationship between two interval/ratio variables related in a linear fashion, researchers often want more information. In particular, they are interested in how we can use information about the independent variable to predict scores on the dependent variable. We use a scatter plot to assess the linearity of a relationship between two interval/ratio variables: we set up a graph, with the dependent variable along the y-axis (vertical) and the independent variable along the x-axis (horizontal), and locate each case according to its position on both axes. We then look at the pattern among the data: does it suggest a linear, curvilinear, or no relationship? A pattern that could have a straight line drawn through it indicates a linear relationship. Basic linear regression is a statistical technique to estimate the location of this line for every value of the independent variable.
4. What are the four questions that researchers need to answer about an association between variables?
When assessing whether there is a relationship between two variables, we need to answer four questions: What is the form/direction of the relationship? How strong is the relationship? Is the relationship statistically significant? What happens to the relationship when we control for other variables?
5. How does the level of measurement affect the statistical tests you can run?
The level of measurement determines what measures of association and dispersion you can use. Sometimes, it is better to recode data to a different level of measurement, because it allows different tests of association. In particular, when you are conducting a test of association between two variables of different levels of measurement, you must use the test for the lower level of measurement, or find a way to recode the data from the lower level of measurement into the higher level of measurement, such as through using dichotomous ("dummy") variables.