Working with the World Values Survey in Stata
The purpose of replication analysis is to determine if a particular result gleaned from a sample of data can be reproduced with a second sample of similar data. If the results using the same variables from two different samples are similar, we can be more confident that a conclusion we have reached based on the data analysis is likely correct. However, if the results differ, it is necessary to critically assess where there is a problem. Most likely, the problem lies with one or both of the samples used in the analysis. For example, even though the researchers collecting the data may have had tried to achieve a representative sample based on randomization, it is possible that one or both samples fall short of this important goal. As a result, when the ability to generate a true random sample is compromised, a sample is likely not representative of a particular population, and the results from the sample may not be replicable.
Furthermore, it is important to consider the time period in which data was collected for each sample; this is especially true for survey data. The seventh wave of the World Values Survey data for the United States was collected during 2017, while the Dataprac data was collected in the United States in 2019. The difference in time between the two datasets may influence the results to some degree. Importantly, how survey respondents thought about the issues asked in the survey may have differed between the two time periods.
In terms of how the size of a sample influences the results, it is important to note, too, that the seventh wave of the World Values Survey for the United States includes over 2,600 observations, while the Dataprac has a little over 1,000. If both samples are representative, this difference alone makes the World Values Survey superior because the margin of error for any particular sample statistic is lower when the number of observations is higher. This makes the estimates from the larger sample more precise (and hence more likely to be significant if there is indeed a true relationship between variables in the population from which a sample was generated).
It is also possible, too, that some of the questions and potential responses may also differ between the two surveys. While the overwhelming majority of the questions between the Dataprac and World Values Survey are exactly alike, there are a few subtle differences between the two surveys. For example, for the question “Which party would you vote for if there were a national election tomorrow?” the question as worded is the same for the two surveys, but the possible responses differ slightly. Specifically, respondents for the World Values Survey were given additional options from which to choose (such as the Libertarian Party and the Green Party) while the respondents in the Dataprac were not given these additional options. This slight discrepancy might also compromise the comparability of the responses. (Overall, however, the wording of the questions between the Dataprac and World Values Survey is mostly the same.)
Finally, it is important to recognize that obtaining a ‘significant’ result from one sample while the result from the second sample is ‘insignificant’ does not necessarily mean that the significant result is more correct. Rather, it is important to assess these particular issues to assess which dataset might be more representative and hence a more adequate reflection of the population from which the sample was derived.
Downloading the World Values Survey for replication analysis
Go to worldvaluessurvey.org to find the following page.
Click on Data & Documentation on the left hand side of the page.
From here click on Documentation/Downloads.
Click on Wave 7 (2017-2020).
From here, use the slide bar under “Select a country” to find USA 2017.
Click on USA 2017. This will take you to the page where the USA 2017 WVS data is located. Before downloading the data in Stata, however, you should peruse the ‘questionnaire’ and the ‘codebook and results’ pdf files that are available. The questionnaire lists how each question was asked to survey participants, along with some of the possible responses. Working with the two documents together, you should be able to understand much of the coding scheme in the World Values Survey. It will also be possible, later, to display the codes in Stata for each variable as well. (Note that responses that are unique to the United States (like Political Party) are not included in the generic questionnaire. You will be able to use the ‘codebook’ feature in Stata to determine the specific codes once you have opened the dataset).
Once you click on USA 2017, you will also see the datasets as they are available for public use. For Stata, you should choose the last file in the list: WVS Wave 7 United States Stata v1.6. Download this file onto your computer and store it where you can easily find it. Once you unzip the file (by clicking on it), your Stata program should open the dataset.
Before you begin your data analysis, you will need to ‘handle’ missing values in the dataset. Unlike the Dataprac, which has no missing values, there are several missing values in the World Values Survey because some respondents, for example, did not know the answer to a particular question or chose not to answer. In the World Values Survey in Stata, these answers are represented by a negative number. Specifically ‘don’t know’ is represented by -1 and ‘no answer’ is represented by -2. Since these values would be included in any data manipulations (like the computation of an average), these particular answers need to be removed so that your analysis will concentrate on only respondents who did answer the particular questions you are working with.
To do this, you should first know if your variable has any such negative values associated with it (i.e. if some respondents did not know the answer to a question or chose not to answer it).
The easiest way to do this in Stata is to use the ‘codebook’ command. See below for an example of the codebook command with Q4 (Important in life: Politics).
codebook Q4
From this we can see that while the overwhelming majority of respondents answered this question with a specific response (very important, rather important, not very important, and not important at all), 22 respondents instead have a value of -2, which indicates that they did not answer the question. We need to remove these observations from the dataset before continuing with our data analysis.
To do this, start by typing the command ‘preserve’ so you can always revert back to the original dataset (with ‘restore’) if necessary. Then, use the ‘keep if’ command to keep only positive values in the variable. Continuing with using Q4 (Important in life: Politics) as an example, type the commands as seen below and in the picture:
preserve
keep if Q4>0
From here, you can see that 22 observations have been deleted; these are the 22 people who did not answer Q4.
Use the keep if command with the parameter >0 for each variable in your analysis. This will ensure that all negative numbers (representing people who did not know an answer or did not answer a particular question) are removed from the dataset.
Once this step is complete, you can use the same commands learned in Chapters 7, 8, and 9 to perform the same data analysis using the World Values Survey as you did with the Dataprac. The basic commands are listed below, but refer to the online tutorials from these chapters if you need to refresh your knowledge of these commands.
To create a copy of a variable so it can be recoded (if you need to create a new binary variable in the World Values Survey): generate
To recode the new variable: recode
To select a particular research population: keep if
To generate a frequency distribution: tab
To generate summary statistics: summarize
To conduct a difference in means test: ttest DV, by(IV) welch
To computer a correlation coefficient: pwcorr DV IV, sig
For single variable regression: regress DV IV
For multivariate regression: regress DV IV1 IV2