Description
Some of the questions in this assignment require you to use STATA to analyze one or the other of the two datasets provided on Canvas. These two datasets are:
Dataset 1. WDI [World Development indicators]
I downloaded the data from: https://databank.worldbank.org/reports.aspx?source=world-development-indicators (Links to an external site.)
There are many more variables available over a wide range of years in the WDI dataset that I did not include in this data set. If you want, you can look at this website to get descriptions of any of the variables. You can also create other datasets with different variables if you want.
Dataset 2. Balyor Religion Survey dataset
I downloaded this from: http://www.thearda.com/Archive/Files/Downloads/BAYLORW2_DL2.asp (Links to an external site.))
I have also included the codebook for the dataset in the Data folder in Files on Canvas. The codebook describes each of the variables.
Please type answers to the following and, when asked for, embed the graphics (or you can put graphics in a separate file).
PART I (requires using STATA)
- (2 points) Using the Baylor dataset in STATA, for each of the following variables, identify (1) whether they are measured at the nominal, ordinal, interval or ratio level and (2) for those variables that are ratio or interval level, whether they are discrete or continuous (you can look at the codebook or read the variable label to assess how each variable is measured):
region
age
bigfoot
marijuan
- (2 points) Using the WDI dataset in STATA, list all nations and the percentage of the population that lives in urban areas.
Stata command: list countryname v59
Which country is the most urbanized by this measure? Which the least?
Hint: an easy way to find this out is to use the sort command: sort v59.
Do this before you use the list command.
- (2 points) Using the WDI dataset in STATA, present the mean, median, standard deviation, range, and interquartile range for each of the following variables: v6, v7, v8, v20, and v25
To get this information use the following command substituting each variable name for x: sum x, d [d stands for detail, which gives more information than using the default for the sum command]. For example, type into the command line sum v6, d
- (8 points) Using the WDI dataset in STATA, create a boxplot of v6, v7, v8, v20, and v25. To do this, use the dropdown Graphics menu and select boxplot. Interpret the boxplot.
- (8 points) Using the WDI dataset in STATA, create a histogram for each of the following variables: v6, v7, v8, v20, and v25. To do this, select histogram from the dropdown Graphics menu. Describe each histogram.
- (8 points) Using the WDI dataset in STATA, for each of the following variables create a new variable that presents the original variable as a standardized score (z-score): v11, v18, v21, v22, v24, and v40.
[To do this, use the following command substituting the variable name of each of the above variables for x and any variable name you want to use for the new variable for newvar: egen newvar=std(x)]
Once you have done this, use the list command to list all the countries and their values on each of the standardized new variables you generated. What are the standardized values for the United States on each of these variables? How does the US compare to other nations?
PART II (no STATA needed)
- (6 points) You want to know the percentage of people 18+ years of age in the state of Oregon who went camping last summer. You select a random sample 160 from the list of registered voters in Oregon and survey them by mail. You get 100 responses. Of these 54 people indicate they went camping last summer.
- What is the target population in this study?
- What is the sampling frame?
- Do you think there is a problem with coverage error in this study? Explain briefly.
- What is the response rate?
- Of the respondents, 54% indicated they went camping. Is this a statistic or a parameter?
- What does the margin of error equal (as a percentage)?
- (4 points) You may need to use the Z table in OpenIntro Stats (Appendix C) for some of the remaining questions.
- If you flip a coin twice, what is the probability that it will come up tails both times (assume that the coin is fair i.e., it is equally likely to come up heads as tails on any flip)?
- In a normal distribution, approximately what percentage of the values falls within 2 standard deviations of the mean?
- Under any normal distribution of scores, what percentage of the total area falls between the mean (?) and +1.41??
- If scores on a test are normally distributed, with a mean (?) of 200 and a standard deviation (?) of 30, what percentage of scores falls between 185 and 230?
- (4 points) Determine the z score for each of the following values from a normal distribution with ? = 200 and ? = 10.
- 220
- 195
- 210
- 185
- (6 points) Among the students at a particular college in spring term 2019, the mean (?) number of cups of coffee students drank during finals week was 8 with a standard deviation (?) of 2. Assuming the number of cups of coffee students drank is normally distributed, determine
- the percentage of students who drank between 6 and 8 cups.
- the percentage of students who drank 7 or more cups.
- the percentage of students who drank between 9 and 11 cups.