Description
For this assignment were going to look at birthweights of babies in California from 2000 to 2013. The data has year, groups of birthweights, and counts of number of babies in that group of birth weights. Theres also information on county, zip code, lat/long, etc. These may or may not be useful, but are interesting grouping factors.
1. Warmup: Load up the data! Use skimr
to show that youve done so properly, and everything is as it should be.
2. What is the number of children in each birthweight category in Sacramento, CA across the entire dataset?
3. Are there trends in the birthweights of children in Truckee over time? Visualize this by looking at the number of individuals in each birthweight category in each year. Make the plot as easy to understand as possible. Extra credit for making it fancy and not just a default plot. Note, Truckee has multiple zip codes, so, youre going to want to make sure to sum over all zip codes in the city!
4. Are there any trends by Latitude (e.g., North-South)? To answer this
4a. Create a new column, lat_group
where you use cut_interval
to cut the data into 10 groups.
4b. Calculate the birthweight count for each birthweight group and latitude group – but also calculate the mean latitude in that group.
4c. Plot! Remember, make your axes, labels, and titles informative! That mean latitude will help you make a good plot. Trust me!
4d. Well that was unsatisfying, given the different number of births at different latitudes. Can you redo this, correcting for number of individuals in each latitude bin to make it more useful? So, percent in each bin? Make those axes on the plot tell us what is going on! Hint: to do this, youll need to group not just once, but twice. The first time, but latitude group and birthweight group, the second, youll just use latitude group and instead of summarize
, use a mutate
to caculate percent in each birthweight category. Dont forget to ungroup
at the end!
4e. What did visualization by raw numbers versus percent tell you? How are they different? Why might they be different? What does this tell you about visualization and analysis of population trends in general?
5. Extra credit (variable, depending on awesomeness of data viz)
Note that there is latitude and longitude information in this data. Can you use that in some way to plot out anything interesting in the data in terms of geographic distribution. Note, log(x+1) transformations may be your friend for some things. Or correcting by population size. Have fun with this! Feel free to look into geospatial visualization with ggplot2 or other packages – although well do this more formally in a few weeks. Might not be necessary, but, you never know.