Description
Questions
- Using str_detect and filter, create a data frame with the massachusetts counties that contain the letter h in their county name.
- Using str_detect and filter, create a data frame with the massachusetts counties that contain the number 2 in their three digit county code.
- Using str_replace, remove the word County or city and the trailing white space after the county name.
- Open reading frames (ORF) are sections of DNA that have the potential to code for a peptide/protein. They occur between a start codon ATG and a stop codon (TAA, TAG, or TGA) on the DNA. Write the regular expression that selects the ORF of the DNA sequences provided and returns them and ONLY them (nothing before or after) with str_replace. Note, each row has only one ORF.
- Take a look at the cheat sheets on R Cheat Sheets. Print some out, even! What are the design elements that you like in these? Reference specific cheat sheets and what works for you in them. Reference specific examples in cheat sheets that DO NOT work for you.
- Choose a package you want to look at for your midterm. You dont need to do anything with it yet. But, sign up here and tell us what package you have chosen and why here. See the midterm handout for more information.
Readings: W&G on Strings, Dates and Times, and Factors
Lecture: Strings and Regular Expressions 1, Strings and Regular Expressions 2
Cheat Sheets: Work with Strings Cheat Sheet
Objective(s): Understand how strings differ from numbers. Learn the basics of string manipulation. Describe the different strategies to clean data full of errors with minimal effort. Process and understand the concept of regular expression matching. Manipulate different date formats and work them into a data map reduce workflow.
Files: Portal Mammal Data with String Problems