STAT 19000: Project 4 — Fall 2020
Motivation: Control flow is (roughly) the order in which instructions are executed. We can execute certain tasks or code if certain requirements are met using if/else statements. In addition, we can perform operations many times in a loop using for loops. While these are important concepts to grasp, R differs from other programming languages in that operations are usually vectorized and there is little to no need to write loops.
Context: We are gaining familiarity working in RStudio and writing R code. In this project we introduce and practice using control flow in R.
Scope: r, data.frames, recycling, factors, if/else, for
Questions
Question 1
Use read.csv
to read in the /class/datamine/data/disney/splash_mountain.csv
data into a data.frame
called splash_mountain
. In the previous project we calculated the mean and standard deviation of the SPOSTMIN
(posted minimum wait time). These are vectorized operations (we will learn more about this next project). Instead of using the mean
function, use a loop to calculate the mean (average), just like the previous project. Do not use sum
either.
Remember, if a value is NA, we don’t want to include it. |
Remember, if a value is -999, it means the ride is closed, we don’t want to include it. |
This exercise should make you appreciate the variety of useful functions R has to offer! |
-
R code used to solve the problem w/comments explaining what the code does.
-
The mean posted wait time.
Question 2
Choose one of the .csv
files containing data for a ride. Use read.csv
to load the file into a data.frame named ride_name
where "ride_name" is the name of the ride you chose. Use a for loop to loop through the ride file and add a new column called status
. status
should contain a string whose value is either "open", or "closed". If SPOSTMIN
or SACTMIN
is -999, classify the row as "closed". Otherwise, classify the row as "open". After status
is added to your data.frame, convert the column to a factor
.
If you want to access two columns at once from a data.frame, you can do: |
For loops are often [much slower (here is a video to demonstrate)](#r-for-loops-versus-vectorized-functions) than vectorized functions, as we will see in (3) below. |
-
R code used to solve the problem w/comments explaining what the code does.
-
The output from running
str
onride_name
.
In this video, we basically go all the way through Question 2 using a video:
Question 3
Typically you want to avoid using for loops (or even apply functions (we will learn more about these later on, don’t worry)) when they aren’t needed. Instead you can use vectorized operations and indexing. Repeat (2) without using any for loops or apply functions (instead use indexing and the which
function). Which method was faster?
To have multiple conditions within the |
You can start by assigning every value in |
Here is a [complete example (very much like question 3) with another video](#r-example-safe-versus-contaminated) that shows how we can classify objects. |
Here is a [complete example with a video](#r-example-for-loops-compared-to-vectorized-functions) that makes a comparison between the concept of a for loop versus the concept for a vectorized function. |
-
R code used to solve the problem w/comments explaining what the code does.
-
The output from running
str
onride_name
.
Question 4
Create a pie chart for open vs. closed for splash_mountain.csv
. First, use the table
command to get a count of each status
. Use the resulting table as input to the pie
function. Make sure to give your pie chart a title that somehow indicates the ride to the audience.
-
R code used to solve the problem w/comments explaining what the code does.
-
The resulting plot displayed as output in the RMarkdown.
Question 5
Loop through the vector of files we’ve provided below, and create a pie chart of open vs closed for each ride. Place all 6 resulting pie charts on the same image. Make sure to give each pie chart a title that somehow indicates the ride.
ride_names <- c("splash_mountain", "soarin", "pirates_of_caribbean", "expedition_everest", "flight_of_passage", "rock_n_rollercoaster")
ride_files <- paste0("/class/datamine/data/disney/", ride_names, ".csv")
To place all of the resulting pie charts in the same image, prior to running the for loop, run |
This is not exactly the same, but it is a similar example, using the campaign election data:
mypiechart <- function(x) {
myDF <- read.csv( paste0("/class/datamine/data/election/itcont", x, ".txt"), sep="|")
mystate <- rep("other", times=nrow(myDF))
mystate[myDF$STATE == "CA"] <- "California"
mystate[myDF$STATE == "TX"] <- "Texas"
mystate[myDF$STATE == "NY"] <- "New York"
myDF$stateclassification <- factor(mystate)
pie(table(myDF$stateclassification))
}
myyears <- c("1980","1984","1988","1992","1996","2000")
par(mfrow=c(2,3))
for (i in myyears) {
mypiechart(i)
}
Here is another video, which guides students even more closely through Question 5.
-
R code used to solve the problem w/comments explaining what the code does.
-
The resulting plot displayed as output in the RMarkdown.