Control Flow
If/else statements
if
, else
, and else if
are keywords in R that allow you to direct the progression of your code. These commands test logical statements, executing certain code depending on the statement’s truth.
The structure and requirements of if/else statements are fairly rigid.
-
There cannot be an
else
without anif
. -
if
statements do not need anelse
— they can stand alone. -
You may have any amount of
else if
sections, but they must be withinif
andelse
blocks -
if
andelse if
require logical statements contained within parentheses. -
All blocks must have curly braces,
{
to open and}
to close, with the desired executable code contained within.
The rules and structure will become clearer in the examples.
Examples
How do I print "Success!" if my expression is TRUE
, and "Failure!" otherwise?
Click to see solution
# Randomly assign either TRUE or FALSE to our variable.
t_or_f <- sample(c(TRUE,FALSE),1)
if (t_or_f == TRUE) {
# If t_or_f is TRUE, print success
print("Success!")
} else {
# Otherwise, print failure
print("Failure!")
}
[1] "Failure!"
For variables that contain either TRUE
or FALSE
, we actually don’t need the == TRUE
part of the if-statement. This is because if/else statements intrinsically utilize TRUE
or FALSE
values when it comes to executing code.
if (t_or_f) {
print("Success!")
} else {
print("Failure!")
}
[1] "Success!"
How do I print "Success!" if my expression is TRUE
, "Failure!" if my expression is FALSE
, and "Huh?" if it is neither?
Click to see solution
# Randomly assign either TRUE or FALSE to t_or_f.
schrodinger_boolean <- sample(c(TRUE,FALSE,"Something else"), 1)
if (schrodinger_boolean == TRUE) {
print("Success!")
} else if (schrodinger_boolean == FALSE) {
print("Failure!")
} else {
print("Huh?")
}
[1] "Failure!"
Unlike in the first example, the sample space of schrodinger_boolean
includes non-logical elements ("Something else"), so it isn’t possible to use the shorthand.
For loops
for
loops allow us to execute similar code repeatedly until we’ve looped through all of an object’s elements. For example, if we wanted to format the dates in a list, we could use a for
loop to run through each element of the list to format it.
While crucial in other programming languages, R has apply
functions that are faster and more powerful than loops. As with many things, apply
functions do have situations where they lack efficiency, so learning loops is still vital for mastering R.
Examples
How do print every value in a vector?
Click to see solution
for (i in 1:10) {
# In the first iteration of the loop, i will be 1. The next, i will be 2, and so on until the vector's values are exhausted
print(i)
}
[1] 1 [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 [1] 7 [1] 8 [1] 9 [1] 10
How do I break out of a loop before it finishes?
Click to see solution
for (i in 1:10) {
if (i==7) {
# When i==7, we will exit the loop without continuing.
break
}
print(i)
}
[1] 1 [1] 2 [1] 3 [1] 4 [1] 5 [1] 6
How do I loop through a vector of names?
Click to see solution
friends <- c("Phoebe", "Ross", "Rachel", "Chandler", "Joey", "Monica")
my_string <- "So no one told you life was gonna be this way, "
for (friend in friends) {
print(paste0(my_string, friend, "!"))
}
[1] "So no one told you life was gonna be this way, Phoebe!" [1] "So no one told you life was gonna be this way, Ross!" [1] "So no one told you life was gonna be this way, Rachel!" [1] "So no one told you life was gonna be this way, Chandler!" [1] "So no one told you life was gonna be this way, Joey!" [1] "So no one told you life was gonna be this way, Monica!"
Check out the paste & paste0 page if you’re confused about its utility here.
How do I skip a loop if some expression evaluates to TRUE
?
Click to see solution
friends <- c("Phoebe", "Ross", "Mike", "Rachel", "Chandler", "Joey", "Monica")
my_string <- "So no one told you life was gonna be this way, "
for (friend in friends) {
if (friend == "Mike") {
# `next` skips over the rest of the code for this loop
# and continues to the next element
next
}
print(paste0(my_string, friend, "!"))
}
[1] "So no one told you life was gonna be this way, Phoebe!" [1] "So no one told you life was gonna be this way, Ross!" [1] "So no one told you life was gonna be this way, Rachel!" [1] "So no one told you life was gonna be this way, Chandler!" [1] "So no one told you life was gonna be this way, Joey!" [1] "So no one told you life was gonna be this way, Monica!"
Video Example 1 — Vectorized Functions and Loops
Click to see solution
This is usually how we write loops in other languages if we want to add the first 10 billion integers.
mytotal <- 0
for (i in 1:10000000000) {
mytotal <- mytotal + i
}
mytotal
[1] 5e+19
This works, but takes a long time to run. The sum
function is vectorized, meaning it will consider all values in a vector at the same time. It will very simply take every integer in the parentheses and add them all together.
sum(1:10000000000)
[1] 5e+19
Video Example 2 — Grocery Store Averages, Loop vs. Non-loop
Click to see solution
Let’s use some grocery store data to demonstrate the difference between loop strategies and non-loop strategies.
myDF <- read.csv("/class/datamine/data/8451/The_Complete_Journey_2_Master/5000_transactions.csv")
head(myDF)
BASKET_NUM HSHD_NUM PURCHASE_ PRODUCT_NUM SPEND UNITS STORE_R WEEK_NUM YEAR 1 24 1809 03-JAN-16 5817389 -1.50 -1 SOUTH 1 2016 2 24 1809 03-JAN-16 5829886 -1.50 -1 SOUTH 1 2016 3 34 1253 03-JAN-16 539501 2.19 1 EAST 1 2016 4 60 1595 03-JAN-16 5260099 0.99 1 WEST 1 2016 5 60 1595 03-JAN-16 4535660 2.50 2 WEST 1 2016 6 168 3393 03-JAN-16 5602916 4.50 1 SOUTH 1 2016
This is how we find the average cost per line in other languages, for instance, C/C++, Python, Java, etc.
The for
loop being used here calculates the length of myDF$SPEND, and runs just enough times to reach the end.
amountspent <- 0 # we initialize a variable to keep track of the entire price of the purchases
numberofitems <- 0 # and we initialize a variable to keep track of the number of purchases
for (myprice in myDF$SPEND) {
amountspent <- amountspent + myprice # we add the price of the current purchase
numberofitems <- numberofitems + 1 # and we increment (by 1) the number o purchases processed so far
}
amountspent # this is the total amount spent on all purchases
[1] 3584366
numberofitems # this is the total number of purchases
[1] 1e+06
amountspent/numberofitems # so this is the average
[1] 3.584366
Now, that technically works, but it’s not efficient!
Let’s try using the mean
function instead to get an average:
mean(myDF$SPEND)
[1] 3.584366
As we can see, mean is a much more efficient way to use a vectorized function in R, to accomplish the same purpose.
The vector is the column myDF$SPEND
(where myDF is a dataframe and the $ allows us to specify the SPEND column in this dataframe).
We can just focus our attention on that column from the data frame, and take a mean.
Video Example 3 — New Columns from Existing Data using Conditional Statements
Click to see solution
We’re looking at grocery store information again for this example. This time, we have two days from which purchases are considered contaminated: July 5th-6th, 2016. Let’s refresh on how the data is formatted.
myDF <- read.csv("/class/datamine/data/8451/The_Complete_Journey_2_Master/5000_transactions.csv")
head(myDF)
BASKET_NUM HSHD_NUM PURCHASE_ PRODUCT_NUM SPEND UNITS STORE_R WEEK_NUM YEAR 1 24 1809 03-JAN-16 5817389 -1.50 -1 SOUTH 1 2016 2 24 1809 03-JAN-16 5829886 -1.50 -1 SOUTH 1 2016 3 34 1253 03-JAN-16 539501 2.19 1 EAST 1 2016 4 60 1595 03-JAN-16 5260099 0.99 1 WEST 1 2016 5 60 1595 03-JAN-16 4535660 2.50 2 WEST 1 2016 6 168 3393 03-JAN-16 5602916 4.50 1 SOUTH 1 2016
We’ll make a vector called "mystatus" that matches the length of our existing data.frame, setting the default value to "safe" for every entry.
mystatus <- rep("safe", times=nrow(myDF))
rep
is perfect for our needs.
Looking at PURCHASE_
, we see that the format is DD-MMM-YY, corresponding to 2-digit day, 3-character month abbreviation, and last two values of the year. We can now change the entries for the elements of mystatus
that occurred on 05-JUL-16
or on 06-JUL-16
to "contaminated".
mystatus[(myDF$PURCHASE_ == "05-JUL-16")|(myDF$PURCHASE_ == "06-JUL-16")] <- "contaminated"
Remember that myDF$PURCHASE_ ==
will give us an index, and since length(mystatus) == nrow(myDF)
, the indices will match. We can factor
mystatus
to create a categorical vector, then rename it and add it to myDF
.
myDF$safetystatus <- factor(mystatus)
Now the head of the data.frame looks like this…
head(myDF)
BASKET_NUM HSHD_NUM PURCHASE_ PRODUCT_NUM SPEND UNITS STORE_R WEEK_NUM YEAR 1 24 1809 03-JAN-16 5817389 -1.50 -1 SOUTH 1 2016 2 24 1809 03-JAN-16 5829886 -1.50 -1 SOUTH 1 2016 3 34 1253 03-JAN-16 539501 2.19 1 EAST 1 2016 4 60 1595 03-JAN-16 5260099 0.99 1 WEST 1 2016 5 60 1595 03-JAN-16 4535660 2.50 2 WEST 1 2016 6 168 3393 03-JAN-16 5602916 4.50 1 SOUTH 1 2016 safetystatus 1 safe 2 safe 3 safe 4 safe 5 safe 6 safe
…and the distribution of contaminated and safe entries is as follows:
table(myDF$safetystatus)
contaminated safe 2459 997541