Why write functions
If you ever find yourself doing a task a second time you should write a function. Functions are a nice way to quickly and consistently calculate something. They also provide you a nice way of organizing your code. This class will walk through the basics of programming functions and introduce purrr which enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors.
Functions in R
Creating a function
Here is how you can create a basic function in R
my_function = function(x){
y = x + 2
return(y)
}
I have include commands that you do not need, but they do help you keep track of what your function is doing. Here is a breakdown of the information:
1.my_function: This is the name 2. function(x): What I send to the function will be called x inside of the function 3. {code}: This is the code the function executes. 4. return(y): This is what your function returns.
Calling the function
To call a function, you just use the name of the function:
my_function(2)
## [1] 4
You will notice it returned 4.
You could also take a few shortcuts when writing functions
my_function = function(x){
x + 2
}
my_function(2)
## [1] 4
What about sending a lot of values:
Enter purrr. In the past you may have used lapply() or sapply() but get of the bench, get in the game, and start using purrr.
library(purrr)
Enter map
The map() functions transform their input by applying a function to each element and returning a vector the same length as the input.
The map() function is map(.x,.f,...) for each element of .x. do function .f
my_list = list(1,2,3)
my_list%>%
map(my_function)
## [[1]]
## [1] 3
##
## [[2]]
## [1] 4
##
## [[3]]
## [1] 5
With map I sent it a list and it returned a list.
my_list%>%
map(my_function)%>%
class()
## [1] "list"
If is send it a vector…it returns a list.
vector = c(1,2,3)
vector%>%
map(my_function)%>%
class()
## [1] "list"
What about returning numeric values.
map_dbl is like map, but it returns a numeric vector.
my_list%>%
map_dbl(my_function)
## [1] 3 4 5
You get a numeric output.
my_list%>%
map_dbl(my_function)%>%
class()
## [1] "numeric"
It works with lots of data types
df <- tibble(
a = rnorm(10),
b = rnorm(10),
c = rnorm(10),
d = rnorm(10)
)
df%>%
map_dbl(mean)
## a b c d
## -0.2510839 0.3848666 -0.2806608 0.6360516
Being Safe
We always need to be safe, so lets see how we can be safe with safely. This example seems trivial but as you expand your understanding of lists() and using map() it will help you figure out where and why your errors are occurring.
my_list = list("string",0.1,0.2)
my_function = function(x){
log(x)
}
my_list%>%
map(my_function)
## Error in log(x): non-numeric argument to mathematical function
No one likes when errors occur. Now lets be SAFE!!!
my_list%>%
map(safely(my_function))
## [[1]]
## [[1]]$result
## NULL
##
## [[1]]$error
## <simpleError in log(x): non-numeric argument to mathematical function>
##
##
## [[2]]
## [[2]]$result
## [1] -2.302585
##
## [[2]]$error
## NULL
##
##
## [[3]]
## [[3]]$result
## [1] -1.609438
##
## [[3]]$error
## NULL
my_list%>%
map(safely(my_function))%>%
transpose()
## $result
## $result[[1]]
## NULL
##
## $result[[2]]
## [1] -2.302585
##
## $result[[3]]
## [1] -1.609438
##
##
## $error
## $error[[1]]
## <simpleError in log(x): non-numeric argument to mathematical function>
##
## $error[[2]]
## NULL
##
## $error[[3]]
## NULL
Taking a side step to get some NFL data
There is a lot going on here, so I will not go into the webscrapping. The bottom line is, noone has created the following command: give_me_what_I_want(). You will see me use map2(), which is explained afterwards.
library(rvest)
library(stringr)
url="https://www.espn.com/nfl/stats/team/_/season/"
list_pages = function(x){
page = str_c(url, x, '/seasontype/2')
}
get_season_data = function(year,url){
stats = url%>%
read_html()%>%
html_nodes(".Table__Scroller div , .Table__Scroller .Table__sub-header .Table__TH")%>%
html_text()%>%
matrix(ncol = 9, byrow = TRUE)%>%
as_tibble()
team_names = url%>%
read_html()%>%
html_nodes(".Table--fixed-left .Table__TD , .Table__TH div")%>%
html_text()%>%
as_tibble()
season_stats = bind_cols(team_names,stats)
colnames(season_stats) = unlist(season_stats[1,])
season_stats%>%
slice(-1)%>%
write_csv(paste0("Off_Stats_",year,".csv"))
}
years = c(2017:2018)
years%>%
map(list_pages)%>%
map2(years,.,get_season_data)
## [[1]]
## # A tibble: 32 x 10
## Team GP YDS `YDS/G` YDS `YDS/G` YDS `YDS/G` PTS `PTS/G`
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 New Engla~ 16 6,307 394.2 4,418 276.1 1,889 118.1 458 28.6
## 2 New Orlea~ 16 6,259 391.2 4,189 261.8 2,070 129.4 448 28.0
## 3 Pittsburg~ 16 6,047 377.9 4,380 273.8 1,667 104.2 406 25.4
## 4 Los Angel~ 16 6,026 376.6 4,431 276.9 1,595 99.7 355 22.2
## 5 Kansas Ci~ 16 6,007 375.4 4,104 256.5 1,903 118.9 415 25.9
## 6 Jacksonvi~ 16 5,855 365.9 3,593 224.6 2,262 141.4 417 26.1
## 7 Philadelp~ 16 5,852 365.8 3,737 233.6 2,115 132.2 457 28.6
## 8 Atlanta F~ 16 5,837 364.8 3,990 249.4 1,847 115.4 353 22.1
## 9 Tampa Bay~ 16 5,816 363.5 4,366 272.9 1,450 90.6 335 20.9
## 10 Los Angel~ 16 5,784 361.5 3,831 239.4 1,953 122.1 478 29.9
## # ... with 22 more rows
##
## [[2]]
## # A tibble: 32 x 10
## Team GP YDS `YDS/G` YDS `YDS/G` YDS `YDS/G` PTS `PTS/G`
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Kansas Ci~ 16 6,810 425.6 4,955 309.7 1,855 115.9 565 35.3
## 2 Los Angel~ 16 6,738 421.1 4,507 281.7 2,231 139.4 527 32.9
## 3 Tampa Bay~ 16 6,648 415.5 5,125 320.3 1,523 95.2 396 24.8
## 4 Pittsburg~ 16 6,453 403.3 5,008 313.0 1,445 90.3 428 26.8
## 5 New Engla~ 16 6,295 393.4 4,258 266.1 2,037 127.3 436 27.3
## 6 Atlanta F~ 16 6,226 389.1 4,653 290.8 1,573 98.3 414 25.9
## 7 Indianapo~ 16 6,179 386.2 4,461 278.8 1,718 107.4 433 27.1
## 8 New Orlea~ 16 6,067 379.2 4,042 252.6 2,025 126.6 504 31.5
## 9 Baltimore~ 16 5,999 374.9 3,558 222.4 2,441 152.6 389 24.3
## 10 Carolina ~ 16 5,972 373.3 3,836 239.8 2,136 133.5 376 23.5
## # ... with 22 more rows
# Without piping...
pages = years%>%
map(list_pages)
map2(years,pages,get_season_data)
## [[1]]
## # A tibble: 32 x 10
## Team GP YDS `YDS/G` YDS `YDS/G` YDS `YDS/G` PTS `PTS/G`
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 New Engla~ 16 6,307 394.2 4,418 276.1 1,889 118.1 458 28.6
## 2 New Orlea~ 16 6,259 391.2 4,189 261.8 2,070 129.4 448 28.0
## 3 Pittsburg~ 16 6,047 377.9 4,380 273.8 1,667 104.2 406 25.4
## 4 Los Angel~ 16 6,026 376.6 4,431 276.9 1,595 99.7 355 22.2
## 5 Kansas Ci~ 16 6,007 375.4 4,104 256.5 1,903 118.9 415 25.9
## 6 Jacksonvi~ 16 5,855 365.9 3,593 224.6 2,262 141.4 417 26.1
## 7 Philadelp~ 16 5,852 365.8 3,737 233.6 2,115 132.2 457 28.6
## 8 Atlanta F~ 16 5,837 364.8 3,990 249.4 1,847 115.4 353 22.1
## 9 Tampa Bay~ 16 5,816 363.5 4,366 272.9 1,450 90.6 335 20.9
## 10 Los Angel~ 16 5,784 361.5 3,831 239.4 1,953 122.1 478 29.9
## # ... with 22 more rows
##
## [[2]]
## # A tibble: 32 x 10
## Team GP YDS `YDS/G` YDS `YDS/G` YDS `YDS/G` PTS `PTS/G`
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Kansas Ci~ 16 6,810 425.6 4,955 309.7 1,855 115.9 565 35.3
## 2 Los Angel~ 16 6,738 421.1 4,507 281.7 2,231 139.4 527 32.9
## 3 Tampa Bay~ 16 6,648 415.5 5,125 320.3 1,523 95.2 396 24.8
## 4 Pittsburg~ 16 6,453 403.3 5,008 313.0 1,445 90.3 428 26.8
## 5 New Engla~ 16 6,295 393.4 4,258 266.1 2,037 127.3 436 27.3
## 6 Atlanta F~ 16 6,226 389.1 4,653 290.8 1,573 98.3 414 25.9
## 7 Indianapo~ 16 6,179 386.2 4,461 278.8 1,718 107.4 433 27.1
## 8 New Orlea~ 16 6,067 379.2 4,042 252.6 2,025 126.6 504 31.5
## 9 Baltimore~ 16 5,999 374.9 3,558 222.4 2,441 152.6 389 24.3
## 10 Carolina ~ 16 5,972 373.3 3,836 239.8 2,136 133.5 376 23.5
## # ... with 22 more rows
map2() and pmap()
You also might want to send multiple lists at one time. In the past you would have had to use multiple for loops.
map2()- you send 2 listspmap()- you send multiple lists
Above I sent years and pages to my function.
What about mappers
A mapper is an anonymous function. That is just cool sounding for you never create the function. Here is an example of me putting a function inside of map.
list.files(pattern = "*.csv")%>%
map(function(file_name){
assign(x = str_extract(file_name,"[^.]+"),
value = read_csv(file_name),
envir = .GlobalEnv)
})
## [[1]]
## # A tibble: 32 x 10
## Team GP YDS `YDS/G` YDS_1 `YDS/G_1` YDS_2 `YDS/G_2` PTS `PTS/G`
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
## 1 New E~ 16 6307 394. 4418 276. 1889 118. 458 28.6
## 2 New O~ 16 6259 391. 4189 262. 2070 129. 448 28
## 3 Pitts~ 16 6047 378. 4380 274. 1667 104. 406 25.4
## 4 Los A~ 16 6026 377. 4431 277. 1595 99.7 355 22.2
## 5 Kansa~ 16 6007 375. 4104 256. 1903 119. 415 25.9
## 6 Jacks~ 16 5855 366. 3593 225. 2262 141. 417 26.1
## 7 Phila~ 16 5852 366. 3737 234. 2115 132. 457 28.6
## 8 Atlan~ 16 5837 365. 3990 249. 1847 115. 353 22.1
## 9 Tampa~ 16 5816 364. 4366 273. 1450 90.6 335 20.9
## 10 Los A~ 16 5784 362. 3831 239. 1953 122. 478 29.9
## # ... with 22 more rows
##
## [[2]]
## # A tibble: 32 x 10
## Team GP YDS `YDS/G` YDS_1 `YDS/G_1` YDS_2 `YDS/G_2` PTS `PTS/G`
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
## 1 Kansa~ 16 6810 426. 4955 310. 1855 116. 565 35.3
## 2 Los A~ 16 6738 421. 4507 282. 2231 139. 527 32.9
## 3 Tampa~ 16 6648 416. 5125 320. 1523 95.2 396 24.8
## 4 Pitts~ 16 6453 403. 5008 313 1445 90.3 428 26.8
## 5 New E~ 16 6295 393. 4258 266. 2037 127. 436 27.3
## 6 Atlan~ 16 6226 389. 4653 291. 1573 98.3 414 25.9
## 7 India~ 16 6179 386. 4461 279. 1718 107. 433 27.1
## 8 New O~ 16 6067 379. 4042 253. 2025 127. 504 31.5
## 9 Balti~ 16 5999 375. 3558 222. 2441 153. 389 24.3
## 10 Carol~ 16 5972 373. 3836 240. 2136 134. 376 23.5
## # ... with 22 more rows
Now here is the same thing, but without using a function inside of map but using a mapper.
list.files(pattern = "*.csv")%>%
map(~assign(x = str_extract(.,"[^.]+"),
value = read_csv(.),
envir = .GlobalEnv)
)
## [[1]]
## # A tibble: 32 x 10
## Team GP YDS `YDS/G` YDS_1 `YDS/G_1` YDS_2 `YDS/G_2` PTS `PTS/G`
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
## 1 New E~ 16 6307 394. 4418 276. 1889 118. 458 28.6
## 2 New O~ 16 6259 391. 4189 262. 2070 129. 448 28
## 3 Pitts~ 16 6047 378. 4380 274. 1667 104. 406 25.4
## 4 Los A~ 16 6026 377. 4431 277. 1595 99.7 355 22.2
## 5 Kansa~ 16 6007 375. 4104 256. 1903 119. 415 25.9
## 6 Jacks~ 16 5855 366. 3593 225. 2262 141. 417 26.1
## 7 Phila~ 16 5852 366. 3737 234. 2115 132. 457 28.6
## 8 Atlan~ 16 5837 365. 3990 249. 1847 115. 353 22.1
## 9 Tampa~ 16 5816 364. 4366 273. 1450 90.6 335 20.9
## 10 Los A~ 16 5784 362. 3831 239. 1953 122. 478 29.9
## # ... with 22 more rows
##
## [[2]]
## # A tibble: 32 x 10
## Team GP YDS `YDS/G` YDS_1 `YDS/G_1` YDS_2 `YDS/G_2` PTS `PTS/G`
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
## 1 Kansa~ 16 6810 426. 4955 310. 1855 116. 565 35.3
## 2 Los A~ 16 6738 421. 4507 282. 2231 139. 527 32.9
## 3 Tampa~ 16 6648 416. 5125 320. 1523 95.2 396 24.8
## 4 Pitts~ 16 6453 403. 5008 313 1445 90.3 428 26.8
## 5 New E~ 16 6295 393. 4258 266. 2037 127. 436 27.3
## 6 Atlan~ 16 6226 389. 4653 291. 1573 98.3 414 25.9
## 7 India~ 16 6179 386. 4461 279. 1718 107. 433 27.1
## 8 New O~ 16 6067 379. 4042 253. 2025 127. 504 31.5
## 9 Balti~ 16 5999 375. 3558 222. 2441 153. 389 24.3
## 10 Carol~ 16 5972 373. 3836 240. 2136 134. 376 23.5
## # ... with 22 more rows
Just make a mapper!!!
You can also make mapper objects.
read_my_csv = as_mapper(~assign(x = str_extract(.x,"[^.]+"),
value = read_csv(.x),
envir = .GlobalEnv))
You can use mappers instead of functions.
list.files(pattern = "*.csv")%>%
map(read_my_csv)
## [[1]]
## # A tibble: 32 x 10
## Team GP YDS `YDS/G` YDS_1 `YDS/G_1` YDS_2 `YDS/G_2` PTS `PTS/G`
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
## 1 New E~ 16 6307 394. 4418 276. 1889 118. 458 28.6
## 2 New O~ 16 6259 391. 4189 262. 2070 129. 448 28
## 3 Pitts~ 16 6047 378. 4380 274. 1667 104. 406 25.4
## 4 Los A~ 16 6026 377. 4431 277. 1595 99.7 355 22.2
## 5 Kansa~ 16 6007 375. 4104 256. 1903 119. 415 25.9
## 6 Jacks~ 16 5855 366. 3593 225. 2262 141. 417 26.1
## 7 Phila~ 16 5852 366. 3737 234. 2115 132. 457 28.6
## 8 Atlan~ 16 5837 365. 3990 249. 1847 115. 353 22.1
## 9 Tampa~ 16 5816 364. 4366 273. 1450 90.6 335 20.9
## 10 Los A~ 16 5784 362. 3831 239. 1953 122. 478 29.9
## # ... with 22 more rows
##
## [[2]]
## # A tibble: 32 x 10
## Team GP YDS `YDS/G` YDS_1 `YDS/G_1` YDS_2 `YDS/G_2` PTS `PTS/G`
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
## 1 Kansa~ 16 6810 426. 4955 310. 1855 116. 565 35.3
## 2 Los A~ 16 6738 421. 4507 282. 2231 139. 527 32.9
## 3 Tampa~ 16 6648 416. 5125 320. 1523 95.2 396 24.8
## 4 Pitts~ 16 6453 403. 5008 313 1445 90.3 428 26.8
## 5 New E~ 16 6295 393. 4258 266. 2037 127. 436 27.3
## 6 Atlan~ 16 6226 389. 4653 291. 1573 98.3 414 25.9
## 7 India~ 16 6179 386. 4461 279. 1718 107. 433 27.1
## 8 New O~ 16 6067 379. 4042 253. 2025 127. 504 31.5
## 9 Balti~ 16 5999 375. 3558 222. 2441 153. 389 24.3
## 10 Carol~ 16 5972 373. 3836 240. 2136 134. 376 23.5
## # ... with 22 more rows
Why mappers instead of functions
as_mapper creats mappers using {rlang} as_function This turns your formula into a function.
If you use a defualt function in map you are using a mapper!

Twitter
Facebook
Reddit
LinkedIn
StumbleUpon
Pinterest
Email