Hugo Future Imperfect Slim

Bryan Adams' Blog

Beat Navy!

12 minute read

Why write functions

If you ever find yourself doing a task a second time you should write a function. Functions are a nice way to quickly and consistently calculate something. They also provide you a nice way of organizing your code. This class will walk through the basics of programming functions and introduce purrr which enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors.

Functions in R

Creating a function

Here is how you can create a basic function in R

my_function = function(x){
  
  y = x + 2
  return(y)
}

I have include commands that you do not need, but they do help you keep track of what your function is doing. Here is a breakdown of the information:

1.my_function: This is the name 2. function(x): What I send to the function will be called x inside of the function 3. {code}: This is the code the function executes. 4. return(y): This is what your function returns.

Calling the function

To call a function, you just use the name of the function:

my_function(2)
## [1] 4

You will notice it returned 4.

You could also take a few shortcuts when writing functions

my_function = function(x){
  
  x + 2
  
}

my_function(2)
## [1] 4

What about sending a lot of values:

Enter purrr. In the past you may have used lapply() or sapply() but get of the bench, get in the game, and start using purrr.

library(purrr)

Enter map

The map() functions transform their input by applying a function to each element and returning a vector the same length as the input.

The map() function is map(.x,.f,...) for each element of .x. do function .f

my_list = list(1,2,3)

my_list%>%
  map(my_function)
## [[1]]
## [1] 3
## 
## [[2]]
## [1] 4
## 
## [[3]]
## [1] 5

With map I sent it a list and it returned a list.

my_list%>%
  map(my_function)%>%
  class()
## [1] "list"

If is send it a vector…it returns a list.

vector = c(1,2,3)

vector%>%
  map(my_function)%>%
  class()
## [1] "list"

What about returning numeric values.

map_dbl is like map, but it returns a numeric vector.

my_list%>%
  map_dbl(my_function)
## [1] 3 4 5

You get a numeric output.

my_list%>%
  map_dbl(my_function)%>%
  class()
## [1] "numeric"

It works with lots of data types

df <- tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)

df%>%
  map_dbl(mean)
##          a          b          c          d 
## -0.2510839  0.3848666 -0.2806608  0.6360516

Being Safe

We always need to be safe, so lets see how we can be safe with safely. This example seems trivial but as you expand your understanding of lists() and using map() it will help you figure out where and why your errors are occurring.

my_list = list("string",0.1,0.2)

my_function = function(x){
  log(x)
}

my_list%>%
  map(my_function)
## Error in log(x): non-numeric argument to mathematical function

No one likes when errors occur. Now lets be SAFE!!!

my_list%>%
  map(safely(my_function))
## [[1]]
## [[1]]$result
## NULL
## 
## [[1]]$error
## <simpleError in log(x): non-numeric argument to mathematical function>
## 
## 
## [[2]]
## [[2]]$result
## [1] -2.302585
## 
## [[2]]$error
## NULL
## 
## 
## [[3]]
## [[3]]$result
## [1] -1.609438
## 
## [[3]]$error
## NULL
my_list%>%
  map(safely(my_function))%>%
  transpose()
## $result
## $result[[1]]
## NULL
## 
## $result[[2]]
## [1] -2.302585
## 
## $result[[3]]
## [1] -1.609438
## 
## 
## $error
## $error[[1]]
## <simpleError in log(x): non-numeric argument to mathematical function>
## 
## $error[[2]]
## NULL
## 
## $error[[3]]
## NULL

Taking a side step to get some NFL data

There is a lot going on here, so I will not go into the webscrapping. The bottom line is, noone has created the following command: give_me_what_I_want(). You will see me use map2(), which is explained afterwards.

library(rvest)
library(stringr)

url="https://www.espn.com/nfl/stats/team/_/season/"

list_pages = function(x){
  
  page = str_c(url, x, '/seasontype/2')
  
}

get_season_data = function(year,url){
  
  stats = url%>%
    read_html()%>%
    html_nodes(".Table__Scroller div , .Table__Scroller .Table__sub-header .Table__TH")%>%
    html_text()%>%
    matrix(ncol = 9, byrow = TRUE)%>%
    as_tibble()

  team_names = url%>%
    read_html()%>%
    html_nodes(".Table--fixed-left .Table__TD , .Table__TH div")%>%
    html_text()%>%
    as_tibble()

  season_stats = bind_cols(team_names,stats)

  colnames(season_stats) = unlist(season_stats[1,])

  season_stats%>%
    slice(-1)%>%
    write_csv(paste0("Off_Stats_",year,".csv"))
  
}

years = c(2017:2018)

years%>%
  map(list_pages)%>%
  map2(years,.,get_season_data)
## [[1]]
## # A tibble: 32 x 10
##    Team       GP    YDS   `YDS/G` YDS   `YDS/G` YDS   `YDS/G` PTS   `PTS/G`
##    <chr>      <chr> <chr> <chr>   <chr> <chr>   <chr> <chr>   <chr> <chr>  
##  1 New Engla~ 16    6,307 394.2   4,418 276.1   1,889 118.1   458   28.6   
##  2 New Orlea~ 16    6,259 391.2   4,189 261.8   2,070 129.4   448   28.0   
##  3 Pittsburg~ 16    6,047 377.9   4,380 273.8   1,667 104.2   406   25.4   
##  4 Los Angel~ 16    6,026 376.6   4,431 276.9   1,595 99.7    355   22.2   
##  5 Kansas Ci~ 16    6,007 375.4   4,104 256.5   1,903 118.9   415   25.9   
##  6 Jacksonvi~ 16    5,855 365.9   3,593 224.6   2,262 141.4   417   26.1   
##  7 Philadelp~ 16    5,852 365.8   3,737 233.6   2,115 132.2   457   28.6   
##  8 Atlanta F~ 16    5,837 364.8   3,990 249.4   1,847 115.4   353   22.1   
##  9 Tampa Bay~ 16    5,816 363.5   4,366 272.9   1,450 90.6    335   20.9   
## 10 Los Angel~ 16    5,784 361.5   3,831 239.4   1,953 122.1   478   29.9   
## # ... with 22 more rows
## 
## [[2]]
## # A tibble: 32 x 10
##    Team       GP    YDS   `YDS/G` YDS   `YDS/G` YDS   `YDS/G` PTS   `PTS/G`
##    <chr>      <chr> <chr> <chr>   <chr> <chr>   <chr> <chr>   <chr> <chr>  
##  1 Kansas Ci~ 16    6,810 425.6   4,955 309.7   1,855 115.9   565   35.3   
##  2 Los Angel~ 16    6,738 421.1   4,507 281.7   2,231 139.4   527   32.9   
##  3 Tampa Bay~ 16    6,648 415.5   5,125 320.3   1,523 95.2    396   24.8   
##  4 Pittsburg~ 16    6,453 403.3   5,008 313.0   1,445 90.3    428   26.8   
##  5 New Engla~ 16    6,295 393.4   4,258 266.1   2,037 127.3   436   27.3   
##  6 Atlanta F~ 16    6,226 389.1   4,653 290.8   1,573 98.3    414   25.9   
##  7 Indianapo~ 16    6,179 386.2   4,461 278.8   1,718 107.4   433   27.1   
##  8 New Orlea~ 16    6,067 379.2   4,042 252.6   2,025 126.6   504   31.5   
##  9 Baltimore~ 16    5,999 374.9   3,558 222.4   2,441 152.6   389   24.3   
## 10 Carolina ~ 16    5,972 373.3   3,836 239.8   2,136 133.5   376   23.5   
## # ... with 22 more rows
# Without piping...

pages = years%>%
  map(list_pages)

map2(years,pages,get_season_data)
## [[1]]
## # A tibble: 32 x 10
##    Team       GP    YDS   `YDS/G` YDS   `YDS/G` YDS   `YDS/G` PTS   `PTS/G`
##    <chr>      <chr> <chr> <chr>   <chr> <chr>   <chr> <chr>   <chr> <chr>  
##  1 New Engla~ 16    6,307 394.2   4,418 276.1   1,889 118.1   458   28.6   
##  2 New Orlea~ 16    6,259 391.2   4,189 261.8   2,070 129.4   448   28.0   
##  3 Pittsburg~ 16    6,047 377.9   4,380 273.8   1,667 104.2   406   25.4   
##  4 Los Angel~ 16    6,026 376.6   4,431 276.9   1,595 99.7    355   22.2   
##  5 Kansas Ci~ 16    6,007 375.4   4,104 256.5   1,903 118.9   415   25.9   
##  6 Jacksonvi~ 16    5,855 365.9   3,593 224.6   2,262 141.4   417   26.1   
##  7 Philadelp~ 16    5,852 365.8   3,737 233.6   2,115 132.2   457   28.6   
##  8 Atlanta F~ 16    5,837 364.8   3,990 249.4   1,847 115.4   353   22.1   
##  9 Tampa Bay~ 16    5,816 363.5   4,366 272.9   1,450 90.6    335   20.9   
## 10 Los Angel~ 16    5,784 361.5   3,831 239.4   1,953 122.1   478   29.9   
## # ... with 22 more rows
## 
## [[2]]
## # A tibble: 32 x 10
##    Team       GP    YDS   `YDS/G` YDS   `YDS/G` YDS   `YDS/G` PTS   `PTS/G`
##    <chr>      <chr> <chr> <chr>   <chr> <chr>   <chr> <chr>   <chr> <chr>  
##  1 Kansas Ci~ 16    6,810 425.6   4,955 309.7   1,855 115.9   565   35.3   
##  2 Los Angel~ 16    6,738 421.1   4,507 281.7   2,231 139.4   527   32.9   
##  3 Tampa Bay~ 16    6,648 415.5   5,125 320.3   1,523 95.2    396   24.8   
##  4 Pittsburg~ 16    6,453 403.3   5,008 313.0   1,445 90.3    428   26.8   
##  5 New Engla~ 16    6,295 393.4   4,258 266.1   2,037 127.3   436   27.3   
##  6 Atlanta F~ 16    6,226 389.1   4,653 290.8   1,573 98.3    414   25.9   
##  7 Indianapo~ 16    6,179 386.2   4,461 278.8   1,718 107.4   433   27.1   
##  8 New Orlea~ 16    6,067 379.2   4,042 252.6   2,025 126.6   504   31.5   
##  9 Baltimore~ 16    5,999 374.9   3,558 222.4   2,441 152.6   389   24.3   
## 10 Carolina ~ 16    5,972 373.3   3,836 239.8   2,136 133.5   376   23.5   
## # ... with 22 more rows

map2() and pmap()

You also might want to send multiple lists at one time. In the past you would have had to use multiple for loops.

  1. map2() - you send 2 lists
  2. pmap() - you send multiple lists

Above I sent years and pages to my function.

What about mappers

A mapper is an anonymous function. That is just cool sounding for you never create the function. Here is an example of me putting a function inside of map.

list.files(pattern = "*.csv")%>%
  map(function(file_name){
    assign(x = str_extract(file_name,"[^.]+"),
           value = read_csv(file_name),
           envir = .GlobalEnv)
  })
## [[1]]
## # A tibble: 32 x 10
##    Team      GP   YDS `YDS/G` YDS_1 `YDS/G_1` YDS_2 `YDS/G_2`   PTS `PTS/G`
##    <chr>  <int> <dbl>   <dbl> <dbl>     <dbl> <dbl>     <dbl> <int>   <dbl>
##  1 New E~    16  6307    394.  4418      276.  1889     118.    458    28.6
##  2 New O~    16  6259    391.  4189      262.  2070     129.    448    28  
##  3 Pitts~    16  6047    378.  4380      274.  1667     104.    406    25.4
##  4 Los A~    16  6026    377.  4431      277.  1595      99.7   355    22.2
##  5 Kansa~    16  6007    375.  4104      256.  1903     119.    415    25.9
##  6 Jacks~    16  5855    366.  3593      225.  2262     141.    417    26.1
##  7 Phila~    16  5852    366.  3737      234.  2115     132.    457    28.6
##  8 Atlan~    16  5837    365.  3990      249.  1847     115.    353    22.1
##  9 Tampa~    16  5816    364.  4366      273.  1450      90.6   335    20.9
## 10 Los A~    16  5784    362.  3831      239.  1953     122.    478    29.9
## # ... with 22 more rows
## 
## [[2]]
## # A tibble: 32 x 10
##    Team      GP   YDS `YDS/G` YDS_1 `YDS/G_1` YDS_2 `YDS/G_2`   PTS `PTS/G`
##    <chr>  <int> <dbl>   <dbl> <dbl>     <dbl> <dbl>     <dbl> <int>   <dbl>
##  1 Kansa~    16  6810    426.  4955      310.  1855     116.    565    35.3
##  2 Los A~    16  6738    421.  4507      282.  2231     139.    527    32.9
##  3 Tampa~    16  6648    416.  5125      320.  1523      95.2   396    24.8
##  4 Pitts~    16  6453    403.  5008      313   1445      90.3   428    26.8
##  5 New E~    16  6295    393.  4258      266.  2037     127.    436    27.3
##  6 Atlan~    16  6226    389.  4653      291.  1573      98.3   414    25.9
##  7 India~    16  6179    386.  4461      279.  1718     107.    433    27.1
##  8 New O~    16  6067    379.  4042      253.  2025     127.    504    31.5
##  9 Balti~    16  5999    375.  3558      222.  2441     153.    389    24.3
## 10 Carol~    16  5972    373.  3836      240.  2136     134.    376    23.5
## # ... with 22 more rows

Now here is the same thing, but without using a function inside of map but using a mapper.

list.files(pattern = "*.csv")%>%
  map(~assign(x = str_extract(.,"[^.]+"),
           value = read_csv(.),
           envir = .GlobalEnv)
  )
## [[1]]
## # A tibble: 32 x 10
##    Team      GP   YDS `YDS/G` YDS_1 `YDS/G_1` YDS_2 `YDS/G_2`   PTS `PTS/G`
##    <chr>  <int> <dbl>   <dbl> <dbl>     <dbl> <dbl>     <dbl> <int>   <dbl>
##  1 New E~    16  6307    394.  4418      276.  1889     118.    458    28.6
##  2 New O~    16  6259    391.  4189      262.  2070     129.    448    28  
##  3 Pitts~    16  6047    378.  4380      274.  1667     104.    406    25.4
##  4 Los A~    16  6026    377.  4431      277.  1595      99.7   355    22.2
##  5 Kansa~    16  6007    375.  4104      256.  1903     119.    415    25.9
##  6 Jacks~    16  5855    366.  3593      225.  2262     141.    417    26.1
##  7 Phila~    16  5852    366.  3737      234.  2115     132.    457    28.6
##  8 Atlan~    16  5837    365.  3990      249.  1847     115.    353    22.1
##  9 Tampa~    16  5816    364.  4366      273.  1450      90.6   335    20.9
## 10 Los A~    16  5784    362.  3831      239.  1953     122.    478    29.9
## # ... with 22 more rows
## 
## [[2]]
## # A tibble: 32 x 10
##    Team      GP   YDS `YDS/G` YDS_1 `YDS/G_1` YDS_2 `YDS/G_2`   PTS `PTS/G`
##    <chr>  <int> <dbl>   <dbl> <dbl>     <dbl> <dbl>     <dbl> <int>   <dbl>
##  1 Kansa~    16  6810    426.  4955      310.  1855     116.    565    35.3
##  2 Los A~    16  6738    421.  4507      282.  2231     139.    527    32.9
##  3 Tampa~    16  6648    416.  5125      320.  1523      95.2   396    24.8
##  4 Pitts~    16  6453    403.  5008      313   1445      90.3   428    26.8
##  5 New E~    16  6295    393.  4258      266.  2037     127.    436    27.3
##  6 Atlan~    16  6226    389.  4653      291.  1573      98.3   414    25.9
##  7 India~    16  6179    386.  4461      279.  1718     107.    433    27.1
##  8 New O~    16  6067    379.  4042      253.  2025     127.    504    31.5
##  9 Balti~    16  5999    375.  3558      222.  2441     153.    389    24.3
## 10 Carol~    16  5972    373.  3836      240.  2136     134.    376    23.5
## # ... with 22 more rows

Just make a mapper!!!

You can also make mapper objects.

read_my_csv = as_mapper(~assign(x = str_extract(.x,"[^.]+"),
           value = read_csv(.x),
           envir = .GlobalEnv))

You can use mappers instead of functions.

list.files(pattern = "*.csv")%>%
  map(read_my_csv)
## [[1]]
## # A tibble: 32 x 10
##    Team      GP   YDS `YDS/G` YDS_1 `YDS/G_1` YDS_2 `YDS/G_2`   PTS `PTS/G`
##    <chr>  <int> <dbl>   <dbl> <dbl>     <dbl> <dbl>     <dbl> <int>   <dbl>
##  1 New E~    16  6307    394.  4418      276.  1889     118.    458    28.6
##  2 New O~    16  6259    391.  4189      262.  2070     129.    448    28  
##  3 Pitts~    16  6047    378.  4380      274.  1667     104.    406    25.4
##  4 Los A~    16  6026    377.  4431      277.  1595      99.7   355    22.2
##  5 Kansa~    16  6007    375.  4104      256.  1903     119.    415    25.9
##  6 Jacks~    16  5855    366.  3593      225.  2262     141.    417    26.1
##  7 Phila~    16  5852    366.  3737      234.  2115     132.    457    28.6
##  8 Atlan~    16  5837    365.  3990      249.  1847     115.    353    22.1
##  9 Tampa~    16  5816    364.  4366      273.  1450      90.6   335    20.9
## 10 Los A~    16  5784    362.  3831      239.  1953     122.    478    29.9
## # ... with 22 more rows
## 
## [[2]]
## # A tibble: 32 x 10
##    Team      GP   YDS `YDS/G` YDS_1 `YDS/G_1` YDS_2 `YDS/G_2`   PTS `PTS/G`
##    <chr>  <int> <dbl>   <dbl> <dbl>     <dbl> <dbl>     <dbl> <int>   <dbl>
##  1 Kansa~    16  6810    426.  4955      310.  1855     116.    565    35.3
##  2 Los A~    16  6738    421.  4507      282.  2231     139.    527    32.9
##  3 Tampa~    16  6648    416.  5125      320.  1523      95.2   396    24.8
##  4 Pitts~    16  6453    403.  5008      313   1445      90.3   428    26.8
##  5 New E~    16  6295    393.  4258      266.  2037     127.    436    27.3
##  6 Atlan~    16  6226    389.  4653      291.  1573      98.3   414    25.9
##  7 India~    16  6179    386.  4461      279.  1718     107.    433    27.1
##  8 New O~    16  6067    379.  4042      253.  2025     127.    504    31.5
##  9 Balti~    16  5999    375.  3558      222.  2441     153.    389    24.3
## 10 Carol~    16  5972    373.  3836      240.  2136     134.    376    23.5
## # ... with 22 more rows

Why mappers instead of functions

as_mapper creats mappers using {rlang} as_function This turns your formula into a function.

If you use a defualt function in map you are using a mapper!

Recent posts

See more

Categories

About

I am an assistant professor at the United States Military Academy. I currently teach MA206Y: Introduction to Data Science and Statistics. I have served in the Army for 11 years and have a beuatiful wife and two wonderful children