class: center, middle, inverse, title-slide # Introduction to R ## A mini workshop ### Harm H. Schuett ### Tilburg University ### 2022-02-22 --- class: left, top # Why R? <style type="text/css"> .regression table { font-size: 12px; } .tiny .remark-code { /*Change made here*/ font-size: 50% !important; } </style> .pull-left[ Because it is easy: ```r # a good life motto if (sad == TRUE) { stop(sad) be_awesome() } ``` if you can read this you can R ] -- .pull-right[ Also because it is powerful: 1. Most common data analysis, statistical analysis language 1. Free and open source 1. Extremely comprehensive 1. Many useful extensions (authoring, publishing) for academics 1. Vast community ] --- # I like R because of its ecosystem R shines in every part of the typical analysis workflow<sup>1</sup>  .footnote[ [1] Source: Hadley Wickham and Garrett Grolemund. 2017. [R for Data Science](https://r4ds.had.co.nz/introduction.html). O’Reilly Media, Inc. ] --- # R shines at tidying and transforming data .pull-left[ ```r show_table(wide_data) ``` |country | 1999| 2000| |:-----------|------:|------:| |Afghanistan | 745| 2666| |Brazil | 37737| 80488| |China | 212258| 213766| ] .pull-right[ ```r long_data <- wide_data |> pivot_longer(c(`1999`, `2000`), names_to = "year", values_to = "cases") show_table(long_data) ``` |country |year | cases| |:-----------|:----|------:| |Afghanistan |1999 | 745| |Afghanistan |2000 | 2666| |Brazil |1999 | 37737| |Brazil |2000 | 80488| |China |1999 | 212258| |China |2000 | 213766| ] --- # R shines at explorative and pub. visuals  .footnote[ [1] Source: Hadley Wickham and Garrett Grolemund. 2017. [R for Data Science](https://r4ds.had.co.nz/introduction.html). O’Reilly Media, Inc. ] --- # R shines at explorative and pub. visuals ```r head(penguins, 3) ``` ``` ## # A tibble: 3 × 8 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex ## <fct> <fct> <dbl> <dbl> <int> <int> <fct> ## 1 Adelie Torge… 39.1 18.7 181 3750 male ## 2 Adelie Torge… 39.5 17.4 186 3800 fema… ## 3 Adelie Torge… 40.3 18 195 3250 fema… ## # … with 1 more variable: year <int> ``` <img src="data:image/png;base64,#https://allisonhorst.github.io/palmerpenguins/man/figures/lter_penguins.png" width="35%"/> --- # R shines at explorative and pub. visuals .footnote[from the [ggdensity doc](https://jamesotto852.github.io/ggdensity/)] ```r penguins |> ggplot(aes(x = flipper_length_mm, y = bill_length_mm, fill = species)) + geom_hdr() + geom_point(shape = 21) ``` <!-- --> --- # R shines at explorative and pub. visuals .pull-left[ .tiny[ ```r pub_theme <- function(){ theme_minimal() + theme(panel.grid = element_blank()) } fig1 <- penguins |> ggplot(aes(x = flipper_length_mm, y = bill_length_mm, fill = species)) + geom_hdr(xlim = c(160, 240), ylim = c(30, 70)) + geom_point(shape = 21) + scale_fill_manual(values = c("darkorange","darkorchid","cyan4")) + pub_theme() + labs( title = "Flipper and Bill Length", subtitle = "Comparision of three penguin species of the Palmer Archipelago", caption = "source: the palmer penguins package", x = "Flipper length (mm)", y = "Bill length (mm)", fill = "Species", ) ``` ] ] .pull-right[ ```r fig1 ``` <!-- --> ] --- # Easy to add a animations too .pull-left[ .footnote[from the [gganimate doc](https://gganimate.com/)] .tiny[ ```r library(gapminder) library(gganimate) ap1 <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, size = pop, colour = country)) + geom_point(alpha = 0.7, show.legend = FALSE) + scale_colour_manual(values = country_colors) + scale_size(range = c(2, 12)) + scale_x_log10() + facet_wrap(~continent) + # Here comes the gganimate specific bits labs(title = 'Year: {frame_time}', x = 'GDP per capita', y = 'life expectancy') + transition_time(year) + ease_aes('linear') ``` ] ] .pull-right[ ```r ap1 ``` <!-- --> ] --- # Big publishing houses use R [The BBC R Package](https://github.com/bbc/bbplot) <img src="data:image/png;base64,#https://github.com/bbc/bbplot/raw/master/chart_examples/bbplot_example_plots.png" width="80%"/> --- # R shines at models  .footnote[ [1] Source: Hadley Wickham and Garrett Grolemund. 2017. [R for Data Science](https://r4ds.had.co.nz/introduction.html). O’Reilly Media, Inc. ] --- class: regression # R is the language of Stats departments ```r fit1 <- lm(body_mass_g ~ species, data = penguins) jtools::summ(fit1) ``` <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Observations </td> <td style="text-align:right;"> 342 (2 missing obs. deleted) </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Dependent variable </td> <td style="text-align:right;"> body_mass_g </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Type </td> <td style="text-align:right;"> OLS linear regression </td> </tr> </tbody> </table> <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> F(2,339) </td> <td style="text-align:right;"> 343.63 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> R² </td> <td style="text-align:right;"> 0.67 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Adj. R² </td> <td style="text-align:right;"> 0.67 </td> </tr> </tbody> </table> <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;border-bottom: 0;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> S.E. </th> <th style="text-align:right;"> t val. </th> <th style="text-align:right;"> p </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> (Intercept) </td> <td style="text-align:right;"> 3700.66 </td> <td style="text-align:right;"> 37.62 </td> <td style="text-align:right;"> 98.37 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> speciesChinstrap </td> <td style="text-align:right;"> 32.43 </td> <td style="text-align:right;"> 67.51 </td> <td style="text-align:right;"> 0.48 </td> <td style="text-align:right;"> 0.63 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> speciesGentoo </td> <td style="text-align:right;"> 1375.35 </td> <td style="text-align:right;"> 56.15 </td> <td style="text-align:right;"> 24.50 </td> <td style="text-align:right;"> 0.00 </td> </tr> </tbody> <tfoot><tr><td style="padding: 0; " colspan="100%"> <sup></sup> Standard errors: OLS</td></tr></tfoot> </table> --- class: regression # R is made for flexible modelling ```r covars <- c("bill_length_mm", "bill_depth_mm", "flipper_length_mm") gen_regeq <- \(x) as.formula(paste("body_mass_g ~ ", paste(covars[1:x], collapse = " + "))) model_list <- map(1:3, ~lm(gen_regeq(.), data = penguins)) jtools::export_summs(model_list) ```
Model 1
Model 2
Model 3
(Intercept)
362.31
3343.14 ***
-6424.76 ***
(283.35)
(429.91)
(561.47)
bill_length_mm
87.42 ***
75.28 ***
4.16
(6.40)
(5.97)
(5.33)
bill_depth_mm
-142.72 ***
20.05
(16.51)
(13.69)
flipper_length_mm
50.27 ***
(2.48)
N
342
342
342
R2
0.35
0.47
0.76
*** p < 0.001; ** p < 0.01; * p < 0.05.
--- # R shines at communication  .footnote[ [1] Source: Hadley Wickham and Garrett Grolemund. 2017. [R for Data Science](https://r4ds.had.co.nz/introduction.html). O’Reilly Media, Inc. ] --- # Easy to pipe directly into Word, Powerpoint, LaTeX documents .pull-left[ .tiny[ ```r library(officer) example_table <- head(mtcars, 10) doc_table <- read_docx() |> body_add_par("Level 1 title", style = "heading 1") |> body_add_par(" ") |> body_add_table(example_table, style = "Table Professional") |> body_add_par(" ") |> body_add_table(example_table, style = "Light List Accent 2", first_column = TRUE) print(doc_table, target = "../out/test.docx") ``` ] ] .pull-right[ <img src="data:image/png;base64,#word-example.png" width="75%"/> ] --- # Really. R excels at communicating results - This presentation was made using the `xaringan` R package (and is hosted on [my site](https://hschuett.github.io/)) -- - Which was made using the `blogdown` R package -- - [RMarkdown](https://rmarkdown.rstudio.com/) powers a rich set of communication tools: [books](https://mixtape.scunning.com/teaching-resources.html), apps, dashboards, websites, reports -- - For example, the `distill` package makes it easy to produce [paper websites](https://hschuett.github.io/BayesForAccountingResearch/) --- # Plan for the rest of today - A simple example of downloading WRDS data to generate a plot for class - A simple example of fixed effects, clustered standard errors regressions - Some closing thoughts on R for reproducible science and robust work-flows #### Slides and example code can be found at [github.com/hschuett/RIntro](https://github.com/hschuett/RIntro) --- class: center, middle # On to coding! Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan).