25 Sep 2024
I owe a debt of gratitude to many people as the thoughts and code in these slides are the process of years-long development cycles and discussions with my team, friends, colleagues and peers. When someone has contributed to the content of the slides, I have credited their authorship.
These materials are generated by Gerko Vink, who holds the copyright. The intellectual property belongs to Utrecht University. Images are either directly linked, or generated with StableDiffusion or DALL-E. That said, there is no information in this presentation that exceeds legal use of copyright materials in academic settings, or that should not be part of the public domain.
Warning
You may use any and all content in this presentation - including my name - and submit it as input to generative AI tools, with the following exception:
RStudio
- Integrated Development Environment (IDE) for R
1. Code Editing: RStudio provides a code editor with syntax highlighting, autocompletion, and error checking, making your coding process more efficient.
2. Console: An interactive R console allows you to execute R
code line by line and view results in real time.
3. Environment Pane: Keep track of your variables, data frames, and functions with the environment pane.
4. Plots and Visualizations: Create and view plots, charts, and visualizations within RStudio.
5. Integrated Help: Access R
documentation, packages, and online resources directly from the IDE.
6. Version Control: Easily integrate R
projects with version control systems like Git.
7. Markdown Support: RStudio seamlessly integrates with Markdown, making it an ideal choice for creating reproducible reports and documents.
It plays a crucial role in promoting reproducibility and collaboration in data science and statistical analysis.
Markdown is a lightweight markup language for creating formatted text using plain text. It’s easy to learn and widely used in various applications.
GitHub-Flavored Markdown (GFM) is a variant of Markdown used on GitHub (next week), enhancing its capabilities for documentation and collaboration.
RMarkdown is an extension of Markdown that allows you to embed R
code and its output directly within a document.
Quarto is a comprehensive tool for creating reproducible and collaborative data science documents.
R
)Jupyter Notebooks: Widely used interactive kernel-based computing environment for data science and machine learning, supporting multiple (i.e. almost all) programming and scripting languages.
RMarkdown: An R-based notebook environment that combines code, output, and narrative text in a single document.
YAML (YAML Ain’t Markup Language) is a human-readable data serialization format commonly used for configuration files and metadata in various programming and markup contexts.
YAML is very simple and readable
In Quarto
and many other applications, YAML is used to specify:
Document Metadata: Information about the document itself, such as the title, author, date, and document type.
Document Configuration: Settings related to the document’s behavior, appearance, and rendering, such as the output format (e.g., HTML, PDF), document template, and style options.
Custom Variables: Definitions of custom variables or parameters that can be used throughout the document to control behavior or content.
Here’s an example of a simple YAML header in a Quarto document:
---
title: "All flavors markdown"
author:
- name: Gerko Vink
orcid: 0000-0001-9767-1924
email: g.vink@uu.nl
affiliations:
- name: Methodology & Statistics @ UU University
- name: Hanne Oberman
orcid: 0000-0003-3276-2141
email: h.i.oberman@uu.nl
affiliations:
- name: Methodology & Statistics @ UU
date: 25 Sep 2024
date-format: "D MMM YYYY"
bibliography: data/lec-2/publications.bib
execute:
echo: true
editor: source
format:
revealjs:
embed-resources: true
theme: [solarized, gerko.scss]
progress: true
multiplex: true
transition: fade
slide-number: true
margin: 0.075
logo: "images/logo.png"
toc: false
toc-depth: 1
toc-title: Outline
scrollable: true
reference-location: margin
footer: Gerko Vink and Hanne Oberman - Markup Languages @ UU
---
In this example:
title
, author
, and date
provide metadata about the document.output
specifies settings related to the document’s output format and theme.The YAML header is a powerful tool for customizing and configuring Quarto documents, allowing you to control how the document is rendered and presented. It ensures that important document information and settings are stored in a human-readable and structured format at the beginning of the document.
Quarto
Text is text. Nothing more, nothing less
# This is a heading indicating a section
## This is a heading indicating a subsection
### This is a heading indicating a subsubsection
But in the above I used
Why?
It is that simple. No more framing in \(\LaTeX\) or other stuff. Just use the #
and ##
to denote a section and a slide.
The above figure is left-aligned
The caption is also left-aligned
The above figure is centered
The caption is still left-aligned
The code for the figures is
\(e^{i\pi} + 1 = 0\)
The above figure is left-aligned
The caption is also left-aligned
The equation is left-aligned too
\[e^{i\pi} + 1 = 0\]
The above figure is centered
The caption is also centered
The equation is centered
The code for the slide is
## Columns, Sizing, Equations and proper centering
::: columns
::: {.column width="50%"}
{fig-align="left" width="80%"}
$e^{i\pi} + 1 = 0$
The above figure is left-aligned <br>
The caption is also left-aligned <br>
The equation is left-aligned too
:::
::: {.column width="50%"}
<center>
{fig-align="center" width="80%"}
</center>
$$e^{i\pi} + 1 = 0$$
The above figure is centered <br>
The caption is also centered <br>
The equation is centered
:::
:::
The code for the slide is
``` markdown
## Columns, Sizing, Equations and proper centering
::: columns
::: {.column width="50%"}
{fig-align="left" width="80%"}
$e^{i\pi} + 1 = 0$
The above figure is left-aligned <br>
The caption is also left-aligned <br>
The equation is left-aligned too
:::
::: {.column width="50%"}
<center>
{fig-align="center" width="80%"}
</center>
$$e^{i\pi} + 1 = 0$$
The above figure is centered <br>
The caption is also centered <br>
The equation is centered
:::
:::
\(e^{i\pi} + 1 = 0\)
The above figure is left-aligned
The caption is also left-aligned
The equation is left-aligned too
\[e^{i\pi} + 1 = 0\]
The above figure is centered
The caption is also centered
The equation is centered
\(e^{i\pi} + 1 = 0\)
\[e^{i\pi} + 1 = 0\]
## Math-ing it all up
::: columns
::: {.column width="70%"}
{fig-align="left" width="30%"}
$e^{i\pi} + 1 = 0$
:::
::: {.column width="30%"}
<center>
{fig-align="center" width="70%"}
</center>
$$e^{i\pi} + 1 = 0$$
:::
:::
```markdown
END OF EXAMPLE
R
figuresR
figuresR
figuresR
figures$\mu$
is used for in-line equations$$\mu$$
is used for equationsLet’s assume that \(Y\) follows a normal distribution. \[Y \sim \mathcal{N}(\mu, \sigma^2)\] Where we set in our simulations \(\mu = 10\) and \(\sigma^2 = 5\). We do something for every \(Y_i\).
$\mu$
is used for in-line equations$$\mu$$
is used for equationsLet’s assume that \(y\) is a vector with \(N\) elements such that \[y \sim \mathcal{N}(\mu, \sigma^2),\] where we set in our simulations \(\mu = 10\) and \(\sigma^2 = 5\). We do something for every \(y_i\) with \(i = 1, \dots, N\).
We already saw some markdown
code chunks. We can run inline code to calculate that 12 * 234
is 2808 by using `r `
and code chunks to evaluate larger blocks of code, such as:
We already saw some `markdown` code chunks. We can run inline code to calculate
that `12 * 234` is ` r answer ` by using `r ` and code chunks to evaluate
larger blocks of code, such as:
```{{r}}
#| output: asis
#| label: fig-mpg
#| fig-cap: "City and highway mileage for 38 popular models of cars."
#| fig-subcap:
#| - "Color by number of cylinders"
#| - "Color by engine displacement, in liters"
#| layout-ncol: 2
#| column: page
#| code-fold: true
library(ggplot2)
ggplot(mpg, aes(x = hwy, y = cty, color = cyl)) +
geom_point(alpha = 0.5, size = 2) +
scale_color_viridis_c() +
theme_minimal()
ggplot(mpg, aes(x = hwy, y = cty, color = displ)) +
geom_point(alpha = 0.5, size = 2) +
scale_color_viridis_c(option = "E") +
theme_minimal()
```
Efficiency: Caching saves time and resources by avoiding the repeated execution of time-consuming code chunks.
Reproducibility: Explicit caching ensures that the same results are used when rendering the document, even if the code is modified or re-executed.
Version Control: Caching reduces the size of version-controlled documents by storing intermediate results separately.
In RMarkdown
and Quarto
, you can use cache directives to specify which code chunks should be cached and which should not.
The following code block
```{r}
#| cache: true
#| code-line-numbers: true
#| label: imputation
library(mice)
library(magrittr)
library(purrr)
imp <- nhanes %>% mice(print = FALSE)
imp |>
complete("all") %>%
map(~.x %$% lm(bmi ~ chl + hyp)) |>
pool()
```
would result in this output
library(mice)
library(magrittr)
library(purrr)
imp <- nhanes %>% mice(print = FALSE)
imp |>
complete("all") %>%
map(~.x %$% lm(bmi ~ chl + hyp)) |>
pool()
Class: mipo m = 5
term m estimate ubar b t dfcom
1 (Intercept) 5 21.86836943 1.478139e+01 1.409736e+00 1.647307e+01 22
2 chl 5 0.03186424 4.179526e-04 5.540773e-05 4.844419e-04 22
3 hyp 5 -1.14339628 4.419660e+00 7.027965e-01 5.263016e+00 22
df riv lambda fmi
1 17.33159 0.1144468 0.1026938 0.1909610
2 16.13520 0.1590833 0.1372492 0.2274234
3 15.32466 0.1908192 0.1602419 0.2518953
with corresponding cache structure relative to the root and format of the output file:
Use https://www.tablesgenerator.com/markdown_tables if you need to make a table from scratch.
Otherwise, use tibbles:
# A tibble: 748 × 9
age hgt wgt bmi hc gen phb tv reg
<dbl> <dbl> <dbl> <dbl> <dbl> <ord> <ord> <int> <fct>
1 0.035 50.1 3.65 14.5 33.7 <NA> <NA> NA south
2 0.038 53.5 3.37 11.8 35 <NA> <NA> NA south
3 0.057 50 3.14 12.6 35.2 <NA> <NA> NA south
4 0.06 54.5 4.27 14.4 36.7 <NA> <NA> NA south
5 0.062 57.5 5.03 15.2 37.3 <NA> <NA> NA south
6 0.068 55.5 4.66 15.1 37 <NA> <NA> NA south
7 0.068 52.5 3.81 13.8 34.9 <NA> <NA> NA south
8 0.071 53 3.89 13.8 35.8 <NA> <NA> NA west
9 0.071 55.1 3.88 12.8 36.8 <NA> <NA> NA west
10 0.073 54.5 4.2 14.1 38 <NA> <NA> NA east
# ℹ 738 more rows
or use e.g. library(DT)
for customization:
reprex
and renv
library(renv)
#>
#> Attaching package: 'renv'
#> The following objects are masked from 'package:stats':
#>
#> embed, update
#> The following objects are masked from 'package:utils':
#>
#> history, upgrade
#> The following objects are masked from 'package:base':
#>
#> autoload, load, remove
renv::init()
#> - Linking packages into the project library ... Done!
#> The following package(s) will be updated in the lockfile:
#>
#> # CRAN -----------------------------------------------------------------------
#> - backports [* -> 1.4.1]
#> - base64enc [* -> 0.1-3]
#> - broom [* -> 1.0.4]
#> - bslib [* -> 0.4.2]
#> - cachem [* -> 1.0.8]
#> - cli [* -> 3.6.1]
#> - cpp11 [* -> 0.4.3]
#> - digest [* -> 0.6.31]
#> - dplyr [* -> 1.1.2]
#> - ellipsis [* -> 0.3.2]
#> - evaluate [* -> 0.21]
#> - fansi [* -> 1.0.4]
#> - fastmap [* -> 1.1.1]
#> - fontawesome [* -> 0.5.1]
#> - fs [* -> 1.6.2]
#> - generics [* -> 0.1.3]
#> - glue [* -> 1.6.2]
#> - highr [* -> 0.10]
#> - htmltools [* -> 0.5.5]
#> - jquerylib [* -> 0.1.4]
#> - jsonlite [* -> 1.8.4]
#> - knitr [* -> 1.43]
#> - lattice [* -> 0.21-8]
#> - lifecycle [* -> 1.0.3]
#> - magrittr [* -> 2.0.3]
#> - memoise [* -> 2.0.1]
#> - mime [* -> 0.12]
#> - pillar [* -> 1.9.0]
#> - pkgconfig [* -> 2.0.3]
#> - purrr [* -> 1.0.1]
#> - R6 [* -> 2.5.1]
#> - rappdirs [* -> 0.3.3]
#> - Rcpp [* -> 1.0.11]
#> - RcppArmadillo [* -> 0.12.4.0.0]
#> - renv [* -> 1.0.2]
#> - rlang [* -> 1.1.1]
#> - rmarkdown [* -> 2.22]
#> - sass [* -> 0.4.6]
#> - stringi [* -> 1.7.12]
#> - stringr [* -> 1.5.0]
#> - tibble [* -> 3.2.1]
#> - tidyr [* -> 1.3.0]
#> - tidyselect [* -> 1.2.0]
#> - tinytex [* -> 0.45]
#> - utf8 [* -> 1.2.3]
#> - vctrs [* -> 0.6.2]
#> - withr [* -> 2.5.0]
#> - xfun [* -> 0.39]
#> - yaml [* -> 2.3.7]
#>
#> # GitHub ---------------------------------------------------------------------
#> - mice [* -> gerkovink/mice@match_conditional]
#>
#> The version of R recorded in the lockfile will be updated:
#> - R [* -> 4.3.0]
#>
#> - Lockfile written to "/private/var/folders/yx/6rn4qpl13wsgk4c7s3jc9d1r0000gp/T/RtmpMqetyA/reprex-1233f6d959b19-drear-kitty/renv.lock".
library(mice)
#>
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following objects are masked from 'package:base':
#>
#> cbind, rbind
library(magrittr)
library(purrr)
#>
#> Attaching package: 'purrr'
#> The following object is masked from 'package:magrittr':
#>
#> set_names
#> The following object is masked from 'package:renv':
#>
#> modify
renv::snapshot()
#> - The lockfile is already up to date.
renv::remove("mice")
#> - Removing package(s) from project library ...
#> Removing package 'mice' ... Done!
renv::restore()
#> The following package(s) will be updated:
#>
#> # GitHub ---------------------------------------------------------------------
#> - mice [* -> gerkovink/mice@match_conditional]
#>
#> # Installing packages --------------------------------------------------------
#> - Installing mice ... OK [linked from cache]
imp <- nhanes %>% mice(print = FALSE)
imp |>
complete("all") %>%
map(~.x %$% lm(bmi ~ chl + hyp)) |>
pool()
#> Class: mipo m = 5
#> term m estimate ubar b t dfcom
#> 1 (Intercept) 5 20.6687887 2.020809e+01 8.8963873222 3.088375e+01 22
#> 2 chl 5 0.0354435 5.409979e-04 0.0001081414 6.707676e-04 22
#> 3 hyp 5 -0.8447582 4.353337e+00 0.9275040571 5.466342e+00 22
#> df riv lambda fmi
#> 1 9.489411 0.5282868 0.3456725 0.4504537
#> 2 14.161181 0.2398710 0.1934645 0.2874598
#> 3 13.811544 0.2556671 0.2036105 0.2983537
sessionInfo()
#> R version 4.3.0 (2023-04-21)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Ventura 13.6
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: Europe/Amsterdam
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] purrr_1.0.1 magrittr_2.0.3 mice_3.15.3 renv_1.0.2
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.6.2 cli_3.6.1 knitr_1.43 rlang_1.1.1
#> [5] xfun_0.39 generics_0.1.3 glue_1.6.2 backports_1.4.1
#> [9] htmltools_0.5.5 fansi_1.0.4 rmarkdown_2.22 grid_4.3.0
#> [13] evaluate_0.21 tibble_3.2.1 fastmap_1.1.1 yaml_2.3.7
#> [17] lifecycle_1.0.3 compiler_4.3.0 dplyr_1.1.2 fs_1.6.2
#> [21] pkgconfig_2.0.3 Rcpp_1.0.11 tidyr_1.3.0 rstudioapi_0.14
#> [25] lattice_0.21-8 digest_0.6.31 R6_2.5.1 tidyselect_1.2.0
#> [29] reprex_2.0.2 utf8_1.2.3 pillar_1.9.0 tools_4.3.0
#> [33] withr_2.5.0 broom_1.0.4
Created on 2023-09-27 with reprex v2.0.2
renv
1. Dependency Isolation: renv
creates a dedicated environment for each project, preventing conflicts between package versions.
2. Reproducibility: With renv
, you can capture and record the specific package versions used in your project, ensuring reproducibility over time.
3. Collaboration: Share your project with others, and they can easily recreate the same environment using renv.lock
.
4. Easier Package Installation: renv
simplifies package installation. Just run renv::restore()
to set up the project environment.
5. Automatic Snapshot: renv
automatically generates a renv.lock
file, listing all package versions used, making it easy to recreate the environment.
6. Compatibility: renv
is compatible with popular version control systems like Git, facilitating collaboration and sharing.
Use renv::snapshot()
to update the renv.lock
file when you add or update packages. You can use renv::activate()
and renv::deactivate()
to activate or deactivate renv
for your project.
```{{mermaid}}
%%| echo: false
flowchart LR
A[Hard edge] --> B(Round edge)
B --> C{Decision}
C <--> D[Result one]
C --> E[Result two]
```
flowchart LR A[Hard edge] --> B(Round edge) B --> C{Decision} C <--> D[Result one] C --> E[Result two]
See Quarto Diagrams for a more comprehensive overview of all graphing engines. This Live online Mermaid editor is awesome!
rticles
and Quarto
To use rticles
from RStudio
, you can access the templates through File -> New File -> R Markdown
. This will open the dialog box where you can select from one of the available templates:
The quarto use template
command can be used to create an article from one the below formats.
Oberman and Vink (2023) and Cai, Van Buuren and Vink (2023) are some of our team’s most recent publications. Some references - such as some work by Reinder Banning and Gerko Vink are much older (2010) or contain simple and contemporary solutions by Volker and Vink (2022, 2–4) or cool but potentially confusing images (Schouten and Vink 2021, 1255).
But you can also refer to Oberman and Vink (2023) here
## Citations
@eval and Cai, Van Buuren and Vink [-@cai2023graphical] are some of our team's most recent publications. Some references - such as some work by Reinder Banning and Gerko Vink are much older [-@banningvink] or contain simple and contemporary solutions by @vigntg [p. 2-4] or cool but potentially confusing images [@schouten2021dance, 1255].
::: footer
But you can also refer to @eval here
:::
::: {#refs}
:::
Gerko Vink and Hanne Oberman - Markup Languages @ UU