Clean Code, Reproducible Science: Advanced R-Programming and Workflows

Part 2

Johannes Feldhege

13.05.2026

Literate programming using Quarto

Literate programming

Literate programming is a methodology that combines a programming language with a documentation language, thereby making programs more robust, more portable, more easily maintained, and arguably more fun to write than programs that are written only in a high-level language. The main idea is to treat a program as a piece of literature, addressed to human beings rather than to a computer.

Donald E. Knuth, https://www-cs-faculty.stanford.edu/~knuth/lp.html

RMarkdown is a combination of the programming language and the documentation language . It allows users to combine R code with markdown-formatted text.

Rmarkdown documents are fully reproducible as the R code contained within is executed when the document is rendered.

Literate programming using Quarto

Quarto is the successor to Rmarkdown, adding new features such as the option to include code from multiple programming languages or a wider set of output formats.

Quarto is included in recent versions of RStudio. To check whether it is installed and working on your machine, you can run in the terminal:

quarto check

Creating a Quarto document

In the RStudio menu:

File

New file

Quarto Document…

Anatomy of a Quarto document

A Quarto document consists of:

a YAML header
Markdown text
Code chunks

---
title: "Quarto Example Document"
format: html
---

## Markdown formatting

This is markdown formatted text. 

It can contain markdown specific format instructions such as ** for bold text. 

## Code chunks

````{r}
#| label: plot
#| fig-cap: CESD total across measurement occasions 
#| echo: true

boxplot(popsy$cesdTotal ~ popsy$occasion, horizontal = TRUE)
````

The YAML header

The text at the top between — is called the YAML header. It supplies instructions, such as the output format, for rendering the document.

The instructions are specified as key: value pairs.

A simple example:

---
title: "Quarto Example Document"
format: html
---

The YAML header

A more elaborate version used to create this presentation:

Typical features are:

a title
an author
format specific options

---
title: "Clean Code, Reproducible Science: Advanced R-Programming and Workflows"
subtitle: "Part 2"
author: "Johannes Feldhege"
format: 
  revealjs:
    theme: [simple]
    footer: <PPF Methods Peer Group>
    slide-number: true
    chalkboard: true
    code-link: true
    code-line-numbers: false
    incremental: true  
    from: markdown+emoji
engine: knitr
webr: 
  packages: ['ds4psy', 'dplyr']
filters:
  - webr
---

Markdown formatting

# are used to create headers. The number of # determines the header level.

Bullet list item 1
Bullest list item 2

bold text, italic text

text formatted as code

# Header 1

## Header 2

- Bullet list item 1

- Bullest list item 2

**bold text**, *italic text*

 `text formatted as code`

Source vs Visual Editor

You can switch between source and a visual editor when writing your Quarto document.

In visual editor mode, you get a preview of your document. It is a more beginner-friendly mode as you get a more immediate feedback on your inputs.

Switching between both modes can sometimes introduce unintended changes in the document!

Visual Editor mode

In Visual Editor mode, you can select and insert formatting or special inputs using the dropdown menus:

Code chunks contain programming code as well as instructions on how to execute the code and place it and its results in the Quarto document.

The first line must specify the programming language.

Instructions are placed in comments with #|.

The actual code can be written as you would in a regular script.

````{r}
#| echo: true
#| label: plot1
#| fig-cap: CESD total across measurement occasions 
#| fig-width: 5

library(ds4psy)

pospy <- ds4psy::posPsy_long

boxplot(popsy$cesdTotal ~ popsy$occasion, horizontal = TRUE)
````

A note on code execution

Rmarkdown documents are fully reproducible as the R code contained within is executed when the document is rendered.

Quarto documents are rendered in a process that is separate to your RStudio environment, therefore they cannot access currently loaded packages or your carefully created datasets.

You need to load packages and data inside a code chunk in order to use them!

Rendering a document

To turn a Quarto document into the desired output format, it needs to be rendered.

This can be done using RStudio buttons:

or in the terminal:

quarto render my_quarto_file.qmd

or in R with the quarto package:

quarto::render("my_quarto_file.qmd")

Output formats

Quarto documents can be rendered to dozens of different output formats:

HTML pages
Word documents
PDFs
Websites
Books
and more…

Quarto manuscripts

Quarto 1.4 introduced an output format for scientific writing: Quarto manuscripts.

It is published as a website that can link to other formats such as word or pdf.

In the background, multiple Quarto files can be used to write the manuscript and conduct the analysis.

A live example can be seen here

Exercise

I have created an example Quarto document that you can download here

play around with the markdown formatting
change the code chunk for the plot and table
create a new plot or table
render the document

Creating R packages

R packages in a reproducibility context

Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data.

- Wickham and Bryan, 2023

Two essential packages

devtools: Essential functions for documenting code and building the package.

usethis: Convenience function to automate the workflow of creating a package.

Creating an R Package with the `usethis` package

One easy step to get started:

usethis::create_package("/path/to/the/package/name")

A package name may only contain letters, numbers, and .

The package directory structure

.
├── (.gitignore)        # Files to ignore when using git
│
├── .Rbuildignore       # Files to ignore when building package
│
├── R/                  # Folder to store (only) R functions
│   └── myfun-1.R       # A first R function file
│
├── man/                # Folder to store R functions documentation
│   └── my_fun_1.Rd     # Documentation for the first function
│
├── DESCRIPTION         # Package metadata
│
└── NAMESPACE           # Automatically edited

The DESCRIPTION

The DESCRIPTION file can be edited by hand, but usethis also provides functions to programmatically edit its:

task	function
Declare dependencies	`usethis::use_package()`
Adding authors	`usethis::use_author()`
Edit DESCRIPTION	`usethis::use_description()`

Exercise

I have prepared a small script with these functions that you can download: script_package_01.R

I want you to execute these functions to create a package and write metadata to its DESCRIPTION.

Check out the directory structure and the different files that have been created.

The process of working on a package

Useful functions for package development

task	function
Create R script	`usethis::use_r()`
Add data	`usethis::use_data()`
Document function	`devtools::document()`
Load all functions	`devtools::load_all()`
CMD Check	`devtools::check()`
Build package locally	`devtools::build()`

Documenting functions with `roxygen2`

The idea behind roxygen2 is to document functions with special comments next to their definition. roxygen2 will process these comments and turn them into manual pages in the package.

You can add a comment skeleton with control + alt + shift + R when your cursor is inside the function.

`roxygen2` comments

#' The length of a string
#'
#' Technically this returns the number of "code points", in a string. One
#' code point usually corresponds to one character, but not always. For example,
#' an u with a umlaut might be represented as a single character or as the
#' combination a u and an umlaut.
#'
#' @param string A text string
#' @return A numeric vector giving number of characters (code points) in each
#'    element of the character vector. Missing string have missing length.
#' @seealso [stringi::stri_length()] which this function wraps.
#' @export
#' @examples
#' str_length(letters)
#' str_length(NA)
#' str_length(factor("abc"))
str_length <- function(string) {
}

R CMD Check

Three levels of feedback
An example output

R CMD check runs checks intended for publication on CRAN. Its output gives strict feedback:

errors need to be fixed
warnings affect functionality
notes can often be ignored if you do not intend to publish the package on CRAN

Exercise

I have prepared a small script with functions that you can download: script_package_02.R

I want you to

create a function that changes the values of the CESD to be in the range [0, 3]
add some documentation to the function using roxygen2
Load the package and test the function
Run CMD Check
Build the package and install it

Optional:

add the dataset from the function ds4psy::posPsy_long() to the package using usethis::use_data()

Pros of “study as an R package”

Combines functions for analysis and study data
documentation of custom functions, dependencies, contributors
CMD check routine
condense analysis into functions

Cons of “study as an R package”

If the package is not intended for CRAN submission, superfluous information and files are created:

metadata exists in different places (github, DESCRIPTION)
CRAN submission artifacts
compliance with CMD check requirements

More info here

Outlook

Version control

Version control using Git

manages a repository, a collection of files, and their changes over time. In the world, this corresponds to a RStudio project or a folder with scripts and datasets.

When you make a change, you commit it with a short message detailing your changes. All these commits make up the history of the repository, through which you can trace back the evolution of your project.

Git clients such as Github, Gitlab, etc.

When your version controlled repository is not confined to your machine, hosting services such as Github , Gitlab or others come into play. You can:

publish your project there
push your changes to the repository
accept changes from others made in pull requests

Hosting Quarto documents

Github and Gitlab let you host webpages and websites created in a repository on their service for free.

This can be used to publish Quarto documents, books, websites.

I have put this to use for my own website, for documentation for my R package, and for this workshop!

Containerization

The purpose of containerization

With containerization, you can control all of the aspects of the computational environment mentioned earlier:

Specific packages and their version
version
operating system ( , , )
system dependencies

What is a container?

A container is little bit like a virtual machine as it is a separate computer running on another computer. However, it usually does not have a graphical interface, but is run through a terminal.

The container is the running instance of the computer. How this computer is set up is defined by the image.

What is an image?

An image is a package that contains all necessary instructions to create a container.

The specifications for an image is defined in a text file, e.g. a Docker file:

what operating system is needed
what software needs to be installed

Collaboration

The image can be shared with collaborators so that they can reproduce the same conditions on a container on their computer.

There are a number of standard images for R created by the Rocker Project. These can be used as a base image which can then be further customised for a specific project.

Research Compendia

What are research compendia?

A research compendium collects all digital parts of a research project, including data, code, and texts (protocols, reports, questionnaires, metadata).

https://epiverse-trace.github.io/research-compendium/instructor/compendium.html

When published, a research compendium allows others to inspect, reconstruct and ideally execute your analysis. .

R packages as research compendia

The idea of a research compendium combines a number of the previously discussed features:

the R package structure
literate programming with Quarto or Rmarkdown
version control
(optionally) the use of Docker and/or Binder

Clean Code, Reproducible Science: Advanced R-Programming and Workflows

Literate programming using Quarto

Literate programming

Literate programming using Quarto

Creating a Quarto document

Anatomy of a Quarto document

The YAML header

The YAML header

Markdown formatting

Source vs Visual Editor

Visual Editor mode

Code chunks

A note on code execution

Rendering a document

Output formats

Quarto manuscripts

Exercise

Creating R packages

R packages in a reproducibility context

Two essential packages

Creating an R Package with the usethis package

The package directory structure

The DESCRIPTION

The DESCRIPTION

Exercise

The process of working on a package

Useful functions for package development

Documenting functions with roxygen2

roxygen2 comments

R CMD Check

Exercise

Pros of “study as an R package”

Cons of “study as an R package”

Outlook

Version control

Sharing files with collaborators

Version control using Git

Git clients such as Github, Gitlab, etc.

Hosting Quarto documents

Further reading

Containerization

The purpose of containerization

What is a container?

What is an image?

Collaboration

Further reading

Research Compendia

What are research compendia?

R packages as research compendia

Further reading

Thank you for attention!

Creating an R Package with the `usethis` package

Documenting functions with `roxygen2`

`roxygen2` comments