Clean Code, Reproducible Science: Advanced R-Programming and Workflows

Part 1

Johannes Feldhege

13.05.2026

About this workshop

Housekeeping

All materials for the workshop (slides, scripts, references) can be found on this website:

http://www.johannesfeldhege.de/peergroup_workshop/

It is hosted on Github with a permissive license so you can use them however you want.

Requirements

To get the most out of this workshop, it is recommended to:

  1. Install recent versions of and RStudio

  2. Install the following packages: ds4psy, usethis, devtools, roxygen2, lintr. They can be installed using the following code:

install.packages(c("ds4psy", "lintr", "usethis", "devtools","roxygen2"))
  1. Create a new RStudio project for use in the workshop.

Timetable

Time Content
13:00 - 14:30 Part 1: Reproducible environments, code styling
14:30 - 14:45 Break
14:45 - 17:00 Part 2: Literate programming, package development

About me

  • M. Sc. in Psychology
  • I used to work for Forschungsstelle fΓΌr Psychotherapie
  • I have been working as a Data Scientist at Asklepios Science & Research since 2022
  • I use R at work daily
  • I have published an R package on CRAN, openholidaysR and I am currently working on a second one

Results from the pre-workshop survey

Experience with R

Results from the pre-workshop survey

Experience with reproducibility tools

Results from the pre-workshop survey

Comments

Topics

Today’s workshop will cover these topics:

  • Making code easier to read
  • first steps toward reproducibility
  • Reproducible environments with renv
  • literate programming using Quarto
  • publish study analysis and data as an R package
  • outlook for further reproducibility

A dataset for today’s workshop

Making code easier to read

Making code easier to read

Code is written once but read many times.

Therefore, we should strive to improve the readibility of our code.

Three measures you can take towards this goal:

  • meaningful names
  • consistent code style
  • helpful comments

Naming things

Naming things

Naming things is hard - so hard that books are written about it:

How to pick a good name?

A good name should be

  • descriptive and concise
  • prioritize reader’s understanding over shortness or machine compatibility
  • follow a standard naming convention

Naming conventions

Some commonly used conventions:

base


🐍 snake_case


🐫 camelCase

# base R
add.value()
reg.results

# snake_case
add_value()
reg_results

# camelCase
addValue()
regResults

Examples for bad names

  • x
  • new_data
  • newnewdata
  • lrmf
  • logistic_regression_model_fit_result

Names for functions

The convention for functions in the tidyverse, a collection of related packages, is to start a function name with a verb, so do() or do_thing().

Examples:

#| eval: false

#Good 
add_value()

#Bad 
value_add()
value()

More style guidelines can be found in the tidyverse style guide

Styling code

Styling code with lintr

lintr is a package for static code analysis on your R files.

Basic usage:

With code:

# Style one file
lintr::lint(path = "path/to/file")

# A whole directory or R project
lintr::lint_dir(path = "path/to/project")

Or with an RStudio addin:

Styling code with lintr

lintr comes with defaults.

These can be modified, deactivated, or you can define your own set.

You can also add your own linters.

Beyond lintr

lintr gives recommendations but leaves it up to you to make changes in the code.

These tools can style your code by changing the code when executed:

Warning

Caution: these tools change your code without asking!

Writing comments

Writing helpful comments

A helpful comment explains the why, not the what or how.

If you find yourself commenting what the code is doing, the code might need to be re-written so that it speaks for itself.

Comments should be used sparingly, acting as an alert for the reader.


More info here and here

Too many comments

Too many comments will be ignored by the reader.

If you find yourself commenting a lot, there might be better alternatives:

  • convert your script to a document such as Rmarkdown or Quarto
  • incorporate your code into functions in a package and document them with roxygen2

There, detailed comments on what your code is doing for what reason are encouraged.

Both aspects will be covered in the second part of the workshop.

Reproducibility

Why make it reproducible?

  • I have to re-run an analysis a few years down the line…

  • My colleague wants to build on my analysis in a new study…

  • I want to publish the code used in my study together with the manuscript so others can review or test it…

What is reproducibility?

A study is reproducible if it can be

  • conducted again
  • years later
  • with the same results.

The replication crisis

Reproducibility =/= Replication

However, open methods and data are essential aspects for both.

Openly shared methods and data become valuable when we are given the same tools to work with as the authors.

Reproducible research

G A DataD Public sharingA->D C DocumentationC->D B CodeB->D E Reproducible ResearchD->E

Measures for reproducibility

We will look at these reproducibility measures today:

  • Reproducible environments with renv
  • literate programming using Quarto
  • study analysis and data as an R package

Outlook for advanced functionalities:

  • version control using git, Github, Gitlab, etc.
  • containerization with Docker, rix, Binder, etc.
  • Research compendia

First steps toward reproducibility

A fresh start

Can I run my analysis tomorrow with the same results as today? Have I included all steps in my code?

for a fresh start:

Session Restart

or

Control + Shift + F10


Make restarting R periodically a habit!

Never let Rstudio save anything!

Do not rely on .RData to bail you out. Either save interim datasets to files or write your script in a way that lets you start from scratch every time.

Two alternatives:

Change the RStudio options:

Use this function from the usethis package:

usethis::use_blank_slate()

Reproducible environments

But it works on my machine!

Sharing your code and data is one step in the right direction.

To guarantee that others can apply them on their machine, you need to be able to share your computational environment.

What is the computational environment:

  • Specific packages and their version
  • version
  • operating system ( , , )
  • system dependencies

The renv package

renv controls in the computational environment:

  • Specific packages and their version
  • version
  • operating system ( , , )
  • system dependencies

The renv package

It records the actively used packages and freezes their version in your project in a project-specific library.

This way, your project becomes portable as you can use it on another computer and reproducible as you can share it with a colleague.

bonus: the project is not affected by changing functionality or breaking changes in packages across versions.

Libraries in a regular setup

A library is the place where packages are installed. To check where your library is located, run .libPaths().

In a regular setup, all projects write to the same library. Therefore, packages can become out of sync with the projects that they have been used in.

G A Project 1D PackagelibraryA->D B Project 2B->D C Project 3C->D

Libraries with renv

Using renv, each project has its own library. When a new package is installed, it is also written to a global package cache. The next time it is needed in a project, it is taken from there instead of downloading it again.

G A Project 1D Project library 1A->D B Project 2E Project library 2B->E C Project 3F Project library 3C->F G Global package cacheD->G E->G F->G

Important renv functions

Task function
Initialise renv renv::init()
Get renv status renv::status()
Install new package renv::install()
Update project library renv::snapshot()
Restore project library renv::restore()

Initialise renv with renv::init()

Console output after renv::init()

Project files created by renv

The following project files are created by renv::init()

.
β”œβ”€β”€ .Rprofile          # Project-specific profile, activates renv 
β”‚
β”œβ”€β”€ renv/
β”‚   β”œβ”€β”€ .gitignore     # Specify which files should be ignored by git
β”‚   β”œβ”€β”€ activate.R     # R script to launch renv
β”‚   β”œβ”€β”€ staging/       # Temporary library when building packages
β”‚   └── settings.json  # renv settings
β”‚
└── renv.lock          # the lockfile, containing package metadata

Workflow with renv

Reproducibility can be achieved with the functions renv::snapshot() & renv::restore()

  • Initialize renv in the project with renv::init()
  • Install new packages with renv::install()
  • Update the lockfile with renv::snapshot()
  • Share project with others or across computers

  • Restore project library from a lockfile with renv::restore()
  • Install new packages with renv::install()
  • Update the lockfile with renv::snapshot()

Exercise

  • Initialise renv in a RStudio projec with renv::init()
  • Check renv::status
  • Inspect the directory structure
  • Install a new package with renv::install()
  • Update the project library with renv::snapshot()