Clean Code, Reproducible Science: Advanced R-Programming and Workflows

Part 1

Johannes Feldhege

13.05.2026

About this workshop

Housekeeping

All materials for the workshop (slides, scripts, references) can be found on this website:

http://www.johannesfeldhege.de/peergroup_workshop/

It is hosted on Github with a permissive license so you can use them however you want.

Requirements

To get the most out of this workshop, it is recommended to:

Install recent versions of and RStudio
Install the following packages: ds4psy, usethis, devtools, roxygen2, lintr. They can be installed using the following code:

install.packages(c("ds4psy", "lintr", "usethis", "devtools","roxygen2"))

Create a new RStudio project for use in the workshop.

Timetable

Time	Content
13:00 - 14:30	Part 1: Reproducible environments, code styling
14:30 - 14:45	Break
14:45 - 17:00	Part 2: Literate programming, package development

About me

M. Sc. in Psychology
I used to work for Forschungsstelle für Psychotherapie
I have been working as a Data Scientist at Asklepios Science & Research since 2022
I use R at work daily
I have published an R package on CRAN, openholidaysR and I am currently working on a second one

Results from the pre-workshop survey

Experience with R

Results from the pre-workshop survey

Experience with reproducibility tools

Results from the pre-workshop survey

Comments

Topics

Today’s workshop will cover these topics:

Making code easier to read
first steps toward reproducibility
Reproducible environments with renv
literate programming using Quarto
publish study analysis and data as an R package
outlook for further reproducibility

A dataset for today’s workshop

Making code easier to read

Code is written once but read many times.

Therefore, we should strive to improve the readibility of our code.

Three measures you can take towards this goal:

meaningful names
consistent code style
helpful comments

Naming things

Naming things is hard - so hard that books are written about it:

How to pick a good name?

A good name should be

descriptive and concise
prioritize reader’s understanding over shortness or machine compatibility
follow a standard naming convention

Naming conventions

Some commonly used conventions:

base

🐍 snake_case

🐫 camelCase

# base R
add.value()
reg.results

# snake_case
add_value()
reg_results

# camelCase
addValue()
regResults

Examples for bad names

x
new_data
newnewdata
lrmf
logistic_regression_model_fit_result

Names for functions

The convention for functions in the tidyverse, a collection of related packages, is to start a function name with a verb, so do() or do_thing().

Examples:

#| eval: false

#Good 
add_value()

#Bad 
value_add()
value()

More style guidelines can be found in the tidyverse style guide

Styling code

Styling code with `lintr`

lintr is a package for static code analysis on your R files.

Basic usage:

With code:

# Style one file
lintr::lint(path = "path/to/file")

# A whole directory or R project
lintr::lint_dir(path = "path/to/project")

Or with an RStudio addin:

Styling code with `lintr`

lintr comes with defaults.

These can be modified, deactivated, or you can define your own set.

You can also add your own linters.

Beyond `lintr`

lintr gives recommendations but leaves it up to you to make changes in the code.

These tools can style your code by changing the code when executed:

Air

styler

Warning

Caution: these tools change your code without asking!

Writing comments

Writing helpful comments

A helpful comment explains the why, not the what or how.

If you find yourself commenting what the code is doing, the code might need to be re-written so that it speaks for itself.

Comments should be used sparingly, acting as an alert for the reader.

More info here and here

Too many comments

Too many comments will be ignored by the reader.

If you find yourself commenting a lot, there might be better alternatives:

convert your script to a document such as Rmarkdown or Quarto
incorporate your code into functions in a package and document them with roxygen2

There, detailed comments on what your code is doing for what reason are encouraged.

Both aspects will be covered in the second part of the workshop.

Reproducibility

Why make it reproducible?

I have to re-run an analysis a few years down the line…
My colleague wants to build on my analysis in a new study…
I want to publish the code used in my study together with the manuscript so others can review or test it…

What is reproducibility?

A study is reproducible if it can be

conducted again
years later
with the same results.

The replication crisis

Reproducibility =/= Replication

However, open methods and data are essential aspects for both.

Openly shared methods and data become valuable when we are given the same tools to work with as the authors.

Reproducible research

Measures for reproducibility

We will look at these reproducibility measures today:

Reproducible environments with renv
literate programming using Quarto
study analysis and data as an R package

Outlook for advanced functionalities:

version control using git, Github, Gitlab, etc.
containerization with Docker, rix, Binder, etc.
Research compendia

First steps toward reproducibility

A fresh start

Can I run my analysis tomorrow with the same results as today? Have I included all steps in my code?

for a fresh start:

Session Restart

Control + Shift + F10

Make restarting R periodically a habit!

Never let Rstudio save anything!

Do not rely on .RData to bail you out. Either save interim datasets to files or write your script in a way that lets you start from scratch every time.

Two alternatives:

Change the RStudio options:

Use this function from the usethis package:

usethis::use_blank_slate()

Reproducible environments

But it works on my machine!

Sharing your code and data is one step in the right direction.

To guarantee that others can apply them on their machine, you need to be able to share your computational environment.

What is the computational environment:

Specific packages and their version
version
operating system ( , , )
system dependencies

The `renv` package

renv controls in the computational environment:

Specific packages and their version
version
operating system ( , , )
system dependencies

The `renv` package

It records the actively used packages and freezes their version in your project in a project-specific library.

This way, your project becomes portable as you can use it on another computer and reproducible as you can share it with a colleague.

bonus: the project is not affected by changing functionality or breaking changes in packages across versions.

Libraries in a regular setup

A library is the place where packages are installed. To check where your library is located, run .libPaths().

In a regular setup, all projects write to the same library. Therefore, packages can become out of sync with the projects that they have been used in.

Libraries with `renv`

Using renv, each project has its own library. When a new package is installed, it is also written to a global package cache. The next time it is needed in a project, it is taken from there instead of downloading it again.

Important `renv` functions

Task	function
Initialise `renv`	`renv::init()`
Get `renv` status	`renv::status()`
Install new package	`renv::install()`
Update project library	`renv::snapshot()`
Restore project library	`renv::restore()`

Initialise `renv` with `renv::init()`

Console output after renv::init()

Project files created by `renv`

The following project files are created by renv::init()

.
├── .Rprofile          # Project-specific profile, activates renv 
│
├── renv/
│   ├── .gitignore     # Specify which files should be ignored by git
│   ├── activate.R     # R script to launch renv
│   ├── staging/       # Temporary library when building packages
│   └── settings.json  # renv settings
│
└── renv.lock          # the lockfile, containing package metadata

Workflow with `renv`

Reproducibility can be achieved with the functions renv::snapshot() & renv::restore()

Initialize renv in the project with renv::init()
Install new packages with renv::install()
Update the lockfile with renv::snapshot()
Share project with others or across computers

Restore project library from a lockfile with renv::restore()
Install new packages with renv::install()
Update the lockfile with renv::snapshot()

Exercise

Initialise renv in a RStudio projec with renv::init()
Check renv::status
Inspect the directory structure
Install a new package with renv::install()
Update the project library with renv::snapshot()

Clean Code, Reproducible Science: Advanced R-Programming and Workflows

About this workshop

Housekeeping

Requirements

Timetable

About me

Results from the pre-workshop survey

Results from the pre-workshop survey

Results from the pre-workshop survey

Topics

A dataset for today’s workshop

Making code easier to read

Making code easier to read

Naming things

Naming things

How to pick a good name?

Naming conventions

Examples for bad names

Names for functions

Styling code

Styling code with lintr

Styling code with lintr

Beyond lintr

Writing comments

Writing helpful comments

Too many comments

Reproducibility

Why make it reproducible?

What is reproducibility?

The replication crisis

Reproducible research

Measures for reproducibility

First steps toward reproducibility

A fresh start

Never let Rstudio save anything!

Reproducible environments

But it works on my machine!

The renv package

The renv package

Libraries in a regular setup

Libraries with renv

Important renv functions

Initialise renv with renv::init()

Project files created by renv

Workflow with renv

Exercise

Styling code with `lintr`

Styling code with `lintr`

Beyond `lintr`

The `renv` package

The `renv` package

Libraries with `renv`

Important `renv` functions

Initialise `renv` with `renv::init()`

Project files created by `renv`

Workflow with `renv`