Chapter 11 Reproducible Environments
by Anna Nguyen
11.1 Package Version Control with renv
11.1.1 Introduction
Replicable code should produce the same results, regardless of when or where it’s run. However, our analyses often leverage open-source R packages that are developed by other teams. These packages continue to be developed after research projects are completed, which may include changes to analysis functions that could impact how code runs for both other team members and external replicators.
For example, suppose we had used a function that took in one argument, such that our code contained example_function(arg_a = “a”)
. A few months after we publish our code, the package developers update the function to take in another mandatory argument arg_b
. If someone runs our code, but has the most recent version of the package, they’ll receive an error message that the argument arg_b
is missing and will not be able to full reproduce our results.
To ensure that the right functions are used in replication efforts, it is important for us to keep track of package versions used in each project.
renv
can be to promote reproducible environments within R projects. renv
creates individual package libraries for each project instead of having all projects, which may use different versions of the same package, share the same package library. However, for projects that use many packages, this process can be memory intensive and increase the time needed for a new users to start running code.
In this lab manual chapter, we provide a quick tutorial for integrating renv
into research workflows. For more detailed instructions, please refer to the renv
package vignette.
11.1.2 Implementing renv in projects
Ideally, renv
should be initiated at the start of projects and updated continuously when new packages are introduced in the codebase. However, this process can be initated at any point in a project
To add renv
to your workflow, follow these steps:
- Install the
renv
package by runninginstall.packages(“renv”)
- Create an RProject file and ensure that your working directory is set to the correct folder
- In the R console, run
renv::init()
to intiialize renv in your R Project - This will create the following files:
renv.lock
, .Rprofile
,renv/settings.json
andrenv/activate.R
. Commit and push these files to GitHub so that they’re accessible to other users. - As you write code, update the project’s R library by running
renv::snapshot()
in the R console - Add
renv::restore()
to the head of your config file, to make sure that all users that run your code are on the same package versions.
11.1.3 Using projects with renv
If you’re starting to work on an ongoing project that already has renv
set up, follow these steps to ensure that you’re using the same project versions.
- Install the
renv
package by runninginstall.packages(“renv”)
- Pull the most updated version of the project from GitHub
- Open the project’s RProject file
- Run
renv::restore()
. In our lab’s projects, this is often already found at the top of the config file, so you can just run scripts as is. - This will pull up a list of the project’s packages that need to be updated for you to be consistent with the project. The console will ask if you want to proceed with updating these packages - type “Y” to continue.
- Wait for the correct versions of each package to install/update. This may take some time, depending on how many packages the project uses.
- Your R environment should now be using the same package versions as specified in the
renv
lock file. You should now be able to replicate the code. - If you make edits to the code and introduce new/updated packages, see the section above for instructions on how to make updates.