Chapter 8 Code Publication

by Nolan Pokpongkiat

8.2 Fill out file headers

Every file in a project should have a header that allows it to be interpreted on its own. It should include the name of the project and a short description for what this file (among the many in your project) does specifically. See template here.

8.3 Clean up comments

Make sure comments in the code are for code documentation purposes only. Do not leave comments to self in the final script files.

8.4 Document functions

Every function you write must include a header to document its purpose, inputs, and outputs. See template for the function documentation here.

8.5 Remove deprecated filepaths

All file paths should be defined in 0-config.R, and should be set relative to the project working directory. All absolute file paths from your local computer should be removed, and replaced with a relative path. If a third party were to re-run this analysis, if they need to download data from a separate source and change a filepath in the 0-config.R to match, make sure to specify in the README which line of 0-config.R needs to be substituted.

8.6 Ensure project runs via bash

The project should be configured to be entirely reproducible by running a master bash script, run-project.sh, which should live at the top directory. This bash script can call other bash scripts in subfolders, if necessary. Bash scripts should use the runFileSaveLogs utility script, which is a wrapper around the Rscript command, allowing you to specify where .Rout log files are moved after the R scripts are run.

See usage and documentation here.

8.7 Complete the README

A README.md should live at the top directory of the project. This usually includes a Project Overview and a Directory Structure, along with the names of the contributors and the Creative Commons License. See below for a template:

Overview

To date, coronavirus testing in the US has been extremely limited. Confirmed COVID-19 case counts underestimate the total number of infections in the population. We estimated the total COVID-19 infections – both symptomatic and asymptomatic – in the US in March 2020. We used a semi-Bayesian approach to correct for bias due to incomplete testing and imperfect test performance.

Directory structure

  • 0-config.R: configuration file that sets data directories, sources base functions, and loads required libraries

  • 0-base-functions: folder containing scripts with functions used in the analysis

    • 0-base-functions.R: R script containing general functions used across the analysis

    • 0-bias-corr-functions.R: R script containing functions used in bias correction

    • 0-bias-corr-functions-undertesting.R: R script containing functions used in bias correction to estimate the percentage of underestimation due to incomplete testing vs. imperfect test accuracy

    • 0-prior-functions.R: R script containing functions to generate priors

  • 1-data: folder containing data processing scripts NOTE: some scripts are deprecated

  • 2-analysis: folder containing analysis scripts. To rerun all scripts in this subdirectory, run the bash script 0-run-analysis.sh.

    • 1-obtain-priors-state.R: obtain priors for each state

    • 2-est-expected-cases-state.R: estimate expected cases in each state

    • 3-est-expected-cases-state-perf-testing.R: estimate expected cases in each state, estimate the percentage of underestimation due to incomplete testing vs. imperfect test accuracy

    • 4-obtain-testing-protocols.R: find testing protocols for each state.

    • 5-summarize-results.R: summarize results; obtain results for in text numerical results.

  • 3-figure-table-scripts: folder containing figure scripts. To rerun all scripts in this subdirectory, run the bash script 0-run-figs.sh.

    • 1-fig-testing.R: creates plot of testing patterns by state over time

    • 2-fig-cases-usa-state-bar.R: creates bar plot of confirmed vs. estimated infections by state

    • 3a-fig-map-usa-state.R: creates map of confirmed vs. estimated infections by state

    • 3b-fig-map-usa-state-shiny.R: creates map of confirmed vs. estimated infections by state with search functionality by state

    • 4-fig-priors.R: creates figure with priors for US as a whole

    • 5-fig-density-usa.R: creates figure of distribution of estimated cases in the US

    • 6-table-data-quality.R: creates table of data quality grading from COVID Tracking Project

    • 7-fig-testpos.R: creates figure of the probability of testing positive among those tested by state

    • 8-fig-percent-undertesting-state.R: creates figure of the percentage of under estimation due to incomplete testing

  • 4-figures: folder containing figure files.

  • 5-results: folder containing analysis results objects.

  • 6-sensitivity: folder containing scripts to run the sensitivity analyses

Contributors: Jade Benjamin-Chung, Sean L. Wu, Anna Nguyen, Stephanie Djajadi, Nolan N. Pokpongkiat, Anmol Seth, Andrew Mertens

Wu SL, Mertens A, Crider YS, Nguyen A, Pokpongkiat NN, Djajadi S, et al. Substantial underestimation of SARS-CoV-2 infection in the United States due to incomplete testing and imperfect test accuracy. medRxiv. 2020; 2020.05.12.20091744. doi:10.1101/2020.05.12.20091744

When possible, also include a description of the RDS results that are generated, detailing what data sources were used, where the script lives that creates it, and what information the RDS results hold.

8.8 Clean up feature branches

In the remote repository on Github, all feature branches aside from master should be merged in and deleted. All outstanding PRs should be closed.

8.9 Create Github release

Once all of these items are verified, create a tag to make a Github release, which will tag the repository, creating a marker at this specific point in time.

Detailed instructions here.