• Lab manual
  • 1 Welcome to the Benjamin-Chung Lab!
    • 1.1 About the lab
    • 1.2 About this lab manual
  • 2 Culture and conduct
    • 2.1 Lab culture
    • 2.2 Diversity, equity, and inclusion
    • 2.3 Protecting human subjects
    • 2.4 Authorship
  • 3 Communication and coordination
    • 3.1 Slack
    • 3.2 Email
    • 3.3 Trello
    • 3.4 Google Drive
    • 3.5 Stanford Medicine Box
    • 3.6 Meetings
  • 4 Reproducibility
    • 4.1 What is the reproducibility crisis?
    • 4.2 Study design
    • 4.3 Register study protocols
    • 4.4 Write pre-analysis plans
    • 4.5 Register pre-analysis plans
    • 4.6 Create reproducible workflows
    • 4.7 Process and analyze data with internal replication and masking
    • 4.8 Use reporting checklists with manuscripts
    • 4.9 Publish preprints
    • 4.10 Publish data (when possible) and replication scripts
  • 5 Getting started
    • 5.1 The tools and how they connect
    • 5.2 RStudio
    • 5.3 GitHub intro
    • 5.4 The terminal
    • 5.5 Sherlock
    • 5.6 Box
    • 5.7 A typical first week
  • 6 Code repositories
    • 6.1 Project Structure
    • 6.2 0-functions folder
    • 6.3 .Rproj files
    • 6.4 Configuration (‘config’) File
    • 6.5 Order Files and Directories
    • 6.6 Using Bash scripts to ensure reproducibility
  • 7 Coding practices
    • 7.1 Organizing scripts
    • 7.2 Documenting your code
      • 7.2.1 File headers
      • 7.2.2 Sections and subsections
      • 7.2.3 Code folding
      • 7.2.4 Comments in the body of your script
      • 7.2.5 Function documentation
    • 7.3 Object naming
    • 7.4 Function calls
    • 7.5 The here package
    • 7.6 Reading/Saving Data
      • 7.6.1 .RDS vs .RData Files
      • 7.6.2 CSVs
    • 7.7 Tidyverse
    • 7.8 Coding with R and Python
    • 7.9 Repeating analyses with different variations
      • 7.9.1 lapply() and sapply()
      • 7.9.2 mapply() and pmap()
      • 7.9.3 Parallel processing with parallel and future packages
    • 7.10 Reviewing Code
      • 7.10.1 Why we review code
      • 7.10.2 What to look for in a code review
      • 7.10.3 How to give feedback
      • 7.10.4 When to approve
      • 7.10.5 Creating a Pull Request Template
  • 8 Coding style
    • 8.1 Comments
    • 8.2 Line breaks
    • 8.3 Automated Tools for Style and Project Workflow
      • 8.3.1 Styling
  • 9 Working with Big Data
    • 9.1 The data.table package
    • 9.2 Using downsampled data
    • 9.3 Optimal RStudio set up
  • 10 Tidy evaluation and programming with dplyr
    • 10.1 General Overview
      • 10.1.1 What is Tidy Evaluation?
      • 10.1.2 Using tidy evaluation for data masking
    • 10.2 Technical Overview
      • 10.2.1 Example
  • 11 Data validation and unit testing
    • 11.1 Data validation with assertions in analysis scripts
      • 11.1.1 After merges and joins
      • 11.1.2 After filters and subsets
      • 11.1.3 After variable creation
      • 11.1.4 Checking for unexpected missing data
    • 11.2 Writing a validation function
    • 11.3 Unit testing with testthat
      • 11.3.1 Setting up unit tests
      • 11.3.2 Writing unit tests
      • 11.3.3 Running tests
      • 11.3.4 What to test
    • 11.4 Summary of practices
  • 12 Github
    • 12.1 Basics
    • 12.2 Github Desktop
    • 12.3 Git Branching
    • 12.4 Example Workflow
    • 12.5 Commonly Used Git Commands
    • 12.6 How often should I commit?
    • 12.7 What should be pushed to Github?
  • 13 Unix commands
    • 13.1 Basics
    • 13.2 Syntax for both Mac/Windows
    • 13.3 Running Bash Scripts
    • 13.4 Running Rscripts in Windows
      • 13.4.1 Common Mistakes
    • 13.5 Checking tasks and killing jobs
    • 13.6 Running big jobs
      • 13.6.1 Example code for runfileSaveLogs
      • 13.6.2 Example usage for runfileSaveLogs
  • 14 Reproducible Environments
    • 14.1 Package Version Control with renv
      • 14.1.1 Introduction
      • 14.1.2 Implementing renv in projects
      • 14.1.3 Using projects with renv
    • 14.2 Documenting the R version
    • 14.3 Ensuring consistency between local and cluster environments
  • 15 Code Publication
    • 15.1 Checklist overview
    • 15.2 Fill out file headers
    • 15.3 Clean up comments
    • 15.4 Document functions
    • 15.5 Remove deprecated filepaths
    • 15.6 Ensure project runs via bash
    • 15.7 Complete the README
    • 15.8 Clean up feature branches
    • 15.9 Create Github release
  • 16 Data Publication
    • 16.1 Overview
    • 16.2 Removing PHI
      • 16.2.1 Personal information
      • 16.2.2 Dates
      • 16.2.3 Geographic information
    • 16.3 Create public IDs
      • 16.3.1 Rationale
      • 16.3.2 A single set of public IDs for each study
      • 16.3.3 Example scripts
    • 16.4 Create a data repository
      • 16.4.1 Steps for creating an Open Science Framework (OSF) repository:
    • 16.5 Edit and test analysis scripts
    • 16.6 Create a public GitHub page for public scripts
    • 16.7 Go live
  • 17 Slurm and cluster computing
    • 17.1 Getting started
      • 17.1.1 One-Time System Set-Up
    • 17.2 Moving files to Sherlock
    • 17.3 Installing packages on Sherlock
    • 17.4 Testing your code
      • 17.4.1 The command line
      • 17.4.2 The Sherlock OnDemand Dashboard
      • 17.4.3 Filepaths & configuration on Sherlock
    • 17.5 Storage & group storage access
      • 17.5.1 Individual storage
      • 17.5.2 Group storage
      • 17.5.3 Folder permissions
    • 17.6 Running big jobs
  • 18 Checklists
    • 18.1 Pre-analysis plan checklist
    • 18.2 Code checklist
    • 18.3 Manuscript checklist
    • 18.4 Figure checklist
  • 19 Resources
    • 19.1 Resources for R
    • 19.2 Resources for Git & Github
    • 19.3 Scientific figures
    • 19.4 Writing
    • 19.5 Presentations
    • 19.6 Professional advice
    • 19.7 Funding
    • 19.8 Ethics and global health research
  • Published with bookdown

Benjamin-Chung Lab Manual

Chapter 19 Resources

by Jade Benjamin-Chung and Kunal Mishra

19.1 Resources for R

  • dplyr and tidyr cheat sheet
  • ggplot cheat sheet
  • data table cheat sheet
  • RMarkdown cheat sheet
  • Hadley Wickham’s R Style Guide
  • Jade’s R-for-epi course
  • Tidy Eval in 5 Minutes (video)
  • Tidy Evaluation (e-book)
  • Data Frame Columns as Arguments to Dplyr Functions (blog)
  • Standard Evaluation for *_join (stackoverflow)
  • Programming with dplyr (package vignette)

19.2 Resources for Git & Github

  • Data Camp introduction to Git
  • Introduction to Github

19.3 Scientific figures

  • Ten Simple Rules for Better Figures

19.4 Writing

  • Tips on how to write a great science paper
  • ICMJE Definition of authorship
  • Nature article on elements of style for scientific writing
  • The Pathway to Publishing: A Guide to Quantitative Writing in the Health Sciences
  • Secret, actionable writing tips

19.5 Presentations

  • How to tell a compelling story in scientific presentations
  • How to give a killer narratively-driven scientific talk
  • How to make a better poster
  • How to make an even better poster

19.6 Professional advice

  • Professional advice, especially for your first job

19.7 Funding

  • Building Your Funding Train

19.8 Ethics and global health research

  • Global Code of Conduct For Research in Resource-Poor Settings
  • Who is a global health expert? Advice for aspiring global health experts
  • Transforming Global Health Partnerships