• Lab manual
  • 1 Welcome to the Benjamin-Chung Lab!
    • 1.1 About the lab
    • 1.2 About this lab manual
  • 2 Culture and conduct
    • 2.1 Lab culture
    • 2.2 Diversity, equity, and inclusion
    • 2.3 Protecting human subjects
    • 2.4 Authorship
  • 3 Communication and coordination
    • 3.1 Slack
    • 3.2 Email
    • 3.3 Trello
    • 3.4 Google Drives
    • 3.5 Meetings
  • 4 Reproducibility
    • 4.1 What is the reproducibility crisis?
    • 4.2 Study design
    • 4.3 Register study protocols
    • 4.4 Write and register pre-analysis plans
    • 4.5 Create reproducible workflows
    • 4.6 Process and analyze data with internal replication and masking
    • 4.7 Use reporting checklists with manuscripts
    • 4.8 Publish preprints
    • 4.9 Publish data (when possible) and replication scripts
  • 5 Code repositories
    • 5.1 Project Structure
    • 5.2 .Rproj files
    • 5.3 Configuration (‘config’) File
    • 5.4 Order Files and Directories
    • 5.5 Using Bash scripts to ensure reproducibility
  • 6 Coding practices
    • 6.1 Organizing scripts
    • 6.2 Documenting your code
      • 6.2.1 File headers
      • 6.2.2 Sections and subsections
      • 6.2.3 Code folding
      • 6.2.4 Comments in the body of your script
      • 6.2.5 Function documentation
    • 6.3 Object naming
    • 6.4 Function calls
    • 6.5 The here package
    • 6.6 Reading/Saving Data
      • 6.6.1 .RDS vs .RData Files
      • 6.6.2 CSVs
    • 6.7 Integrating Box and Dropbox
      • 6.7.1 Box
      • 6.7.2 Dropbox
    • 6.8 Tidyverse
    • 6.9 Coding with R and Python
    • 6.10 Repeating analyses with different variations
      • 6.10.1 lapply() and sapply()
      • 6.10.2 mapply() and pmap()
      • 6.10.3 Parallel processing with parallel and future packages
    • 6.11 Reviewing Code
      • 6.11.1 Creating a Pull Request Template
  • 7 Coding style
    • 7.1 Comments
    • 7.2 Line breaks
    • 7.3 Automated Tools for Style and Project Workflow
      • 7.3.1 Styling
  • 8 Working with Big Data
    • 8.1 The data.table package
    • 8.2 Using downsampled data
    • 8.3 Optimal RStudio set up
  • 9 Data masking
    • 9.1 General Overview
      • 9.1.1 What is Data Masking?
      • 9.1.2 Using tidy evaluation for data masking
    • 9.2 Technical Overview
      • 9.2.1 Example
  • 10 Github
    • 10.1 Basics
    • 10.2 Github Desktop
    • 10.3 Git Branching
    • 10.4 Example Workflow
    • 10.5 Commonly Used Git Commands
    • 10.6 How often should I commit?
    • 10.7 What should be pushed to Github?
  • 11 Unix commands
    • 11.1 Basics
    • 11.2 Syntax for both Mac/Windows
    • 11.3 Running Bash Scripts
    • 11.4 Running Rscripts in Windows
      • 11.4.1 Common Mistakes
    • 11.5 Checking tasks and killing jobs
    • 11.6 Running big jobs
      • 11.6.1 Example code for runfileSaveLogs
      • 11.6.2 Example usage for runfileSaveLogs
  • 12 Reproducible Environments
    • 12.1 Package Version Control with renv
      • 12.1.1 Introduction
      • 12.1.2 Implementing renv in projects
      • 12.1.3 Using projects with renv
  • 13 Code Publication
    • 13.1 Checklist overview
    • 13.2 Fill out file headers
    • 13.3 Clean up comments
    • 13.4 Document functions
    • 13.5 Remove deprecated filepaths
    • 13.6 Ensure project runs via bash
    • 13.7 Complete the README
    • 13.8 Clean up feature branches
    • 13.9 Create Github release
  • 14 Data Publication
    • 14.1 Overview
    • 14.2 Removing PHI
      • 14.2.1 Personal information
      • 14.2.2 Dates
      • 14.2.3 Geographic information
    • 14.3 Create public IDs
      • 14.3.1 Rationale
      • 14.3.2 A single set of public IDs for each study
      • 14.3.3 Example scripts
    • 14.4 Create a data repository
      • 14.4.1 Steps for creating an Open Science Framework (OSF) repository:
    • 14.5 Edit and test analysis scripts
    • 14.6 Create a public GitHub page for public scripts
    • 14.7 Go live
  • 15 Slurm and cluster computing
    • 15.1 Getting started
      • 15.1.1 One-Time System Set-Up
    • 15.2 Moving files to Sherlock
    • 15.3 Installing packages on Sherlock
    • 15.4 Testing your code
      • 15.4.1 The command line
      • 15.4.2 The Sherlock OnDemand Dashboard
      • 15.4.3 Filepaths & configuration on Sherlock
    • 15.5 Storage & group storage access
      • 15.5.1 Individual storage
      • 15.5.2 Group storage
      • 15.5.3 Folder permissions
    • 15.6 Running big jobs
  • 16 Checklists
    • 16.1 Pre-analysis plan checklist
    • 16.2 Code checklist
    • 16.3 Manuscript checklist
    • 16.4 Figure checklist
  • 17 Resources
    • 17.1 Resources for R
    • 17.2 Resources for Git & Github
    • 17.3 Scientific figures
    • 17.4 Writing
    • 17.5 Presentations
    • 17.6 Professional advice
    • 17.7 Funding
    • 17.8 Ethics and global health research
  • Published with bookdown

Benjamin-Chung Lab Manual

Chapter 17 Resources

by Jade Benjamin-Chung and Kunal Mishra

17.1 Resources for R

  • dplyr and tidyr cheat sheet
  • ggplot cheat sheet
  • data table cheat sheet
  • RMarkdown cheat sheet
  • Hadley Wickham’s R Style Guide
  • Jade’s R-for-epi course
  • Tidy Eval in 5 Minutes (video)
  • Tidy Evaluation (e-book)
  • Data Frame Columns as Arguments to Dplyr Functions (blog)
  • Standard Evaluation for *_join (stackoverflow)
  • Programming with dplyr (package vignette)

17.2 Resources for Git & Github

  • Data Camp introduction to Git
  • Introduction to Github

17.3 Scientific figures

  • Ten Simple Rules for Better Figures

17.4 Writing

  • Tips on how to write a great science paper
  • ICMJE Definition of authorship
  • Nature article on elements of style for scientific writing
  • The Pathway to Publishing: A Guide to Quantitative Writing in the Health Sciences
  • Secret, actionable writing tips

17.5 Presentations

  • How to tell a compelling story in scientific presentations
  • How to give a killer narratively-driven scientific talk
  • How to make a better poster
  • How to make an even better poster

17.6 Professional advice

  • Professional advice, especially for your first job

17.7 Funding

  • Building Your Funding Train

17.8 Ethics and global health research

  • Global Code of Conduct For Research in Resource-Poor Settings
  • Who is a global health expert? Advice for aspiring global health experts
  • Transforming Global Health Partnerships