Chapter 12 Slurm and cluster computing

by Anna Nguyen, and Jade Benjamin-Chung

When you need to run a script that requires a large amount of RAM, large files, or that uses parallelization, you can use Sherlock, Stanford’s computing cluster. Sherlock uses Slurm, an open source, scalable cluster management and job scheduling system for computing clusters. Jade can email Sherlock managers to get you an account. Please refer to the Sherlock user guide to learn about the system and how to use it. Below, we include a few tips specific to how we use Sherlock in our lab.

12.1 Getting started

To access Sherlock, in terminal, log in using the following syntax and replace “USERNAME” with your Stanford alias. You will be prompted to enter your Stanford password (the same one you use for your email and other accounts) and to complete two-factor authentication.

ssh USERNAME@login.sherlock.stanford.edu

Once you log in, you can view the contents of your home directory in command line by entering cd $HOME. You can create subfolders within this directory using the mkdir command. For example, you could make a “code” subdirectory and clone a Github repository there using the following code:

cd $HOME
mkdir code
git clone https://github.com/jadebc/covid19-infections.git

12.1.1 One-Time System Set-Up

To keep the install packages consistent across different nodes, you will need to explicitly set the pathway to your R library directory.

Open your ~/.Renviron file (vi ~/.Renviron) and append the following line:

Note: Once you open the file using vi [file_name], you must press i (on Mac OS) or Insert (on Windows) to make edits. After you finish, hit Esc to exit editing mode and type :wq to save and close the file.

R_LIBS=~/R/x86_64-pc-linux-gnu-library/4.0

Alternatively, run an R script with the following code on Sherlock:

r_environ_file_path = file.path(Sys.getenv("HOME"), ".Renviron")
if (!file.exists(r_environ_file_path)) file.create(r_environ_file_path)

cat("\nR_LIBS=~/R/x86_64-pc-linux-gnu-library/4.0",
    file = r_environ_file_path, sep = "\n", append = TRUE)

To load packages that run off of C++, you’ll need to set the correct compiler options in your R environment.

Open the Makevars file in Sherlock (vi ~/.R/Makevars) and append the following lines

CXX14FLAGS=-O3 -march=native -mtune=native -fPIC
CXX14=g++

Alternatively, create an R script with the following code, and run it on Sherlock:

dotR = file.path(Sys.getenv("HOME"), ".R")
if (!file.exists(dotR)) dir.create(dotR)

M = file.path(dotR, "Makevars")
if (!file.exists(M)) file.create(M)

cat("\nCXX14FLAGS=-O3 -march=native -mtune=native -fPIC",
    "CXX14=g++",
    file = M, sep = "\n", append = TRUE)

12.2 Moving files to Sherlock

The $HOME directory is a good place to store code and small test files (quota: 15 GB per user). Save large files to the $SCRATCH directory (quota: 100 TB per user). On the $SCRATCH directory, files that are not modified after 90 days are automatically deleted. For this reason, it’s best to create a bash script that records the file transfer process for a given project. See example code below:

# securely transfer folders from Box to sherlock home directory
# note: the -r option is for folders and is not needed for files
scp -r "Box/malaria-project/folder-1/" USERNAME@login.sherlock.stanford.edu:
scp -r "Box/malaria-project/folder-2/" USERNAME@login.sherlock.stanford.edu:

# log into sherlock
ssh USERNAME@login.sherlock.stanford.edu

# make a malaria-project subfolder in scratch directory
cd $SCRATCH
mkdir malaria-project

# move files from sherlock home directory to scratch directory
cd $HOME
mv folder-1 $SCRATCH/malaria-project/
mv folder-2 $SCRATCH/malaria-project/

12.3 Testing your code

Typically, you will want to initially test your scripts by initiating a development node using the command sdev. This will allocate a small amount of computing resources for 1 hour. You can access R via command line using the following code, or you can test your code in the Rstudio server via Sherlock.

# start development node
sdev

# load R - default version (see Sherlock documentation for which version)
module load R

# initiate R in command line
R

In most cases, you will want to test that the file paths work correctly on Sherlock. You will likely need to add code to the configuration file in the project repository that specifies Sherlock-specific file paths. Here is an example:

# set sherlock-specific file paths
if(Sys.getenv("LMOD_SYSHOST")=="sherlock"){
  
  sherlock_path = paste0(Sys.getenv("HOME"), "/malaria-project/")
  
  data_path = paste0(sherlock_path, "data/")
  results_path = paste0(sherlock_path, "results/")
}

12.4 Running big jobs

Once your test scripts run successfully, you can submit an sbatch script for larger jobs. These are text files with a .sh suffix. Use a text editor like Sublime to create such a script. Documentation on sbatch options is available here. Here is an example of an sbatch script with the following options:

  • job-name=run_inc: Job name that will show up in the Sherlock system
  • begin=now: Requests to start the job as soon as the requested resources are available
  • dependency=singleton: Jobs can begin after all previously launched jobs with the same name and user have ended.
  • mail-type=ALL: Receive all types of email notification (e.g., when job starts, fails, ends)
  • cpus-per-task=16: Request 16 processors per task. The default is one processor per task.
  • mem=64G: Request 64 GB memory per node.
  • output=00-run_inc_log.out: Create a log file called 00-run_inc_log.out that contains information about the Slurm session

The file analysis.out will contain the log file for the R script analysis.R.

#!/bin/bash

#SBATCH --job-name=run_inc
#SBATCH --begin=now
#SBATCH --dependency=singleton
#SBATCH --mail-type=ALL
#SBATCH --cpus-per-task=16
#SBATCH --mem=64G
#SBATCH --mem=64G
#SBATCH --output=00-run_inc_log.out

cd $HOME/malaria-code-repo/2-analysis/

module purge 

# load R version 4.0.2 (required for certain packages)
module load R/4.0.2

# load gcc, a C++ compiler (required for certain packages)
module load gcc/10

# load software required for spatial analyses in R
module load physics gdal
module load physics proj

R CMD BATCH --no-save analysis.R analysis.out

To submit this job, save the code in the chunk above in a script called myjob.sh and then enter the following command into terminal:

sbatch myjob.sh 

To check on the status of your job, enter the following code into terminal:

squeue -u $USERNAME