Chapter 15 Slurm and cluster computing

by Anna Nguyen, Jade Benjamin-Chung, and Gabby Barratt Heitmann

When you need to run a script that requires a large amount of RAM, large files, or that uses parallelization, you can use Sherlock, Stanford’s computing cluster. Sherlock uses Slurm, an open source, scalable cluster management and job scheduling system for computing clusters. Jade can email Sherlock managers to get you an account. Please refer to the Sherlock user guide to learn about the system and how to use it. Below, we include a few tips specific to how we use Sherlock in our lab.

15.1 Getting started

To access Sherlock, in terminal, log in using the following syntax and replace “USERNAME” with your Stanford alias. You will be prompted to enter your Stanford password (the same one you use for your email and other accounts) and to complete two-factor authentication.

ssh USERNAME@login.sherlock.stanford.edu

Once you log in, you can view the contents of your home directory in command line by entering cd $HOME. You can create subfolders within this directory using the mkdir command. For example, you could make a “code” subdirectory and clone a Github repository there using the following code:

cd $HOME
mkdir code
git clone https://github.com/jadebc/covid19-infections.git

15.1.1 One-Time System Set-Up

To keep the install packages consistent across different nodes, you will need to explicitly set the pathway to your R library directory.

Open your ~/.Renviron file (vi ~/.Renviron) and append the following line:

Note: Once you open the file using vi [file_name], you must press i (on Mac OS) or Insert (on Windows) to make edits. After you finish, hit Esc to exit editing mode and type :wq to save and close the file.

R_LIBS=~/R/x86_64-pc-linux-gnu-library/4.0.2

Alternatively, run an R script with the following code on Sherlock:

r_environ_file_path = file.path(Sys.getenv("HOME"), ".Renviron")
if (!file.exists(r_environ_file_path)) file.create(r_environ_file_path)

cat("\nR_LIBS=~/R/x86_64-pc-linux-gnu-library/4.0.2",
    file = r_environ_file_path, sep = "\n", append = TRUE)

To load packages that run off of C++, you’ll need to set the correct compiler options in your R environment.

Open the Makevars file in Sherlock (vi ~/.R/Makevars) and append the following lines

CXX14FLAGS=-O3 -march=native -mtune=native -fPIC
CXX14=g++

Alternatively, create an R script with the following code, and run it on Sherlock:

dotR = file.path(Sys.getenv("HOME"), ".R")
if (!file.exists(dotR)) dir.create(dotR)

M = file.path(dotR, "Makevars")
if (!file.exists(M)) file.create(M)

cat("\nCXX14FLAGS=-O3 -march=native -mtune=native -fPIC",
    "CXX14=g++",
    file = M, sep = "\n", append = TRUE)

15.2 Moving files to Sherlock

The $HOME directory is a good place to store code and small test files (quota: 15 GB per user). Save large files to the $SCRATCH directory (quota: 100 TB per user). You can read more about storage options on Sherlock here. On the $SCRATCH directory, files that are not modified after 90 days are automatically deleted. For this reason, it’s best to create a bash script that records the file transfer process for a given project. See example code below:

# note: the following steps should be done from your local 
# (not after ssh-ing into sherlock)

# securely transfer folders from Box to sherlock home directory
# note: the -r option is for folders and is not needed for files
scp -r "Box/malaria-project/folder-1/" USERNAME@login.sherlock.stanford.edu:/home/users/USERNAME/

# securely transfer folders from Box to your sherlock scratch directory
scp -r "Box/malaria-project/folder-2/" USERNAME@login.sherlock.stanford.edu:/scratch/users/USERNAME/

# securely transfer folders from Box to our shared scratch directory
scp -r "Box/malaria-project/folder-3/" USERNAME@login.sherlock.stanford.edu:/scratch/group/jadebc/

15.3 Installing packages on Sherlock

When you begin working on Sherlock, you will most likely encounter problems with installing packages. To install packages, login to Sherlock on the command line and open a development node using the command sdev. Do not attempt to do this in the RStudio Server (see next section), as you will have to re-do it for every new session you open.

ssh USERNAME@login.sherlock.stanford.edu

sdev

There is a package installation file explicitly written for Sherlock that you should run before testing any code and sourcing the configuration file. You should only have to install packages once. Sherlock requires that you specify the repository where the package is downloaded from. You may also need to add an additional argument to install.packages to prevent the packages from locking after installation:

install.packages(<PACKAGE NAME>, repos=“http://cran.us.ur-project.org”, 
                  INSTALL_opts = "--no-lock"")

In order for some R packages to work on Sherlock, it is necessary to load specific software modules before running R. These must be loaded in Sherlock each time you want to use the package in R. For example, for spatial and random effects analyses, you may need the modules/packages below. These modules must also be loaded on the command line prior to opening R in order for package installation to work.

module --force purge # remove any previously loaded modules, including math and devel
module load math
module load math gmp/6.1.2
module load devel
module load gcc/10
module load system
module load json-glib/1.4.4
module load curl/7.81.0
module load physics
module load physics udunits geos
module load physics gdal/2.2.1 # for R/4.0.2
module load physics proj/4.9.3 # for R/4.0.2
module load pandoc/2.7.3

module load R/4.0.2

R # Open R in the Shell window to install individual packages or test code
Rscript install-packages-sherlock.R # Alternatively, run the entire package installation script in the Shell window

Figuring out the issues with some packages will require some trial and error. If you are still encountering problems installing a package, you may have to install other dependencies manually by reading through the error messages. If you try to install a dependency from CRAN and it isn’t working, it may be a module. You can search for it using the module spider command:

module spider DEPENDENCY NAME

However, you can also reach out to the Sherlock team for help. You can email them at srcc-support@stanford.edu. They also hold office hours.

15.4 Testing your code

Both of the following ways to test code on Sherlock are recommended for making small changes, such as editing file paths and making sure the packages and source files load. You should write and test the functionality of your script locally, only testing on Sherlock once major bugs are out.

15.4.1 The command line

There are two main ways to explore and test code on Sherlock. The first way is best for users who are comfortable working on the command line and editing code in base R. Even if you are not comfortable yet, this is probably the better way because these commands will transfer between Sherlock and other cluster computers using Slurm.

Typically, you will want to initially test your scripts by initiating a development node using the command sdev. This will allocate a small amount of computing resources for 1 hour. You can access R via command line using the following code.

# open development node
sdev

# Load all the modules required by the packages you are using
module load MODULE NAME  

# Load R (default version)*
module load R 

# initiate R in command line
R

*Note: for collaboration purposes, it’s best for everyone to work with one version of R. Check what version is being used for the project you are working on. Some packages only work with some versions of R, so it’s best to keep it consistent.

15.4.2 The Sherlock OnDemand Dashboard

The second way to test and edit code is to use the Sherlock OnDemand Dashboard, accessed by typing login.sherlock.stanford.edu into a web browser. You will be prompted to authenticate the way you would for any Stanford website. This is the best way to edit code for people who are not comfortable accessing & editing in base R in a Shell application.

You can test your code via the Rstudio server on Sherlock. To access this, login to the Dashboard, then click on Interactive Apps in the menu bar and choose R Studio Server. Similar to the sdev node, you have to set various parameters for your session. Choose a version of R and set the time – max. 2 hours. You can play with the other configurations, but this is likely unnecessary, as you should not need huge computing power to test small amounts of code. Keep in mind the more computing power you request, the lower priority your request becomes. You will then wait for the resources to become available, and you will be able to click “Launch” when they are (if you don’t mess with the CPU or GPU, this is usually less than 2 minutes). The screen that opens will look very similar to the RStudio on your local.

Do NOT use the RStudio Server’s Terminal to install packages, set your R environment, and do everything else needed to configure Sherlock because you will likely need to re-do it for every session/project. It’s best to use the Dashboard/RStudio Server if you are more comfortable testing & editing in RStudio rather than through base R in a Shell application.

15.4.3 Filepaths & configuration on Sherlock

In most cases, you will want to test that the file paths work correctly on Sherlock. You will likely need to add code to the configuration file in the project repository that specifies Sherlock-specific file paths. Here is an example:

# set sherlock-specific file paths
if(Sys.getenv("LMOD_SYSHOST")=="sherlock"){
  
  sherlock_path = paste0(Sys.getenv("HOME"), "/malaria-project/")
  
  data_path = paste0(sherlock_path, "data/")
  results_path = paste0(sherlock_path, "results/")
}

15.5 Storage & group storage access

15.5.1 Individual storage

There are multiple places to store your files on Sherlock. Each user has their own $HOME directory as well as a $SCRATCH directory. These are directories that can be accessed via the command line once you’ve logged in to Sherlock:

cd $HOME 
cd /home/users/USERNAME # Alternatively, use the full path

cd $SCRATCH
cd /scratch/users/USERNAME # Full path

You can also navigate to these using the File Explorer on Sherlock OnDemand.

$HOME has a volume quota of 15 GB. $SCRATCH has a volume quota of 100 TB, but files here get deleted 90 days after their last modification. Thus, use $SCRACTH for test files, exploratory analyses, and temporary storage. Use $HOME for long-term storage of important files and more finalized analyses.

You can read more about storage options on Sherlock here.

15.5.2 Group storage

The lab also has a $GROUP_HOME and a $GROUP_SCRATCH to store files for collaborative use. $GROUP_HOME has a volume quota of 1 TB and infinite retention time, whereas $GROUP_SCRATCH has a volume quota of 100 TB and the same 90-day retention limit. You can access these via the command line or navigate to them using the File Explorer:

cd $GROUP_HOME
cd /home/groups/jadebc

cd $GROUP_SCRATCH
cd /scratch/groups/jadebc

However, saving files to group storage can be tricky. You can try using the scp command in the section “Moving files to Sherlock” to see if you have permission to add files to group directories. Read the next section to ensure any directories you create have the right permissions.

15.5.3 Folder permissions

Generally, when we put folders in $GROUP_HOME or $GROUP_SCRATCH, it is so that we can collaborate on an analysis within the research group, so multiple people need to be able to access the folders. If you create a new folder in $GROUP_HOME or $GROUP_SCRATCH, please check the folder’s permissions to ensure that other group members are able to access its contents. To check the permissions of a folder, navigate to the level above it, and enter ls -l. You will see output like this:

drwxrwxrwx 2 jadebc jadebc  2204 Jun 17 13:12 myfolder

Please review this website to learn how to interpret the code on the left side of this output. The website also tells you how to change folder permissions. In order to ensure that all users and group members are able to access a folder’s contents, you can use the following command:

chmod ugo+rwx FOLDER_NAME

15.6 Running big jobs

Once your test scripts run successfully, you can submit an sbatch script for larger jobs. These are text files with a .sh suffix. Use a text editor like Sublime to create such a script. Documentation on sbatch options is available here. Here is an example of an sbatch script with the following options:

job-name=run_inc: Job name that will show up in the Sherlock system
begin=now: Requests to start the job as soon as the requested resources are available
dependency=singleton: Jobs can begin after all previously launched jobs with the same name and user have ended.
mail-type=ALL: Receive all types of email notification (e.g., when job starts, fails, ends)
cpus-per-task=16: Request 16 processors per task. The default is one processor per task.
mem=64G: Request 64 GB memory per node.
output=00-run_inc_log.out: Create a log file called 00-run_inc_log.out that contains information about the Slurm session
time=47:59:00: Set maximum run time to 47 hours and 59 minutes. If you don’t include this option, Sherlock will automatically exit scripts after 2 hours of run time.

The file analysis.out will contain the log file for the R script analysis.R.

#!/bin/bash

#SBATCH --job-name=run_inc
#SBATCH --begin=now
#SBATCH --dependency=singleton
#SBATCH --mail-type=ALL
#SBATCH --cpus-per-task=16
#SBATCH --mem=64G
#SBATCH --mem=64G
#SBATCH --output=00-run_inc_log.out
#SBATCH --time=47:59:00

cd $HOME/malaria-code-repo/2-analysis/

module purge 

# load R version 4.0.2 (required for certain packages)
module load R/4.0.2

# load gcc, a C++ compiler (required for certain packages)
module load gcc/10

# load software required for spatial analyses in R
module load physics gdal
module load physics proj

R CMD BATCH --no-save analysis.R analysis.out

To submit this job, save the code in the chunk above in a script called myjob.sh and then enter the following command into terminal:

sbatch myjob.sh

To check on the status of your job, enter the following code into terminal:

squeue -u $USERNAME