Chapter 7 Coding style
by Kunal Mishra, Jade Benjamin-Chung, and Stephanie Djajadi
7.2 Line breaks
For
ggplot
calls anddplyr
pipelines, do not crowd single lines. Here are some nontrivial examples of “beautiful” pipelines, where beauty is defined by coherence:# Example 1 school_names = list( OUSD_school_names = absentee_all %>% filter(dist.n == 1) %>% pull(school) %>% unique %>% sort, WCCSD_school_names = absentee_all %>% filter(dist.n == 0) %>% pull(school) %>% unique %>% sort )
# Example 2 absentee_all = fread(file = raw_data_path) %>% mutate(program = case_when(schoolyr %in% pre_program_schoolyrs ~ 0, schoolyr %in% program_schoolyrs ~ 1)) %>% mutate(period = case_when(schoolyr %in% pre_program_schoolyrs ~ 0, schoolyr %in% LAIV_schoolyrs ~ 1, schoolyr %in% IIV_schoolyrs ~ 2)) %>% filter(schoolyr != "2017-18")
And of a complex
ggplot
call:# Example 3 ggplot(data=data, mapping=aes_string(x="year", y="rd", group=group)) + geom_point(mapping=aes_string(col=group, shape=group), position=position_dodge(width=0.2), size=2.5) + geom_errorbar(mapping=aes_string(ymin="lb", ymax="ub", col=group), position=position_dodge(width=0.2), width=0.2) + geom_point(position=position_dodge(width=0.2), size=2.5) + geom_errorbar(mapping=aes(ymin=lb, ymax=ub), position=position_dodge(width=0.2), width=0.1) + scale_y_continuous(limits=limits, breaks=breaks, labels=breaks) + scale_color_manual(std_legend_title,values=cols,labels=legend_label) + scale_shape_manual(std_legend_title,values=shapes, labels=legend_label) + geom_hline(yintercept=0, linetype="dashed") + xlab("Program year") + ylab(yaxis_lab) + theme_complete_bw() + theme(strip.text.x = element_text(size = 14), axis.text.x = element_text(size = 12)) + ggtitle(title)
Imagine (or perhaps mournfully recall) the mess that can occur when you don’t strictly style a complicated
ggplot
call. Trying to fix bugs and ensure your code is working can be a nightmare. Now imagine trying to do it with the same code 6 months after you’ve written it. Invest the time now and reap the rewards as the code practically explains itself, line by line.
7.3 Automated Tools for Style and Project Workflow
7.3.1 Styling
Code Autoformatting - RStudio includes a fantastic built-in utility (keyboard shortcut:
CMD-Shift-A
) for autoformatting highlighted chunks of code to fit many of the best practices listed here. It generally makes code more readable and fixes a lot of the small things you may not feel like fixing yourself. Try it out as a “first pass” on some code of yours that doesn’t follow many of these best practices!Assignment Aligner - A cool R package allows you to very powerfully format large chunks of assignment code to be much cleaner and much more readable. Follow the linked instructions and create a keyboard shortcut of your choosing (recommendation:
CMD-Shift-Z
). Here is an example of how assignment aligning can dramatically improve code readability:
# Before
OUSD_not_found_aliases = list(
"Brookfield Village Elementary" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Brookfield"),
"Carl Munck Elementary" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Munck"),
"Community United Elementary School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Community United"),
"East Oakland PRIDE Elementary" = str_subset(string = OUSD_school_shapes$schnam, pattern = "East Oakland Pride"),
"EnCompass Academy" = str_subset(string = OUSD_school_shapes$schnam, pattern = "EnCompass"),
"Global Family School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Global"),
"International Community School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "International Community"),
"Madison Park Lower Campus" = "Madison Park Academy TK-5",
"Manzanita Community School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Manzanita Community"),
"Martin Luther King Jr Elementary" = str_subset(string = OUSD_school_shapes$schnam, pattern = "King"),
"PLACE @ Prescott" = "Preparatory Literary Academy of Cultural Excellence",
"RISE Community School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Rise Community")
)
# After
OUSD_not_found_aliases = list(
"Brookfield Village Elementary" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Brookfield"),
"Carl Munck Elementary" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Munck"),
"Community United Elementary School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Community United"),
"East Oakland PRIDE Elementary" = str_subset(string = OUSD_school_shapes$schnam, pattern = "East Oakland Pride"),
"EnCompass Academy" = str_subset(string = OUSD_school_shapes$schnam, pattern = "EnCompass"),
"Global Family School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Global"),
"International Community School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "International Community"),
"Madison Park Lower Campus" = "Madison Park Academy TK-5",
"Manzanita Community School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Manzanita Community"),
"Martin Luther King Jr Elementary" = str_subset(string = OUSD_school_shapes$schnam, pattern = "King"),
"PLACE @ Prescott" = "Preparatory Literary Academy of Cultural Excellence",
"RISE Community School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Rise Community")
)
- StyleR - Another cool R package from the Tidyverse that can be powerful and used as a first pass on entire projects that need refactoring. The most useful function of the package is the
style_dir
function, which will style all files within a given directory. See the function’s documentation and the vignette linked above for more details.- Note: The default Tidyverse styler is subtly different from some of the things we’ve advocated for in this document. Most notably we differ with regards to the assignment operator (
<-
vs=
) and number of spaces before/after “tokens” (i.e. Assignment Aligner add spaces before=
signs to align them properly). For this reason, we’d recommend the following:style_dir(path = ..., scope = "line_breaks", strict = FALSE)
. You can also customize StyleR even more if you’re really hardcore. - Note: As is mentioned in the package vignette linked above, StyleR modifies things in-place, meaning it overwrites your existing code and replaces it with the updated, properly styled code. This makes it a good fit on projects with version control, but if you don’t have backups or a good way to revert back to the intial code, I wouldn’t recommend going this route.
- Note: The default Tidyverse styler is subtly different from some of the things we’ve advocated for in this document. Most notably we differ with regards to the assignment operator (
- Linter - Linters are programming tools that check adherence to a given style, syntax errors, and possible semantic issues. The R linter, called
lintr
, can be found in this package. It helps keep files consistent across different authors and even different organizations. For example, it notifies you if you have unused variables, global variables with no visible binding, not enough or superflous whitespace, and improper use of parentheses or brackets. A list of its other purposes can be found in this link, and most guidelines are based on Hadley Wickham’s R Style Guide.- Note: You can customize your settings to set defaults or to exclude files. More details can be found here.
- Note: The
lintr
package goes hand in hand with thestyler
package. The styler can be used to automatically fix the problems that the lintr catches.
7.1 Comments
# This is a comment -- first letter is capitalized and spaced away from the pound sign