R Development Resources

Author
Affiliation

Jack Leary

University of Florida

Published

April 16, 2024

1 Introduction

This document houses a list of resources I’ve put together that have helped me in my journey from novice R user to experienced (kinda) R package developer. Most of the resources will be centered around developing packages specifically, but some pertain to other types of R projects.

2 Package Development

  • R Packages
    • The official R package development book, written by Hadley Wickham & Jenny Bryan. Contains a startup guide and sections on metadata, dependencies, unit testing, and documentation. Good comprehensive resource, but not the quickest way to get up & running. This package does have non-R dependencies, but they’re free & easy to install.
  • The usethis package
  • Writing R Extensions
    • This book, written by the R Core Team, is a lower-level guide to creating R packages, writing R documentation, debugging, linking R with C / C++, and other advanced topics. It’s useful if you’re getting deeper into package development & already have a very solid handle on the basics.

3 Computational Efficiency

  • Code profiling with proffer
    • Code profiling allows you to see a breakdown in graph or table form of which functions in a piece of code are taking the longest to execute. This is done at varying hierarchical levels of resolution, which means you can see that e.g., a function you wrote called run_analysis() is taking up 50% of your computation time, and that within that function the main culprit is the subfunction load_large_dataset(). As such, it becomes very simple & quick to identify the best targets for further optimization of your code with respect to runtime.
  • Benchmarking runtime using microbenchmark
    • This package is a very light benchmarking utility. It allows you to compare a list of different functions’ runtime by executing each function a given number of times e.g., 100 times, and returning the distribution of runtime for each function. This makes it very easy to quickly see which implementation of a given task is better with respect to runtime. The source repository can be found here.

4 Reproducible Research

  • Reproducible pipelines in R using targets
    • The targets package is one of my favorite R tools, & the well-written docs above show how to create version-controlled pipelines entirely using R. This framework is an absolute godsend for large projects, simulation studies, etc., and I’ve used it on every longterm computational project I’ve worked on in the past 2 years. It makes tracking, reproducing, & parallelizing the execution of large codebases very easy, and makes reproducible research accessible to anyone with a good handle on R.
  • Scientific writing via quarto
    • quarto is a more fully-featured successor to RMarkdown, and enables the user to combine code, Markdown-formatted text, images, equations, etc. in a single document. Citations via supported as well, which makes technical writing relatively easy. There is support for LaTeX, which is great for homework assignments, proofs, and Methods sections. In addition, you can use quarto to create websites, books, presentations, and more. This site itself is actually run using quarto. I’d highly recommend it over RMarkdown at this point, both for its breadth of features & its expanded support of languages (Python, Julia, R, etc.) & IDEs (RStudio, VS Code, Jupyter notebooks). Full documentation can be found here.
  • How to control stochasticity when processing in parallel
    • This presentation describes how to produce reproducible random number streams when using one of R’s several parallel processing frameworks. It’s pretty applied (not theoretical), and several useful code examples are shown.

5 Development Best Practices

  • What They Forgot to Teach You About R
    • This online book is less about package development & more about R’s idiosyncrasies, but it contains a bunch of useful tips & tricks on how to make your code more reproducible & less brittle. The debugging section is clear & concise, and contains links to other, more detailed resources as well.

6 Interfacing with Other Languages

  • reticulate
    • Seamlessly interact with Python from your R session. Super easy to use & fairly full-featured, this package makes things like performing scRNA-seq analysis with both R & Python libraries a breeze. This package is developed by the team at RStudio & is actively maintained.
  • JuliaCall
    • Allows you to use the Julia language from within an R session. A great vignette can be found here.

7 Miscellaneous

  • The R Inferno
    • This amusingly-titled & engagingly-written book covers a variety of oddities & frustrating idiosyncrasies that R has, many of which are holdovers from when R was being first developed. If you’re having an annoyingly difficult low-level problem, this book might have the answer.
  • radian: an improved R console
    • If you often use R in the terminal, it can be a frustrating experience when compared to all the features (syntax highlighting, autocomplete, bracket-matching, etc.) that RStudio has. radian is a command line tool that adds all these aesthetic features to your R console. Simply install the Python-based library, then call radian to open a console session from the terminal.