In r, what includes reusable functions and documentation about how to use the functions?

I recently gave a lightning talk on the topic of creating your own R packages. Whether you’re working individually or on a team, everyone can benefit from creating a package to organizedocument, and reuse existing code.

“Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data.”

Hadley Wickham
  • Packages are an efficient way to reuse existing assets, especially for ongoing projects. Our team has streamlined our ongoing, data science projects by creating packages for our data pipeline process, core metric calculations, and key visualizations.
  • If you find yourself repeatedly copying and pasting old code across projects and files, or rewriting functions because you can’t find them, then it may be worthwhile to put it into a package. Doing so will save you a ton of time and frustration in the future.
  • Packages help to organize collections of functions and data sets. You could arrange your functions into packages by category, e.g. a data viz package, a package for your biology class functions/datasets, etc.
  • One of the easiest methods of distributing code and data for others to use is through a package. Teammates can simply install your package!
  • Packages allow you to maintain consistency in your team’s code. E.g. create a standard plot theme with your company’s color scheme, fonts, and logo(s), with set sizes for titles and axis labels. Put it into a package, so that graphs created across teammates will have a consistent look and feel.

I created the following tutorial with my own functions (they’re super simple!) to show the steps of creating a package.

For reference, all my code is on Github. For more details on the package development process, please refer to Hadley Wickham’s book on R Packages.

Install the following packages: devtools and roxygen2.

install.packages("devtools") devtools::install_github("klutometis/roxygen")

  • roxygen2 generates the documentation files for a package.
  • devtools is needed for the documentation, testing, and sharing stages of package development.

In RStudio, go to: File → New Project → New Directory → R Package
The following dialog will appear.

In r, what includes reusable functions and documentation about how to use the functions?
  1. Enter your package name.
  2. (Optional) Add any existing R scripts that you’d like to use as the basis for the package.
  3. Specify your package location.
  4. I prefer to check the box to create a git repository, for version control. Also, you can check the box to open this package in a new R session (see lower left corner).
  5. Click “Create Project”. R will automatically build the package, and will create a sample “hello.R” file (which you can delete).

Create a new R script, and start to paste your functions into it. A best practice is to avoid placing all functions in one file, but also to avoid placing each function in a separate file. Ideally, you’ll have a few R scripts, each containing a set of related functions.

See my code for steps 3 and 4 here!

If your code is not in the form of functions, you’ll need to put it into functions in this step.

In r, what includes reusable functions and documentation about how to use the functions?

Save each script to the default location, which will be in a folder called “R”. Packages store all their R scripts in this folder.

Note: If your code uses functions from other packages, these lines of code must be prefixed with the package name, e.g. “lubridate::ymd(…)”. The automated testing that we’ll do in a later step will help you identify which lines are missing a package name.

Now it’s time to add documentation to your functions! This is where the roxygen2 package comes in. Above every function, you will type comments in a special format, which roxygen will later transform into formal documentation.

In r, what includes reusable functions and documentation about how to use the functions?
Sample roxygen format. See here for more details.

See this example of how to add roxygen comments to a function (get the code here):

In r, what includes reusable functions and documentation about how to use the functions?

You might be wondering, what happens if my package uses functions from other R packages, but someone doesn’t have those other packages installed? This is where the DESCRIPTION file comes in.

R automatically creates the DESCRIPTION file. See my example file.

On your computer, navigate to your package’s directory, and open the file titled “DESCRIPTION”. This file contains metadata about your package. There are 2 important things to do here:

  1. If your code calls functions from external packages, list these packages on an “Imports” line (see image below).
    • This will cause each package to be automatically installed if someone does not have it installed on their machine.
  2. Hit Enter at the very end of the DESCRIPTION file. This file must end with a blank new line, otherwise you’ll receive a “incomplete final line found” warning in step 6.

In r, what includes reusable functions and documentation about how to use the functions?
Above is an example of how to modify the DESCRIPTION file.

Generate the formal documentation files (.Rd files) by running:

This enables any user of your package to type ?<function> and see its documentation. Any time you modify the roxygen comments in your R scripts, you’ll need to rerun “devtools::document()”.

In r, what includes reusable functions and documentation about how to use the functions?

Test your package. There are a variety of testing tools, but at a minimum, I like the “check” function from devtools.

This function thoroughly checks for issues in your code, errors in your package structure, and problems with your documentation.

Use your package! You must first change to the parent directory, as shown in the code below. Install and load your package by running:

setwd("..") devtools::install("mathPackage") library("mathPackage")

In r, what includes reusable functions and documentation about how to use the functions?

You can upload your package to Github by uploading the entire package folder to a new or existing repo. Then, for anyone to install from Github, they simply need to run the “install_github” function:

# Non-enterprise github # Modify the "repo" argument if package is in a subdirectory devtools::install_github(repo = "corinneleopold/packageTutorial/mathPackage")   # Enterprise github - may need to create a personal access token devtools::install_github(repo = "path-to-package/package-name", host = "github.hostname.com/api/v3", token = "your-token")

Whenever you make a change to your package, be sure to…

  • Update the roxygen comments (e.g. if you change a function’s parameters or add/delete a function).
  • Rerun “devtools::document()” to ensure the man files are up to date, especially if you modified the roxygen comments.
  • Commit your changes to Github, then ensure that anyone using your package reinstalls it using the “install_github()” function.

You are reading the work-in-progress second edition of R Packages. This chapter should be readable but is currently undergoing final polishing.

In this chapter, you’ll learn about function documentation, which users access with ?somefunction or help("somefunction"). Base R provides a standard way of documenting a package where each function is documented in a topic, an .Rd file (“R documentation”) in the man/ directory. .Rd files use a custom syntax, loosely based on LaTeX, and can be rendered to HTML, plain text, or pdf, as needed, for viewing in different contexts.

In the devtools ecosystem, we don’t edit .Rd files directly with our bare hands. Instead, we include specially formatted “roxygen comments” above the source code for each function1. Then we use the roxygen2 package to generate the .Rd files from these special comments2 . There are a few advantages to using roxygen2 :

  • Code and documentation are co-located. When you modify your code, it’s easy to remember to also update your documentation.

  • You can use markdown, rather than having to learn a one-off markup language that only applies to .Rd files. In addition to formatting, the automatic hyperlinking functionality makes it much, much easier to create richly linked documentation.

  • There’s a lot of .Rd boilerplate that’s automated away.

  • roxygen2 provides a number of tools for sharing content across documentation topics and even between topics and vignettes.

In this chapter we’ll focus on documenting functions, but the same ideas apply to documenting datasets (Section 8.2.2), classes and generics, and packages. You can learn more about those important topics in vignette("rd-other", package = "roxygen2").

To get started, we’ll work through the basic roxygen2 workflow and discuss the overall structure of roxygen2 comments, which are organised into blocks and tags. We also highlight the biggest wins of using markdown with roxygen2.

Unlike with testthat, there’s no obvious opening move to declare that you’re going to use roxygen2 for documentation. That’s because the use of roxygen2 is purely a matter of your development workflow. It has no effect on, e.g., how a package gets checked or built. We think the roxygen approach is the best way to generate your .Rd files, but officially R only cares about the files themselves, not how they came to be.

You should document all exported functions and datasets. Otherwise, you’ll get this warning from R CMD check:

W checking for missing documentation entries (614ms) Undocumented code objects: ‘somefunction’ Undocumented data sets: ‘somedata’ All user-level objects in a package should have documentation entries.

Conversely, you probably don’t want to document unexported functions. If you want to use roxygen comments for internal documentation, include the @noRd tag to suppress the creation of the .Rd file.

This is also a good time to explain something you may have noticed in your DESCRIPTION file:

Roxygen: list(markdown = TRUE)

devtools/usethis includes this by default when initiating a DESCRIPTION file and it gives roxygen2 a heads-up that your package uses markdown syntax in its roxygen comments.3

Your documentation workflow truly begins when you start to add roxygen comments above your functions. Roxygen comment lines always start with #' , the usual # for a comment, followed immediately by a single quote ':

#' Add together two numbers #' #' @param x A number. #' @param y A number. #' @returns A numeric vector. #' @examples #' add(1, 1) #' add(10, 1) add <- function(x, y) { x + y }

Usually you write your function first, then its documentation. Once the function definition exists, put your cursor somewhere in it and do Code > Insert Roxygen Skeleton to get a great head start on the roxygen comment.

Once you have at least one roxygen comment, run devtools::document() or, in RStudio, press Ctrl/Cmd + Shift + D, to generate (or update) your package’s .Rd files4. Under the hood, this ultimately calls roxygen2::roxygenise(). The example above generates a man/add.Rd file that looks like this:

If you’ve used LaTeX before, this should look vaguely familiar since the .Rd format is loosely based on LaTeX. If you are interested in the .Rd format, you can read more in Writing R Extensions. But generally you’ll never need to look at .Rd files, except to commit them to your package’s Git repository.

How does this .Rd file correspond to the documentation you see in R? When you run ?add, help("add"), or example("add"), R looks for an .Rd file containing \alias{add}. It then parses the file, converts it into HTML, and displays it. Here’s what the result looks like in RStudio:

In r, what includes reusable functions and documentation about how to use the functions?

The default help-seeking process looks inside installed packages, so to see your package’s documentation during development, devtools overrides the usual help functions with modified versions that know to consult the current source package. To activate these overrides, you’ll need to run devtools::load_all() at least once. If it feels like your edits to the roxygen comments aren’t having an effect, double check that you have actually regenerated the .Rd files with devtools::document() and that you’ve loaded your package. When you call ?function, you should see “Rendering development documentation …”.

To summarize, there are four steps in the basic roxygen2 workflow:

  1. Add roxygen2 comments to your .R files.

  2. Run devtools::document() or press Ctrl/Cmd + Shift + D to convert roxygen2 comments to .Rd files.

  3. Preview documentation with ?function.

  4. Rinse and repeat until the documentation looks the way you want.

Now that you understand the basic workflow, we’ll go into more detail about the syntax. roxygen2 comments start with #' and all the roxygen2 comments preceding a function are collectively called a block. Blocks are broken up by tags, which look like @tagName tagValue, and the content of a tag extends from the end of the tag name to the start of the next tag5. A block can contain text before the first tag which is called the introduction. By default, each block generates a single documentation topic, i.e. a single .Rd file6 in the man/ directory .

Throughout this chapter we’ll show you roxygen2 comments from real tidyverse packages, focusing on stringr, since the functions there tend to be fairly straightforward, leading to documentation that’s understandable with relatively little context. We attach stringr here so that its functions are hyperlinked in the rendered book (more on that in section Section 17.2.3).

Here’s a simple first example: the documentation for str_unique().

#' Remove duplicated strings #' #' `str_unique()` removes duplicated values, with optional control over #' how duplication is measured. #' #' @param string Input vector. Either a character vector, or something #' coercible to one. #' @param ... Other options used to control matching behavior between duplicate #' strings. Passed on to [stringi::stri_opts_collator()]. #' @returns A character vector, usually shorter than `string`. #' @seealso [unique()], [stringi::stri_unique()] which this function wraps. #' @examples #' str_unique(c("a", "b", "c", "b", "a")) #' #' # Use ... to pass additional arguments to stri_unique() #' str_unique(c("motley", "mötley", "pinguino", "pingüino")) #' str_unique(c("motley", "mötley", "pinguino", "pingüino"), strength = 1) #' @export str_unique <- function(string, ...) { ... }

Here the introduction includes the title (“Remove duplicated strings”) and a basic description of what the function does. The introduction is followed by five tags: two @params, one @returns, one @seealso, one @examples, and one @export.

Note that the block has an intentional line length (typically the same as that used for the surrounding R code) and the second and subsequent lines of the long @param tag are indented, which makes the entire block easier to scan. You can get more roxygen2 style advice in the tidyverse style guide.

It can be aggravating to manually manage the line length of roxygen comments, so be sure to try out Code > Reflow Comment (Ctrl/Cmd+Shift+/).

Note also that the order in which tags appear in your roxygen comments (or even in handwritten .Rd files) does not dictate the order in rendered documentation. The order of presentation is determined by tooling within base R.

The following sections go into more depth for the most important tags. We start with the introduction, which provides the title, description, and details. Then we cover the inputs (the function arguments), outputs (the return value), and examples. Next we discuss links and cross-references, then finish off with techniques for sharing documentation between topics.

For the most part, general markdown and RMarkdown knowledge suffice for taking advantage of markdown in roxygen2. But there are a few pieces of syntax that are so important we want to highlight them here. You’ll see these in many of the examples in this chapter.

Backticks for inline code: Use backticks to format a piece of text as code, i.e. in a fixed width font. Example:

#' I like `thisfunction()`, because it's great.

Square brackets for an auto-linked function: Enclose text like somefunction() and somepackage::somefunction() in square brackets to get an automatic link to that function’s documentation. Be sure to include the trailing parentheses, because it’s good style and and it causes the function to be formatted as code, i.e. you don’t need to add backticks. Example:

#' It's obvious that `thisfunction()` is better than [otherpkg::otherfunction()] #' or even our own [olderfunction()].

Vignettes: If you refer to a vignette with an inline call to vignette("some-topic"), it serves a dual purpose. First, this is literally the R code you would execute to view a vignette locally. But wait there’s more! In many rendered contexts, this automatically becomes a hyperlink to that same vignette in roxygen2’s pkgdown website. Here we use that to link to some very relevant vignettes7:

Lists: Bullet lists break up the dreaded “wall of text” and can make your documentation easier to scan. You can use them in the description of the function or of an argument and also for the return value. It is not necessary to include a blank line before the list, but that is also allowed.

#' Best features of `thisfunction()`: #' * Smells nice #' * Has good vibes

The introduction provides a title, description, and, optionally, details, for the function. While it’s possible to use explicit tags in the introduction, we usually rely on implicit tags when possible:

  • The title is taken from the first sentence. It should be written in sentence case, not end in a full stop, and be followed by a blank line. The title is shown in various function indexes (e.g. help(package = "somepackage")) and is what the user will usually see when browsing multiple functions.

  • The description is taken from the next paragraph. It’s shown at the top of documentation and should briefly describe the most important features of the function.

  • Additional details are anything after the description. Details are optional, but can be any length so are useful if you want to dig deep into some important aspect of the function. Note that, even though the details come right after the description in the introduction, they appear much later in rendered documentation.

The following sections describe each component in more detail, and then discuss a few useful related tags.

When writing the title, it’s useful to think about how it will appear in the reference index. When a user skims the index, how will they know which functions will solve their current problem? This requires thinking about what your functions have in common (which doesn’t need to be repeated in every title) and what is unique to that function (which should be highlighted in the title).

When we wrote this chapter, we found the function titles for stringr to be somewhat disappointing. But they provide a useful negative case study:

There’s a lot of repetition (“pattern”, “from a string”) and the verb used for the function name is repeated in the title, so if you don’t understand the function already, the title seems unlikely to help much. Hopefully we’ll have improved those titles by the time you read this!

In contrast, these titles from dplyr are much better8:

  • mutate(): Create, modify, and delete columns
  • summarise(): Summarize each group to fewer rows
  • filter(): Subset rows using column values
  • select(): Subset columns using their names and types
  • arrange(): Arrange rows by column values

Here we try to succinctly describe what the function does, making sure to describe whether it affects rows, columns, or groups. We do our best to use synonyms, instead of repeating the function name, to hopefully give folks another chance to understand the intent of the function.

The purpose of the description is to summarize the goal of the function, usually in a single paragraph. This can be challenging for simple functions, because it can feel like you’re just repeating the title of the function. Try to find a slightly different wording, if you can. It’s okay if this feels a little repetitive; it’s often useful for users to see the same thing expressed in two different ways. It’s a little extra work, but the extra effort is often worth it. Here’s the description for str_detect():

#' Detect the presence/absence of a match #' #' `str_detect()` returns a logical vector with `TRUE` for each element of #' `string` that matches `pattern` and `FALSE` otherwise. It's equivalent to #' `grepl(pattern, string)`.

If you want more than one paragraph, you must use an explicit @description tag to prevent the second (and subsequent) paragraphs from being turned into the @details. Here’s a two-paragraph @description from str_view():

#' View strings and matches #' #' @description #' `str_view()` is used to print the underlying representation of a string and #' to see how a `pattern` matches. #' #' Matches are surrounded by `<>` and unusual whitespace (i.e. all whitespace #' apart from `" "` and `"\n"`) are surrounded by `{}` and escaped. Where #' possible, matches and unusual whitespace are coloured blue and `NA`s red.

Here’s another example from str_like(), which has a bullet list in @description:

#' Detect a pattern in the same way as `SQL`'s `LIKE` operator #' #' @description #' `str_like()` follows the conventions of the SQL `LIKE` operator: #' #' * Must match the entire string. #' * `_` matches a single character (like `.`). #' * `%` matches any number of characters (like `.*`). #' * `\%` and `\_` match literal `%` and `_`. #' * The match is case insensitive by default.

Basically, if you’re going to include an empty line in your description, you’ll need to use an explicit @description tag.

Finally, it’s often particularly hard to write a good description if you’ve just written the function, because the purpose often seems very obvious. Do your best, and then come back later, when you’ve forgotten exactly what the function does. Once you’ve re-derived what the function does, you’ll be able to write a better description.

The @details are just any additional details or explanation that you think your function needs. Most functions don’t need details, but some functions need a lot. If you have a lot of information to convey, it’s a good idea to use informative markdown headings to break the details up into manageable sections9. Here’s an example from dplyr::mutate(). We’ve elided some of the details to keep this example short, but you should still get a sense of how we used headings to break up the content in to skimmable chunks:

#' Create, modify, and delete columns #' #' `mutate()` adds new variables and preserves existing ones; #' `transmute()` adds new variables and drops existing ones. #' New variables overwrite existing variables of the same name. #' Variables can be removed by setting their value to `NULL`. #' #' # Useful mutate functions #' #' * [`+`], [`-`], [log()], etc., for their usual mathematical meanings #' #' ... #' #' # Grouped tibbles #' #' Because mutating expressions are computed within groups, they may #' yield different results on grouped tibbles. This will be the case #' as soon as an aggregating, lagging, or ranking function is #' involved. Compare this ungrouped mutate: #' #' ...

This is a good time to remind ourselves that, even though a heading like # Useful mutate functions in the example above comes immediately after the description in the roxygen block, the content appears much later in the rendered documentation. The details (whether they use section headings or not) appear after the function usage, arguments, and return value.

For most functions, the bulk of your work will go towards documenting how each argument affects the output of the function. For this purpose, you’ll use @param (short for parameter, a synonym of argument) followed by the argument name and a description of its action.

The highest priority is to provide a succinct summary of the allowed inputs and what the parameter does. For example, here’s how str_detect() documents string:

#' @param string Input vector. Either a character vector, or something #' coercible to one.

And here are three of the arguments to str_flatten():

#' @param collapse String to insert between each piece. Defaults to `""`. #' @param last Optional string to use in place of the final separator. #' @param na.rm Remove missing values? If `FALSE` (the default), the result #' will be `NA` if any element of `string` is `NA`.

Note that @param collapse and @param na.rm describe their default arguments. This is often a good practice because the function usage (which shows the default values) and the argument description are often quite far apart in the rendered documentation. But there are downsides. The main one is that this duplication means you’ll need to make updates in two places if you change the default value; we believe this small amount of extra work is worth it to make the life of the user easier.

If an argument has a fixed set of possible parameters, you should list them. If they’re simple, you can just list them in a sentence, like in str_trim():

#' @param side Side on which to remove whitespace: `"left"`, `"right"`, or #' `"both"` (the default).

If they need more explanation, you might use a bulleted list, as in str_wrap():

#' @param whitespace_only A boolean. #' * `TRUE` (the default): wrapping will only occur at whitespace. #' * `FALSE`: can break on any non-word character (e.g. `/`, `-`).

The documentation for most arguments will be relatively short, often one or two sentences. But you should take as much space as you need, and you’ll see some examples of multi-paragraph argument docs shortly.

If the behavior of multiple arguments is tightly coupled, you can document them together by separating the names with commas (with no spaces). For example, x and y are interchangeable in str_equal(), so they’re documented together:

#' @param x,y A pair of character vectors.

In str_sub(), start and end define the range of characters to replace. But instead of supplying both, you can use just start if you pass in a two-column matrix. So it makes sense to document them together:

#' @param start,end A pair of integer vectors defining the range of characters #' to extract (inclusive). #' #' Alternatively, instead of a pair of vectors, you can pass a matrix to #' `start`. The matrix should have two columns, either labelled `start` #' and `end`, or `start` and `length`.

In str_wrap(), indent and exdent define the indentation for the first line and all subsequent lines, respectively:

#' @param indent,exdent A non-negative integer giving the indent for the #' first line (`indent`) and all subsequent lines (`exdent`).

If your package contains many closely related functions, it’s common for them to have arguments that share the same name and meaning. It would be both annoying and error prone to copy and paste the same @param documentation to every function, so roxygen2 provides @inheritParams which allows you to inherit argument documentation from another function, possibly even in another package.

stringr uses @inheritParams extensively because most functions have string and pattern arguments. The detailed and definitive documentation belongs to str_detect():

#' @param string Input vector. Either a character vector, or something #' coercible to one. #' @param pattern Pattern to look for. #' #' The default interpretation is a regular expression, as described in #' `vignette("regular-expressions")`. Use [regex()] for finer control of the #' matching behaviour. #' #' Match a fixed string (i.e. by comparing only bytes), using #' [fixed()]. This is fast, but approximate. Generally, #' for matching human text, you'll want [coll()] which #' respects character matching rules for the specified locale. #' #' Match character, word, line and sentence boundaries with #' [boundary()]. An empty pattern, "", is equivalent to #' `boundary("character")`.

Then the other stringr functions use @inheritParams str_detect to get this detailed documentation for string and pattern without having to duplicate that text.

@inheritParams only inherits docs for arguments that the function actually uses and that aren’t already documented, so you can document some arguments locally and inherit others. str_match() uses this to inherit str_detect()’s standard documentation for the string argument, while providing its own specialized documentation for pattern:

#' @inheritParams str_detect #' @param pattern Unlike other stringr functions, `str_match()` only supports #' regular expressions, as described `vignette("regular-expressions")`. #' The pattern should contain at least one capturing group.

Now that we’ve discussed default values and inheritance we can bring up one more dilemma. Sometimes there’s tension between giving detailed information on an argument (acceptable values, default value, how the argument is used, etc.) and making the documentation amenable to reuse in other functions (which might differ in some specifics). This can motivate you to assess whether it’s truly worth it for related functions to handle the same input in different ways or if standardization would be beneficial.

You can inherit documentation from a function in another package by using the standard :: notation, i.e. @inheritParams anotherpackage::function. This does introduce one small annoyance: now the documentation for your package is no longer self-contained and the version of anotherpackage can affect the generated docs. Beware of spurious diffs introduced by contributors who run devtools::document() with a different installed version of the inherited-from package.

A function’s output is as important as its inputs. Documenting the output is the job of the @returns10 tag. Here the priority is to describe the overall “shape” of the output, i.e. what sort of object it is, and its dimensions (if that makes sense). For example, if your function returns a vector you might describe its type and length, or if your function returns a data frame you might describe the names and types of the columns and the expected number of rows.

The @returns documentation for functions in stringr is straightforward because almost all functions return some type of vector with the same length as one of the inputs. For example, here’s how str_like() describes its output:

#' @returns A logical vector the same length as `string`.

A more complicated case is the joint documentation for str_locate() and str_locate_all()11. str_locate() returns an integer matrix, and str_locate_all() returns a list of matrices, so the text needs to describe what determines the rows and columns.

#' @returns #' * `str_locate()` returns an integer matrix with two columns and #' one row for each element of `string`. The first column, `start`, #' gives the position at the start of the match, and the second column, `end`, #' gives the position of the end. #' #'* `str_locate_all()` returns a list of integer matrices with the same #' length as `string`/`pattern`. The matrices have columns `start` and `end` #' as above, and one row for each match. #' @seealso #' [str_extract()] for a convenient way of extracting matches, #' [stringi::stri_locate()] for the underlying implementation.

In other cases it can be easier to figure out what to highlight by thinking about the set of functions and how they differ. For example, most dplyr functions return a data frame, so just saying @returns A data frame is not very useful. Instead, we tried to identify exactly what makes each function different. We decided it makes sense to describe each function in terms of how it affects the rows, the columns, the groups, and the attributes. For example, this describes the return value of dplyr::filter():

#' @returns #' An object of the same type as `.data`. The output has the following properties: #' #' * Rows are a subset of the input, but appear in the same order. #' * Columns are not modified. #' * The number of groups may be reduced (if `.preserve` is not `TRUE`). #' * Data frame attributes are preserved.

@returns is also a good place to describe any important warnings or errors that the user might see. For example readr::read_csv() mentions what happens if there are any parsing problems:

#' @returns A [tibble()]. If there are parsing problems, a warning will alert you. #' You can retrieve the full details by calling [problems()] on your dataset.

For your initial CRAN submission, all functions must document their return value. While this may not be scrutinized in subsequent submissions, it’s still a good practice. There’s currently no way to check that you’ve documented the return value of every function (we’re working on it) which is why you’ll notice some tidyverse functions lack output documentation. But we certainly aspire to provide this information across the board.

Describing what a function does is great, but showing how it works is even better. That’s the role of the @examples tag, which uses executable R code to demonstrate what a function can do. Unlike other parts of the documentation where we’ve focused mainly on what you should write, here we’ll briefly give some content advice and then focus mainly on the mechanics.

The main dilemma with examples is that you must jointly satisfy two requirements:

  • Your example code should be readable and realistic. Examples are documentation that you provide for the benefit of the user, i.e. a real human, working interactively, trying to get their actual work done with your package.

  • Your example code must run without error and with no side effects in many non-interactive contexts over which you have limited or no control, such as when CRAN runs R CMD check or when your package website is built via GitHub Actions.

It turns out that there is often tension between these goals and you’ll need to find a way to make your examples as useful as you can for users, while also satisfying the requirements of CRAN (if that’s your goal) or other automated infrastructure.

The mechanics of examples are complex because they must never error and they’re executed in four different situations:

  • Interactively using the example() function.
  • During R CMD check on your computer, or another computer you control (e.g. in GitHub Actions).
  • During R CMD check run by CRAN.
  • When your pkgdown website is being built, often via GitHub Actions or similar.

After discussing what to put in your examples, we’ll talk about keeping your examples self-contained, how to display errors if needed, handling dependencies, running examples conditionally, and alternatives to the @examples tag for including example code.

When preparing .R scripts or .Rmd / .qmd reports, it’s handy to use Ctrl/Cmd + Enter or the Run button to send a line of R code to the console for execution. Happily, you can use the same workflow for executing and developing the @examples in your roxygen comments. Remember to do devtools::load_all() often, to stay synced with the package source.

Use examples to first show the basic operation of the function, then to highlight any particularly important properties. For example, str_detect() starts by showing a few simple variations and then highlights a feature that’s easy to miss: as well as passing a vector of strings and one pattern, you can also pass one string and vector of patterns.

#' @examples #' fruit <- c("apple", "banana", "pear", "pineapple") #' str_detect(fruit, "a") #' str_detect(fruit, "^a") #' str_detect(fruit, "a$") #' #' # Also vectorised over pattern #' str_detect("aecfg", letters)

Try to stay focused on the most important features without getting into the weeds of every last edge case: if you make the examples too long, it becomes hard for the user to find the key application that they’re looking for. If you find yourself writing very long examples, it may be a sign that you should write a vignette instead.

There aren’t any formal ways to break up your examples into sections but you can use sectioning comments that use many --- to create a visual breakdown. Here’s an example from tidyr::chop():

#' @examples #' # Chop ---------------------------------------------------------------------- #' df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1) #' # Note that we get one row of output for each unique combination of #' # non-chopped variables #' df %>% chop(c(y, z)) #' # cf nest #' df %>% nest(data = c(y, z)) #' #' # Unchop -------------------------------------------------------------------- #' df <- tibble(x = 1:4, y = list(integer(), 1L, 1:2, 1:3)) #' df %>% unchop(y) #' df %>% unchop(y, keep_empty = TRUE)

Strive to keep the examples focused on the specific function that you’re documenting. If you can make the point with a familiar built-in dataset, like mtcars, do so. If you find yourself needing to do a bunch of setup to create a dataset or object to use in the example, it may be a sign that you need to create a package dataset or even a helper function. See Chapter 8, Section 8.4.2, and Section 16.1.1 for ideas. Making it easy to write (and read) examples will greatly improve the quality of your documentation.

Also, remember that examples are not tests. Examples should be focused on the authentic and typical usage you’ve designed for and that you want to encourage. The test suite is the more appropriate place to exhaustively exercise all of the arguments and to explore weird, pathological edge cases.

Your examples should be self-contained. For example, this means:

  • If you modify options(), reset them at the end of the example.
  • If you create a file, create it somewhere in tempdir(), and make sure to delete it at the end of the example.
  • Don’t change the working directory.
  • Don’t write to the clipboard (unless a user is present to provide some form of consent).

This has a lot of overlap with our recommendations for tests (see section Section 15.2.2) and even for the R functions in your package (see section Section 7.6). However, due to the way that examples are run during R CMD check the tools available for making examples self-contained are much more limited. Unfortunately, you can’t use the withr package or even on.exit() to schedule clean up, like restoring options or deleting a file. Instead, you’ll need to do it by hand. If you can avoid doing something that must then be undone, that is the best way to go and this is especially true for examples.

These constraints are often in tension with good documentation, if you’re trying to document a function that somehow changes the state of the world. For example, you have to “show your work”, i.e. all of your code, which means that your users will see all of the setup and teardown, even it is not typical for authentic usage. If you’re finding it hard to follow the rules, this might be another sign to switch to a vignette (section Chapter 18).

Many of these constraints are also mentioned in the CRAN repository policy, which you must adhere to when submitting to CRAN. Use find in page to locate “malicious or anti-social” to see the details.

Additionally, you want your examples to send the user on a short walk, not a long hike. Examples need to execute relatively quickly so users can quickly see the results, it doesn’t take ages to build your website, automated checks happen quickly, and it doesn’t take up computing resources when submitting to CRAN.

All examples must run in under 10 minutes.

Your examples cannot throw any errors, so don’t include flaky code that can fail for reasons beyond your control. In particular, it’s best to avoid accessing websites, because R CMD check will fail whenever the website is down.

What can you do if you want to include code that causes an error for the purposes of teaching? There are two basic options:

  • You can wrap the code in try() so that the error is shown, but doesn’t stop execution of the examples. For example, dplyr::bind_cols() uses try() to show you what happens if you attempt to column-bind two data frames with different numbers of rows:

    #' @examples #' ... #' # Row sizes must be compatible when column-binding #' try(bind_cols(tibble(x = 1:3), tibble(y = 1:2)))

  • You can wrap the code in \dontrun{}12, so it is never run by example(). The example above would look like this if you used \dontrun{} instead of try().

    #' # Row sizes must be compatible when column-binding #' \dontrun{ #' bind_cols(tibble(x = 1:3), tibble(y = 1:2))) #' }

We generally recommend using try() so that the reader can see an example of the error in action.

For the initial CRAN submission of your package, all functions must have at least one example and the example code can’t all be wrapped inside \dontrun{}. If the code can only be run under specific conditions, use the techniques below to express those pre-conditions.

An additional source of errors in examples is the use of external dependencies: you can only use packages in your examples that your package formally depends on (i.e. that appear in Imports or Suggests). Furthermore, example code is run in the user’s environment, not the package environment, so you’ll have to either explicitly attach the dependency with library() or refer to each function with ::. For example, dbplyr is a dplyr extension package, so all of its examples start with library(dplyr):

#' @examples #' library(dplyr) #' df <- data.frame(x = 1, y = 2) #' #' df_sqlite <- tbl_lazy(df, con = simulate_sqlite()) #' df_sqlite %>% summarise(x = sd(x, na.rm = TRUE)) %>% show_query()

In the past, we recommended only using code from suggested packages inside a block like this:

#' @examples #' if (requireNamespace("suggestedpackage", quietly = TRUE)) { #' # some example code #' }

We no longer believe that approach is a good idea, because:

  • Our policy is to expect that suggested packages are installed when running R CMD check13 and this informs what we do in examples, tests, and vignettes.
  • The cost of putting example code inside { ... } is high: you can no longer see intermediate results, such as when the examples are rendered in the package’s website. The cost of a package not being installed is low: users can usually recognize the associated error and resolve it themselves, i.e. by installing the missing package.

In other cases, your example code may depend on something other than a package. For example, if your examples talk to a web API, you probably only want to run them for an authenticated user, and never want such code to run on CRAN. In this case, you really do need conditional execution. The entry-level solution is to express this explicitly:

#' @examples #' if (some_condition()) { #' # some example code #' }

The condition could be quite general, such as interactive(), or very specific, such as a custom predicate function provided by your package. But this use of if() still suffers from the downside highlighted above, where the rendered examples don’t clearly show what’s going on inside the { … } block.

The @examplesIf tag is a great alternative to @examples in this case:

#' @examplesIf some_condition() #' some_other_function() #' some_more_functions()

This looks almost like the snippet just above, but has several advantages:

  • Users won’t actually see the if() { … } machinery when they are reading your documentation from within R or on a pkgdown website. Users only see realistic code.

  • The example code renders fully in pkgdown.

  • The example code runs when it should and does not run when it should not.

  • This doesn’t run afoul of CRAN’s prohibition of putting all your example code inside \dontrun{}.

For example, googledrive uses @examplesIf in almost every function, guarded by googledrive::drive_has_token(). Here’s how the examples for googledrive::drive_publish() begin:

#' @examplesIf drive_has_token() #' # Create a file to publish #' file <- drive_example_remote("chicken_sheet") %>% #' drive_cp() #' #' # Publish file #' file <- drive_publish(file) #' ...

The example code doesn’t run on CRAN, because there’s no token. It does run when the pkgdown site is built, because we can set up a token securely. And, if a normal user executes this code, they’ll be prompted to sign in to Google, if they haven’t already.

An alternative to examples is to use RMarkdown code blocks elsewhere in your roxygen comments, either ```R if you just want to show some code, or ```{r} if you want the code to be run. These can be effective techniques but there are downsides to each:

  • The code in ```R blocks is never run; this means it’s easy to accidentally introduce syntax errors or to forget to update it when your package changes.
  • The code in ```{r} blocks is run every time you document the package. This has the nice advantage of including the output in the documentation (unlike examples), but the code can’t take very long to run or your iterative documentation workflow will become quite painful.

roxygen2 provides a number of features that allow you to reuse documentation across topics. They are documented in vignette("reuse", package = "roxygen2"), so here we’ll focus on the three most important:

  • Documenting multiple functions in one topic.
  • Inheriting documentation from another topic.
  • Using child documents to share prose between topics, or to share between documentation topics and vignettes.

By default, each function gets its own documentation topic, but if two functions are very closely connected, you can combine the documentation for multiple functions into a single topic. For example, take str_length() and str_width(), which provide two different ways of computing the size of a string. As you can see from the description, both functions are documented together, because this makes it easier to see how they differ:

#' The length/width of a string #' #' @description #' `str_length()` returns the number of codepoints in a string. These are #' the individual elements (which are often, but not always letters) that #' can be extracted with [str_sub()]. #' #' `str_width()` returns how much space the string will occupy when printed #' in a fixed width font (i.e. when printed in the console). #' #' ... str_length <- function(string) { ... }

To merge the two topics, str_width() uses @rdname str_length to add its documentation to an existing topic:

#' @rdname str_length str_width <- function(string) { ... }

This technique works best for functions that have a lot in common, i.e. similar return values and examples, in addition to similar arguments.

In other cases, functions in a package might share many related behaviors, but aren’t closely enough connected that you want to document them together. We’ve discussed @inheritParams above, but there are three variations that allow you to inherit other things:

  • @inherit source_function will inherit all supported components from source_function().

  • @inheritSection source_function Section title will inherit the single section with title “Section title” from source_function().

  • @inheritDotParams automatically generates parameter documentation for ... for the common case where you pass ... on to another function.

See https://roxygen2.r-lib.org/articles/reuse.html#inheriting-documentation for more details.

Finally, you can reuse the same .Rmd or .md document in the function documentation, README.Rmd, and vignettes by using RMarkdown child documents. The syntax looks like this:

#' ```{r child = "man/rmd/filename.Rmd"} #' ```

This is a feature we use very sparingly in the tidyverse, but one place we do use it is in dplyr, because a number of functions use the same syntax as select() and we want to provide all the info in one place:

#' # Overview of selection features #' #' ```{r, child = "man/rmd/overview.Rmd"} #' ```

Then man/rmd/overview.Rmd contains the repeated markdown:

Tidyverse selections implement a dialect of R where operators make} it easy to select variables: - `:` for selecting a range of consecutive variables. - `!` for taking the complement of a set of variables. - `&` and `|` for selecting the intersection or the union of two sets of variables. - `c()` for combining selections. ...

If the Rmd file contains roxygen (Markdown-style) links to other help topics, then some care is needed. See https://roxygen2.r-lib.org/dev/articles/reuse.html#child-documents for details.