-
Notifications
You must be signed in to change notification settings - Fork 755
Documenting functions
Documentation is one of the most important aspects of good code. Without it, users won't know how to use your package, and are unlikely to do so. Documentation is also useful for you in the future (so you remember what the heck you were thinking!), and for other developers working on your package.
The standard way to write R documentation is to create .Rd
files in the man/
directory. These files describe each object (function, data set, class, generic or method) in your package, documenting how it works and giving the user examples of how to use it. Rather than writing Rd files by hand, we're going to use the roxygen2 package. With roxygen, you write the documentation in comments next to each function, and then run an R script to create the main files.
This workflow has a number of advantages over writing .Rd
files by hand:
-
Code and documentation are adjacent so when you modify your code, it's easy to remember that you need to update the documentation.
-
Roxygen2 dynamically inspects the objects that are documenting, so it can figure out a lot of information
.Rd
files need by itself. For example, it can automatically add links to super and subclasses. -
Rt takes care of other files that are fiddly or downright painful to maintain by hand: the namespace, collate order in description, and the demos index.
-
Abstracts over the differences in documenting S3 and S4 methods, generics and classes so that they behave basically the same.
Note: this chapter currently describes the use of the in-development roxygen3 package. This will be merged back into roxygen2 in the near future.
This chapter will proceed as follows. First, we'll discuss the basic documentation process, what each step does, and how to see the output at every stage. Next we'll show you, at a high-level, how to document all the different types of R objects (functions, datasets, s3 methods, s4 methods, s4 classes, r5 classes, and packages). After that you'll learn how to format text within the documentation
You've probably used help a lot, but you might not be aware of the more advanced features:
package?lubridate
class?myclass
methods?
method?
method?combo("numeric", "numeric")
?combo(1:10, letters)
Roxygen automatically takes care of generating the special aliases needed to make these lookups work.
How does help work? It finds the matching Rd file, and compiles it to either text or html output. It's complicated by the fact that binary packages don't include individual Rd files, they actually included a pre-parsed database of Rd files.
There's a three step process to go from the R comments in the source files to Rd files to human readable documentation. The process starts with special comments starting with #'
that indicate a comment is a roxygen comment:
#' Order a data frame by its columns.
#'
#' This function completes the subsetting, transforming and ordering triad
#' with a function that works in a similar way to \code{\link{subset}} and
#' \code{\link{transform}} but for reordering a data frame by its columns.
#' This saves a lot of typing!
#'
#' @param df data frame to reorder
#' @param ... expressions evaluated in the context of \code{df} and
#' then fed to \code{\link{order}}
#' @keywords manip
#' @export
#' @examples
#' mtcars[with(mtcars, order(cyl, disp)), ]
#' arrange(mtcars, cyl, disp)
#' arrange(mtcars, cyl, desc(disp))
arrange <- function(df, ...) {
ord <- eval(substitute(order(...)), df, parent.frame())
unrowname(df[ord, ])
}
To convert roxygen comments to the official .Rd
files, we'll use devtools::document()
. Like the other devtools functions you've seen so far, it takes a package directory as it's first argument, and if you omit it, it will use the current working directory.
This produces an .Rd
file in the man/
directory:
\name{arrange}
\alias{arrange}
\title{Order a data frame by its columns.}
\usage{arrange(df, ...)}
\description{
This function completes the subsetting, transforming and
ordering triad with a function that works in a similar
way to \code{\link{subset}} and \code{\link{transform}}
but for reordering a data frame by its columns. This
saves a lot of typing!
}
\keyword{manip}
\arguments{
\item{df}{data frame to reorder}
\item{...}{expressions evaluated in the context of
\code{df} and then fed to \code{\link{order}}}
}
\examples{mtcars[with(mtcars, order(cyl, disp)), ]
arrange(mtcars, cyl, disp)
arrange(mtcars, cyl, desc(disp))}
Rd files are a special file format loosely based on latex. You can read more about the Rd format in the R extensions manual. We'll avoid discussing Rd files as much as possible, focussing instead on what you need to know about roxygen.
When you request documentation, R then converts this file into the display you're probably used to seeing:
When you use help
or example
it looks for the Rd files in the installed package. This isn't very useful for package development, because we want to retrieve the .Rd
files from the source package. devtools
provides two functions to do this dev_help
and dev_example
- these behave similarly to help
and example
but look in source packages you have loaded with load_all
, rather than installed packages you've loaded with library
.
As you've seen roxygen comments have a special format. They start with #'
and they include tags of the form @tagname
that break the documentation up into pieces. The content of a tag extends from the tag name to the next tag, and they can span multiple lines.
If you want to put a literal @
in the documentation, use @@
.
Each documentation block starts with a text description. The first sentence/line becomes the title of the documentation. That's what you see when you look at help(package = mypackage)
and is shown at the top of the documentation for each package. It should be succinct. The second paragraph is the description: this is the first thing shown in the documentation and should briefly describe what the function does. The third and subsequence paragraphs go in the details: this is a (often long) section that comes after the description of the parameters. If you want to override the default behaviour you can use the @title
, @description
and @details
tags.
Here's an example showing what the documentation for sum
might look like if it was written using roxygen:
#' Sum of Vector Elements
#'
#' \code{sum} returns the sum of all the values present in its arguments.
#'
#' This is a generic function: methods can be defined for it directly
#' or via the \code{Summary} group generic. For this to work properly,
#' the arguments \code{...} should be unnamed, and dispatch is on the
#' first argument.'
sum <- function(..., na.rm = TRUE) {}
The following documentation is equivalent, but uses explicit tags. This is not necessary unless you want to have a multiple paragraph description or title, or want to omit the description (in which case roxygen will replace with the title).
#' @title Sum of Vector Elements
#'
#' @description
#' \code{sum} returns the sum of all the values present in its arguments.
#'
#' @details
#' This is a generic function: methods can be defined for it directly
#' or via the \code{Summary} group generic. For this to work properly,
#' the arguments \code{...} should be unnamed, and dispatch is on the
#' first argument.'
sum <- function(..., na.rm = TRUE) {}
If use explicit tags you can put them in any order and still get the same output. The following documentation block would produce exactly the same Rd file and final documentation as the previous two.
#' @details
#' This is a generic function: methods can be defined for it directly
#' or via the \code{Summary} group generic. For this to work properly,
#' the arguments \code{...} should be unnamed, and dispatch is on the
#' first argument.'
#'
#' @title Sum of Vector Elements
#'
#' @description
#' \code{sum} returns the sum of all the values present in its arguments.
#'
sum <- function(..., na.rm = TRUE) {}
You can also opt to document multiple functions in one file. If you want to do so, use the @rdname
tag to manully specify the name of the rd file (if you omit this it's done automatically in a way that every object gets its own documentation). You should be careful about documenting too many functions in one file because it gets confusing (which parameter belongs to which function?), but it can be useful if the functions are tightly connected.
mean <- function()
You can get documentation on a given tag with ?TagName
. For example, if you wanted to get help on the param tag, you'd do ?TagParam
. (Unfortunately you can't do ?@param
because of a technical limitation in Rd files.)
When documenting a function, the first decision you need to make is whether you want to export it or not. Exporting is described more in namespaces, but basically when you export a function you are saying it is ready for use and you are making a commitment to keep it around in a similar form for the near future. All exported functions need to be documented. You don't have to document internal functions, but it can be a good idea if they're particularly complicated. Few users will read the documentation for internal functions, but it will be useful for you so when come back to them in the future you don't need to struggle to recall what the inputs and outputs are.
The following code shows the arrange function and its documentation from the plyr package.
#' Order a data frame by its columns.
#'
#' This function completes the subsetting, transforming and ordering triad
#' with a function that works in a similar way to \code{\link{subset}} and
#' \code{\link{transform}} but for reordering a data frame by its columns.
#' This saves a lot of typing!
#'
#' @param df data frame to reorder
#' @param ... expressions evaluated in the context of \code{df} and
#' then fed to \code{\link{order}}
#' @keywords manip
#' @export
#' @examples
#' mtcars[with(mtcars, order(cyl, disp)), ]
#' arrange(mtcars, cyl, disp)
#' arrange(mtcars, cyl, desc(disp))
arrange <- function(df, ...) {
ord <- eval(substitute(order(...)), df, parent.frame())
unrowname(df[ord, ])
}
The tags used here are:
-
@param arg description
- a description for each function argument. This can span multiple lines (or even paragraphs) if necessary. -
@export
- a flag indicating that this function should be exported for use by others. Described in more detail in namespaces. -
@examples
- examples of the function in use. In my opinion, this is the most important part of the documentation - these are what most people will look at first to figure out how to use the function. I always put this last.
Other tags that you might find useful are:
-
@return
- the type of object that the function returns
The following tags apply to all types of documentation
-
@author
, in the formFirstname Lastname <email@@address.com>
. Use this if some of the functions are written by people other than the author of the package. -
@seealso
- to provide pointers to other related topics -
@references
- references to scientific literature on this topic -
@aliases
- a list of additional topic names that will be mapped to this documentation when the user looks them up from the command line. -
@family
S3 generics are documented in the same way as functions - there is typically no reason for a user to know (or care) that a function is a generic. (Although you may want to mention it in the docs so that other developers know that they can extend it with their own methods).
If you @export
a S3 method, all methods defined in your package will also get exported appropriately (using the S3method()
namespace directive): they will be available when you call the generic, but users can not call the methods directly. For example, the wday
generic in lubridate
is exported:
library(lubridate)
wday(Sys.Date(), label = TRUE)
wday(1, label = TRUE)
But you can't call any of the methods directly:
wday.default(Sys.Date(), label = TRUE)
wday.numeric(1, label = TRUE)
lubridate:::wday.default(Sys.Date(), label = TRUE)
lubridate:::wday.numeric(1, label = TRUE)
This is good practice because it hides the internal implementation details of your functions. Users should not rely on specific behaviour (for example, in this case we need to call wday.default
for Date objects). This makes it easier for you to change how the functions work in the future - for example, you could change from S3 to S4 dispatch.
You only need to explicitly @export
a method if it's for a generic defined in another package. This is one of the easiest things to forget, and it creates subtle bugs that R CMD check
doesn't report and are hard to track down. Future versions of roxygen should hopefully make this easier.
Most of the time you don't need document S3 methods - particularly if they are for simple generics like print
. However, if your method is more complicated you should document in. In base R, you can find documentation for more complex methods like predict.lm
, predict.glm
, anova.glm
and so on.
The same rules apply when documenting S4 generics and methods:
-
Document the generic like you would document any function. Document methods that are more complex or have significant differences in behaviours.
-
You don't need to export methods if they're for a generic you defined (just export the generic). Export all methods for generics defined in other packages.
-
Use
@genericMethods
in the generic documentation if you want an automated listing of all methods implemented for the generic.
Currently there is no way to use roxygen to document R5 methods. Use the standard docstring format.
Since S3 classes have no formal structure, you should document the constructor function.
Typically you should the setClass
call, and if present, the constructor function. Unless you want to separate out internal documentation for the class and the public interface (for the constructor), I recommend using @rdname
to document both in the same file. This will mean that ?classname
and class?classname
go to the same place.
Export the class if you want other developers to be able to write subclasses of your class. Export the constructor if you want users to be able to use it.
The following documentation excerpt comes from the diamonds dataset in ggplot2. We can't document the object directly (because it lives in the data directory), so we document NULL
and provide the @name
of the object we really want to document.
#' Prices of 50,000 round cut diamonds.
#'
#' A dataset containing the prices and other attributes of almost 54,000
#' diamonds. The variables are as follows:
#'
#' \itemize{
#' \item price. price in US dollars (\$326--\$18,823)
#' \item carat. weight of the diamond (0.2--5.01)
#' ...
#' }
#'
#' @docType data
#' @keywords datasets
#' @format A data frame with 53940 rows and 10 variables
#' @name diamonds
NULL
There are a few new tags:
-
@docType data
which indicates that this is documentation for a dataset. -
@format
, which gives an overview of the structure of the dataset. If you omit this, roxygen will automatically add something based on the first line ofstr
output -
@source
where you got the data form, often a\url{}
.
As well package level documentation resources, every package should also have it's own documentation page.
This documentation topic should contain an overview documentation topic that describes the overall purpose of the package, and points to the most important functions. This topic should have @docType package
and be aliased to package-pkgname
and pkgname
(unless there is already a function by that name) so that you can get an overview of the package by doing ?pkgname
or package?pkgname
.
There are still relatively few packages that provide package documentation, but it's an extremely useful tool for users, because instead of just listing functions like help(package = pkgname)
it organises them and shows the user where to get started.
The example below shows the basic structure, as taken from the documentation for the lubridate
package:
#' Dates and times made easy with lubridate.
#'
#' Lubridate provides tools that make it easier to parse and
#' manipulate dates. These tools are grouped below by common
#' purpose. More information about each function can be found in
#' its help documentation.
#'
#' Parsing dates
#'
#' Lubridate's parsing functions read strings into R as POSIXct
#' date-time objects. Users should choose the function whose name
#' models the order in which the year ('y'), month ('m') and day
#' ('d') elements appear the string to be parsed:
#' \code{\link{dmy}}, \code{\link{myd}}, \code{\link{ymd}},
#' \code{\link{ydm}}, \code{\link{dym}}, \code{\link{mdy}},
#' \code{\link{ymd_hms}}).
#'
#' ...
#'
#' @references Garrett Grolemund, Hadley Wickham (2011). Dates and Times
#' Made Easy with lubridate. Journal of Statistical Software, 40(3),
#' 1-25. \url{http://www.jstatsoft.org/v40/i03/}.
#' @import plyr stringr
#' @docType package
#' @name lubridate
NULL
Important components are:
-
The general overview of the package: what it does, and what are the important pieces.
-
Like for data sets, there isn't a object that we can document directly so document
NULL
and use@name
to say what we're actually documenting -
@docType package
to indicate that it's documenting a package. This will automatically add the corect aliases so thatpackage?yourpackage
works. -
@references
point to any published material about the package that users might find help.
The package documentation should not contain a verbatim list of functions or copy of DESCRIPTION
. This file is for human reading, so pick the most important elements of your package.
Within roxygen text, you use the usual R documentation formatting rules, as summarised below. A fuller description is available in the R extensions manual.
Sections are indicated by the @section tag, followed by the title (which should be in sentence case) and a colon. Subsections are indicated by latex-style command; the first argument is the subsection title, and the second argument contains the subsection content. The following example illustrates how to add an arbitrary section:
@section Warning:
You must not call this function unless ...
\subsection{Exceptions}{
Apart from the following special cases...
}
The section ends when another tag is declared, or when the roxygen2 block ends.
-
Ordered (numbered) lists:
\enumerate{ \item First item \item Second item }
-
Unordered (bulleted) lists
\itemize{ \item First item \item Second item }
-
Definition (named) lists
\describe{ \item{One}{First item} \item{Two}{Second item} }
Tables are created with the tabular command which has two arguments:
-
Column alignment: a letter for each column.
l
= left alignment,c
= centre,r
= right. -
Table contents. Columns separated by
\tab
, rows separated by\cr
.
The following function will turn an R data frame into the correct format. It ignores column and row names, but should get you started.
tabular <- function(df) {
stopifnot(is.data.frame(df))
align <- function(x) if (is.numeric(x)) "r" else "l"
col_align <- vapply(df, align, character(1))
contents <- do.call("paste", c(lapply(df, format),
list(sep = " \\tab ", collapse = "\\cr\n")))
mat <- matrix(unlist(lapply(df, format)), ncol = ncol(df))
paste("\\tabular{", paste(col_align, collapse = ""), "}{\n",
contents, "}\n", sep = "")
}
\eqn
for inline, \deqn
for display. Standard latex (no extensions).
-
\emph{text}
: emphasised text, usually displayed as italics -
\strong{text}
: strong text, usually displayed in bold -
\code{text}
,\pkg{package_name}
,\file{file_name}
-
External links:
\email{email_address}
,\url{url}
,\href{url}{text}
-
\link[package]{function}
- the first argument can be omitted if the link is in the current package, or the base package. Will usually be wrapped inside\code
:\code{\link{fun}}
Since R.XX, help has been...