Abstract
Introduction
Macro-evolutionary studies frequently use phylogenetic trees to examine diversification rate variation within and among groups. Variation in diversification rate in a phylogenetic tree can inform evolutionary hypotheses regarding the role of past geological or climatic events, evolutionary novelty, and adaptive and non-adaptive radiations.1–3 Various methods have been developed to examine diversification rate variation, including, but not limited to, those that examine the accumulation of lineages through time,4–7 tree balance,8–10 distribution of tree branch lengths, 11 and the shape of ordered cladogenic events. 12
Here, we present the R library iteRates, which implements the parametric rate comparison (PRC) test, 13 a new method and approach for identifying rate variation in phylogenetic trees. An in-depth description of the PRC, as well as an examination of its statistical power and false positive rates, can be found in the study by Shah et al. 13 Briefly, the PRC examines the fit of a distribution of branch lengths extracted from a phylogenetic tree to standard statistical distributions. 14 The lengths of terminal branches are treated as censored at the time of sampling. Internal branch lengths and terminal branch lengths are jointly modeled using the censored form of a given distribution. This approach differs from other approaches aimed at identifying rate heterogeneity in a phylogenetic tree in that it does not attempt to estimate the parameters of a particular model of diversification, rather it simply examines the statistical properties of the distribution of branch lengths. By iterating through subtrees in a tree, the PRC can be used to identify subclades where diversification rate differs from the remainder of the tree (see Fig. 1 in Shah et al. 13 ). The PRC can also be used to compare a priori defined groups, and does not require that these groups be monophyletic. The PRC can be used both as a hypothesis testing tool and for exploratory data analysis using functions in iteRates library.

A phylogenetic tree showing regions of rate variation identified by the function comp.subs using the function color.tree.plot.
Description
PRC - the Parametric Rate Comparison Test
The PRC test is implemented using the function comp.subs. The function iterates through subtrees of a phylogenetic tree and compares the distribution of branch lengths in that subtree to the remainder of the tree. The function has arguments that allow flexibility for the user to govern how comp.subs implements the PRC. An example usage of comp.subs showing some important arguments at their default values is as follows:
Here, the argument tree is an ultrametric phylogenetic tree of object class phylo. The argument thr indicates the threshold for the minimum number of branches required for a subtree to be considered in the analysis. The argument srt determines how the branch that links a subtree to the remainder of the tree is treated in the analysis. The default is to drop the linking branch from the analysis because there is no way of knowing exactly at what point an inferred rate change might have occurred along the branch; however, the user has the option to include this linking branch as part of the subtree or part of the remainder of the tree. There are four different statistical distributions that iteRates uses to model the distribution of branch lengths: exponential, Weibull, log-normal, and variable rates.13,14 The argument mod.id is used to indicate which of these distributions are used in the PRC. The default is to consider only an exponential model. When multiple distributions are included in the analysis, comp.subs will use the Akaike information criterion (AIC) 15 scores to pick the best-fit model for each subtree vs. remainder of the tree comparison.
There is an expectation that false-positive rates will increase as the tree under scrutiny deviates from a pure-birth model. A whole-tree randomization test can be used to avoid spurious inferences based on observed statistical significance for each node. Here, the topology of the tree is randomized while retaining the observed branching times. For each randomization, the PRC test is employed and the number of statistically significant clades is recorded. Following many randomizations, the observed number of significant clades is compared against the distribution of the number of significant clades from the randomized trees. The whole-tree randomization test is implemented in the function tree.rand.test. This function requires an ultrametric tree of object class phylo and has arguments that allow the user to determine the number of randomizations and the statistical distributions used.
The results of the PRC can be visualized on a phylogenetic tree using the function color.tree.plot. Various options are available to the user to illustrate the relative direction, magnitude, and statistical support for a diversification rate change. The function requires a tree object class phylo and the result object from comp.subs. Figure 1 shows an example of a phylogenetic tree showing regions with rate variation using default settings.
The PRC test assumes that taxon sampling is complete, although it is robust to incomplete taxon sampling if taxon sampling is random. 13 For situations where there is ambiguity as to what complete taxon sampling might be, for example if there is a recent radiation, or when a tree is based on a higher taxonomic rank (eg, a genus-level phylogeny), the user might choose to trim a particular amount of time from the tree. For example, a researcher might decide that taxon sampling is complete up until five million years before present in the phylogeny. The function trimTree can be used to trim a specified amount of time (or branch length) from the tips of a tree. This function returns a list that contains the trimmed tree and a key indicating the taxa from the original tree that have been collapsed to each terminal node of the trimmed tree.
K -Clades PRC
The
Here, tree.Kclades$subtree is the list of all subtrees provided by id.subtrees. The argument focal indicates the identifier of the subtrees of interest for comparison. Figure 2 shows a hypothetical phylogenetic tree with the subtrees of interest indicated. The argument k indicates the maximum number of different rates to explore, in this case three. The function will compare all possible models of rate variation, ranging from all subtrees having the same rate, to k number of subtrees having different rates. As in the function comp.subs, the user can choose any combination of the four available statistical distributions using the argument mod.id. The default is to fit only an exponential distribution (mod.id = c (1,0,0,0)). Parameter values, log likelihood, and AIC scores and ΔAIC are returned. The results can be summarized using the function tab.summary, which will restrict the returned output to a ΔAIC limit determined by the user and the best-fit model for each k. An example of tab.summary and output is as follows:

A phylogenetic tree showing the subtrees identified by the function id.subtrees. For the example in the text, the subtrees defined by nodes 55, 27, and 3 make up the three groups examined using the
In this example, the best model is one that groups the first and second subtrees (subtrees 55 and 27, respectively) to share the same rate, and the third subtree (subtree 3) to be modeled as having a separate rate. Note, based on ΔAIC, modeling each subtree as having a separate rate also falls within the group of “best” models. The tab.summary function can be useful when exploring rate variation across numerous subtrees because the number of possible groupings can get quite large; for example, when k = 5, there are 52 different groupings explored.
Conclusion
The package iteRates provides the library functions required to employ the PRC test and the
Author Contributions
JAF, PS, and BMF conceived and designed the experiments, and analyzed the data. JAF wrote the first draft of the manuscript. JAF, PS, and BMF contributed to the writing of the manuscript, agreed with the manuscript results and conclusions, jointly developed the structure and arguments for the paper, and made critical revisions and approved the final version. All authors reviewed and approved the final manuscript.
