Abstract
Biological data objects often have both of the following features: (i) they are functions rather than single numbers or vectors, and (ii) they are correlated owing to phylogenetic relationships. In this paper, we give a flexible statistical model for such data, by combining assumptions from phylogenetics with Gaussian processes. We describe its use as a nonparametric Bayesian prior distribution, both for prediction (placing posterior distributions on ancestral functions) and model selection (comparing rates of evolution across a phylogeny, or identifying the most likely phylogenies consistent with the observed data). Our work is integrative, extending the popular phylogenetic Brownian motion and Ornstein–Uhlenbeck models to functional data and Bayesian inference, and extending Gaussian process regression to phylogenies. We provide a brief illustration of the application of our method.
1. Introduction
In this paper, we consider statistical inference for functionvalued data that are correlated owing to phylogenetic relationships. A schematic is given in figure 1a: in this case, given functional data observed at the tips of a phylogeny, the task is to perform inference on the (unobserved) functional data at the root of the phylogeny. Alternatively, if the phylogeny is uncertain we may wish to perform phylogenetic inference, or our interest may be inferring the dynamics of the evolutionary process that produced the data. The term ‘functionvalued’ is meant in the sense of Ramsay [2], where a datum is a continuous function f(x) of a variable x, such as time or temperature: examples are therefore curves for ambient temperature versus growth rate for caterpillars, a heart rhythm time series [3], or a spectrogram of audio data. Our approach is to combine the theory of Gaussian processes with assumptions from phylogenetics, to obtain a flexible nonparametric model for such data. Because this model effectively specifies the evolutionary dynamics of the data through the phylogeny, we note that (i) our approach generalizes the Brownian motion and Ornstein–Uhlenbeck models of continuoustime character evolution from quantitative genetics [4], and (ii) the phylogenetic tree will play the role of evolutionary time: to avoid confusion, we therefore refer to the indexing variable x above as ‘space’. The model may be used as a prior for Bayesian inference, which opens up phylogenetically aware approaches to the prediction of ancestral functions and to model selection.
Because of their generality, flexibility and mathematical simplicity, there has been substantial recent interest in the use of Gaussian process priors for Bayesian nonparametric regression [5–7]. This report may be viewed as an extension of Gaussian process regression to take account of functional data and a tree topology. Our work relates to the field of spatial statistics, in that it involves multidimensional index sets with both distance and topology; however, it is the tree topology and the conditional independence of siblings given their common ancestors that makes our approach particularly suited to the study of phylogenetically correlated data.
Functional representations of data have been in use for at least 30 years [2, 8]. Their use in (nonphylogenetic) evolutionary studies was proposed for quantitative genetics in Kirkpatrick & Heckman [9]. While the wider debate concerning classical versus functional data types is outside the scope of this report; in Ramsay [2], it is argued that statistical challenges for classical data have corresponding dual statistical challenges for functional data, so we may ask: in the application of our models to statistical inference, which classical approaches are dual to our functional approach? Classical data types investigated in the phylogenetic context include sequences of discrete symbols (e.g. encoding the presence or absence of certain characters [10], or alternatively in models for the evolution of genetic sequences), the spatial locations of a number of fixed landmark points in geometric morphometrics [11], and multivariate vectors of continuous characters or summary statistics [12]. Within the evolutionary study of multivariate vectors, the relative effects of phylogenetic and (species)specific variation have been studied [13], which relates closely to our equation (2.14) below, and the challenges of ancestor prediction and model selection have been addressed and explored statistically [14–16]. We address model selection in the supplementary material to this report and more fully in a companion paper [1], which also contains a discussion of statistical issues and performance. Beyond this duality, however, our model is suitable for use as a prior in Bayesian analysis, and in the remainder of this report, we derive our models and provide details for their use in Bayesian regression.
2. Phylogenetic Gaussian processes
2.1. Gaussian processes and regression
A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution. Examples are the Wiener (Brownian motion) and Ornstein–Uhlenbeck processes that have received considerable recent attention in the study of evolution [12,14,15,17]. Gaussian processes are characterized by their first two (cross)moments (although, unless otherwise stated, the mean will be assumed to be zero—a common assumption for Gaussian process priors, see Rasmussen & Williams [5]). The covariance function σ, which specifies how the sample values (co)vary, typically depends on a vector θ of parameters.
Suppose that a Gaussian process f is observed at a vector of coordinates L. Then, the resulting vector of sample values f(L) has a multivariate Gaussian distribution of dimension equal to L, the number of points of measurement: . Here, 𝒩 represents the Gaussian distribution, its two arguments being the mean vector and the covariance matrix σ(L, L, θ), which is the matrix of the covariances between all pairs of observation coordinates in L (where ) so that 2.1In §2.3, we derive the structure of some particular covariance functions, including the phylogenetic Ornstein–Uhlenbeck process.
The loglikelihood of the sample f(L) is then [5] 2.2
We might be interested in making inferences about the unobserved values of our random function f at a vector M of coordinates, given samples at the coordinates L. The posterior distribution of the vector f(M) given f(L) is also Gaussian and of the form [5]: 2.3where 2.4and 2.5and σ(M, L, θ) denotes the M × L matrix of the covariance function σ(m_{i}, l_{j}) evaluated at all pairs m_{i}∈M, l_{j}∈L. From equation (2.4), the posterior mean vector A consists of linear combinations of the observations, while the posterior covariance matrix B, given by (2.5), is independent of the observations. Gaussian process regression is nonparametric in the sense that no assumption is made about the structure of the model: the more data gathered, the longer the vector f(L), and the more intricate the posterior model for f(M).
We are able to combine evolutionary dynamics with functional data, because the index variable introduced above can be a point in a space of arbitrary dimension. Therefore, if we wish to model the time evolution of a functionvalued trait f(x) with indexing variable x, we could consider each point of observation as corresponding to a point (x, t) in both space and evolutionary time. Then, f(L) would represent the values of a random space–time surface at various space–time coordinates, L (analogously, the values f(L) are like a set of altimeter recordings recorded at different locations (L) on a map). A cross section through this space–time surface at a fixed value of time t yields a random curve that can be viewed as a single functionvalued trait at the fixed time t in its evolution. In §2.2, we will extend this view to processes indexed by phylogenies.
2.2. Phylogenetic covariance function
Our aim in this section is to build a Gaussian process model for the evolution of a functionvalued trait along a phylogenetic tree T by allowing T to play the role of evolutionary time, rather than the linear time variable t used above. Suppose, therefore, that each observation corresponds to a point (x, t) in the space–phylogeny S × T: that is, x ∈ S is the value under consideration of the spatial (indexing) variable, and t ∈ T is the point under consideration on the phylogeny (t is not just a time coordinate but also indicates a branch of the phylogeny). We will do this by constructing a covariance function Σ_{T}(l_{i}, l_{j}) when the l_{i}, l_{j} are points in S × T, calling it the phylogenetic covariance function. In order to obtain a unique phylogenetic covariance function Σ_{T}, we will make two assumptions that are natural in the context of evolution [12]:
Assumption 2.1. Conditional on their common ancestors in the phylogenetic tree T, any two functionvalued traits are statistically independent.
Assumption 2.2. The statistical relationship between a functionvalued trait and any of its descendants in T is independent of the topology of T.
Assumption 2.2 means that our statistical model of the evolutionary process is identical along paths through T from the root to any tip, and we call this the marginal process (this assumption could be generalized, for example to model unequal rates of evolution along different branches of T). As in Felsenstein [12] and related work, we need the assumption that T is a chronogram, having both a tree topology and a distance metric. We will call the distance between a point t ∈ T and the root of T the ‘depth’ of t, and denote it by the plain typeface symbol t.
The marginal process is of course Gaussian, and so it is sufficient to specify its covariance function Σ on S × T, where T is the set of all dates in T. In fact, each different marginal covariance function specifies a different phylogenetic covariance function, and so a wide class of phylogenetic covariance functions may be constructed. In turn, these models offer an equally wide choice of priors for Bayesian evolutionary inference with functionvalued traits.
In proposition 2.3, we present our mathematical results in the simplest case when the marginal covariance function is space–time separable: that is, when there exist a spaceonly covariance function K(x_{1},x_{2}) and a timeonly covariance function k(t_{1}, t_{2}) such that 2.6Proposition 2.3.

1. If the marginal covariance function Σ is space–time separable, then the phylogenetic covariance function Σ_{T} is also space–time separable, i.e. 2.7

where k_{T}(t _{1},t _{2}) is the phylogenetic covariance function constructed from the timedependent component k in (2.6) and K(x_{1}, x_{2}) is the spacedependent component.

2. When the timedependent component k of (2.6) specifies a process that is Markovian in time, we have the simple expression 2.8

where t_{12} is the most recent common ancestor of t_{1} and t_{2} (and t_{12} is its depth in T). In particular, we have the following corollaries:


(a) if k(t_{1},t_{12}) = min(t_{1},t_{12}) so that we have a Wiener process in evolutionary time as in Felsenstein [12], then k_{T}(t_{1},t_{2}) = t_{12}, which is the variance of the evolutionary time component of variation evaluated at the most recent common ancestor t_{12}.

(b) if k is isotropic so that k(t_{1},t_{2}) is a function of t_{1}−t_{2} only, it does not necessarily follow that k_{T}(t_{1},t_{2}) is isotropic (meaning a function of the patristic distance between t_{1} and t_{2} only). In fact, k_{T} is only isotropic when k is the Ornstein–Uhlenbeck covariance.


3. Let Y be a phylogenetic Gaussian process with a space–time separable covariance function Σ_{T} which factorizes as in (2.7). If K is a continuous degenerate Mercer kernel, then there exist an integer n and deterministic functions and univariate Gaussian processes X_{i}, for i =1 … n such that the Gaussian process given by

2.9

has the same distribution as Y.
Further mathematical detail can be found in the electronic supplementary material. Part 2(b) of proposition 2.3 helps clarify the relationship between phylogenetic Gaussian process models and studies such as [18] based on autocorrelation functions of patristic distance, namely that the two are compatible only when the phylogenetic Ornstein–Uhlenbeck process is assumed. Part 3 of the proposition establishes that a convenient expansion using basis functions can be used for a wide range of phylogenetic Gaussian processes with space–time separable covariance functions. This expansion is useful for statistical inference, as it justifies the use of dimensionreduction techniques (see the paper [1]).
2.3. Examples
We now illustrate two ways in which proposition 2.3 may be used to make priors for Bayesian inference on phylogenetically related functionvalued traits (henceforth referred to simply as ‘traits’). They differ in their approach to modelling spatial variation, and may be called the spatially inhomogeneous (figure 1a) and spatially homogeneous (figure 1b) models, respectively. In both cases, we assume the simplest possible structure for the marginal covariance function Σ: that it is space–time separable as in (2.6). We also assume that, conditional on any given trait, its ancestor and progenitor traits are statistically independent: this corresponds to choosing a temporal component which is Markovian. The only Markovian Gaussian processes are the class of Ornstein–Uhlenbeck processes, and the stationary examples have the covariance function where the hyperparameter θ_{2} specifies the characteristic length scale for the evolutionary dynamics. From proposition 2.3, we obtain 2.10where d_{T} (t_{1}, t_{2}) denotes the patristic distance between t_{1} and t_{2} (note that the phylogenetic Ornstein–Uhlenbeck model is isotropic, see end of proposition 2.3).
2.3.1. Spatially homogeneous model
If our prior belief is that variation in the trait is homogeneous over all values of x ∈ S, then we should choose the spatial covariance K to be stationary, and if the traits are typically smooth we should choose K to generate smooth random functions. An example is the squared exponential covariance function: 2.11Here, the hyperparameter θ_{1} fixes the characteristic length scale of the random functions. This choice of K gives 2.12 2.13
This simple example may be combined by summation to construct other spatially homogeneous phylogenetic covariance functions, although the separability property is typically then lost. The following phylogenetic covariance function, Σ′_{T} contains an uncorrelated noise term whose influence is controlled by the choice of the parameter : 2.14where δ is the Kronecker delta. When functional data are sampled at a finite set of space–time points L, the phylogenetic covariance function (2.14) may be used to specify a prior for Bayesian regression as described in §2.1. Figure 1b gives a schematic of functional data and the posterior distribution for the root function, when all functions are discretely sampled on a regular lattice. An illustrative example developing figure 1b is provided in the electronic supplementary material.
2.3.2. Spatially inhomogeneous model
We may alternatively construct a prior using the representation (2.9). This involves choosing deterministic spatial basis functions and univariate phylogenetic Gaussian processes X_{1, … ,}X_{n}. The spatial basis may be specified a priori, or alternatively obtained by functional decomposition of observed data: in the paper [1], we present a practical methodology for this empirical Bayesian approach. If we assume that the X_{i} are independent phylogenetic Ornstein–Uhlenbeck processes on T with noise, then their respective covariance functions are: 2.15The full covariance function of our prior distribution is then: 2.16Figure 1a gives a schematic of functional data and the posterior distribution for the root function, when all functions are constructed from the spatial basis ϕ_{1,} … ,ϕ_{n}.
3. Discussion
In this paper, we have exploited the powerful Bayesian inference architecture provided by Gaussian processes to address evolutionary questions for functionvalued data. The closedform posterior distribution (2.3) offers a straightforward approach to the prediction of unobserved functionvalued traits, and the associated explicit likelihood function is an attractive feature for Bayesian inference in the presence of uncertainty over the phylogenetic tree or the evolutionary process. The approach is suitable both for complete (or dense) observations of functionvalued traits and for sparsely and even irregularly sampled traits, with missing observations [19].
 Received August 2, 2012.
 Accepted October 8, 2012.
 © 2012 The Author(s) Published by the Royal Society. All rights reserved.