## Abstract

Models that deal with the individual level of populations have shown the importance of stochasticity in ecology, epidemiology and evolution. An increasingly common approach to studying these models is through stochastic (event-driven) simulation. One striking disadvantage of this approach is the need for a large number of replicates to determine the range of expected behaviour. Here, for a class of stochastic models called Markov processes, we present results that overcome this difficulty and provide valuable insights, but which have been largely ignored by applied researchers. For these models, the so-called Kolmogorov forward equation (also called the ensemble or master equation) allows one to simultaneously consider the probability of each possible state occurring. Irrespective of the complexities and nonlinearities of population dynamics, this equation is linear and has a natural matrix formulation that provides many analytical insights into the behaviour of stochastic populations and allows rapid evaluation of process dynamics. Here, using epidemiological models as a template, these ensemble equations are explored and results are compared with traditional stochastic simulations. In addition, we describe further advantages of the matrix formulation of dynamics, providing simple exact methods for evaluating expected eradication (extinction) times of diseases, for comparing expected total costs of possible control programmes and for estimation of disease parameters.

## 1. Introduction

The foundations of ecological and epidemiological modelling are largely based upon deterministic equations for dynamics of populations. However, a growing body of research suggests that demographic stochastic effects, due to the random nature of population events, can cause dramatic deviations from this deterministic ideal (Bartlett 1956; Rand & Wilson 1991; Fox 1993; Grenfell *et al*. 1998; Keeling *et al*. 2000; Spagnolo *et al*. 2003; Coulson *et al*. 2004). Three notable elements separate stochastic from deterministic dynamics. The first and most obvious is the random nature of the dynamics leading to differences between replicates and temporal variation, where the deterministic model will often predict an unvarying equilibrium solution (Taylor 1961; Hanski & Woiwod 1993; Keeling & Grenfell 1999). The second is the possibility of stochastic extinction, where, by chance, individuals may fail to ‘reproduce’ and the species dies out (Bartlett 1956; Lande 1993; Grenfell *et al*. 1995). This is important in most avenues of population modelling as recovery from extinction can usually only occur due to interactions with an external population. Finally, stochasticity induces fluctuations away from the deterministic equilibrium (attractor), so that realizations of the stochastic process often ‘track’ the deterministic transient path. This can often lead to stochastic resonance where stochasticity causes the population to oscillate at or near its natural frequency (Renshaw 1991; Blarer & Doebeli 1999; McKane & Newman 2005; Alonso *et al*. 2006; Ross 2006*b*).

Commonly, modelling of stochastic populations is performed using integer-based event-driven models (Gillespie 1976; Renshaw 1991), mimicking the supposed behaviour of the real system. Such methods of simulation generally allow a range of complex biologically realistic behaviour to be incorporated in the underlying model and offer an intuitive modelling framework to many applied researchers. However, given a single simulation, it is not clear whether the simulated dynamics are representative of average behaviour or merely a chance outlier due to a rare combination of events. Therefore, large numbers of replicate simulations are required to establish confidence in results. The same is true in the situation where our interest is in a rare event, such as occasional extinctions or unusually large epidemics (although methods have been developed to improve efficiency, e.g. *importance sampling* and the *cross-entropy method*; Rubinstein & Kroese 2004). In essence, event-driven models are *in silico* experiments and therefore the results must be subject to the same statistical treatments as would be employed for any experimental observation.

A number of approximation methods exist, which overcome the requirement of a large number of simulations by providing analytical approximations for expected behaviour of the population and often approximations for variability in this behaviour. Two widely used methods are moment closure techniques (Keeling *et al*. 2000; Nåsell 2001, 2002, 2003*a*,*b*) and diffusion approximations (Kurtz 1970, 1971; Pollett 1990; van Kampen 1992; Jacquez & Simon 1993; Andersson & Britton 2000; Ross 2006*a*,*b*). However, such methods are generally most accurate when the population size is sufficiently large. Since we are often interested in stochastic effects when population sizes are small, and thus stochasticity has a relatively large impact on dynamics, it would be desirable to have similar methods that operate effectively when population sizes are small. We present such a method for Markov processes. Markov processes are a class of stochastic process in which the future state of the population is determined solely by the current state—the system has no memory (Norris 1997; Andersson & Britton 2000); the vast majority of event-driven stochastic models used in ecology and epidemiology are Markov processes.

The basic methodology presented here pre-dates both simulation and analytical approximations; it allows the complete ensemble of stochastic behaviour to be predicted precisely by a (very) large set of deterministic equations (e.g. for *susceptible–infectious–susceptible* (SIS) dynamics, *N*+1 differential equations are required and for *susceptible–infectious–recovered* (SIR) dynamics, (1/2)(*N*+1)(*N*+2) differential equations are required, where *N* is the number of individuals). In essence, the deterministic ‘Kolmogorov forward equation’ (also called the ensemble or master equation) contains a single equation for the probability of being in each possible state, with the dynamics governed by the rates of transition between states (Norris 1997). Therefore, by solving one set of differential equations, we can obtain a complete description of all possible behaviours of the stochastic system. While these methods have existed for many years, they appear not to be widely known or used by many applied practitioners. However, with continuing advances in computing power, these techniques are becoming increasingly applicable to real-world problems. In addition, these ensemble equation methods can be used to test the validity of analytical approximation methods (such as diffusion approximations and moment closure techniques) and provide a richer understanding of the dynamics of stochastic event-driven simulations.

The solution of large numbers of differential equations generated by this method is relatively straightforward and fast on modern computers, and has been exploited by a number of researchers (Dieckmann & Law 1996; Keeling 2000; Keeling *et al*. 2000; Stollenwerk & Briggs 2000; Alonso & McKane 2002; Stollenwerk & Jansen 2003; Viet & Medley 2006). However, a careful examination of the Kolmogorov forward equation shows that it is linear in terms of the probabilities of being in each state—the complex nonlinearities often associated with population dynamics are simply absorbed into the matrix terms. This observation allows the equations to be recast in terms of simpler matrix and vector operations, which greatly speed computation, and provides more insight into dynamics. This paper is not intended to provide detailed computational methodology for handling the necessary matrix operations (such as finding eigenvalues, solving systems of linear equations or calculating matrix exponentials), instead readers are recommended to read one of the various textbooks on the subject (e.g. Golub & van Loan 1996) or to use one of the many computational packages that are available (e.g. Matlab, Lapack, R or Mathematica). Instead, using simple epidemiological examples, we show the power and advantages of using Kolmogorov's forward equations and the associated matrix methods, in particular when very precise results are required for relatively small populations. We hope this will motivate researchers to explore the potential that these matrix methods have to offer.

Throughout this paper, attention is focused on dynamics of infectious diseases; this is for two main reasons. First, the study of infectious diseases has been at the forefront of research into the behaviour of stochastic population processes—generally because the prevalence of infection is often low and therefore stochastic effects are felt strongly (Bartlett 1956; Grenfell 1992). Second, for disease dynamics, the number of events is strictly limited (birth, death, infection and recovery) and their nature well defined, whereas for ecological models there is far more ambiguity (e.g. consider the number of ways in which density dependence can be modelled). Three distinct epidemiological models are considered to exemplify the use of ensemble equations: the endemic SIS model with infectious imports (i.e. infection resulting from a source external to the population being modelled); the endemic SIR model with infectious imports; and the simple SIR model without births, deaths or imports. Throughout the paper, deterministic ensemble equation results are compared with results from the corresponding event-based stochastic simulations and the additional insights that come from the matrix formulation are discussed. Understanding the stochastic behaviour of diseases in small populations is a vital step towards the control of infection in subdivided communities with metapopulation-like structure (Grenfell & Harwood 1997). In particular, the methods outlined in this paper are ideal for studying infection within families (Hope Simpson 1952; Melegaro *et al*. 2004; Verver *et al*. 2004), farms (Woolhouse *et al*. 1996; Nodelijk *et al*. 2000; Stark *et al*. 2000; Viet & Medley 2006) and hospitals (Dooley *et al*. 1992; Austin *et al*. 1999; Cooper *et al*. 1999; Cooper & Lipsitch 2004). The continual improvement in computational power means these techniques will be of increasing importance and applicability in the near future as it becomes practical to deal with ever larger population sizes.

## 2. The SIS model

The SIS model is an accurate representation of the population dynamics for many sexually transmitted infections (Anderson & May 1992), where following treatment an infectious individual recovers from the disease but is once again susceptible to infection. We take the standard differential equation for the number of infectious individuals in a deterministic population (noting that in such formulations, population numbers are assumed continuous)(2.1)where *S*=*N*−1 is the number of susceptibles; *N* is the population size (assumed constant); *β* is the contact rate; *ϵ* captures the import of infection from an external source (assumed to equal 0 if the population is isolated); and *g* is the rate at which individuals are treated and return to the susceptible class. In this basic formulation, the natural processes of births and deaths have been ignored, as these demographic events often occur at a much slower rate than infection. A more realistic model of sexually transmitted infections would include the age, gender, sexual preference and number of partners of an individual; however, the simple one-dimensional model (equation (2.1)), in which the only events are infection and recovery (back to susceptibility), is ideal for illustrating the power of Kolmogorov's forward equations. In fact, the SIS model has many similarities with the Levins metapopulation model (Levins 1969; Alonso & McKane 2002; Ross 2006*a*,*b*), classifying hosts as either empty or occupied with infection.

For this SIS disease model, there are *N*+1 different states that the population can be in, corresponding to *I*=0, 1, …, *N*. We let *p*_{n}(*t*) be the probability that there are *n* infectious individuals at time *t* and construct a set of differential equations for these state probabilities, the so-called Kolmogorov forward equations(2.2)where *I*=*n*=−1 and *I*=*n*=*N*+1 are not feasible states and therefore *p*_{−1} and *p*_{N+1} are set to zero. The first two terms on the right-hand side of equation (2.2) deal with ways in which an *I*=*n* population state can arise: the first term corresponds to ‘creating’ a new infectious individual from the *I*=*n*−1 state, and the second term corresponds to recovery from when *I*=*n*+1. The final term deals with ways in which an *I*=*n* population can be lost, either through gaining an extra case or through recovery of an infectious individual. While it would be relatively straightforward to numerically evolve this set of equations, more insight can be gained by changing to a vector notation.

We set ** p** to be the row vector of the

*N*+1 probabilities. In this vector notation, the Kolmogorov forward equation becomes(2.3)where the matrix

**is tridiagonal and consists of the transition ratesIt is interesting to realize that all of the complex nonlinear dynamics associated with disease transmission (or other processes) are absorbed into the terms of the matrix, and therefore the dynamics of**

*Q***are purely linear. In general, the eigenvalues (**

*p**λ*

_{1},

*λ*

_{2}, …,

*λ*

_{N+1}) and left eigenvectors of

**specify the entire dynamics. (For later use, we also specify that the right eigenvectors are , which are associated with the same set of eigenvalues.) Throughout this paper, we assume that the eigenvalues are ordered such that 0=**

*Q**λ*

_{1}≥

*Re*(

*λ*

_{2})≥⋯≥

*Re*(

*λ*

_{N+1}), so that the first eigenvector(s) dominates the long-term dynamics. In general, the sparse nature of

**(here only 3**

*Q**N*+1 terms out of (

*N*+1)

^{2}are non-zero) means that we can exploit a range of powerful numerical techniques to find dominant eigenvalues (Golub & van Loan 1996; Trefethen & Bau 1997).

Considering the Kolmogorov forward equation in general for any Markov process model (given that the system is finite—as is frequently the case), there exist two ways in which we can write the full dynamics of the ensemble for all time, hence providing a single equation that specifies the exact probabilistic dynamics of the disease. The simplest formulation extending from the differential equation (2.3) is(2.4)where the exponential of a matrix can be computed with relative ease (Moler & van Loan 1979; Sidje 1998). As an alternative representation, we can define the solution as(2.5)where the coefficients *q*_{n} are determined by the inner product of the right eigenvectors of ** Q** with the initial distribution , , and the right eigenvectors are normalized so that x. (The precise normalization of left eigenvalues is irrelevant for this formulation, although throughout we have assumed that the terms of

*l*

_{1}sum to 1 such that it is a valid probability distribution. We also note that for equation (2.5) to hold we have assumed that repeated eigenvalues have equal algebraic and geometric multiplicities, which is generally the case.)

For the SIS model, in particular, *λ*_{1}=0 and all other eigenvalues are real and negative. Hence, from equation (2.5), we find that (normalized to sum to 1) is the final state of the ensemble equation and hence the long-term distribution of the stochastic SIS model. Figure 1 shows the exact probability distribution given by , and the error at each point when the distribution is estimated from a stochastic time series (simulation of SIS model) of length 100 and 1000 time units. The inset graph shows how the total (root mean square) error decreases as the length of the stochastic time series increases. As expected from standard statistics, the error scales inversely with the square root of the length of the time series (error∝(length)^{−1/2}). So, either very long or very many stochastic simulations have to be performed to obtain an accurate prediction of the distribution of disease prevalence. In contrast, a single eigenvalue calculation produces the answer to very high accuracy and extremely quickly. We return to this question of computational efficiency later.

Other eigenvalues (and the associated eigenvectors) provide additional information; in particular, the real part of *λ*_{2}, *λ*_{3}, … (noting again that for the SIS model, the eigenvalues are all real) informs about the rate of convergence to the equilibrium distribution. For the example given in figure 1, *λ*_{2}≈−0.0098 while *λ*_{3}≈−0.8725, and the entry of the eigenvector corresponding to the disease-free state *I*=0 is significantly larger than the other entries. This suggests that, in general, convergence from any infected state to the equilibrium distribution is swift, as contributions from the eigenvectors to decay rapidly; however, escape from disease extinction (governed by parameter *ϵ* and captured by *λ*_{2}) is much slower.

In addition, we note that if *ϵ*=0, the equilibrium distribution is degenerate as *I*=0 is an absorbing state. In such cases, we may be interested in the long-term distribution of the process conditioned on non-absorption; in epidemiological terms, this is the prevalence distribution, given that the disease has not died out. This is precisely the information conveyed by the quasi-stationary distribution, which may once again be evaluated efficiently using a matrix formulation (Pollett 2006; Ross 2006*a*). We perform the same calculations as before, but for the reduced matrix *Q*_{0} derived by removing the first row and first column (rows and columns corresponding to absorbing states) from the matrix ** Q**. The first (normalized left) eigenvector of this matrix, , is the quasi-stationary distribution of the SIS model without imports of infection. Once again, the eigenvalues provide us with additional information; the real part of the first eigenvalue tells us about the long-term rate of extinction (decay of the quasi-stationary distribution or convergence to the absorbing state

*I*=0), and the difference between the real parts of the first and second eigenvalues (the

*spectral gap*) tells us about the rate of convergence to the quasi-stationary distribution (Dambrine & Moreau 1981). Furthermore, for the quasi-stationary distribution to be of practical interest, the difference between the first and second eigenvalues must be substantially larger than the magnitude of the first eigenvalue.

Finally, the matrix formulation also allows exact evaluation of the likelihood for a sequence of data, and thus provides a framework for parameter estimation (Pelupessy *et al*. 2002; Cooper & Lipsitch 2004; Ross *et al*. 2006). Suppose there is a parameter (or set of parameters) *θ*, we wish to estimate from some given data. We allow the dependence on *θ* to be made explicit in our model notation by writing ** Q**(

*θ*) and ; we also define to be the indicator vector that corresponds to being precisely in state

*i*, and set [.]

_{i}to be the element of a vector corresponding to state

*i*, while [.]

_{i,j}is defined to be the

*i*,

*j*th element of a matrix. Suppose we now have

*n*observations at times

*t*

_{1}<⋯<

*t*

_{n}, when it is recorded that the process is in states

*i*

_{1}, …,

*i*

_{n}. The likelihood of observing these states is(2.6)This likelihood is the product of the probability of being in state

*i*

_{k}at time

*t*

_{k}, given that the system was in state

*i*

_{k−1}at time

*t*

_{k−1}, all multiplied by the probability of observing the system in state

*i*

_{1}at time

*t*

_{1}. Using the exponential matrix formulation, the probability of moving from state

*i*

_{k−1}to state

*i*

_{k}in time

*t*

_{k}−

*t*

_{k−1}can be explicitly calculated. Any one of a range of numerical optimization techniques can then be used to find the value of

*θ*, which maximizes the likelihood (2.6) over the range of parameter space (Ross

*et al*. 2006). It should be emphasized that this method of parameter estimation uses the exact likelihood of observing the given data—assuming the model is an accurate description of disease dynamics—and also incorporates dependency between observations. Although the form of the likelihood (2.6) is greatly simplified by the assumption of an accurate record of prevalence, a similar approach is possible when only partially observed reporting data are available. We note that if the data are recorded periodically (such that

*t*

_{k}−

*t*

_{k−1}is constant), then the matrix exponential only needs to be calculated once, greatly improving computational efficiency.

It is informative to contrast this estimation method with MCMC techniques (Gamerman 1997). The estimation method based on the matrix approach calculates the exact probability of moving between two states (say *i*_{k−1} and *i*_{k}); in contrast, MCMC techniques generally calculate the likelihood by ‘averaging’ over plausible transitions between states (Gamerman 1997). Therefore, for small population sizes and limited number of states, this matrix approach may be faster and more accurate; however, MCMC techniques can generally deal with higher-dimensional systems with much larger state spaces.

## 3. Endemic SIR diseases

While the SIS model provides a simple test case for the ensemble equation, the one-dimensional nature of the problem means the equilibrium solution to the ensemble equation (2.3) can be found explicitly by balancing transitions between states (*p*_{n}*β*(*N*−*n*)*n*=*p*_{n+1}*g*(*n*+1); Keeling 2000; Alonso & McKane 2002). In fact, explicit expressions exist for many quantities of interest for such one-dimensional *birth-and-death processes* (Norris 1997), where the only transitions possible are either increase or decrease the state by 1.

When considering a disease with SIR dynamics, such that there are (at least) two independent variables, the problem is more complex and the matrix approach becomes even more appealing. In the SIR disease model, when individuals recover, they are assumed to be immune from further infection; this gives rise to the standard deterministic model for this type of disease (Anderson & May 1992)(3.1)where the birth and death rates are typically assumed to be equal (*B*=*d*) and again imports from an external source (governed by the parameter *ϵ*) have been included. Here six distinct events can occur (birth, death of a susceptible individual, death of an infected individual, death of a recovered individual, infection and recovery), which modify the state of the process. This means, in general, there are six ways in which the probability density for a particular state can increase and six ways in which it can decrease:(3.2)where *p*_{S,I,N} is the probability of having *S* susceptible individuals, *I* infectious individuals and a total population of size *N*.

Unfortunately, due to the way births are modelled, a strict upper limit cannot be placed on the population size for the stochastic model, which is necessary to formulate a finite matrix expression. Hence, we require a mechanism of bounding the population. One approach to this, which should provide the most accurate approximation to the dynamics of equation (3.2), is to truncate the process at some large population size *N*. The influence of truncating the process at various population sizes may then be examined numerically, and for some models analytically (Cairns & Pollett 2005). In practice, one would usually choose the largest population size *N* for which computations remain feasible. An alternative way to bound the population size is to fix the population size *N* as a constant, such that the death of a susceptible, infectious or recovered individual is immediately compensated by the birth of a susceptible. This assumption has three additional advantages. First, it reduces the number of independent variables to two (just *S* and *I*). Second, it circumvents the necessity to choose between density- and frequency-dependent transmission (Begon *et al*. 2002). Finally, a constant population size is reasonable for many applied problems, such as modelling the transmission of infection upon farms or within hospital wards, where there is generally little variation in the number of individuals in the population (Pelupessy *et al*. 2002; Cooper & Lipsitch 2004; Viet & Medley 2006). It should be noted that while assuming a constant population size does not change the ODEs (3.1) (if, as is typical, we assume *B*=*d*), however, it does change the Kolmogorov forward equations(3.3)where *p*_{S,I} is the probability of having *S* susceptible individuals and *I* infective individuals. This highlights an additional use of the Kolmogorov forward equations: the Kolmogorov forward equations uniquely specify the full stochastic dynamics, whereas there are multiple interpretations of the ordinary differential equations (Keeling 2000). For this model, there are possible disease states (0≤*S*≤*N*, 0≤1≤*N*−*S*), and so *C* different *p*_{S,I} variables. To formulate a matrix expression, we need to find a method of mapping the two-dimensional *p*_{S,I} into a one-dimensional vector, . An efficient means of achieving this is to set the probability *p*_{S,I} as the th element of the vector.

Now, as before, we can compute the distribution of states as , where the matrix ** Q** is constructed from the transition rates between states; again

**is sparse (a very small proportion of non-zero terms). If we are only interested in long-term behaviour, considering the eigenvalues again provides an efficient method of calculating the equilibrium distribution. For this SIR model, the first eigenvalue,**

*Q**λ*

_{1}, is zero while all the others have negative real parts, and thus there is again a unique equilibrium distribution. Hence, from equation (2.5),so the entire long-term behaviour of the full stochastic system can be found by simply computing the dominant (left) eigenvector, which can be done relatively efficiently (Pollett & Stewart 1994; Golub & van Loan 1996). Figure 2

*a*compares the distribution from the first eigenvector (contours) with the results of a long stochastic simulation (dots). As expected, there is good agreement between the two methods, although the Kolmogorov forward equation approach provides far more detail, allowing evaluation of the exact (to numerical precision) distribution of states—which can be a highly efficient way of calculating extremes of behaviour such as the 99 or 99.5% contours.

Looking at the other eigenvalues and eigenvectors again provides a deeper understanding of the dynamics. For the parameters used in figure 2*a*, once again, the second eigenvalue and eigenvector describe the slow escape from the disease-free state (*S*=*N*, *I*=0), with *λ*_{2}≈−0.3468. Thus, if the initial distribution begins at the disease-free state, it will escape relatively slowly at rate exp(−0.3468*t*). This is in direct contrast to the strong instability of the disease-free state in the standard ODE model (equation (3.1)) for which *I*(*t*) grows at approximately the rate exp(2.333*t*) for *I*(0) small; this difference is because in the stochastic model, an import is required for the infection to escape zero and there is the possibility of stochastic extinctions, both of these are captured by the Kolmogorov forward equation. The third and fourth eigenvalues are a complex conjugate pair (approx. −0.7607±0.9576*i*) and correspond to the dominant oscillations around the equilibrium distribution. These oscillations have a slightly longer period and a slower decay than predicted by the standard ODE model (Anderson & May 1992, where the eigenvalues are approx. −0.8401±1.0287i) due to the effects of stochasticity and nonlinearities around the fixed point.

Finally, the equilibrium solution can also tell us about the long-term rates of temporary extinction and subsequent reinfection of the population. As we are at equilibrium, these two rates must be equal and can be calculated as either the rate at which infection is lost from or the rate that reinfection occurs from Figure 2*b* shows the extinction rates from the equilibrium distribution, and average numbers of susceptible and infectious individuals, for a range of population sizes and with *β*=5/*N*. As expected, the extinction rate decreases exponentially with increasing population size (extinction rate≈11.13 exp(−0.0789*N*)), and for population sizes above 100 the mean number of susceptibles and infecteds increases linearly.

## 4. The simple SIR epidemic

As a further example of the power of ensemble equations, we consider the simple SIR model, without births, deaths or imports of infection(4.1)The Kolmogorov forward equations are defined as in equation (3.3), but with *d*=ϵ=0. For this model, we observe a single epidemic that eventually dies out as the level of susceptibles becomes too low to support the infection and cannot be replenished. While generally considering the simplest form of the deterministic SIR model, in this stochastic setting the dynamics are more complex as there are multiple equilibrium (absorbing) distributions. For a population of size *N*, there are *N*+1 equilibria corresponding to *I*=0, *S*=0, …, *N*. Faced with this inevitable extinction, there are two applied questions that can be addressed using the ensemble model: what is the expected time to extinction, and how many individuals escape infection, and the influence of the initial conditions on both of these quantities is of applied interest. These questions are once again readily calculated using a matrix formulation.

The expected time to extinction, , when starting in state *i*, is evaluated as the solution towhere is a vector of 1s, and *Q*_{0} is again the matrix ** Q** with the rows and columns corresponding to the (

*N*+1) absorbing states removed (Mangel & Tier 1994; Norris 1997). Additionally, if costs are involved (such as treatment and isolation costs), we can replace by a vector , where the entry is the cost per unit time associated with a population in state

*j*. The solution now corresponds to the expected total cost over the lifetime of the disease starting in state

*i*(Norris 1997; Pollett & Stefanov 2002; Pollett 2003). This formulation highlights that the effect of initial conditions may be rapidly evaluated once

*τ*is calculated.

In the deterministic model, the final size of the epidemic, *R*_{∞}, is defined as the proportion of a totally susceptible population that become infected (and eventually recover) following the introduction of a small amount of infection (Kermack & McKendrick 1927). In this context, the final size is determined by the relationshipWe now wish to consider how this theoretical quantity for a large deterministic population compares with results for small stochastic populations. Using the matrix formulation, the average final size of the epidemic () can be calculated from the number of susceptibles ‘surviving’ in the final probability distributionWe now use equation (2.5), but note that there are *N*+1 zero eigenvalues and that eigenvectors associated with zero eigenvalues span the null space of the matrix. The null-space calculation provides a computationally efficient means of finding the eigenvectors and hence calculating the final state of the system. (For simplicity, we can set the first *N*+1 left eigenvectors to be (where is a vector of length equal to the size of the state space and having a 1 in the entry corresponding to the state (*S*, 0) and zeros elsewhere), the right eigenvectors can then be found such that they span the null space of ** Q** and satisfy .) From equation (2.5), the long-term distribution is given byallowing us to compute . Hence, the influence of initial distributions on the size of the epidemic may again be computed very quickly. (We note that alternative methods exist for evaluating this quantity, as described for instance by Bailey (1953), Daley & Gani (1999) and Diekmann & Heesterbeek (2000).)

Figure 3*a* shows the long-term distribution *p*_{s,0}(∞) in a population of 50 individuals, starting from *S*(0)=40 and *I*(0)=5. As expected, the model predicts a range of epidemic sizes, from failure of the initial cases to generate any secondary infections to epidemics infecting the entire population. Once the left and right eigenvectors have been found corresponding to the null space of the matrix, extending the calculation to include the complete range of possible initial conditions is computationally efficient (figure 3*b*). Figure 3*c* shows the corresponding average times to extinction. Several facets are clear from these results: the average time to extinction increases with both the initial number of susceptibles and the number of infecteds; in contrast, the number of individuals escaping infection decreases with the initial number infected but increases with the number that are initially susceptible.

Finally, in figure 3*d*, we consider how numerical calculation of final size (proportion of a totally susceptible population that becomes infected) in a stochastic population (starting with *S*(0)=*N*−1 and *I*(0)=1) compares with theoretical predictions of Kermack & McKendrick (1927). The theoretical value of *R*_{∞} has to be modified by a factor 1−1/*R*_{0}, which accounts for failure of the initial infection to cause a major epidemic (Bartlett 1956). Although the theoretical value may be appropriate in the limit of large population sizes, understanding the deviation from this ideal for small populations is highly informative. Two conflicting elements contribute to the pattern. First, for small population sizes, the initial case is a significant proportion of the population size, hence the true final size is greater than the theoretical prediction. However, for larger population sizes, the theoretical prediction is an overestimate primarily due to underestimating the extinction risk. While these conclusions are intuitive, and could be determined by repeated simulation, the results from the ensemble equations provide an extremely accurate description of the behaviour and are computationally efficient to generate.

## 5. Computational efficiency

Throughout this paper, we have alluded to the computational efficiency of using the Kolmogorov forward equation when dealing with Markov models. We now wish to make these assertions more definite by providing some examples of the types of computational demands required by the various problems discussed in this paper (table 1). It is clear that some calculations, such as computing the transition matrix ** Q** or calculating the first few dominate eigenvalues and eigenvectors, are relatively fast—and therefore applications which use this information may have substantial advantages over event-driven stochastic simulations. However, other calculations, such as finding exponentials and null spaces, are associated with significant computational overheads and therefore repeated event-driven simulations may be preferable.

To provide a relative comparison with the times in table 1, we shall consider what can be achieved with a similarly parametrized event-driven stochastic model in one second. Both SIS- and SIR-type models can perform approximately 2.7 million events per second using Gillespie's direct algorithm (Gillespie 1976)—the comparable speed of both SIS and SIR models is because the rate-determining step is the calculation of two random numbers per event. For the SIS model, illustrated in figure 1, we find that in the time it takes to calculate the matrix, dominant eigenvalue and associated eigenvector, the equilibrium distribution can be calculated with an error of approximately 1% for both *N*=100 and 1000. Hence, for this problem, the Kolmogorov forward equations have substantial benefits. However, for the simple SIR model without births or deaths (figure 3), Gillespie's direct algorithm allows the simulation of approximately 170 complete epidemics per second for a population size of *N*=10 000, which is just sufficient to calculate the mean final epidemic size to within 1%. In contrast, the null-space calculation for *N*=10 000 is computationally infeasible, so multiple stochastic simulations are the only suitable methodology.

Given the time scales and computational overheads involved with the matrix formulation of SIS and SIR models relative to event-driven simulations, we can come to the following general conclusions. Problems that involve large populations (and many possible states) are best tackled using event-driven stochastic simulations; this is because although simulation time generally increases proportional to the population size, the computational time associated with the matrix operations increases proportional to the number of states (or even faster). The advantages of the matrix-based approach are most evident when answers (or distributions) are required to high precision and multiple sets of initial conditions need to be considered. For example, the calculation of the null space for the simple SIR model is a computationally intensive process, but once found the effect of different initial conditions can be calculated using a single matrix–vector multiplication; in contrast, with event-driven stochastic simulations, each set of initial conditions requires a new set of replicate simulations. As an illustrative example, consider a set of stochastic (event-driven) epidemics for the entire range of possible initial conditions, which take the same amount of time as the associated null-space-based calculation in table 1; for *N*=100, the stochastic simulations could calculate the mean final epidemic size to an accuracy of just 10% while for *N*=1000 the accuracy increases to approximately 2.5%. Therefore, even though calculating null spaces is computationally demanding, it can be an efficient approach if multiple initial conditions need to be considered.

There are still many situations where it is currently infeasible to consider the Kolmogorov equations. For a population of *N* individuals, where each individual can be in one of *n* states, the number of possible states of the process grows like . As such, a disease with SIR dynamics has states , whereas if the dynamics are assumed to be SEIR (*susceptible*–*exposed–infectious–recovered*), the number of states increases to *C*=1/6((*N*+1)(*N*+2)(*N*+3)) (*N*=100⇒*N*=176851). Hence, although increasing computational power will allow us to deal with Kolmogorov's forward equations for ever larger population sizes, increasing the biological realism, and thus the number of states, can readily overtake any gains in processing speed.

## 6. Discussion

Understanding the dynamics of stochastic populations, and how they deviate from the deterministic ideal, is being viewed with increasing importance by ecologists and epidemiologists. In particular, the stochastic behaviour of diseases in small (well-mixed) populations is vital for a better understanding of control. For such situations, where the number of possible states is not prohibitively large, ensemble equations provide a very powerful and efficient alternative to replicate stochastic simulations.

In this paper, it has been shown that the natural matrix formulation (due to the linear nature of the ensemble equation) allows us to specify the dynamics in a concise form, which in turn allows a wealth of sophisticated computational approaches to be used. Four considerable benefits are gained from this approach. First, only a single calculation needs to be performed to describe the dynamics of an infinite ensemble of stochastic realizations. Second, and as a corollary to the first point, the results of the ensemble equations are exact—so the probability of rare events (which could have a large impact) can be calculated precisely. Third, once the initial calculation is performed, considering a range of initial conditions is generally highly efficient. Finally, the fact that the ensemble equations are deterministic means that a wide variety of tools from dynamical systems can be used; for example, it is possible to assess the rate of convergence to an equilibrium distribution.

The use of the Kolmogorov forward equation clearly has many benefits over more commonly used event-driven stochastic simulations; although we will never be able to completely replace the need for simulation in applied modelling. At the moment, Kolmogorov's forward equations are best used on small populations (*N*<1000) with SIR- or SIS-type dynamics, which complements the need for precise study of small populations where the effects of stochasticity are the greatest. We therefore expect this approach to have maximum benefit when examining the spread of infection within families, farms and hospitals and when needing to obtain precise estimates of rare events.

## Acknowledgments

This research was supported by the Royal Society and Leverhulme Trust. We thank Mike Tildesley, Jon Read and David Sirl for their helpful comments.

## Footnotes

- Received May 18, 2007.
- Accepted June 18, 2007.

- © 2007 The Royal Society