## Abstract

Heterogeneity in host contact patterns profoundly shapes population-level disease dynamics. Many epidemiological models make simplifying assumptions about the patterns of disease-causing interactions among hosts. In particular, homogeneous-mixing models assume that all hosts have identical rates of disease-causing contacts. In recent years, several network-based approaches have been developed to explicitly model heterogeneity in host contact patterns. Here, we use a network perspective to quantify the extent to which real populations depart from the homogeneous-mixing assumption, in terms of both the underlying network structure and the resulting epidemiological dynamics. We find that human contact patterns are indeed more heterogeneous than assumed by homogeneous-mixing models, but are not as variable as some have speculated. We then evaluate a variety of methodologies for incorporating contact heterogeneity, including network-based models and several modifications to the simple SIR compartmental model. We conclude that the homogeneous-mixing compartmental model is appropriate when host populations are nearly homogeneous, and can be modified effectively for a few classes of non-homogeneous networks. In general, however, network models are more intuitive and accurate for predicting disease spread through heterogeneous host populations.

## 1. Introduction

Epidemics caused by the transmission of infectious agents are marked by variation. Heterogeneities in pathogens, host populations and the interactions between them profoundly affect the dynamics of infection. For example, in 1984, in a study of the spread of gonorrhoea in the United States, Hethcote and Yorke showed that 60% of all infections were caused by a small group of individuals making up only 2% of the entire population (Hethcote & Yorke 1984). In 1999, a cluster of five cases of measles in a small elementary school in The Netherlands led to an outbreak of approximately 3000 cases, despite the fact that 96% of the population of The Netherlands was vaccinated against measles at the time (Centers for Disease Control and Prevention 2000). In the spring of 2003, two individuals travelling from Vancouver and Toronto, respectively, were infected almost simultaneously with SARS by a source in Hong Kong. There were no secondary cases reported from the individual from Vancouver; in contrast, the infected individual from Toronto went on to infect five other people, which led to an outbreak of 200 cases in Toronto (Poutanen *et al*. 2003).

There are several epidemiologically important sources of variability, including disease-independent host parameters—age, sex, contact rate and compliance to public health recommendations—and disease-dependent host parameters—susceptibility to disease, transmission rate, mode of transmission and recovery rate. In this article, we will focus on disease-independent heterogeneity in host contact rates. That is, the number of potentially disease-causing interactions can vary widely across a host population. Although this is only one aspect of host heterogeneity, it is an important one. Variability in contact patterns can stem from social structure, age, sex, spatial structure and behavioural differences. This heterogeneity is ubiquitous at many scales and can cause variability in disease parameters such as infectivity and susceptibility (Hethcote & Yorke 1984; Addy *et al*. 1991). Such individual-level diversity can profoundly shape population-level disease dynamics.

The simplest of the traditional mathematical models for the spread of infectious disease typically does not take such diversity into account, but assumes that communities are homogeneous; that is, they are made up of individuals who mix uniformly and randomly with each other. This simplifying assumption makes the analysis tractable but may not adequately reflect reality. Despite this assumption, these *homogeneous-mixing compartmental models* have proved to be robust and predictive (Anderson & May 1992; Mollison *et al*. 1993). Although we will focus on the simplest compartmental model, we note that this approach has been successfully extended to capture large-scale host heterogeneities. These extensions of the simple compartmental framework have included age-specific contact patterns and heterogeneities induced by spatial structure (Ball *et al*. 1997; Bjørnstad *et al*. 2002; Grenfell *et al*. 2002), but they do not allow for individual-level resolution.

After nearly a century of successes with these models, mathematical epidemiologists have turned their attention to individual-based approaches and specifically to network modelling. These new approaches, spurred by the availability of data and the maturation of network theory, reject the homogeneous-mixing assumption and explicitly capture the diverse patterns of interaction that underlie disease transmission (Barbour & Mollison 1990; Watts & Strogatz 1998; Pastor-Satorras & Vespignani 2001; Newman 2002; Meyers *et al*. 2005; Shirley & Rushton 2005).

Here, we reconcile the historical successes of homogeneous-mixing compartmental models with the advantages of new individual-based approaches. We start with a brief introduction to the contact network framework and then translate the homogeneous-mixing assumption into network terminology. Using several real and simulated datasets of human contact networks, we show that the homogeneous-mixing assumption may give rough but reasonable approximations for many populations. We then characterize the ‘epidemiological distance’ between these empirical networks and the homogeneous-mixing assumption of the simple model. Finally, we review and evaluate a variety of methodologies for incorporating contact heterogeneity, including several modifications to the simplest compartmental framework. We emphasize that many of the new network-based approaches are as mathematically tractable as the simplest compartmental models; although the compartmental framework can be extended to consider many complexities, the elegance with which the network-based approaches incorporate heterogeneity makes them an attractive option for future epidemiological studies.

## 2. An introduction to contact networks

Many diseases spread through human populations via close physical interactions. The interpersonal contact patterns that underlie disease transmission can naturally be thought to form a network, where links join individuals who interact with each other. During an outbreak, disease then spreads along these links. All epidemiological models make assumptions about the underlying network of interactions, often without explicitly stating them. Contact network models, however, mathematically formalize this intuitive concept so that epidemiological calculations can explicitly consider complex patterns of interactions.

Formally, a contact (or social) network model explicitly represents host interactions that mediate disease spread. A *node* in a contact network represents an individual host, and an *edge* between two nodes represents an interaction that may allow disease transmission. A node's *degree* is the number of edges attached to it (i.e. its number of contacts) and the *degree distribution* of a network is the frequency distribution of degrees throughout the entire population.

An exact contact network model requires knowledge of every individual in a population and every disease-causing contact between individuals (e.g. sneezing in the case of airborne diseases or sexual contact in the case of sexually transmitted diseases). For even small populations, this is typically unfeasible, and thus researchers typically work with approximate networks. There are several techniques for gathering the information needed to build realistic contact network models. They include tracing all infected individuals and their contacts during or following an outbreak (e.g. Klovdahl *et al*. 1977), surveying individuals in populations (e.g. Eubank *et al*. 2004) and using census (e.g. Meyers *et al*. 2005), social characteristic (e.g. Halloran *et al*. 2002) or other collected data (e.g. Meyers *et al*. 2003).

Characterizing network structure has become a multidisciplinary cottage industry, with researchers across epidemiology, sociology, biology, computer science and physics searching volumes of data for meaningful patterns. Researchers often look for global statistical properties in network data, and have paid special attention to small-world networks—characterized by high levels of both local clustering and global connectivity (Watts & Strogatz 1998)—and scale-free networks—characterized by degree distributions that follow a power-law distribution with a small fraction of very highly connected hubs (Barabási & Albert 1999). Scale-free networks have been reported in many technological (e.g. the Internet, the World Wide Web) and biological systems (e.g. metabolic, protein interactions, transcription regulation and protein domain; Albert *et al*. 1999; Faloutsos *et al*. 1999; Wuchty 2001; Giot *et al*. 2003; Bork *et al*. 2004; Hatzimanikatis *et al*. 2004; Luscombe *et al*. 2004). These highly structured networks are often contrasted with three classes of ‘null’ networks: (i) *lattices* in which all nodes have the same degree, and any given node is connected to physically proximate nodes, (ii) *regular random networks* in which all nodes have the same degree, but any given node is connected to randomly chosen nodes throughout the network, and (iii) *Poisson random networks* (also called Erdos–Renyi random graphs) in which some specified total number of edges are assigned to nodes completely at random, thus yielding a Poisson degree distribution across the network.

The structures of human contact networks undoubtedly play a crucial role in the transmission of diseases (Pastor-Satorras & Vespignani 2001; Newman 2002; Barthélemy *et al*. 2004, 2005; Meyers *et al*. 2005; Ferrari *et al*. 2006). For example, epidemiologists have long realized that the epidemic threshold, the critical value for the infection rate above which a disease may spread and persist, decreases as the standard deviation of the degree distribution increases (Hethcote & Yorke 1984; Anderson & May 1992). The obvious question is thus: what are the structures of real-world contact networks?

The field has focused particularly on scale-free random networks (May & Lloyd 2001; Dezso & Barabasi 2002; Pastor-Satorras & Vespignani 2002), based on the apparent ubiquity of such networks in natural and human-made systems and a (limited) set of studies of epidemiologically relevant contact patterns (Liljeros *et al*. 2001). These networks are characterized by the presence of hosts with anomalously high numbers of potential disease-causing contacts, called super-spreaders, which have important epidemiological implications (Shen *et al*. 2004; Lloyd-Smith *et al*. 2005). With large or infinite variance in degree, scale-free networks can have exceedingly low or non-existent epidemic thresholds; this means that even the sparsest networks are highly vulnerable to epidemics. Despite recent popularity, however, it is not clear that realistic epidemiological networks are generally scale-free, and thus whether they deserve so much attention in epidemiology. Here, we address this by characterizing the structures of several real-world networks.

We limit our discussion to random networks with arbitrary degree distributions, including random networks with regular, Poisson, exponential and scale-free degree distributions (figure 1). These classes of networks have been well studied with respect to the spread of epidemics and are representatives of a spectrum of network structures. We focus exclusively on the epidemiological impact of the degree distribution, although other network characteristics such as clustering (Watts & Strogatz 1998; Keeling 1999; Moore & Newman 2000; Petermann & Rios 2004) and degree correlations (Boguna *et al*. 2003) are also important.

We will also focus exclusively on static networks, i.e. networks in which contacts are assumed to be fixed during the infectious period of an individual. The permanence of contacts captured by static networks offers a more realistic model of human contact behaviour than that by traditional epidemiological models. However, a recent study (Volz & Meyers submitted) suggests that static networks may only be an approximate model for diseases that spread slowly relative to the rate at which individuals change the numbers and identities of their contacts.

## 3. A network interpretation of homogeneous-mixing models

Traditional epidemiological models implicitly assume that contact patterns are highly homogeneous. The most basic model in epidemiology is the SIR compartmental model for the ‘simple’ epidemic without demographic processes of birth and death (Kermack & McKendrick 1991; Anderson & May 1992). This model describes the dynamics of a single epidemic, during the course of which each individual is in one of three disjoint states (or compartments) at any given time: not yet infected and susceptible to disease (*S*); infected and infectious (*I*); or recovered and neither able to spread disease nor be reinfected (*R*). The number of individuals in each of these classes is described by the following set of differential equations:(3.1)where *λ* is the rate at which susceptible individuals become infected (i.e. the *force of infection*) and *γ* is the recovery rate for infected individuals. (The constant recovery rate, *γ*, yields an exponential distribution of infectious periods.)

The parameter *λ* can be thought of as a composite of three factors: , where *α* is the number of individuals with which a susceptible individual has effective contact; is the proportion of contacts that are infectious; and *τ* is the per contact rate at which disease is transmitted between an infectious and susceptible individual (Keeling & Eames 2005). This model assumes homogeneous-mixing among individuals. Taken literally, the assumption means that at any point in time, every susceptible individual has an equal probability of contacting every other individual in the population. Taken more loosely, it is sufficient for all susceptible individuals to have comparable contact patterns (*α*) and encounter infected individuals at a rate corresponding to the overall prevalence .

We suggest that the standard equations given above can be approximately mapped to a network model in which all individuals have identical numbers of contacts—a regular random network. Consider a regular random network with homogeneous degree *k*. Suppose that disease spreads from infected nodes to susceptible contacts with a rate of *τ*′. Because contacts are random, we can assume that the fraction of contacts that are infected is equal to the fraction of infected individuals in the network as a whole, or . Then, the force of infection on any given susceptible individual will be , which is equal to that given by the homogeneous-mixing compartmental model with *α*=*k* and *τ*=*τ*′. This mapping is only approximate because the two models view effective contact differently. In compartmental models, the identity of a contact is determined randomly and instantaneously for each transmission event; in static network models, on the other hand, the identities of contacts are chosen randomly but remain fixed for the length of an individual's infectious period. The models thus yield different values for the basic reproductive ratio and expected equilibrium values (Keeling & Grenfell 2000).

Some authors have argued that the homogeneous-mixing assumption translates into a network where every individual is connected to every other individual in the population (a complete network; Aparicio & Pascual 2007; Roy & Pascual 2006). In fact, a complete network is just a regular random network in which each individual contacts all *N*−1 other individuals, and thus is one of many valid (regular) network interpretations of the equations above. The force of infection in a complete network is . For most populations, however, it is unrealistic to imagine that individuals are in constant contact with all other individuals. Thus, it is more realistic to interpret the edges in the complete network as ‘possible’ contacts and the transmission term *τ*′ as a combined effective contact and transmission rate.

Figure 2*a*,*b* demonstrates that, indeed, the dynamics predicted by the homogeneous-mixing compartmental model are a close approximation to stochastic simulations of disease transmission through a regular random network. The predicted final size of an outbreak (figure 2*a*) is shown as a function of *T*, the per contact probability of transmission integrated over the entire infectious period of an infected individual. We generated these and all other random networks described in §5 using the configuration model (Molloy & Reed 1995). The stochastic epidemiological simulations are based on a discrete-time, chain binomial, SIR model (Bailey 1957). (See appendix A in the electronic supplementary material for further details.)

## 4. Heterogeneity in empirical contact networks

The homogeneous-mixing assumption is reasonable when contact patterns in populations are random and homogeneous; that is, they resemble a regular random network. The contact patterns in real populations, however, may be more heterogeneous than assumed by the simple models. Figure 2*c*–*h* illustrates that the simple models become inadequate as contact patterns become more variable. The predictions of the homogeneous-mixing model are more reasonable for a Poisson random network (figure 2*c*,*d*)—which is more variable than a regular random network, but less so than an exponential or scale-free network—than they are for networks with more variation (figure 2*e*–*h*). Here, we quantify the heterogeneity found in realistic contact networks and its epidemiological implications.

### 4.1 Statistical analysis of realistic networks

Using datasets from the literature, we characterize the variation found in real-world contact networks. In particular, we determine the shape of the degree distributions in six empirical (or semi-empirical) social networks. Network characterizations are often based on graphical methods for fitting data to theoretical distributions. This approach, however, can be flawed, especially if the data are transformed to a log–log scale (Goldstein *et al*. 2004). For each of our datasets, we evaluate four one-parameter candidate distributions (Poisson, exponential, pure power law and truncated power law) using maximum-likelihood estimation (MLE) to fit the distribution parameters. MLE avoids the problems of visual/graphical fitting methods and more accurately estimates the distribution parameters. We then use the Akaike information criterion (AIC) to select the most appropriate distribution for the data (see appendix B in the electronic supplementary material for further methodological details).

Our datasets are chosen from the limited set of empirical contact networks available in the literature and include several different scales and types of contacts. The first is an urban contact network model based on demographic information for the city of Vancouver, British Columbia (Meyers *et al*. 2005). The second is another urban contact network model developed for the city of Portland, Oregon based on surveys and other demographic and geographical data (Eubank *et al*. 2004; Del Valle *et al*. 2006). The third dataset describes social ties (which may be used as a proxy for disease-causing contacts) among members of a university karate club (Zachary 1977). The final three studies provide sexual contact patterns within different adolescent populations (Rothenberg *et al*. 1998; Potterat *et al*. 2002; Bearman *et al*. 2004).

Sampling bias in data collection is an important concern in the study of empirical contact networks, and its impact has yet to be understood for epidemiological applications. Preliminary work by Stumpf *et al*. (2005) has shown that scale-free networks are especially prone to sampling errors. In fitting the datasets above to ideal distributions, we have not made any additional adjustments to correct for possible sampling biases (beyond those made in the original studies), which may obscure the true structure of the population.

Our analysis suggests that all six populations fit an exponential degree distribution best, based on the AIC (figure 3). We note that this is the best choice only among the four single-parameter distributions considered, and that other multiple-parameter distributions like gamma or beta distributions might yield even better approximations of the data. Although the exact shape of these distributions may be uncertain given the small sample sizes, our analysis suggests generally that the variability found in these realistic contact networks appears to lie somewhere between the homogeneity assumed by homogeneous-mixing models and the high heterogeneity of scale-free networks. Recently, Amaral *et al*. (2000) found similarly that adolescent friendship networks have exponentially shaped or Gaussian degree distributions.

### 4.2 Quantifying deviation from assumptions of homogeneity

Given the nearly exponential shape of the empirical contact networks we have considered, we next quantify the epidemiological distance between exponential networks and the homogeneous-mixing model (equivalently, a regular random network) using a structural approach. Starting with a regular random network, we use a greedy rewiring procedure to gradually generate an exponentially distributed network. In particular, we iteratively select a random edge in the network, and change its destination to a new node (chosen in proportion to its degree). This process reduces the degree of the original destination node and increases the degree of the new destination node (which already had high degree), and thus progressively increases the variance in the degree distribution. (Details on the rewiring algorithm can be found in appendix C in the electronic supplementary material.)

We monitor the structural evolution of the network during the rewiring process in terms of the coefficient of variation (CV) in degree—the standard deviation in degree divided by the mean degree. We have observed that when the CV reaches 1, which is the CV of an exponential degree distribution, the modified networks have approximately exponential degree distributions (appendix C in the electronic supplementary material).

In figure 4, we show the impact of rewiring on both the structure of the network (figure 4*a*) and the resulting epidemiological dynamics (figure 4*b*). For any given network created by rewiring, we define its *structural distance* from the original regular network as the probability that a randomly selected edge has been rewired. This is estimated by , where *m* is the number of edges in the network and *r* is the number of rewiring events so far. We show results for networks with mean degrees varying from 〈*k*〉=8 to 16, which spans part of the range of contact patterns observed in the realistic networks discussed in §4.1. Figure 4*a* illustrates that the structural progression towards exponential networks slows as the mean degree of the network increases.

We define the *epidemiological distance* to the target (exponential) network as the relative discrepancy between the expected size of an epidemic on the current network and the expected size of an epidemic on the target network, based on stochastic simulations. We use these quantities to characterize the rate at which epidemiological behaviour changes as a function of structure (contact heterogeneity). Figure 4*b* illustrates the epidemiological consequences of rewiring for a particular probability of transmission (*T*=0.25). The *y*-intercept values in figure 4*b* (structural distance=0) indicate the amount of error realized if one uses a homogeneous-mixing model to make predictions on an exponential network. For example, in a population with a mean degree of 8, the homogeneous-mixing model has a 17% error in the prediction for final size of epidemic; while for a network with a mean degree of 16 (such as the urban contact network for Portland, Oregon discussed in §4.1), the error is approximately 8%. In general, for a given probability of transmission, the discrepancy between the homogeneous-mixing model and the true epidemiology decreases as the mean degree increases. Thus, for highly connected exponential networks, the homogeneous-mixing compartmental models may offer reasonable approximations.

As rewiring progresses, the degree distribution of the network approaches an exponential distribution, and the epidemiological behaviour of the network approaches that of an exponential network, at first slowly and then rapidly. The shape of the curves in figure 4*b* indicates that essentially the entire network must be restructured before the epidemiological consequences begin to resemble those of an exponential network. In other words, epidemiological models (beyond those that make the homogeneous-mixing assumption) need to incorporate almost all of the variation of the true network in order to reduce the error in the epidemiological predictions even slightly.

## 5. Analytical approaches for incorporating heterogeneity

As the tail of a network's degree distribution grows, i.e. as the variability in the number of contacts increases, disease dynamics increasingly differ from those predicted by homogeneous-mixing models in two respects. First, early in an outbreak, the probability of a contact between a susceptible and infected individual is *higher* than expected because outbreaks are initially biased towards individuals who have high numbers of contacts and are thus epidemiologically vulnerable. Second, late in an outbreak, the reverse occurs, and disease-causing contacts are fewer than expected because the newly infected and remaining susceptible populations tend to have low numbers of contacts. Figure 5 illustrates this changing distribution of contacts in the susceptible and infectious populations over the course of an outbreak. As the disease spreads, not only does the average number of contacts per infected individual change, but also the variability in the number of contacts decreases. Consequently, the homogeneous-mixing models underestimate disease burden early in the outbreak and overestimate it towards the end, as illustrated in figure 2*d*,*f*,*h*.

### 5.1 Network and other individual-based models

Incorporating these heterogeneities into analytical models of disease spread has proved to be a critical challenge. Differential equations-based models enjoy the advantages of being tractable, yielding to sensitivity analysis and providing temporal behaviour, but they often lack attention to detail in the underlying process. Individual-based models allow individual-level heterogeneities to be taken into account and can prove to be very useful in making public health decisions, but can often be intractable. Despite the challenges, many powerful analytical approaches have been developed in recent years to make predictions of disease spread in heterogeneous populations. We review some of these approaches here.

Percolation theory methods are based on generating functions, and only require the degree distribution of the network and the average transmissibility *T* of the pathogen, i.e. the probability that an infected individual will transmit disease to a susceptible contact during his or her infectious period. These methods are very general and mathematically tractable. They provide excellent final-state predictions, but do not predict the dynamics of an outbreak (Moore & Newman 2000; Newman 2002; Meyers *et al*. 2003, 2005, 2006). Pair approximation methods (Keeling 1999) are based on a differential equation model of counting the number of pairs of individuals in each disease class. These methods are particularly useful for networks with clustering or spatial networks. There are many useful extensions of pair approximation methods for networks with various structures (e.g. Eames & Keeling 2002), but we consider only the simplest of these models in the analysis below. Moreno *et al*. (2002) and Pastor-Satorras & Vespignani (2002) have developed a dynamical framework based on a system of degree-based differential equations. These models track numbers of individuals in each disease state for each degree class (there are 3*k* differential equations in the system for an SIR model, where *k* is the number of unique degree classes). Finally, Volz (in press) has developed a powerful probability generating function (PGF)-based differential equation model that accurately tracks global epidemiological dynamics in complex random networks (we give details of each of these approaches in appendix D in the electronic supplementary material).

In figure 2, we compare the predictions of these four network-based methods for four different types of random networks: regular; Poisson; exponential; and scale-free. The networks are generated using the configuration model (Molloy & Reed 1995) and the parameters for the degree distributions are chosen so that each network has an identical average degree of 10. In particular, let *p*_{k} be the probability that a randomly chosen node has degree *k*. All nodes in the regular network have degree *k*=10; the Poisson degree distribution is given by , where *θ*=10; the exponential degree distribution is given by , where *θ*=9.49; and the scale-free degree distribution is given by , where *θ*=1.875 and *ζ*(*θ*) is the Riemann zeta function. We thereby hold network density constant and investigate only the effects of contact heterogeneity on disease dynamics.

The predictions from these models are compared with stochastic simulations of disease transmission on the same networks in terms of both the total burden of morbidity in the population (figure 2*a*,*c*,*e*,*g*, dotted black line) and the disease incidence dynamics (figure 2*b*,*d*,*f*,*h*, grey lines). Accurate predictions of these quantities are critical to efficient and timely implementation of public health measures. Generally, the homogeneous-mixing model does not perform as well as the other methods on complex random graphs. The percolation, heterogeneous-mixing and the dynamical PGF-based methods generally perform well, but may require more information (namely the degree distribution) than the other methods. Recall, however, that the realistic networks described above were reasonably approximated by single-parameter exponential degree distributions. Whenever a network closely resembles a random network with an idealized, low-parameter degree distribution, these methods do not require many parameters, if any more than the corresponding compartmental models.

### 5.2 Modified compartmental models

We have argued that the homogeneous-mixing compartmental model essentially assumes that contact patterns within a population form a regular random network, and shown that real-world contact patterns often exhibit more heterogeneity. The homogeneous-mixing compartmental model fails to make accurate predictions for such networks because it does not account for the evolving structure of the population (figure 5). In particular, when there is significant variability in contact rates, contacts between infected and susceptible individuals will initially be more frequent than predicted by homogeneous-mixing models.

Although there are several powerful network-based analytical methods for predicting disease transmission on complex networks, it would still be desirable to develop a method that works within the simple compartmental framework (Levin & Durrett 1996; Diekmann *et al*. 1998). If successful, this would help bridge the conceptual gaps between the new and old methods and offer an appealing alternative to researchers already comfortable with this methodology. Several epidemiologists have taken steps in this direction (Severo 1969; Liu *et al*. 1987; Heathcote & Nicholls 1990; Hochberg 1991; Keeling 2005; Aparicio & Pascual 2007; Stroud *et al*. 2006; Roy & Pascual 2006).

The most recent effort has been the most direct one: Aparicio & Pascual (2007) have modified the homogeneous-mixing compartmental model with a network-based approximation of the reproductive ratio, *R*_{0}, and have demonstrated the success of the approach on Poisson and small-world random networks. Although effective for these particular networks, the modification does not capture the evolution of contact patterns that occurs over the course of an outbreak. Figure 6 compares this approach with the homogeneous-mixing compartmental model. Aparicio and Pascual do not actually discuss adaptations of their model to exponential or scale-free networks. To evaluate its performance on such networks, we have inserted analytical estimates of *R*_{0}, based on the specific structures of these networks (Meyers *et al*. 2005). That is, , where *τ* is the average probability of infection per contact and time step; *γ* is the recovery probability; and 〈*k*〉 and 〈*k*^{2}〉 are the average degree and average squared degree in the networks, respectively. As seen in figure 6*c*–*f*, the modification based on the expected number of secondary cases early in the outbreak gives a gross overestimate of epidemiological predictions.

A different class of approaches to bridge the gap between homogeneous and individual-based models uses epidemic data. Keeling (2005) suggests modifying the transmission term to be a function of time, *β*(*t*), and fitting this time-dependent term to case-reporting data. Keeling has shown that this approach works well, given sufficient epidemiological data. Using a model similar to those introduced by Severo (1969), Liu *et al*. (1987) and Hochberg (1991), Roy & Pascual (2006) suggest modifying the infection term of the homogeneous-mixing compartmental model to be *τ**k**S*^{p}*I*^{q}, where the ‘heterogeneity parameters’ *k*, *p* and *q* are estimated via least-squares fitting to simulation data. They demonstrate the success of this approach on small-world networks. Similarly, Stroud *et al*. (2006) modify the term representing the proportion of susceptibles in the homogeneous-mixing model with an empirical exponent , where *v* is estimated from simulation data. They successfully demonstrate this approach on several urban contact networks (Stroud *et al*. 2006). We choose to evaluate the Stroud model as a representative of this class of modifications (figure 6).

Figure 6 shows that neither of the tested modifications succeeds in capturing the disease dynamics across all network classes and probabilities of transmission. These efforts to account for contact heterogeneity seem somewhat ad hoc, and not justified from ‘first principles’. A flexible modification to the simple framework must be able to capture the evolving distributions of contacts illustrated in figure 5. During the course of an epidemic, the epidemiologically active portion of the network (individuals who are either susceptible or infectious) shrinks and changes in contact structure. The nature of this change fundamentally depends on the initial network structure. The homogeneous-mixing compartmental model takes into account the changing *densities* of the epidemiologically active individuals in the population (changing values of *S* and *ι*_{t}), but not the changing *distribution of contacts* among epidemiologically active individuals (constant *α*).

Looking closer at figure 5, we observe that, in heterogeneous networks, the average degree of infected nodes is initially significantly higher than the average degree in the network as a whole (all four networks have an average degree of 10). This is because high-degree nodes are more likely to be infected than low-degree nodes. This value decreases as the epidemic progresses and high-degree nodes recover from disease and thereby are removed from the epidemiologically active portion of the network. The average degree among susceptible nodes likewise decreases owing to the removal of high-degree nodes. This figure also sheds light on the mapping of the homogeneous-mixing compartmental model onto the regular random network model as discussed in §3. The contact structure of the regular random graph cannot change since all nodes have identical numbers of contacts.

In theory, the simple compartmental model can be modified to capture this structural evolution. We recall that the equation describing the dynamics of infected individuals in the homogeneous-mixing compartmental model iswhere . Our discussion above suggests that *α*, which was originally defined as the number of contacts for a typical node (i.e. the average degree in the original network 〈*k*〉), should be modified to give the average number of contacts among *currently* susceptible nodes, as given byWe must also modify *ι*_{t} from the fraction of currently infected individuals to the fraction of all edges (contacts) that lead to infected individuals. The total number of edges leading to infected individuals at time *t* is , where is the average degree among currently infected nodes, and the number of all contacts in the network is *N*〈*k*〉, where 〈*k*〉 is the average degree in the original network. Thus, we defineWith these modifications to the model parameters, we can redefine the force of infection asThis modification to the standard compartmental model (replacing the force of infection term *λ* with ) is both flexible and theoretically motivated. The exact form of , however, depends on the contact structure of the population and can be found with one of two methods. One could use analytical methods borrowed from network modelling to estimate (or calculate exactly) the function . This would mean, however, that an entire set of network equations would have to be embedded within the compartmental framework, defeating the purpose of a simple modification. Another option would be to simulate exactly a full epidemic on the contact network, and estimate by fitting linear or quadratic functions to the values of and as a function of outbreak size from the simulation data. Figure 6 shows that this option indeed performs better than existing modifications, but is cumbersome in comparison with the latest generation of network models. Thus, we do not advocate its general use, but present it simply to provide insight into the mechanisms of infection during an epidemic.

## 6. Conclusion

Homogeneous-mixing models have had a long history of success (Anderson & May 1992) and continue to produce valuable results. Until recently, individual-based models have been considered computationally prohibitive and unnecessary, given the flexibility of the compartmental framework to analytically divide hosts into multiple demographic classes. With the recent development of several powerful analytical approaches to model disease spread through heterogeneous populations, however, we must ask: what have the simplest models been missing, if anything and which models should we use in the future?

The answer to the first question is not as bleak as one might fear. Epidemiologists have recently focused on scale-free contact networks made up of a chaste majority and a small, highly promiscuous minority. Human contact patterns indeed exhibit more heterogeneity than assumed by homogeneous-mixing models, but they do not appear generally to be scale-free. The diverse populations we considered above have exponentially distributed contact patterns (figure 3). The epidemiological behaviour of exponential networks is typically much closer to the predictions of the homogeneous-mixing models than that of scale-free networks, although this depends on the transmission rate of the pathogen (figure 2).

The answer to the second question, in some cases, is a matter of taste. For many studies, there is a single population (and thus a contact network) under consideration. If the network is close to homogeneous, the homogeneous-mixing compartmental model is a reasonable choice, although network models are equally tractable, and can be reduced to a few parameters by assuming an idealized approximation for the degree distribution (figure 2). If the population is heterogeneous, but falls into a few specific classes of networks, there may be simple modifications to the homogeneous-mixing compartmental framework that perform quite well (e.g. Aparicio & Pascual 2007), and again one has a choice of frameworks (figure 6). If, however, the network is exponential or scale-free, or perhaps is not well approximated by any of these common distributions, then neither the homogeneous-mixing model nor the modifications can adequately capture the evolving structure of the host network (figure 5).

We have not considered several classes of more complex compartmental models that incorporate various sources of heterogeneity in the host population. Through additional parameters, these may perform better than the simple models on complex host networks, or be more easily adapted to do so. For example, Hethcote and Yorke successfully modelled the spread of gonorrhoea by dividing populations into several homogeneously mixed groups based on gender and sexual activity (Hethcote & Yorke 1984). Anderson and May evaluated vaccination programmes for childhood infections using age-structured models, in which the transmission term is replaced by a ‘who acquires infection from whom’ matrix of transmission rates (Anderson & May 1985). Bjornstad and Grenfell have developed the time-series SIR (TSIR) framework to model seasonal changes in contact patterns that influence the spread of measles (Bjørnstad *et al*. 2002; Grenfell *et al*. 2002). Ball and colleagues introduced a general patch model in which hosts mix at high rates locally and at low rates globally (Ball *et al*. 1997). Although these approaches have been highly successful, we note that several of them require an *a priori* categorization of individuals into epidemiological type classes (e.g. by age, sex or location). In general, it may be difficult to identify epidemiologically meaningful groupings, particularly for a newly emerging disease.

When contacts are heterogeneous, it thus makes sense to consider the growing toolkit of tractable network-based methods that explicitly consider individual-level contact patterns and can predict the changing structure of a contact network as disease spreads through it. These methods may appear intimidating, but are actually intuitive upon closer inspection. They enable straightforward mathematical calculations of disease transmission dynamics through non-standard populations, and allow detailed demographic predictions that can serve as vital input to public health decisions.

## Acknowledgments

The authors thank Richard Rothenberg and Sara Del Valle for sharing data; Shashank Khandelwal for algorithm implementation; Erik Volz for technical advice and stimulating discussions; and three anonymous referees for their valuable input. S.B. acknowledges the support of the NASA-Jenkins Fellowship Program and L.A.M. acknowledges grant support from the James S. McDonnell Foundation.

## Footnotes

One contribution of 20 to a Theme Issue ‘Cross-scale influences on epidemiological dynamics: from genes to ecosystems’.

- Received May 11, 2007.
- Accepted June 22, 2007.

- © 2007 The Royal Society