## Abstract

The primary tool for predicting infectious disease spread and intervention effectiveness is the mass action susceptible–infected–recovered model of Kermack & McKendrick. Its usefulness derives largely from its conceptual and mathematical simplicity; however, it incorrectly assumes that all individuals have the same contact rate and partnerships are fleeting. In this study, we introduce *edge-based compartmental modelling*, a technique eliminating these assumptions. We derive simple ordinary differential equation models capturing social heterogeneity (heterogeneous contact rates) while explicitly considering the impact of partnership duration. We introduce a graphical interpretation allowing for easy derivation and communication of the model and focus on applying the technique under different assumptions about how contact rates are distributed and how long partnerships last.

## 1. Introduction

The conceptual and mathematical simplicity of Kermack & McKendrick's [1,2] mass action susceptible–infected–recovered (SIR) model has made the model the most popular quantitative tool to study infectious disease spread for over 80 years. However, it ignores important details of the fabric of social interactions, assuming homogeneous contact rates and negligible partnership duration. Improvements are largely *ad hoc*, spanning the range between mild modifications of the model and elaborate agent-based simulations [2–7]. Increased complexity allows us to incorporate more realistic effects, but at a price. It becomes difficult to identify which variables drive disease spread or to address sensitivity to changing the underlying assumptions. In this study, we show that shifting our attention to the status of an average partner rather than an average individual yields a surprisingly simple mathematical description, expanding the universe of analytically tractable models. This allows epidemiologists to consider more realistic social interactions and test sensitivity to assumptions, improving the robustness of public health recommendations.

We motivate our approach using the standard mass action (MA) SIR model. We are interested in the susceptible *S*(*t*), infected *I*(*t*) and recovered *R*(*t*) proportions of the population as time *t* changes. Under MA assumptions, an infected individual causes new infections at rate , where is the per-infected transmission rate and *S* is the probability that the recipient is susceptible. Recovery to an immune state happens at rate *γ*. The flux of individuals from susceptible to infected to recovered is represented by a flow diagram (figure 1), making the model conceptually simple. This leads to a simple mathematical interpretation, the low-dimensional, ordinary differential equation (ODE) system

The dot denotes differentiation in time. An ODE system allows for easy prediction of details such as early growth rates, final sizes and intermediate dynamics. Using *S* + *I* + *R* = 1, we re-express this as
1.1

The product *IS* measures the proportion of partnerships that are from an infected individual to a susceptible individual.

The MA model often provides a reasonable description of epidemics; however, it has well-recognized flaws, which cause the frequency of infected to susceptible partnerships to vary from *IS*. We highlight two. It neglects both *social heterogeneity*, variation in contact rates which can be quite broad [8,9], and *partnership duration*, implicitly assuming all partnerships are infinitesimally short. Because of these omissions, model predictions can differ from reality. For example, owing to social heterogeneity, early infections tend to have more partnerships [10] and may cause more infections than ‘average’ individuals, enhancing the early spread over that predicted by the MA model [11–15]. When partnership duration is significant, an infected individual may have already infected its partners, reducing its ability to cause new infections. Because of these assumptions, the MA model predicts the same results for a sexually transmitted disease in a completely monogamous population, a population with serial monogamy, and a population with wide variation in contact levels with mean one. Intuitively, we expect these to produce dramatically different epidemics, but no existing mathematical theory allows analytical comparisons.

Over the past 25 years, attempts have been made to eliminate these assumptions without sacrificing analytical tractability. With few exceptions (notably Volz & Meyers [16]), these make an ‘all-or-nothing’ assumption about partnership duration: partnerships are fleeting and never repeated or they never change. With fleeting partnerships, social heterogeneity is introduced by adding multiple risk groups to the MA model: in extreme cases, there are arbitrarily many subgroups (the *mean field social heterogeneity* model) [2,17–20]. This model is relatively well understood and can be rigorously reduced to a handful of equations. With permanent partnerships, social heterogeneity is introduced through static networks [12–14,21,22]. Static network results typically give the final size but no dynamic information. Some attempts to predict dynamics with static networks use pair approximation techniques [23] relying on approximation of network structures. More rigorous approaches avoid these approximations, but are more difficult [24–27]. Of these, only Volz [24] yields a closed ODE system, (see also [28,29]). This model lacks an illustration like figure 1, hampering communication and further development. Finite, non-zero partnership duration is typically handled through simulation, which is often too slow to effectively study parameter space.

Although no coherent mathematical structure to study social heterogeneity and partnership duration exists, there have been studies collecting these data in real-world contexts [9,10,30,31]. Typically, the resulting measurements have been reduced to average contact rates to make the mathematics tractable. Much of the available and potentially relevant detail is therefore discarded because existing models cannot capture the detail collected.

We find that the appropriate perspective allows us to develop conceptually and mathematically simple models that incorporate social heterogeneity and (arbitrary) partnership duration. This provides a unifying framework for existing models and allows an expanded universe of models. Our goal in each case is to calculate the susceptible, infected or recovered proportions of the population, but we find that this can be answered more easily using an equivalent problem. We ask the question, ‘what is the probability that a randomly chosen *test node* *u* is susceptible, infected or recovered?’ Because *u* is chosen randomly, the probability that *u* is susceptible equals the proportion susceptible *S*(*t*), and similarly for *I* and *R*. If we know *S*(*t*), then the initial conditions and , *I* = 1 − *S* − *R* determine *I*(*t*) and *R*(*t*) as in equation (1.1).

The probability that *u* is susceptible is the probability that no partner has ever transmitted infection to *u*. The method to calculate this is the focus of this study. This probability depends on how many partners *u* has, the rate its partners change, and the probability that a random partner is infected at any given time. Because a random partner is likely to have more partners than a random individual, knowing the infected fraction of the population does not give the probability that a partner of *u* is infected. We focus on the probability that a random partner is infected rather than the probability a random individual is infected. Once we calculate this, it is straightforward to calculate that the probability *u* is susceptible. The resulting *edge-based compartmental modelling* approach significantly increases the effects we can study compared with MA models, but with only a small complexity penalty.

In this study, we consider the spread of epidemics in two general classes of networks: *actual degree networks* (based on *configuration model* networks [32–34]) and *expected degree networks* (based on *mixed Poisson* network; commonly called *Chung-Lu* networks [35–37]). In both cases, we can consider static and dynamic networks. In actual degree networks, a node is assigned *k* stubs where *k* is a random non-negative integer assigned independently for each node from some probability distribution. Edges are created by pairing stubs from different nodes. In expected degree networks, a node is assigned *κ*, where *κ* is a random non-negative real number. Edges are assigned between two nodes *u* and *v* with probability proportional to *κ*_{u}*κ*_{v}. We develop exact differential equations for the large population limit, which we compare with simulations. Detailed descriptions of the simulation techniques are given in the electronic supplementary material.

For many assumptions about population structure, the equations we derive are equivalent to those found by others. These previous equations have not been widely used. We speculate that this is because the derivation underlying the earlier equations are not for the faint of heart, and consequently the models have been conceptually difficult as well. The main result of this study is a conceptual shift, after which the derivations become straightforward, and the equations become simpler.

We summarize the populations we consider in table 1. We begin by analysing the simplest edge-based compartmental model in detail, exploring epidemic spread in a static network of known degree distribution, a configuration model (CM) network. To derive the equations, we introduce a flow diagram that leads to a simple mathematical formulation. We next consider disease spreading through dynamic actual degree networks and then static and dynamic expected degree networks. The template shown here allows us to derive a handful of ODEs for each of these populations. Unsurprisingly, the stronger our assumptions, the simpler our formulation becomes. We neglect heterogeneity within the population other than the contact rates, assume that the disease has a very simple structure and assume that the population is at statistical equilibrium prior to disease introduction.

## 2. Configuration model epidemics

We demonstrate our approach with CM networks. A CM network is static with a known degree distribution (the distribution of the number of partners). We create a CM network with *N* nodes as follows: we assign each node *u* its degree *k*_{u} with probability *P*(*k*_{u}) and give it *k*_{u} *stubs* (half-edges). Once all nodes are assigned stubs, we pair stubs randomly into edges representing partnerships. The probability a randomly selected node *u* has degree *k* is *P*(*k*). In contrast, the probability a stub of *u* connects to some stub of *v* is proportional to *k*_{v}. So the probability a randomly selected partner (or neighbour) of *u* has degree *k* is

See earlier studies [15,38] for more detail. A sample CM network is shown in figure 2.

We assume that the disease transmits from an infected node to a partner at rate *β*. If the partner is susceptible, then it becomes infected. Infected nodes recover at rate *γ*. Throughout, we assume a large population, a small initial proportion infected, a small initial proportion of stubs belonging to infected nodes and a growing outbreak. Our equations become correct once the number of infections *NI* has grown large enough to behave deterministically, whereas the proportion infected *I* is still small. Until stochastic effects are unimportant, other methods such as branching process approximations [39] (which apply in large populations with small numbers infected) may be more useful.

To calculate *S*(*t*), *I*(*t*) and *R*(*t*), we note that these are the probabilities that a random *test node* *u* is in each state. We calculate *S*(*t*) by noting that it is also the probability none of *u*'s partners has yet transmitted to *u*. We would like to treat each partner as independent, but the probability that one partner *v* has become infected is affected by whether another partner *w* of *u* has transmitted to *u* as *u* could infect *v*. Accounting for this directly requires considerable bookkeeping. A simpler approach removes the correlation by assuming that *u* causes no infections. This does not alter the state of *u*: the probabilities we calculate for *u* will be the proportion of the population in each state under the original assumption that *u* behaves as any other node, and so this yields an equivalent problem. Further discussion of this modification is given in the electronic supplementary material.

We define *θ*(*t*) to be the probability that a randomly chosen partner has not transmitted to *u*. Initially, *θ* is close to 1. For large CM networks, partners of *u* are independent. So given its degree *k*, *u* is susceptible at time *t* with probability *s*(*k*,*θ*(*t*)) = *θ*(*t*)^{k}. Thus, where
is the probability generating function [40] of the degree distribution (such functions have many useful properties, of which we use three: its derivative is , its second derivative is and *ψ*′(1) = 〈*K*〉). For many important probability distributions, *ψ* takes a simple form, which simplifies our examples. Combining with the flow diagram for *S*, *I* and *R* in figure 3, we have

To calculate the new variable *θ*, we break it into three parts: the probability that a partner *v* is susceptible at time *t*, *ϕ*_{S}; the probability that *v* is infected at time *t* but has not transmitted infection to *u*, *ϕ*_{I}; the probability that *v* has recovered by time *t* but did not transmit infection to *u*, *ϕ*_{R}. Then *θ* = *ϕ*_{S} + *ϕ*_{I} + *ϕ*_{R}. Initially, *ϕ*_{S} and *θ* are approximately 1 while *ϕ*_{I} and *ϕ*_{R} are small (they sum to *θ* − *ϕ*_{S}). The flow diagram for *ϕ*_{S}, *ϕ*_{I}, *ϕ*_{R} and 1 − *θ* (figure 3) shows the probability fluxes between these compartments. The rate an infected partner transmits to *u* is *β*; so the *ϕ*_{I} to 1 − *θ* flux is *β**ϕ*_{I}. We conclude . To find *ϕ*_{I}, we will use *ϕ*_{I} = *θ*− *ϕ*_{S} − *ϕ*_{R} and calculate *ϕ*_{S} and *ϕ*_{R} explicitly.

The rate at which an infected partner recovers is *γ*. Thus the *ϕ*_{I} to *ϕ*_{R} flux is *γ**ϕ*_{I}. This is proportional to the flux into 1 − *θ* with the constant of proportionality *γ*/*β*. As *ϕ*_{R} and 1 − *θ* both begin as approximately 0, we have *ϕ*_{R} = *γ*(1 − *θ*)/*β*. To find *ϕ*_{S}, recall a partner *v* has degree *k* with probability *P*_{n}(*k*) = *kP*(*k*)/〈*K*〉. Given *k*, *v* is susceptible with probability *θ*^{k−1} (we disallow transmission from *u*; so *k* − 1 nodes can infect *v*). A weighted average gives
Thus
Thus our equations become
2.1and
2.2

This captures substantially more population structure than the MA model with only marginally more complexity. This is the system of Miller [28] and is equivalent to that of Volz [24]. It improves on approaches of earlier studies [25,26], which require either 𝒪(*M*) or 𝒪(*M*^{2}) ODEs, where *M* is the (possibly unbounded) maximum degree. This derivation is simpler than Volz [24] because we choose variables with a conserved quantity, simplifying the bookkeeping.

The edge-based compartmental modelling approach we have introduced forms the basis of our study. Depending on the network structure, some details will change. However, we will remain as consistent as possible.

### 2.1. ℛ_{0} and final size

One of the most important parameters for an infectious disease is its basic reproductive number ℛ_{0}, the average number of infections caused by a node infected early in an epidemic. When ℛ_{0} < 1 epidemics are impossible, while when ℛ_{0} > 1 they are possible, though not guaranteed. For this model, we find that ℛ_{0} is (see the electronic supplementary material)
2.3

We want the expected final size if an epidemic occurs. We set in equation (2.1) and solve
2.4for *θ*(∞). If ℛ_{0} > 1 then this has two solutions, the larger of which is *θ* = 1 (the pre-disease equilibrium). We want the smaller solution. The total fraction of the population infected in the course of an epidemic is ℛ(∞) = 1 − *ψ*(*θ*(∞)). These calculations of ℛ_{0} and ℛ(∞) are in agreement with previous observations [11–13,41,42]. When ℛ_{0} < 1, our approach breaks down: full details are given in the electronic supplementary material.

### 2.2. Example

We consider a disease with *β* = 0.6 and *γ* = 1. Figure 4 compares simulations with solutions to our ODEs using four different CM networks, each with 5 × 10^{5} nodes and average degree 〈*K*〉 = 5, but different degree distributions. In order from latest peak to earliest peak, the networks are: every node has degree 5, the degree distribution is Poisson with mean 5, half the nodes have degree 2 and the other half degree 8 and finally a truncated powerlaw distribution in which *P*(*k*) ∝ *k*^{−ν}e^{−k/40}, where *ν* = 1.418. We see that the degree distribution significantly alters the spread, with increased heterogeneity leading to an earlier peak, but generally a smaller epidemic. Our predictions fit, while the MA model using fails.

## 3. Actual degree models

For the CM networks, each node has a specific number of stubs. Edges are created by pairing stubs, and no partner changes occur. In generalizing to dynamic ‘actual degree’ models, we assign each node a number of stubs, but allow edges to break and the freed stubs to create new edges. We consider three limits. In the first, the mean field social heterogeneity (MFSH) model, at every moment a stub is connected to a new partner. In the second, the dynamic fixed-degree (DFD) model, edges last for some time before breaking. When an edge breaks, the stubs immediately form new edges with stubs from other edges that have just broken. In the third, the dormant contact (DC) model, we assume edges break as in the DFD model, but stubs wait before finding new partners.

### 3.1. Mean field social heterogeneity

For the MFSH model, the number of stubs *k* an individual has is assigned using the probability mass function *P*(*k*). At each moment in time, all stubs are rewired to form new edges. A sample network at two times is shown in figure 5.

We analyse the MFSH model similarly. We take *θ* as the probability a stub has never transmitted infection to the test node *u* from any partner. To define *ϕ*_{S}, *ϕ*_{I} and *ϕ*_{R}, we require that the stub has not transmitted infection to *u* and additionally the current partner is susceptible, infected or recovered. As at each moment an individual chooses a new partner, the probability of connecting to a node of a given status is the proportion of all stubs belonging to nodes of that status. We must track the proportion of stubs that belong to susceptible, infected or recovered nodes *π*_{S}, *π*_{I} and *π*_{R}. Because of the rapid turnover of partners, we find that *ϕ*_{S} is the product of the probability that a stub has not transmitted *θ* with the probability it has just joined with a susceptible partner *π*_{S} so *ϕ*_{S} = *θ**π*_{S}. Similarly, *ϕ*_{I} = *θ**π*_{I} and *ϕ*_{R} = *θ**π*_{R}.

We create flow diagrams as before in figure 6. The *S*, *I* and *R* diagram is unchanged, but the diagram for the *ϕ* variables and 1 − *θ* changes. There are no *ϕ*_{S} to *ϕ*_{I} or *ϕ*_{I} to *ϕ*_{R} fluxes because of the explicit assumption that the partners associated with a single stub any two times are independent. The change in partner status is owing to ending and forming partnerships rather than infection or recovery of the partner. The flux into 1 − *θ* from *ϕ*_{I} is *β**ϕ*_{I} as before. We need a new flow diagram for *π*_{S}, *π*_{I} and *π*_{R} similar to that for *S*, *I* and *R*. Stubs belonging to infected nodes become stubs belonging to recovered nodes at rate *γ*, thus . We calculate *π*_{S} explicitly: the probability a stub belongs to a node of degree *k* is *P*_{n}(*k*), and the probability the node is susceptible is *θ*^{k}. A weighted average ∑*kP*_{n}(*k*)*θ*^{k} gives *π*_{S} = *θ**ψ*′(*θ*)/*ψ*′(1). Finally, *π*_{I} = 1 − *π*_{S} − *π*_{R}.

Combining these observations . So, *π*_{R} = −(*γ*/*β*) ln *θ* (the constant of integration is 0). We have
and . Thus
3.1and
3.2

The MFSH model has been considered previously [2,17–20], with the population stratified by degree. Setting *ζ* to be the proportion of all stubs that belong to infected nodes (equivalent to *π*_{I} above), the pre-existing system is
where *S*_{k} and *I*_{k} are the probabilities a random individual with *k* partners is susceptible or recovered. A known change of variables reduces this to a few equations equivalent to ours.

#### 3.1.1. ℛ_{0} and final size

We find
3.3consistent with previous observations [2]. The total proportion infected is *R*(∞) = 1 − *ψ*(*θ*(∞)), where
3.4Full details are given in the electronic supplementary material.

### 3.2. Dynamic fixed-degree

The DFD model interpolates between the CM and MFSH models. We assign each node's degree *k* as before and pair stubs randomly. As time progresses, edges break. The freed stubs immediately join with stubs from other edges that break, a process we refer to as ‘edge swapping’. The rate an edge breaks is *η*. A sample network at two times is shown in figure 8.

We develop flow diagrams (figure 9) as before. The *S*, *I* and *R* diagram is unchanged. We again track the probabilities *π*_{S}, *π*_{I} and *π*_{R} that a random stub belongs to a susceptible, infected or recovered node, respectively. The diagram is unchanged. The diagram for *θ* and the *ϕ* variables changes: we have fluxes from *ϕ*_{S} to *ϕ*_{I} and *ϕ*_{I} to *ϕ*_{R} representing infection or recovery of the partner as in the CM model, but we also have fluxes from *ϕ*_{S} to *ϕ*_{S}, *ϕ*_{I} or *ϕ*_{R} resulting from edge swapping. We have similar edge swapping fluxes from *ϕ*_{I} and *ϕ*_{R}. The flux into *ϕ*_{S} from edge swapping is *η**θ**π*_{S}. The flux out of *ϕ*_{S} from edge swapping is *η**ϕ*_{S}. Similar results hold for *ϕ*_{I} and *ϕ*_{R}.

Our earlier techniques to find *ϕ*_{I} break down. We solve for *ϕ*_{S} and *ϕ*_{I} using ODEs. To complete the system, we need the *ϕ*_{S} to *ϕ*_{I} flux. Consider a partner *v* of our test node *u* such that: the stub belonging to *u* never transmitted to *u* and the stub belonging to *v* never transmitted to *v* prior to the *u*–*v* edge forming. Given this, the probability *v* is susceptible is (1). Thus, given that *v* is susceptible, *v* becomes infected at rate

Thus the *ϕ*_{S} to *ϕ*_{I} flux is the product of *ϕ*_{S}, the probability that a stub has not transmitted infection to the test node and connects to a susceptible node, with *β**ϕ*_{I}*ψ*″(*θ*)/*ψ*′(*θ*), the rate the node becomes infected given that the stub has not transmitted and connects to a susceptible node. This completes figure 9.

The model requires more equations, but remains relatively simple: 3.5 3.6 3.7 3.8and 3.9This is simpler than, but equivalent to, the model of Volz & Meyers [16].

#### 3.2.1. ℛ_{0} and final size

We find 3.10

We do not find a simple expression for final size. Instead, we must solve the ODEs numerically. Full details are given in the electronic supplementary material.

#### 3.2.2. Example

We choose a population having negative binomial degree distribution NB(4,1/3) with size parameter *r* = 4 and probability *p* = 1/3. Thus . The mean is 2 and the variance 3. For negative binomial distributions, *ψ*(*x*) = [(1 − *p*)/(1 − *px*)]^{r}; so

We take *β* = 5/4, *γ* = 1 and *η* = 1/2. The equations accurately predict the spread (figure 10).

Our simulations are slower because we must track edges; so we have used smaller population sizes. To reduce noise, we perform 250 simulations, averaging the 102 that become epidemics.

### 3.3. Dormant contacts

We finally generalize the DFD model, allowing stubs to enter a dormant phase after edges break. This is the most general model we present. It reduces to any model of this study in appropriate limits [43]. It is appropriate for serial monogomy where individuals to not immediately find a new partner.

A node is assigned *k*_{m} stubs using the probability mass function *P*(*k*_{m}). We take . A stub is dormant or active depending on whether it is currently connected to a partner. The maximum degree of a node is *k*_{m} and the ‘active’ and ‘dormant’ degrees are *k*_{a} and *k*_{d}, respectively, *k*_{a} + *k*_{d} = *k*_{m}. In addition to *ϕ*_{S}, *ϕ*_{I} and *ϕ*_{R}, we add *ϕ*_{D} denoting the probability that a stub is dormant and has never transmitted infection from a partner; so *θ* = *ϕ*_{S} + *ϕ*_{I} + *ϕ*_{R} + *ϕ*_{D}. Active stubs become dormant at rate *η*_{2} and dormant stubs become active at rate *η*_{1}. A sample network at two times is shown in figure 11.

We now develop flow diagrams (figure 12). The diagram for *S*, *I* and *R* is as before. The diagram for *θ* and the *ϕ* variables is similar to the DFD model, but with the new compartment *ϕ*_{D}. The fluxes associated with edge breaking are at rate *η*_{2} times *ϕ*_{S}, *ϕ*_{I} or *ϕ*_{R} and go from the appropriate compartment into *ϕ*_{D}, for a total of *η*_{2}(*θ* − *ϕ*_{D}). To describe fluxes owing to edge creation, we generalize the definitions of *π*_{S}, *π*_{I} and *π*_{R} to give the probability that a stub is dormant (and thus available to form a new partnership) and belongs to a susceptible, infected or recovered node, with *π* = *π*_{S} + *π*_{I} + *π*_{R} the probability a stub is dormant. The probability that a new partner is susceptible, infected or recovered is *π*_{S}/*π*, *π*_{I}/*π* and *π*_{R}/*π*, respectively. The fluxes associated with edge creation occur at total rate *η*_{1}*ϕ*_{D}, with proportions *π*_{S}/*π*, *π*_{I}/*π* and *π*_{R}/*π* into *ϕ*_{S}, *ϕ*_{I} and *ϕ*_{R}, respectively.

The flow diagram for the *π* variables is related to that for the DFD model, but we must account for active and dormant stubs. We use *ξ*_{S}, *ξ*_{I} and *ξ*_{R} to be the probabilities that a stub is active and belongs to each type of node, with 1 − *π* = *ξ* =*ξ*_{S} + *ξ*_{I} + *ξ*_{R} the probability that a stub is active. The *π*_{S} to *ξ*_{S} and *ξ*_{S} to *π*_{S} fluxes are *η*_{1}*π*_{S} and *η*_{2}*ξ*_{S}, respectively. Similar results hold for the other compartments. We can use this to show from which we can conclude that (at equilibrium) *π* = *η*_{2}/(*η*_{1} + *η*_{2}) and *ξ* = *η*_{1}/(*η*_{1} + *η*_{2}). The fluxes from *ξ*_{I} and *π*_{I} to *ξ*_{R} and *π*_{R}, respectively, represent recovery of the node the stub belongs to and so are *γ**ξ*_{I} and *γ**π*_{I}, respectively.

We can calculate *ξ*_{S} and *π*_{S} explicitly. The probability a dormant stub belongs to a susceptible node is *π*_{S} = *ϕ*_{D}*ψ*′(*θ*)/*ψ*′(1), where *ϕ*_{D} is the probability the dormant stub has never received infection, and *ψ*′(*θ*)/*ψ*′(1) is the probability that none of the other stubs have received infection. Similarly, the probability that an active stub belongs to a susceptible node is *ξ*_{S} = (*θ* − *ϕ*_{D})*ψ*′(*θ*)/*ψ*′(1). We can use *π*_{I} = *π*− *π*_{S} − *π*_{R} and *ξ*_{I} = *ξ*− *ξ*_{S} − *ξ*_{R} to simplify the system further.

Our new equations are 3.11 3.12 3.13 3.14 3.15 3.16 3.17and 3.18

#### 3.3.1. ℛ_{0} and final size

We can show that 3.19

However, we have not found a simple final size relation. Details are given in the electronic supplementary material.

#### 3.3.2. Example

In figure 13, we consider the spread of a disease through a network with dormant edges. The distribution of *k*_{m} is Poisson with mean 3

The parameters are *β* = 2, *γ* = 1, *η*_{1} = *γ* and *η*_{2} = *γ*/2. Simulations are again slow; so we use a smaller population with 322 simulations, and average the 155 that become epidemics.

## 4. Expected degree models

We now consider SIR diseases spreading through ‘expected degree’ networks. In these networks, each individual has an expected degree *κ*, which need not be an integer. It is assigned using the probability density function *ρ*(*κ*). Edges are placed between two nodes with probability proportional to the expected degrees of the two nodes. In the actual degree models, once a stub belonging to *u* was joined into an edge, it became unavailable for other edges: the existence of a *u*–*v* edge reduced the ability of *u* to form other edges. In contrast, for expected degree models, edges are assigned independently: a *u*–*v* edge does not affect whether *u* forms other edges. Similar to actual degree models, a partner will tend to have larger expected degree than a randomly chosen node. The probability density function for a partner to have expected degree *κ* is *ρ*_{n}(*κ*) = *κ**ρ*(*κ*)/〈*K*〉.

Our approach remains similar. We consider a randomly chosen test node *u* which cannot infect its partners, and calculate the probability that *u* is susceptible. We first consider the spread of disease through static ‘mixed Poisson’ (MP) networks (also called Chung-Lu networks), for which an edge from *u* to *v* exists with probability *κ*_{u}*κ*_{v}/(*N* − 1)〈*K*〉. We then consider the expected degree formulation of MFSH. Following this, we consider the more general dynamic variable-degree (DVD) model, for which a node creates and deletes edges as independent events, unlike the DFD model where deleted edges were instantly replaced. The MP model produces equations effectively identical to the CM equations. The MFSH equations differ somewhat from the actual degree version, but may be shown to be formally equivalent [43]. The DVD equations are simpler than the DFD equations, and it may be more realistic because it does not enforce constant degree for an individual.

### 4.1. Mixed Poisson

We now consider the MP model. In this model, each node is assigned an expected degree *κ* using the probability density function *ρ*(*κ*). A *u*–*v* edge exists with probability *κ*_{u}*κ*_{v}/(*N* − 1)〈*K*〉 independently of other edges. We use the name ‘MP’ because at large *N* the actual degree of nodes with expected degree *κ* is chosen from a Poisson distribution with mean *κ*. The degree distribution is a mixture of Poisson distributions. An example is shown in figure 14.

Consider two nodes *u* and *v* whose expected degrees are *κ*_{u} and *κ*_{v} = *κ*_{u} + Δ*κ* with Δ*κ* ≪ 1. Our question is, how much does the additional Δ*κ* reduce the probability that *v* is susceptible? At leading order, it contributes an extra edge to *v* with probability Δ*κ*, and we may assume it contributes at most one additional edge. We define *Θ* to be the probability that an edge has not transmitted infection. With probability *Θ*Δ*κ* there is an additional edge that has not transmitted. The probability the extra Δ*κ* either does not contribute an edge or contributes an edge that has not transmitted is 1 − Δ*κ* + *Θ*Δ*κ* = 1 − (1 − *Θ*)Δ*κ*. If *s*(*κ*,*t*) is the probability that a node of expected degree *κ* is susceptible, then we have *s*(*κ* + Δ*κ*, *t*) = *s*(*κ*, *t*)[1 − (1 − *Θ*)Δ*κ*]. Taking Δ*κ* → 0, this becomes ∂*s*/∂*κ* =−(1 − *Θ*)*s*. Thus, *s*(*κ*,*t*) = exp[−*κ*(1 − *Θ*)] and *S*(*t*) = *Ψ*(*Θ*(*t*)), where

Note that this is the Laplace transform of *ρ* evaluated at 1−*x*. As before, figure 15 gives
and we need *Θ*(*t*).

We follow the CM approach almost exactly. The value of *Θ* is the probability that an edge has not transmitted to the test node *u*. We define *Φ*_{S}, *Φ*_{I} and *Φ*_{R} to be the probabilities that an edge has not transmitted to *u* and connects to a susceptible, infected or recovered node; so *Θ* = *Φ*_{S} + *Φ*_{I} + *Φ*_{R}. To calculate *Φ*_{S}, we observe that a partner *v* of *u* with expected degree *κ* has the same probability of having an edge to any *w* ≠ *u* as any other node of expected degree *κ* because edges are created independently of one another. Thus given *κ*, *v* is susceptible with probability *s*(*κ*,*t*). Taking the weighted average over all *κ* gives *Φ*_{S} = ∫_{0}^{∞}*ρ*_{n}(*κ*)*s*(*κ*,*t*)d*t* = ∫_{0}^{∞}*κ* exp[−*κ*(1 − *Θ*)]*ρ*(*κ*)d*κ*/〈*K*〉 = *Ψ*′(*Θ*)/*Ψ*′(1). The same techniques as for the CM networks give *Φ*_{R} = *γ*(1 − *Θ*)/*β*. Our equations are

This is almost identical to CM epidemics except that *Ψ* and *Θ* replace *ψ* and *θ* . This similarity is explained in [43].

#### 4.1.1. ℛ_{0} and final size

We find
where is the average of *κ*^{2} (which equals the average of *k*^{2} − *k*) and 〈*K*〉 is the average of *κ* (which equals the average of *k*). The total proportion infected by an epidemic is *R*(∞) = 1 − *Ψ*(*Θ*(∞)), where

Full details are given in the electronic supplementary material.

#### 4.1.2. Example

We consider a population whose distribution of expected degrees satisfies

Half the individuals have an expected degree between 0 and 2 uniformly, and the other half have expected degree between 10 and 20 uniformly. This gives

We take *β* = 0.15 and *γ* = 1, and perform simulations with a population of size 5 × 10^{5} generated using the 𝒪(*N*) algorithm of Miller & Hagberg [44]. We compare simulation and prediction in figure 16.

### 4.2. Mean field social heterogeneity

For the expected degree formulation of the MFSH model, the probability that an edge exists between *u* and *v* at time *t* is *κ*_{u}*κ*_{v}/(*N* − 1)〈*K*〉. Whether this edge exists at one moment is independent of whether it exists at any other moment and what other edges exist. A sample network at two times is shown in figure 17.

As before, we consider two nodes whose expected degrees differ by Δ*κ* and ask how much the additional Δ*κ* reduces the probability of being susceptible. The definition of *Θ* is slightly more problematic here because having an edge at one moment is independent of having one later. So it does not make sense to discuss the probability that an edge did not transmit previously because the edge did not exist previously. Instead, we note that in the MP case, (1 − *Θ*)Δ*κ* could be interpreted as the probability that the additional amount of Δ*κ* ever contributed an edge that had transmitted infection. Guided by this, we define *Θ* so that as Δ*κ* → 0, the extra amount of Δ*κ* has at some time contributed an edge that transmitted to the node with probability (1 − *Θ*)Δ*κ*. This again leads to
The flow diagram for *S*, *I* and *R* (figure 18) is unchanged:

To find the evolution of *Θ*, we define *Φ*_{S}, *Φ*_{I} and *Φ*_{R} to be the probabilities that a current edge connects *u* to a susceptible, infected or recovered node, respectively. The probability that a small extra amount Δ*κ* currently contributes an edge and previously had a different edge that transmitted scales like Δ*κ*^{2}(1 − *Θ*). As Δ*κ*^{2} ≪ Δ*κ*, this is negligible compared with the probability that there is a current edge. We conclude that at leading order, *Φ*_{S}Δ*κ* , *Φ*_{I}Δ*κ* and *Φ*_{R}Δ*κ* give the probability that the Δ*κ* contributes a current edge connected to a susceptible, infected or recovered node and there has never been a transmission owing to this extra Δ*κ*.

We can construct a flow diagram between *Φ*_{S} Δ*κ* , *Φ*_{I}Δ*κ*, *Φ*_{R}Δ*κ* and (1 − *Θ*)Δ*κ*. Because all of these have Δ*κ* in them, we factor it out to just use *Φ*_{S}, *Φ*_{I}, *Φ*_{R} and 1 − *Θ*. Because edges have no duration, there is no *Φ*_{S} to *Φ*_{I} or *Φ*_{I} to *Φ*_{R} flux. Instead, there is flux in and out of these compartments representing the continuing change of edges. The *Φ*_{I} to 1 − *Θ* flux is *β**Φ*_{I}.

Because edges have no duration, the probability that an edge connects to an individual of a given type is the probability a new edge connects to an individual of that type: *Φ*_{S} = *Π*_{S} , *Φ*_{I} = *Π*_{I} and *Φ*_{R} = *Π*_{R} where *Π*_{S}, *Π*_{I} and *Π*_{R} are the probabilities that a newly formed edge connects to a susceptible, infected or recovered node, respectively.^{1} We have *Π*_{S} = *Ψ*′(*Θ*)/*Ψ*′(1) and . As *Π*_{I} = *Φ*_{I}, this means . Integrating gives *Π*_{R} = *γ* (1 − *Θ*)/*β*. So
Since , finally, we have
and

Under appropriate limits, the MFSH equations in *k* reduce to those in *κ* and vice versa; so the models are equivalent [43]. Surprisingly, this system differs from the MP equations only in the first term of the equation.

#### 4.2.1. ℛ_{0} and final size

We find
and the final size is *R*(∞) = 1 − *Ψ*(*Θ*(∞)), where *Θ*(∞) solves

Full details are given in the electronic supplementary material.

#### 4.2.2. Example

For our example, we take a population with *ρ*(*κ*) = e^{κ}/(e^{3} − 1) for 0 < *κ*< 3 and 0 otherwise, giving

We take *γ* = 1 and *β* = 0.435 and compare simulation with theory in figure 19. We choose these parameters to demonstrate that the approach remains accurate for small ℛ_{0} = 1.04. We use a population of 15 × 10^{6}. There is noise as the epidemic does not infect a large number of people.

### 4.3. Dynamic variable-degree

The DVD model interpolates between the MP model and the expected degree formulation of the MFSH model. Each node is assigned *κ* using *ρ*(*κ*) and creates edges at rate *κ**η* (joining to another node also creating an edge). Existing edges break at rate *η*. Thus, a node has expected degree *κ*, though its value varies around *κ*. In fact, it is Poisson-distributed over time. A sample network at two times is shown in figure 20.

We define *Θ* such that the probability that a small Δ*κ* has ever contributed an edge that has transmitted infection is (1 − *Θ*)Δ*κ* . We again have

We generalize the earlier definitions and define *Φ*_{S}, *Φ*_{I} and *Φ*_{R} to be the probabilities a current edge has never transmitted infection and connects to a susceptible, infected or recovered node.

We define *Π*_{S}, *Π*_{I} and *Π*_{R} to be the probabilities a new edge connects to a susceptible, infected or recovered node. We have *Π*_{S} = *Ψ*′(*Θ*)/*Ψ*′(1), *Π*_{I} = 1 − *Π*_{S} − *Π*_{R} and . We build the flow diagram for *Φ*_{S}Δ*κ*, *Φ*_{I}Δ*κ*, *Φ*_{R}Δ*κ* and (1 − *Θ*)Δ*κ*. There is flux into *Φ*_{S}Δ*κ* at rate *η**Π*_{S}Δ*κ* because this is the rate at which the Δ*κ* leads to edge creation. There is flux out of *Φ*_{S}Δ*κ* at rate *η**Φ*_{S}Δ*κ* because existing edges break at rate *η*, and the probability that such an edge exists is *Φ*_{S}Δ*κ*. Similar fluxes exist for *Φ*_{I} and *Φ*_{R}. The flux out of *Φ*_{I}Δ*κ* into *Φ*_{R}Δ*κ* is *γ**Φ*_{I}Δ*κ* as before, and the flux into (1 − *Θ*)Δ*κ* is *β**Φ*_{I}Δ*κ*. We factor out Δ*κ* and the flow diagrams (figure 21) are defined.

Because the existence of an edge from the test node *u* to the partner *v* has no impact on any other edges *v* might have, the partners *v* has—aside from *u*—are indistinguishable from the partners of another node with the same *κ*, and so they are susceptible with the same probability: *Φ*_{S} = *Π*_{S}. The fluxes into and out of *Φ*_{S} from edge creation–deletion balance, and the *Φ*_{S} to *Φ*_{I} flux is simply . Using this and the other fluxes for *Φ*_{I}, we conclude . As and , we can integrate this and find
So can be written in terms of *Θ* and *Π*_{R}. We arrive at
4.1
4.2and
4.3

This is simpler than the DFD case because the arbitrary smallness of Δ*κ* allowed us to assume that no previous transmission occurred for the Δ*κ*, whereas for a discrete stub, we cannot neglect previous transmission.

#### 4.3.1. ℛ_{0} and final size

We find

The total proportion infected is *R*(∞) = 1 − *Ψ*(*Θ*(∞)), where

Full details are given in the electronic supplementary material.

#### 4.3.2. Example

We choose the same distribution of *κ* as of *k* in the DFD example, NB(4,1/3). We find

We take the same parameters *β* = 5/4, *γ* = 1 and *η* = 1/2. Figure 22 shows that the equations accurately predict the spread. As in the DFD and DC case, we use an average as the comparison point, taking 240 simulations and averaging the 92 resulting in epidemics.

The final size is larger than for the DFD model. Although the average numbers of partners are all the same, the increase in transmission routes when an individual has more partners than expected outweighs the decrease when the number is less than expected. The net effect is to increase the final size.

## 5. Discussion

We have introduced a new approach to study the spread of infectious diseases. This edge-based compartmental modelling approach allows us to simultaneously consider the impacts of partnership duration and social heterogeneity. It is conceptually simple and leads to equations of comparable simplicity to the MA model. It produces a broad family of models that contains several known models as special cases. It further allows us to investigate the effect of many behaviours that have previously been inaccessible to analytical study.

A significant contribution of this work is that it allows us to study the spread of a disease in a population for which some individuals have different propensity to form partnerships while allowing us to explicitly incorporate the impact of partnership duration. The interaction of partnership duration and numbers of overlapping partnerships plays a significant role in the spread of many diseases, and in particular may play an important role in the spread of HIV [45]. These techniques open the door to studying these questions analytically rather than relying on simulation.

The edge-based compartmental modelling approach has a simple, graphical interpretation through flow diagrams. This simplifies model description, and guides generalizations. We propose that this is the correct perspective to study the deterministic dynamics of SIR epidemics on random networks because the derivations are straightforward, we need to track only a small number of compartments and we do not require any closure approximations. Other existing techniques rely on approximations [23] or produce complicated or large systems of equations [24–26]. This approach does not offer immediate insight into stochastic effects where methods such as branching processes are more appropriate. In later studies, we use this approach to derive other generalizations, including different correlations in population structure, different disease dynamics and populations that change structure in response to the spreading disease.

Our approach is limited by the assumption that infection of one partner of *u* can be treated as independent of that of another partner. This is a strong assumption, and prevents us from applying this model to susceptible–infected–susceptible (SIS) diseases for which individuals return to a susceptible state. In this case, the assumption that *u* does not infect its partners can alter the future state of *u*. In the real population, if *u* becomes infected, it can infect partners who then infect *u* when *u* returns to a susceptible state. Thus if partnership duration is nonzero, our predictions may be significantly altered. This limitation is often not recognized but may lead to important failures of mean-field or MA models when applied to a population for which partnership duration is important [46].

Treating partners as independent also means that if *v* and *w* are partners of *u*, we assume no alternate short path between *v* and *w* exists. In particular, we assume that clustering [47,48] is negligible: partners are unlikely to see one another. This assumption applies equally to most existing analytical epidemic models, but it can be eliminated in very special cases using techniques similar to those of earlier studies [49–52].

## Acknowledgements

J.C.M. was supported by the RAPIDD programme of the Science and Technology Directorate, Department of Homeland Security and the Fogarty International Center, National Institutes of Health and the Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard School of Public Health under award number U54GM088558 from the National Institute Of General Medical Sciences. Much of this work was a result of ideas catalysed by the China–Canada Colloquium on Modelling Infectious Diseases held in Xi'an, China in September 2009. E.M.V. was supported by NIH K01 AI091440. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of General Medical Sciences or the National Institutes of Health. We thank S. Bansal, M. Lipsitch, R. Meza, B. Pourbohloul, P. Trapman and J. Wallinga for useful conversations.

## Footnotes

↵1 Unlike the actual degree formulation, we do not need a factor of

*Θ*in these because the smallness of Δ*κ*allows us to assume that there has never been a previous transmission.

- Received June 22, 2011.
- Accepted September 13, 2011.

- This journal is © 2011 The Royal Society