## Abstract

Network epidemiology often assumes that the relationships defining the social network of a population are static. The dynamics of relationships is only taken indirectly into account by assuming that the relevant information to study epidemic spread is encoded in the network obtained, by considering numbers of partners accumulated over periods of time roughly proportional to the infectious period of the disease. On the other hand, models explicitly including social dynamics are often too schematic to provide a reasonable representation of a real population, or so detailed that no general conclusions can be drawn from them. Here, we present a model of social dynamics that is general enough so its parameters can be obtained by fitting data from surveys about sexual behaviour, but that can still be studied analytically, using mean-field techniques. This allows us to obtain some general results about epidemic spreading. We show that using accumulated network data to estimate the static epidemic threshold lead to a significant underestimation of that threshold. We also show that, for a dynamic network, the relative epidemic threshold is an increasing function of the infectious period of the disease, implying that the static value is a lower bound to the real threshold. A practical example is given of how to apply the model to the study of a real population.

## 1. Introduction

Even though the aim of mathematical modelling in epidemiology is to predict the patterns of spread of infectious diseases, the complexity of real populations has always constrained modellers to use strong assumptions. Even though these do not always guarantee the existence of analytical solutions, at least the models become *tractable*. On the other hand, the search for analytical simplicity has sometimes taken over more practical considerations.

One of the strongest assumptions used in most epidemiological models is the Law of mass action [1]. First proposed by chemists, it postulates that in dynamical equilibrium, the rate of a chemical reaction is proportional to the concentrations of the reactants, and can be derived from the probability of collision between reacting molecules. The analogy between the movements of molecules and living beings, drawn almost a century ago [1], leads to the epidemiological version of this postulate: the ‘force of infection’ is proportional to the densities of infected and uninfected individuals (called ‘susceptibles’ in the epidemiological literature). It implies that the population has no structure, i.e. that every person can be in contact with every other (‘random mixing’).

In general, however, members of a population interact only with a very small subset of it. Thus, one way to go beyond the random mixing assumption is to consider that the members of the population form a social network, which is usually modelled as a graph. Its nodes represent individuals, and each edge joins individuals that could pass an infection to each other. The number of connections of a certain node is called the *degree* of the node. The definition of the network depends strongly on the type of interaction necessary to transmit the disease whose spread is being modelled. The advantage of the network over the random mixing approach is that models can be better adapted to specific populations. Needless to say, this implies having more data about the social structure, as well as new concepts and tools to analyse them. Fortunately, these are provided by social network analysis, a field that has developed rapidly in recent years [2]. The mathematics are not as straightforward as in the analysis of mass-action models, but for some cases, interesting results can be obtained by using approximations (some of them derived from statistical physics). One example is the relationship that exists between the properties of the network and the epidemic threshold *λ*_{c}, defined as smallest value of the infectivity *λ* for which a small number of infectives can trigger an epidemic (also defined as the value which makes *R*_{0}, the basic reproductive ratio or the disease larger than 1). For a disease with infectivity *λ* and an infectious period *α*^{−1}, mean-field approximations give [3,4]:
1.1where 〈*k*〉 and 〈*k*^{2}〉 are the first and second moments of the degree distribution of the network. In this equation, as in the rest of the paper, the tilde over a variable indicates that the corresponding quantity has been calculated for a static network, to distinguish it from the analogous quantity calculated for dynamic networks.

Network epidemiology seems particularly well-suited for the analysis of the spread of sexually transmitted diseases (STDs), as the definition of the network in this case is more straightforward (although not free of problems [5]). The large number of surveys of sexual behaviour carried out in the last three decades provides an invaluable resource for modellers. Interestingly, a common feature of many sexual networks built from survey data is that their degree distribution has a very long tail: a small number of individuals report very large numbers of sexual contacts. Mathematically, this means that, even though 〈*k*〉 is rather small (typically less than 3), 〈*k*^{2}〉 can be very large. Applying equation (1.1) to such networks (which, as explained below, is not altogether correct) would lead to the conclusion that, for those populations, even STDs with very low infectivity can trigger an epidemic. It has even been argued that some sexual networks have power law degree distributions with infinite variance [6,7], which would imply a vanishing epidemic threshold, but there is some controversy about this [8].

One aspect that is usually disregarded in the network approach is the dynamic nature of social interactions. It is reasonable to assume that, even though at all times individuals are free to end their existing relationships and create new ones, these dynamics produce a steady state, in which the *instantaneous* distribution of contacts does not change. One possibility to estimate the epidemic threshold of the system is to use equation (1.1) for the instantaneous distribution. Hereafter, we call this estimate, , the *static* threshold. One of the aims of this paper is to clarify the relationship between and the real threshold for a dynamic network, *λ*_{c}.

A problem of this approach is that the instantaneous distribution is usually not available. In general, we only have the information obtained from sexual behaviour surveys. Respondents to these surveys, however, are usually asked about the number of partners over a certain time period *T*, and the distribution thus obtained is often used as a proxy for the instantaneous distribution. The critical thresholds calculated using equation (1.1) for these distributions, hereafter called , can also be used as estimates of *λ*_{c}, or even of . But it is difficult to ascertain how close distributions of accumulated contacts can be to the instantaneous distribution [9]. It is often suggested that if the time period asked about in the survey is similar to the infectivity period of the disease analysed, epidemic thresholds can be estimated by using the proxy network [6,10,11]. In other words, it is argued that for certain values of *T*, can be a good estimate of *λ*_{c}. But, in general, this argument remains at a qualitative level. In the following sections we show, quantitatively, what is the relationship between all these quantities for a family of dynamic models.

Models that take into account the dynamic nature of social networks usually consider the formation and dissolution of links between individuals as stochastic processes [12]. In the context of classical compartmental models used to study the spread of infectious diseases, pair formation was introduced by Dietz [13] and Dietz & Hadeler [14]. These analytical models were extended to structured (i.e. heterogeneous) populations by Castillo-Chavez and co-workers (see [15] and references therein). More complex models have been studied, but only through numerical simulations [16–19]. More recently, pair formation has also been included in network models, in which the number of partners of the individuals is taken explicitly into account [20–23]. In principle, this would provide a framework to use the information gathered in sexual behaviour surveys. However, the additional complication of dealing with network dynamics has led either to models that have analytical solutions but that are too simple to be applied in a realistic setting, or to models that rely exclusively on numerical simulations, from which it is difficult to draw general conclusions.

The model of network dynamics presented in §2 is an attempt to overcome these limitations. We obtain the predictions for the distributions of contacts in a population using a mean-field approximation, in which it is assumed that in the interaction of each individual with its neighbourhood, the influence of each neighbour can be replaced by an average influence, which is treated as a new variable. In this way, the problem becomes analytically tractable because the interactions now depend only on this new variable and the present number of contacts of each person. The ‘social’ dynamics does not depend on the epidemiological parameters of a given disease, but only on the sexual nature of transmission. In §3, we study two synthetic examples of the model that generate the same instantaneous distribution of contacts and we use computer simulations to test the approximations used. Additionally, we provide a practical example using survey data of a real population to build a model of its social dynamics. These models allow us to show that the static epidemic threshold is significantly underestimated by the estimates obtained using accumulated contact distributions (i.e. ). In §4, we add the spread of an STD to the dynamic network, and we find general equations for the true epidemic threshold *λ*_{c} and the number of infected in the steady state using a mean-field approximation. We show that, for the family of dynamic networks analysed and within the approximation used, it is always true that . We apply these results to the populations analysed in §3. In §5, we allow the social interactions to depend also on intrinsic features of the individuals. We use the same real population as in §3 to show an example of how intrinsic features are included in the model. It is also shown that, in general, the range of possible values for *λ*_{c} is significantly smaller than in the case when such features are not included. In §6, some conclusions are drawn and the possible extensions of the model are discussed.

## 2. Model

We consider a population of *N* epidemiologically identical individuals. It has been shown that static models with individuals placed on a bipartite network give identical predictions to models where the population is not divided into two groups [9], so we assume that partnerships can be established between any two individuals. Thus, even though our model applies strictly only to homosexual populations, its predictions should be qualitatively correct for heterosexual populations, with similar epidemiological variables for both sexes. Simulations of heterosexual and homosexual populations with the same parameters support this conjecture (see the electronic supplementary material, appendix).

We assume that partnerships can be established and dissolved with a rate that depends on features of the two individuals. We consider the number of partners as the only dynamic attribute; we first assume that rates depend only on that number. Thus, the rate of partnership creation between individuals *i* and *j* is *ρ*(*k*_{i}, *k*_{j}, *t*) and the rate of partnership dissolution is *σ*(*k*_{i}, *k*_{j}, *t*), where *k*_{i} and *k*_{j} are the number of current sexual partners of *i* and *j* at time *t*. As we only deal with steady states, hereafter the *t* dependence is dropped from all quantities.

In the steady state, the master equation for the degree distribution *P*(*k*) is:
2.1where *ρ*_{k} = 〈*ρ*(*k*,*k*_{l})〉_{l} is the average probability per unit time that an individual with *k* partners gets a new partner and *σ*_{k} = 〈*σ*(*k*,*k*_{l})〉_{l} is the average rate of break-up of his existing relationships (a detailed derivation of this equation is given in the electronic supplementary material, appendix). The interpretation of the terms in the equation is straightforward. The first and second terms correspond to the increase in the fraction of nodes with *k* neighbours given by, respectively, the fraction of nodes with *k* − 1 neighbours that have gained one link, and the fraction of nodes with *k* + 1 for which one link was deleted. The third and the fourth terms correspond to the decrease in the fraction of nodes with *k* neighbours given by the fraction of nodes that have gained, or lost, one link. In principle, the link-creation probability should be averaged only over those individuals that are not current partners of the individual. However, as in real populations, *k* is much smaller than *N*; this quantity is very well approximated by the average over the entire population:
2.2

For *σ*_{k}, the distribution that should be used to calculate the average is *P*(*k*_{l}|*k*), the degree distribution of the individuals connected to an individual having *k* partners. However, if we assume that the dynamics do not generate a significant assortative mixing by degree, *P*(*k*_{l}|*k*) can be written as *P*(*k*_{l}|*k*) = *k*_{l}*P*(*k*_{l})/〈*k*〉. The resulting average link dissolution is:
2.3

Solving equation (2.1) gives the steady-state degree distribution, for *k* > 0:
2.4where *P*(0) is obtained by normalizing the distribution, and the *N* parameters *x*_{i} (*i* = 0, …, *N* − 1) are obtained by solving the *N* self-consistency equations:
2.5

The number of such equations to be solved imposes a practical constraint on the models that can be effectively analysed. One of the simplest ways to reduce the number of equations to only one is to consider separable functions: *N*^{−1}*ρ*(*k*_{i}, *k*_{j}) = *ρ*(*k*_{i})*ρ*(*k*_{j}) (the scaling factor *N*^{−1} ensures that all terms of equation (2.1) are of the same order) and *σ*(*k*_{i}, *k*_{j}) = *σ*(*k*_{i})*σ*(*k*_{j}). This choice has the added advantage of ensuring that there is no assortative mixing by degree [20]. Note that if *ρ*(*k*) is an increasing function of *k*, individuals with many partners are more likely to attract new ones. This is reminiscent of preferential attachment in models of network growth [24] (note however that in our model, the number of nodes is kept fixed and only the links can vary).

If a model is to be used for understanding the spread of a disease in a real population, its parameters should be adjusted by comparing with available data. For simpler models, it has been suggested that this could be done by using an empirical instantaneous distribution [20]. In our model, however, equation (2.5) shows that rescaling the link creation and dissolution functions does not change the equilibrium distribution. This was to be expected, because changing the timescale cannot change the nature of the steady state reached. Thus, timescales should be obtained from other population measurements. An important problem of this approach is that information about instantaneous degree distributions is usually *not* available. Instead, almost all surveys ask respondents about the number of sexual contacts accumulated over a certain time period. Thus, we need the distribution of accumulated contacts of the model (i.e. the probability of having had *k* contacts during a given time period), *P*_{T}(*k*), which can be written as:
2.6where *P*_{T}(*k* − *k*′,*k*′) is the probability of having *k* − *k*′ *new* contacts over a time period of length *T*, conditional on having *k*′ partners at the beginning of that period. The evolution equations for these conditional probabilities are:
2.7for 0 ≤ *m*, *n* ≤ *N* − 1, with *ρ*_{N−1} ≡ 0 and *σ*_{0} ≡ 0. With the aid of some mathematical software, this recursion can be solved exactly, for any desired value of *T* (see the electronic supplementary material, appendix). Using this solution, the 2(*N* − 1) parameters *ρ*_{n} and *σ*_{n} can be adjusted to fit the distributions obtained in any given survey. For real populations, however, it is evident that the number of contacts cannot scale as *N*.

Thus, for obtaining more realistic parameters, one could allow independent values only for the first few *ρ*_{n} and *σ*_{n}. The values of the remaining parameters can either be set to 0, or given by a function of *n* (depending on a few adjustable parameters) that generates a rapid decay of *P*(*k*). An example of this is given in §3.

## 3. Application examples

First we analyse two different models, called A and B, that generate almost the same instantaneous contact distribution. The distribution has been chosen to have a power law tail with exponent 3 to mimic the long tails observed in some sexual and social networks. Even though both models have the same static threshold, they are dynamically very different, giving very different values for the estimates . This shows that in general, it is not advisable to use accumulated contact distributions as proxies for the instantaneous one. For both models, we use separable creation and destruction functions. Model A is defined by *ρ*(*k*) = *C*_{A}*k*^{3}/(*k* + 1)^{2} (for *k* > 0), *ρ*(0) = 1 and *σ*(*k*) = 1, whereas model B is defined by *ρ*(*k*) = *C*_{B}*k*^{3}/(*k* + 1) (for *k* > 0), *ρ*(0) = 1 and *σ*(*k*) = *k*. *C*_{A} and *C*_{B} are numerical constants. The instantaneous distribution is *P*(*k*) = *P*_{0}*D*_{k}*x*^{k}/*k*^{3}, where *D*_{k} = ∏_{i=1}^{k} (1 − *i*/*N*). *x* is obtained by solving the self-consistency equation for each model. The constants *C*_{A} and *C*_{B} are adjusted to obtain a degree distribution that has a mean value of order 1, and a variance large enough to mimic the long tails observed in sexual networks.

Figure 1 shows that the mean-field approach is a very good approximation for the corresponding stochastic model, both for the instantaneous degree distribution as well as for the accumulated ones. It also shows that, even for models with the same instantaneous degree distribution, the distribution of the number of accumulated partners can be rather different. Using equation (1.1), the accumulated partners distributions can be used to calculate the estimates , which are shown in the inset of figure 1. For both models, decays rapidly with *T*, and therefore even for small values of *T*, these estimates can be very different from the static threshold, . To see whether these differences are relevant in a real setting, we have applied this model to data from the National Survey of Sexual Attitudes and Lifestyles II (NATSAL), carried out in Britain in 2000–2001 [25,26]. Participants were asked about the number of male and female partners during several, overlapping, time periods prior to the survey. From these data, one can build, for each time period, the distribution of the number of accumulated partners. We have only used the data related to homosexual men, as our model deals strictly with one-sex populations. However, as information about sexual orientation was not asked from participants of NATSAL, we have defined men who have sex with men (MSM) as those men having reported at least one male partner within the 5 years prior to interview [27]. This leaves 166 out of 4762 male respondents. Because of recall problems, the accuracy of the reports decreases as the time period asked about increases [28]. This is already apparent in substantial heaping present in the data for the 5 years recall period. In our case, this dataset is further skewed because it has been used to define MSM. Thus, we have adjusted our model to fit only the degree distributions for the recall periods of one month, three months and 1 year, obtaining reasonably good fits (see the electronic supplementary material, appendix). Note that, in contrast with what is done in static models, we use one set of parameters to fit the three sets of data simultaneously. Figure 2 shows the resulting cumulative distributions of accumulated partners. As explained in the electronic supplementary material, appendix, the overestimation of the 5 years distribution can be partially explained from the definition of MSM used.

The instantaneous degree distribution shows that a fraction of the population may have many (more than 10) ‘instantaneous’ partners. One may wonder at such high levels of ‘concurrency’. But, it is important to remember that in our model, the relationships involving such individuals are bound to last rather short times (typically a few hours). This can be thought of as modelling very particular events such as visits to a bathhouse. The inset of figure 2 shows the estimates , calculated using the model degree distributions for several time periods *T* (see the electronic supplementary material, appendix). As in the figure 1, decreases with *T*, which implies that and that, as estimates of , these quantities get worse when large time periods are used. In fact, already the one month distribution leads to an underestimation of by about 50 per cent. To understand whether this underestimation is relevant, the spread of a disease should be analysed taking into account the intrinsic dynamics of the network in order to calculate the real epidemic threshold *λ*_{c}. The question is not only how close *λ*_{c} and are, but which one is larger, because if , then there would be a range of *T*, where could be a good estimate of *λ*_{c}, even if it is not a good estimate of . Unfortunately, in §4 it is shown that, at least within the approximations used, this is not the case: the dynamic threshold is always larger than the static one.

## 4. Epidemic spread

We consider the propagation of a disease that can be cured, and that confers no immunity, i.e. individuals can be reinfected as soon as they become susceptible again. This type of models, called SIS, are considered acceptable models of such STDs as gonorrhoea and chlamydia [29].

It is assumed that, in an existing relationship between a susceptible and an infected individual, infection can be transmitted with a rate *λ* (also called infectivity), and that infected individuals recover at a rate *α*. Thus, the average duration of the disease is 1/*α*. We call the quotient *λ*/*α* relative infectivity, which can be interpreted as the probability that the disease is transmitted from an infected to a susceptible individual, provided that the relationship between these individuals is at least as long as the infectious period of the disease.

We also assume that the social dynamics are not affected by the propagation of the disease. We need to calculate *P*_{x}(*k*, *I*; *t*), the probability that at time *t* an agent *x* has *k* simultaneous relationships and is infected. The master equation for this depends on the two-point probabilities *P*_{xy} (*k*_{x}, *S*; *k*_{y}, *I*; *t*), which in turn depend on three-point probabilities, and so on. To get a closed system, we choose the same ansatz as for equation (2.1) (see its derivation in the electronic supplementary material, appendix): *P*_{xy}(*k*_{x}, *S*; *k*_{y}, *I*; *t*) ≈ (*k*_{x}*k*_{y}/*N*〈*k*〉) *P*_{x}(*k*_{x} *S*; *t*)*P*_{y}(*k*_{y} *I*; *t*), which implies assuming the absence of degree correlations. Averaging over all agents with the same number of partners, *k*, the master equation for *P*(*k*, *I*) becomes:
4.1

This equation is analogous to equation (2.1) but with three terms added, related to the possibility of getting infected or recovering. The third term, for instance, gives the increase in the number of infecteds of degree *k* given by the transmission of infection by one of its *k* contacts. *θ* = ∑_{k}(*k*/〈*k*〉*P*(*k*, *I*) represents the probability that a given neighbour is infected. Since *P*(*k*, *I*) depends on *θ*, the definition of *θ* becomes a self-consistency equation that must be solved to find the correct value of *θ*. The ansatz used implies that this quantity is the same for all nodes. Equation (4.1) can be more compactly rewritten as:
4.2where **I** is the *N* × *N* identity matrix, **A** is a tridiagonal matrix defined by (*A*_{θ})_{i,i+1} =−*i**σ*_{i}, (*A*_{θ})_{ii} = (*N* − *i*)*ρ*_{i−1} + (*i* − 1)*σ*_{i−1} + (*i* − 1)*λ**θ* and (*A*_{θ})_{i+1i} =−(*N* − *i*)*ρ*_{i−1} (0 ≤ *i* ≤ *N* − 1) and the vectors *P*_{I} and ** P**′ are given by (

*P*

_{I})

_{j}=

*P*(

*j*,

*I*) and (

*P*′)

_{j}=

*j*

*P*(

*j*) (0 ≤

*j*≤

*N*− 1).

*P*(

*j*) is given by equation (2.4). The self-consistency condition becomes: 4.3

The epidemic threshold is obtained from this equation by taking the limit : 4.4

The fraction of infected individuals is:
4.5where **1** is the vector with all components set to 1. In the limit where the characteristic times of the disease are much shorter than the ones characterizing the social dynamics (i.e. *λ* → ∞ , *α* → ∞, but keeping the relative infectivity *λ*/*α* constant), the usual result for a static network is obtained (equation (1.1)): . Intuitively, one can think that the disease spreads so fast that it ‘sees’ only the instantaneous network. In the case that the spread of the disease is much slower than the network dynamics (i.e. *λ* → 0, *α* → 0, but keeping *λ*/*α* constant), we obtain (see the electronic supplementary material, appendix) *λ*_{c}^{∞} = *α*/〈*k*〉. Thus in this case, the social dynamics is so fast that, in terms of disease spread, the network is equivalent to an ‘average’ network where all nodes have the same degree, 〈*k*〉. Note that *λ*_{c}^{∞}> *λ*_{c}^{0}.

Although for static networks, the relative threshold *λ*_{c}/*α* is a constant, for dynamic networks, the relative threshold does depend on the infectious period *t*_{I} = 1/*α*. Figure 3 shows that the relative epidemic threshold of the NATSAL model is larger for STDs with larger infectious periods. For *t*_{I} of the order of a few months, as is the case of untreated gonorrhoea, chlamydia and syphilis, the difference between the dynamic and the static threshold can be significant. In terms of the non-normalized epidemic threshold, the inset of figure 3 shows that when the dynamics of the network is taken into account, *λ*_{c} decreases more slowly than , as functions of .

Interestingly, it can be proven (see the electronic supplementary material, appendix) that the effect of the dynamics is always the same for *all* possible choices of the link creation and dissolution functions, *ρ*(*k*_{i}, *k*_{j}) and *σ*(*k*_{i}, *k*_{j}): the relative epidemic threshold is an increasing function of *t*_{I}. Even though the mean-field approximation is in general not very good for sparse networks (as should be the case of most instantaneous sexual networks), simulations carried out for the stochastic analogue of the NATSAL model, shown in figure 4, qualitatively confirm the analytical prediction of the increase of *λ*_{c}/*α* with *t*_{I}. Note that the real epidemic threshold is even larger than the mean-field value and therefore the underestimation mentioned is even worse when compared with simulation values.

For large values of the infectivity, figure 4 shows that *n*_{I}, the fraction of infected individuals in the endemic state, grows with *t*_{I}, and its derivative at the epidemic threshold also grows with *t*_{I}. This too is a general feature of this kind of model. Interestingly, for large *λ*, *n*_{I} does not tend to 1:
4.6

Intuitively, the picture is as follows. In a static network (i.e. *t*_{I} → 0), the disease cannot reach isolated individuals. In the dynamic case, however, even momentarily isolated people get a partner after a time 1/〈*ρ*_{0}〉, on average. But there is a probability that isolated, infected people get cured before they get a partner. This ensures that there is always a fraction of isolated individuals who are not infected. The proportion of partners that are infected, *θ*, is also an increasing function of *λ*/*α* but it tends to 1 for large infectivities, for all values of *t*_{I}.

## 5. Including intrinsic features

Our model can be extended in many ways to make it more realistic. One of them is to consider that the ability to obtain partners can depend not only on the present number of partners, which is a dynamical variable, but also on intrinsic features of each individual that do not change over time (or at least over the times relevant for the problem). Many characteristics have been proposed to account for this ability: beauty, talent, particular social behaviours and even geographical location. The downside to this added realism is that such features are not easy to univocally define [30], let alone quantify. It is interesting, however, to see that some general properties can be derived for our model.

We assume that the features *f* take a finite number of values, whose probability mass function is *Π*(*f*). The rates of partnership creation and dissolution depend now on the *f* of each agent: *ρ*(*k*_{i}, *f*_{i}, *k*_{j}, *f*_{j}) and *σ*(*k*_{i}, *f*_{i}, *k*_{j}, *f*_{j}). The population can be divided into subpopulations with a common value of *f*, with a degree distribution *P*(*k*|*f*) given by equation (2.4). One important difference with the model analysed in the previous sections is that the time average of the number of partners is not the same for all individuals, but depends on their features. The interaction between the subpopulations is encoded in the self-consistency parameters *x*_{i}(*f*), calculated from:
5.1

Note we are assuming that the dynamics introduces no degree correlations within each subpopulation (as in §3), but also that there is no correlation between the intrinsic features of a node and its neighbours. It is also possible to obtain the distribution of accumulated contacts. In this case, *P*_{T}(*m*|*n*, *f*) is the probability that an individual with feature *f*, having *n* partners at the beginning of a given time period of duration *T*, has had *m* partners at the end of that period. There is now a set of equations for each *f*, analogous to equations (2.7), which can be solved independently of each other. The degree distribution for the period *T* is *P*_{T} (*m*) = ∑_{f} ∑_{n=0}^{m} *P*_{T} (*m*|*n*, *f*)*P*(*n*)*Π*(*f*).

The analysis of the spread of an infectious disease can be carried out much in the same way as in §4. Using the mean-field approximation, the epidemic threshold is:
5.2where , denotes an average over the distribution *Π*(*f*) and 〈〉 denotes an average over both *Π*(*f*) and *P*(*n*). It is instructive to compare the cases where different distributions of *f* generate the same instantaneous network. As expected, the static limit (*t*_{i} → 0) does not depend on *π*(*f*). But the opposite limit does depend on the features:
5.3where is the average of *k* over the individuals with the same value of *f*. For a non-trivial feature distribution, it can be shown that this value is strictly smaller than *λ*_{c}^{∞} = *α*/〈*k*〉, the limit found in §4. In other words, the effect of the social dynamics on the spread of the disease is less pronounced if the instantaneous network is (at least partly) generated by the features of the individuals.

In STD epidemiology, it is often assumed that there is a small group of individuals, usually called *core group*, whose contribution to the spread of the disease is disproportionately large. Even though there is some ambiguity in the exact characterization of it [31], this label is frequently applied to people with very many sexual contacts [20]. As mentioned above, for the model presented in §3, the time average of the number of partners is the same for all members of the population. This implies that the people who are in the tail of the instantaneous distribution, which can be considered as a core group, cannot be always the same. In this sense, the core group is a dynamic structure. On the other hand, when intrinsic features are included in the model, the time average becomes dependent on them and therefore the composition of the core group becomes more stable (because people with a larger time average of number of contacts will be more likely to be in the tail of the contact distribution at any given time). Our result suggests that, even though dynamic and static models can have core groups having the same number of individuals at any time (because their instantaneous contact distributions are the same), dynamic core groups might be not as effective as static ones in driving an epidemic.

For a practical example of including intrinsic features, we have again used the NATSAL data. Of the data provided by the survey, one that in principle can be unambiguously related to the ability to acquire contacts is the frequency with which the individuals visit gay pubs. Even though there were 10 possible answers, to have statistically significant sets we have accumulated the populations in only three groups: high frequency, medium frequency and low frequency (see the details in the electronic supplementary material, appendix). We have fitted the accumulated contact distribution of each population and thereby obtained the respective values of *ρ*(*k*, *f*) and *σ*(*k*, *f*). As before, we have assumed that the rate of creation and dissolution of contacts between individuals *i* and *j* is simply multiplicative: *ρ*(*k*_{i}, *f*_{i}, *k*_{j}, *f*_{j}) = *N*^{−1} *ρ*(*k*_{i}, *f*_{i})*ρ*(*k*_{j}, *f*_{j}) and *σ*(*k*_{i}, *f*_{i}, *k*_{j}, *f*_{j}) = *σ*(*k*_{i}, *f*_{i})*σ*(*k*_{j}, *f*_{j}). Note that, as in the previous sections, this implies assuming that there is no assortativity by degree or feature. Figure 5 compares the epidemic threshold obtained for this population when the gay pub visitation frequency is taken into account versus what is obtained when this feature is not considered. As predicted, the range of possible values of the epidemic threshold becomes smaller when intrinsic features are included. However, note that there is still a twofold difference between the largest and smallest values of *λ*_{c}. It is interesting to note that up to approximately four months, adding intrinsic features to the model does not change the epidemic threshold.

## 6. Discussion and conclusion

The use of pair formation models to study the spread of STDs is certainly not new. In the context of the mass action approximation, they have been adapted to take into account many different features of real populations, both analytically [14,15] and numerically [16]. More recently, pair formation has also been included in some network models in which the number of contacts of each individual is explicitly taken into account. In general, however, most models do not use the data about sexual networks that can be obtained from sexual behaviour surveys. A notable exception are the dynamic network models used to study the spread of human immunodeficiency virus [21,32]. They consider dynamics where links are neither created nor destroyed; instead, randomly selected couples simply swap their contacts. This preserves the number of contacts of each individual at all times and thus the distribution of contacts is an input parameter of these models. In the model presented in this paper, the dynamics of stochastic and independent creation and destruction of links generates a steady state with a degree distribution that does not change (except for small stochastic fluctuations), even though individuals are free to have any number of contacts. The parameters of the model can be tuned to obtain almost any desired degree distribution in the steady state. It is important to realize that surveys of sexual behaviour only provide information about the distribution of the number of contacts accumulated during given periods of time, and not about the instantaneous distribution of contacts. We have found analytical expressions that give the distribution function of the number of accumulated contacts, which makes possible a direct comparison with the results obtained in the surveys. This could also be useful in models of social network dynamics not specifically related to the spread of diseases [33]. Interestingly, there have been some recent attempts in determining the rates of partnership formation and dissolution [34,35] in sexual networks. Even though these studies still do not have the level of detail necessary to use their results as inputs of the model presented here, they could be useful to test its predictions.

We have found that, because of the interplay between the social and the epidemic dynamics, the relative epidemic threshold, as a function of the average duration of infection, increases monotonically between the two limit cases, *λ*_{c}^{0}/*α* = 〈*k*〉/〈*k*^{2}〉 and *λ*_{c}^{∞}/*α* = 1/〈*k*〉. Thus, approximating the epidemic threshold by the static network threshold entails an underestimation. Also the examples analysed show that, in real cases, this underestimation can be significant for STDs having an infectious period of the order of months (as §5, although to a lesser degree, this also happens when intrinsic features are included). But, even in the case when is a good approximation, the problem that remains is how to estimate its value from sexual behaviour survey data. Usually, is estimated from the network built by considering the distribution of the number of accumulated partners as a degree distribution for each recall period. We have shown that this approximation does improve as shorter periods are considered. Unfortunately, we have also shown that, in real cases, even the values obtained for rather short recall periods (one month) can be much smaller than .

It is often assumed that to study the spread of STDs with short infectious periods, the relevant information is encoded in the distribution of sexual partners for short recall periods, whereas longer periods (of the order of years) are more relevant for STDs with long infectious periods. The results of the previous sections show that this might not be the case, at least for the epidemic threshold. It is true that sometimes this threshold is well-approximated by the static limit, whose estimation necessitates information about sexual partners in time periods as short as possible. But for STDs with long infectious periods, we find that the epidemic threshold obtained with distribution of partners for long recall periods underestimates the static epidemic threshold, which in turn underestimates the real threshold. Therefore, for these diseases, the best would be to build a good social dynamics model by fitting the empirical data for several recall periods, and to calculate its corresponding epidemic threshold.

Perhaps, the two most important limitations of the model presented here concern the assumptions of no assortativity and the unipartiteness of the network. As mentioned above, the predictions of homosexual and heterosexual populations may differ only if the social dynamics and the epidemiological characteristics of both populations are not the same. But taking this into consideration doubles the number of parameters, and it is not clear that analytical results can be obtained as easily. This problem is compounded by the fact that fitting heterosexual partner distributions is far from trivial because the distributions reported by men and women are usually not even compatible (the total number of contacts reported by men is substantially higher than the total reported by women). Although many explanations have been proposed for this, there is still no consensus on this matter. Thus, many different models should be tested, using several criteria to make the distributions compatible.

Regarding assortativity, our model is only strictly valid for populations with no degree correlations. While this may be true in some cases [36], it must be acknowledged that it is not in many others. One obvious example are populations with prostitutes, where a significant negative assortativity is to be expected. In those cases, the model presented here, combined with computer simulations, could be used as a null model to study the influence of degree correlations. There is also room for improvement in the approximation we have used. One possibility is to go one step further from the mean-field theory and to include the different moment closures that have been successfully applied in simpler models [37]. It is not clear, however, whether in our case such modifications would lead to analytical solutions or approximations, which is one of the main advantages of the model presented here.

While this manuscript was under review, we learned of a work published very recently that deals with dynamic network models whose parameters are adjusted by comparing the resulting networks (obtained with computational simulations) with data taken from NATSAL 2000 [38].

## Acknowledgements

I wish to thank M. N. Kuperman and D. H. Zanette for a critical reading of the manuscript and useful suggestions.

- Received July 7, 2011.
- Accepted October 31, 2011.

- This journal is © 2011 The Royal Society