Contact patterns in group-structured populations determine the course of infectious disease outbreaks. Network-based models have revealed important connections between group-level contact patterns and the dynamics of epidemics, but these models typically ignore heterogeneities in within-group composition. Here, we analyse a flexible mathematical model of disease transmission in a hierarchically structured wildlife population, and find that increased variation in group size reduces the epidemic threshold, making social animal populations susceptible to a broader range of pathogens. Variation in group size also increases the likelihood of an epidemic for mildly transmissible diseases, but can reduce the likelihood and expected size of an epidemic for highly transmissible diseases. Further, we introduce the concept of epidemiological effective group size, which we define to be the group size of a hypothetical population containing groups of identical size that has the same epidemic threshold as an observed population. Using data from the Serengeti Lion Project, we find that pride-living Serengeti lions are epidemiologically comparable to a homogeneous population with up to 20 per cent larger prides.
The structure of animal and human societies is often hierarchical. In their simplest form, societies have two hierarchical levels: individuals form groups, and groups form populations. In some species, such as humans, hamadryas baboons (Papio hamadryas), gelada (Theropithecus gelada) or zebras (Equus burchelli), populations comprise additional hierarchical levels referred to as clans, bands, herds or troops [1–4]. In such multi-level societies, individuals interact preferentially with other members of their own social clade, and may only rarely come into contact with individuals belonging to distant clades. Theoretical work has shown that variable contact rates between individuals or groups of individuals alter the spread of infectious diseases and other transmissible elements, such as information and cultural traits [5–7].
Network models have been used extensively to explore the impact of wildlife contact patterns on disease transmission [8,9]. With this approach, one can represent animal societies by networks in which individuals or groups of individuals constitute nodes (or vertices) that are connected by edges representing their social interactions. The number of connections per node is called the degree, and the distribution of degrees across all nodes in a network is called the degree distribution . In network epidemiology, nodes are commonly characterized by a discrete state variable taking on the values: susceptible (S), infected (I) or removed (R) . Nodes transition from one state to another depending on their own state and the states of neighbouring nodes. Network models have provided important insights into the epidemiological effects of contact heterogeneity, but are often limited to a single layer of connectivity. Hierarchically structured populations are often modelled using networks in which nodes represent entire groups [12–16]. Such models ignore both within-group connectivity and variation in group size, both of which can influence disease transmission.
Here, we introduce a network-based model of infectious disease transmission in a two-level hierarchically structured population, and use it to investigate the impact of group size distributions on disease dynamics. We focus specifically on the mean and the variance of group sizes. If the transmission probability between two connected groups is assumed to be positively correlated with the sizes of the groups, then larger mean group sizes are expected to facilitate disease transmission. However, the variance of group size works in subtler ways. We use bond percolation methods [7,17,18] to derive expressions for (i) epidemic thresholds (above which the disease has the potential to infect a large number of individuals); (ii) expected size and variance of small outbreaks; and (iii) the probability and expected size of epidemics in two-level hierarchical networks. We show that when the mean group size is held constant, the variance of group size strongly influences transmission dynamics. Whether it facilitates or hinders disease spread depends on the transmission rate of the disease.
Using our model, we then study the impact of pride size distributions on disease transmission in a well-studied lion population (Panthera leo) in the Serengeti National Park, Tanzania . This population has been affected by multiple infectious disease outbreaks [20,21], some of which have caused substantial mortality and could pose conservation threats, including canine distemper virus and feline immunodeficiency virus [22–25]. The mean pride size is approximately 10 individuals, whereas the variance in pride size is six times that. We show that such high pride size variance has considerable epidemiological consequences, and should be explicitly considered in epidemiological and conservation assessments of wildlife populations.
2. Theoretical model
Consider an infectious disease spreading through an undirected network in which nodes represent social groups of different sizes and edges represent contacts between groups (figure 1a). Disease can be transmitted between groups when infected individuals from one group come into contact with susceptible individuals from another group. In the following model, we assume that the probability of such an event depends on the sizes of the two interacting groups and that, once the disease reaches a group, all members of the group eventually become infected (the latter assumption can be relaxed, see §4).
In this section, we present and provide mathematical motivation for several key epidemiological quantities: (i) the epidemic threshold, which is a value that indicates how contagious a disease must be to cause an epidemic in a particular population, (ii) for diseases below the epidemic threshold, estimates of the mean and variance in outbreak sizes, (iii) for diseases above the threshold, the probability and expected size of an epidemic, and (iv) the epidemiological effective group size, which is a metric that indicates the epidemiological vulnerability of a population via comparison with hypothetical populations with equal-sized groups.
2.1. Epidemic threshold
We will present epidemiological quantities for group-to-group contact networks that rely on two probability distributions: S0(n), the probability that a randomly chosen group has size n, and D0(k), the probability that a randomly chosen group has degree k (where the degree of a node is the number of edges connecting it to other nodes). We assume that group size and degree are independent variables, that is, the probability that a random group has size n and degree k is simply the product S0(n) · D0(k).
Suppose that an entirely infected group of size n1 is connected to an entirely susceptible group of size n2. The probability T that the disease is transmitted between an infected individual from the first group and a susceptible individual from the second group is given by 2.1where β is the rate of disease-causing contacts between individuals in connected groups and τ is the duration of infectiousness. Each parameter is assumed to be identical for all individuals. The probability that at least one individual from the second group becomes infected is then 2.2Hereafter, we will assume that βτn1n2 is small compared with 1, so that θ(n1, n2) can be approximated by its first-order Taylor series expansion: 2.3As described below, this approximation allows the derivation of closed-form expressions of important epidemiological quantities, and is within 10 per cent of the exact value of θ(n1, n2) as long as θ(n1, n2) < 0.19, and within 20 per cent if θ(n1, n2) < 0.37.
Consider an outbreak that originates with a single infected group of size n1. This group is expected to transmit the disease along each of its outgoing edges with average probability given by 2.4where is the average group size in the population. Assuming the group has degree k, the number of outgoing edges m along which the disease is expected to be transmitted then follows a binomial distribution with parameters k and θ(n1): 2.5We will refer to these m edges as ‘infective edges’.
Hereafter, we will make use of probability-generating functions (PGFs), which have been used extensively to study epidemiological properties of networks [7,17,18,26]. PGFs are functions of the form 2.6where pk is the probability of a random variable taking discrete value k. They have interesting mathematical properties [10,17]. For example, G(1) is the sum of all pk's, and hence is equal to 1, and the first derivative of the PGF, 2.7is equal to the mean of the distribution when x = 1, and can be combined with the second derivative to obtain the variance of the distribution: G″(1) + G’(1) − (G’(1))2.
Suppose the first infected group in an outbreak is chosen randomly from an entirely susceptible population. The PGF of the number of other groups infected by that initial group is given by 2.8Making use of the relation 2.9we obtain 2.10Let us now determine the PGF of the distribution of the number of infective edges leaving a group reached by following a random infective edge, denoted G1(x). Intuitively, groups that are highly connected and/or large are more likely to become infected than their less connected and/or smaller counterparts. The probability of reaching a group of degree k by following a random infective edge is proportional to k · D0(k). Similarly, by approximation (2.3), the probability of reaching a group of size n by following a random infective edge is proportional to n · S0(n). G1(x) is therefore proportional to 2.11G1(x) is obtained by normalizing g1(x), which makes the probabilities sum to one: 2.12G1(x) generates the distribution of the number of infective edges leaving a group reached by following a random infected edge. During the course of an outbreak, an infected group will infect on average susceptible groups. If then an outbreak is expected to continue increasing in size, and may grow into an epidemic that is expected to scale with the size of the network (described in detail below). The epidemic threshold Tc, beyond which epidemics are possible, can be determined by solving 2.13The second moment of the group size distribution, , is a function of the mean and variance of the group size distribution: In agreement with equation (2.13), numerous studies have already shown that larger mean group sizes typically promote the spread of infectious disease outbreaks [27–32]. Here, we focus instead on the role of the variance of the group size distribution
As the mean and variance of the group size distribution increase, the epidemic threshold decreases. In other words, the epidemiological vulnerability of the population increases. This stems from the assumption that between-group transmission is proportional to the sizes of interacting groups, making larger groups more vulnerable to infection. Figure 2 depicts the approximate probability distribution of the size of newly infected groups after a few transmission events, as given by 2.14with mean 2.15This equation tells us that the expected size of infected groups early in an outbreak is higher than the average group size, and that it increases with the variance in group size. Because the between-group transmission rate is proportional to the size of the infected group, then the early epidemiological vulnerability of a population also increases with variance in group size. Thus, variance in group size enhances transmission via the presence of large groups (albeit relatively rare) that have the unfortunate combination of high probability of catching disease from other groups and high probability of spreading disease to other groups. Note that S1(n) does not give the probability distribution for groups infected throughout the entire epidemic, but applies to the early stages of transmission. The sizes of groups infected very early and later in the epidemic tend to be lower than those given by S1(n).
2.2. Mean and variance of outbreak sizes below the epidemic threshold
Functions G0(x) and G1(x) can be used to determine the distribution of the total number of nodes that become infected in an outbreak starting from a single infected group . Newman et al.  and Newman  demonstrated that this distribution is generated by the PGF H0(x) defined by the following system of equations: 2.16
The self-consistent form of the second equation makes closed-form expressions of H0(x) and H1(x) difficult to obtain. However, the mean outbreak size can be written as a simple function of the mean and mean-squared degree ( and ), mean and mean-squared group size ( and ), and transmission rate T (see the electronic supplementary material, S1.1): 2.17
The second moment of the group size distribution, , is equal to , where is the variance of the group size distribution. The derivatives of with respect to and are both positive real numbers. The mean outbreak size is hence an increasing function of both the mean and variance of group sizes. The variance of outbreak sizes, , can also be derived from equation (2.16): 2.18 From this expression, we find that the variance in outbreak size, which indicates the unpredictability of outbreaks, is also an increasing function of both the mean and variance of group sizes.
2.3. Epidemic size and probability of an epidemic
The expression for in equation (2.17) diverges when and thus is valid only when , i.e. when T > Tc (equation (2.13)). Above this epidemic threshold, outbreaks can either have a small, finite size or become epidemic, that is, have a large expected size that scales with the size of the network. By definition, H0(x) is the PGF of the size of finite outbreaks. Thus, H0(1) is the expected fraction of the network that, if infected, would yield small, finite-sized outbreaks. Following Newman , the expected size of an epidemic is thus given by 2.19where H1(1), the probability that the node at the end of a random edge does not get infected during an epidemic, can be obtained by solving 2.20Note that S is both the proportion of the network affected by the epidemic and the probability that an epidemic occurs (as opposed to a small outbreak), under the assumption that the first infected group is randomly determined .
In the epidemic regime, the mean size of small outbreaks that do not reach epidemic proportions is given by (see the electronic supplementary material, S1.2) 2.21Note that in the non-epidemic regime, S = 0 and H1(1) = 1, which makes equations (2.21) and (2.17) equivalent.
2.4. Epidemiological effective group size
To further quantify the epidemiological significance of group size variance, we compare populations with variable group sizes to hypothetical populations with constant group sizes. The equations above tell us that all else being equal (same transmission rate, same degree distribution, same mean group size), a population with heterogeneous group sizes will be more vulnerable to epidemics (i.e. have a lower epidemic threshold) than a population with homogeneous group sizes. As the concept of epidemic threshold is specific to the field of epidemiology, the comparison of two epidemic thresholds may only be meaningful for a specialized audience. Instead, let us introduce a more meaningful quantity. For a given network with variable-sized groups, we ask which homogeneous network (with identical-sized groups) has the exact same epidemic threshold, when the mean group size is allowed to change, but not the degree distribution. In other words, if we reassigned all individuals in the original heterogeneous network to equal-sized groups (leaving the number of groups and degree distribution unchanged), then the population would become less vulnerable to epidemics. By how much would we then have to increase the size of those groups to reduce the epidemic threshold back to its original value? This unit for this quantity is a ‘number of individuals per group’, which makes it easily understandable by non-epidemiologists.
Because all groups in the hypothetical equivalent population have the same size, ne, then the second moment of the group size distribution is simply Using equation (2.13), we see that the group size in this hypothetical equivalent population, ne, must satisfy 2.22Because the degree distributions are identical, the mean size of the groups in the homogeneous network is given by 2.23
This quantity is conceptually analogous to the ‘effective population size’ used in population genetics  and defined as the size of the ideal population that would undergo the same amount of random genetic drift as the actual population . Hence, we call this quantity the epidemiological effective group size (EEGS). Notably, it depends only on the original group size distribution, and not on either the degree distribution or the transmission rate of the disease.
We provide two examples of application of this model. The first example is theoretical. We present exact solutions for infinite networks where group sizes follow a negative-binomial distribution and edges are distributed randomly between nodes. Erdös & Rényi  have shown that such networks have Poisson degree distributions (figure 1a). We then extend these results to networks with power-law (figure 1b) and exponential degree distributions. As a second example, we use simulations to investigate the impacts of group size distribution for an actual biological system, pride-living Serengeti lions [14,16].
3.1. Negative binomial group size distribution
Consider a network composed of an infinite number of social groups, where group sizes follow a discrete distribution S0(n) in which mean and variance can be tuned independently. As an example, we will investigate the negative binomial distribution shifted one unit to the right to avoid groups of size zero: 3.1This probability distribution has mean given by 3.2To help with interpretation, we will henceforth replace parameter x with and parameter p with 1/c. This gives the variance of the distribution a simpler expression: 3.3Parameter is thus equal to and measures the overdispersion of the distribution. When c approaches 1, S0(n) tends towards 3.4which is a Poisson distribution shifted one unit to the right. Assuming the network as a Poisson degree distribution implies that , which gives 3.5The epidemic threshold can be derived by solving equation (2.13): 3.6Tc is a decreasing function of , and c. The epidemic threshold is more easily reached when the mean degree, the mean group size and the coefficient of dispersion of group sizes are high (see figure 3a and electronic supplementary material, figure S2.1a).
The EEGS (equation (2.23)) for such degree distribution is given by 3.7and the difference between ne and the actual mean group size , , is an increasing function of , which asymptotes to the constant value c/2. Therefore, when the mean group size is large, ne varies between n + 1/2 (in the case of a Poisson group size distribution) and +∞ (in the case of highly overdispersed group size distribution (i.e. c → + ∞)).
In the case of more dispersed degree distributions, the results are slightly different. Following several authors [17,33,34,38,39], we investigated the following degree distribution: 3.8which is a power-law distribution with an exponential cut-off. Constant C, ensuring ∑kD0(k) = 1, is equal to Liα(e−1/κ), where Liα is the polylogarithm function, defined as 3.9When α = 0, D0(k) becomes an exponential with scale parameter κ; and when κ → 0, it approaches a pure power-law distribution of exponent α. According to equation (2.13), the epidemic threshold is proportional to 3.10which is an increasing function of α and a decreasing function of κ (see the electronic supplementary material, figure S2.1b). This result is identical to that obtained with networks composed of simple nodes .
Quantity S (the epidemic size or, equivalently, the probability of an epidemic) can be computed numerically for different values of the coefficient of dispersion of the group size distribution using equations (2.19) and (2.20). The effect of c depends on the contagiousness of the disease, as shown for a set of networks with Poisson degree distributions (figure 3a). When the contagiousness is low, variation in group size increases the mean epidemic size. Just as group size variability increases epidemiological vulnerability early in an outbreak via enhanced infection to and transmission from large groups, it also leads to increases in sizes of infected groups and resulting probabilities of transmission throughout epidemics of mildly transmissible diseases. However, when the disease is highly contagious, increasing the variance of the group size distribution leads to smaller epidemics. The crossing of the lines in figure 3 may seem surprising, as it shows that networks with the lowest epidemic thresholds (highest vulnerability) and large expected epidemic sizes for mildly contagious diseases are expected to experience the smallest epidemics for highly contagious diseases, and vice versa. For networks with heterogeneous group sizes, the flip side of having some large groups that increase the epidemiological vulnerability of the population as a whole is also having small groups that are unlikely to become infected. Recall that the transmission probability, θ(n1, n2), is proportional to the product of the sizes of the two connected groups. Small groups thus are relatively protected from infection, and may result in stochastic termination of transmission chains.
Interestingly, the quantity H1(1) computed using equation (2.20) can also be used to derive the probability of a group of size n and degree k remaining uninfected during an epidemic: 3.11where 1 − θ(n) is the probability of an edge remaining uninfected, and θ(n) · H1(1) is the probability of an infective edge that is not connected to the part of the network affected by the epidemic. Using Bayes theorem, we can then calculate the group size and degree distributions of uninfected and infected groups during an epidemic: 3.12As expected, groups that become infected during an epidemic have on average a larger size and a larger degree than groups that remain uninfected (see the electronic supplementary material, figure S2.2).
3.2. Disease spread in Serengeti lions
Craft et al. [14,16] have described the structure and properties of the contact patterns among approximately 180 lion prides in Serengeti National Park, Tanzania, using detailed demographic and behavioural data collected since 1965 on a subset of the population . They developed an algorithm to simulate stochastic networks with statistical properties similar to the real pride network . They also showed that pride size and degree did not correlate, as assumed by our model . We used this algorithm to generate hypothetical pride networks (figure 1c).
The distribution of pride sizes in these networks was assumed to be the shifted negative binomial distribution defined in equation (3.1). Maximum-likelihood estimates of parameters and c were calculated from composition data on 21 prides recorded in 1992. We estimated a mean pride size of 10.3 individuals (females and cubs older than three months), and a coefficient of dispersion of 6.3. A likelihood-ratio test indicated that the negative binomial fitted the data significantly better than a Poisson distribution (see the electronic supplementary material, figure S2.3). This test was highly significant (χ2 = 26.2, d.f. = 1, p = 3 × 10−7).
We performed chain-binomial simulations of disease outbreaks through the hypothetical Serengeti lion populations, keeping mean pride size constant and varying the value of c between 1 and 20 . The lion pride network is relatively small, making it difficult to distinguish between small, non-epidemic outbreaks and full epidemics. Thus, rather than estimating epidemic thresholds and epidemic sizes from the simulations, we instead computed final outbreak sizes, defined as the proportion of prides that became infected. The results, displayed in figure 3b, are qualitatively comparable to the analytical results obtained with a Poisson network of infinite size (figure 3a). When the contagiousness is low (i.e. when outbreaks affect on average less than approx. 20–30% of the prides), group size variability promotes larger outbreaks. When the contagiousness increases, however, the trend reverses and increased group size variability leads to smaller outbreaks. Thus, the epidemiological impact of group size variability can be positive or negative, depending on the infectious properties of the spreading disease.
Because the Serengeti lion population forms a small, finite-sized network, we cannot estimate its epidemic threshold and the EEGS, as defined above. However, an analogous quantity can be calculated using simulations. For 30 different transmissibility values T (corresponding to 30 different initial transmission rates ranging from zero to three), we simulated disease outbreaks in hypothetical networks with homogeneous pride sizes, n, ranging from five to 20 individuals per pride. For each of the 480 combinations of T and n, we simulated 100 outbreaks and calculated the mean outbreak size. For each , we then determined the homogeneous pride size that yielded the closest mean outbreak size to the empirical population (figure 4). For diseases expected to have total attack rates of less than 23 per cent of all prides, the observed lion population was found epidemiologically equivalent to populations in which all prides include 11 or 12 individuals (figure 4), that is, populations with 10–20% more individuals than the empirical population.
Our analyses demonstrate the epidemiological importance of variation in group size in social species. As the variance of the group size distribution increases, the model demonstrates that the epidemic threshold decreases and that both the mean and variance of small outbreak sizes should increase. Above the epidemic threshold, that is, for diseases capable of causing large epidemics, the effect of group size variability depends on the transmissibility of the disease. For mildly contagious diseases, more variable group sizes promote larger epidemics, whereas for more highly contagious diseases, the effect reverses and group size variability inhibits epidemics.
These findings have important implications for disease-control strategies, including vaccination, quarantine and culling, which often target social groups that are likely to decrease the epidemic threshold. Prior studies have highlighted the importance of targeting groups with high numbers of interacting neighbours or who occupy a central location in the network . Our study suggests that factoring the group size distribution into epidemiological assessments can improve both understanding of disease dynamics and efforts to prevent and mitigate outbreaks. For example, vaccination efforts should perhaps be targeted to reduce the variation in the number of susceptible individuals per group (perhaps by vaccinating a fraction of individuals in large groups) rather than simply reducing the total number of susceptible individuals population-wide or immunizing all individuals in a few groups. For conservation biologists, this work also suggests that assessments of disease risk in endangered animal populations should carefully consider variation in both contact patterns and group sizes.
The effect of the hierarchical structure of a population on the transmission of infectious diseases can similarly apply to the transmission of information. In human populations, the transmission of rumours and cultural traits between families or communities is thus probably influenced by the variance in social group size. One can also predict that the transmission of computer viruses between small networks of interconnected computers (e.g. intranets) is more likely to become epidemic when the size of subnetworks is more variable.
We have introduced EEGS as a simple, intuitive metric for quantifying the epidemiological impact of group size variation and the epidemiological vulnerability of a population. As long as the network structure of the population is sufficiently random (lacks clustering or other local structure), EEGS can be estimated using only the mean and variance of the group size distribution. EEGS is independent of degree distribution and indicates which homogeneous network with same-sized groups has comparable epidemiological properties. The original network and the EEGS homogeneous network share the same epidemic threshold and thus are vulnerable to the same suite of pathogens. However, the two networks will not necessarily experience outbreaks of similar magnitude. Analogous metrics can also be computed by equating the mean outbreak size (as illustrated here with the Serengeti lion network) or the probability of an epidemic instead of the epidemic threshold. However, such metrics can generally only be computed numerically or using Monte Carlo simulations.
The model makes several simplifying assumptions. First, it assumes that between-group transmission is an increasing function of the size of the groups. This happens, for example, when all the individuals of two interacting groups come into contact, or when movements of potentially infected individuals between connected social groups are proportional to the size of the groups (as in the gravity model [42,43]). This may also be the case when disease vectors such as mosquitoes are more likely to detect large social groups than small social groups (e.g. malaria in primates [44,45]). It also assumes that when disease reaches a group, all individuals in the group become infected. This may not hold when the within-group disease transmission rate or the within-group connectivity is low. The model can be modified to handle such scenarios by replacing the group size distribution with the distribution of within-group outbreak sizes. This distribution can, for example, be inferred using stochastic SIR compartmental models for highly intra-connected groups  or network-based models for sparser groups. Within-group network structures may be sensitive to stochastic effects and yield highly variable within-group outbreak sizes, which will, in turn, affect the epidemiological vulnerability of the larger population. In these cases, the dynamics of infectious diseases will depend on the two-level within- and between-group contact network.
Second, the model assumes that group sizes are randomly distributed across the landscape. In reality, group sizes may depend on the quality of locally available resources, with large groups occurring around resource-rich areas and small groups found in more marginal areas . If group sizes are spatially clustered, then the epidemiological impact of the group size variability may be different, depending on the size and spatial distribution of these clusters.
Finally, we assume that group size and degree are not correlated. If there is a positive correlation between the two, that is, larger groups have more inter-group contacts, then we expect the effects of group size variability to be amplified. The epidemic threshold would then be lower and epidemic size for high transmission rates would be smaller.
In conclusion, group size variability strongly impacts disease transmission in hierarchical populations. This pertains not only to group-living wild animals such as the Serengeti lions, but more generally to structured populations, including patch-structured wild plants, cultivated crops, herd-structured domestic livestock and community-structured humans.
We thank Rosalind Eggo and Alexandre Courtiol for useful discussions and suggestions. We also thank three anonymous reviewers for their useful comments. This work was supported by NSF grant (no. OISE-0804186) to M.E.C. and NSF grant (no. DEB-0749097) to L.A.M.
- Received March 4, 2013.
- Accepted March 19, 2013.
- © 2013 The Author(s) Published by the Royal Society. All rights reserved.