## Abstract

A network structure metric is herein suggested for the investigation of the behaviour of epidemic spreading processes in general network-structured populations. This simple measure, based on the algebraic powers of the adjacency matrix associated with the network in question, is shown to admit a heuristic interpretation as a representation of a spreading process similar to standard epidemic models. It is further shown that the values of this metric may be of use in understanding the dynamic pattern of epidemic spread on networks of greatly varying structural properties (e.g. the degree distribution, the assortativity/dissortativity and the clustering).

## 1. Introduction

The last decade has seen an explosion of interest in the properties of ‘complex networks’ (Albert & Barabasi 2002; Boccaletti 2006; Newman *et al*. 2006). The origin of the new focus may be traced essentially to two papers presenting the canonical examples of complex networks: first, the small-world (SW) networks of Watts & Strogatz (1998), and second, the scale-free network model of Barabasi & Albert (1999). These networks are considered ‘complex’ particularly in comparison with the Erdos–Renyi (ER) random graphs (Bollobas 2001), which have a somewhat longer history in the literature.

Network structure has been identified as an important consideration in different aspects of many areas of scientific investigation, including biochemistry and molecular biology (Jeong 2000; Giot 2003; Barabasi & Oltvai 2004), the Internet (Faloutsos *et al*. 1999; Vazquez *et al*. 2002; Eriksen 2003) and ecology (Sole & Montoya 2001; Dunne *et al*. 2002; Montoya & Sole 2002; Sole & Bascompte 2006). Theoretical studies have also highlighted the impact of network structural features on complex dynamical systems, in terms of dynamical synchronization (Nishikawa 2003; Chavez 2005; Arenas *et al*. 2006; Boccaletti 2006) and in the integration of multiple multistable subswitches (Wylie 2007).

This paper is concerned primarily with *epidemic spreading processes* on networks (Sattenspiel & Simon 1988; Hess 1996; Ball *et al*. 1997; Watts & Strogatz 1998; Pastor-Satorras & Vespignani 2001*a*; Moreno *et al*. 2002; Newman 2002; Gross *et al*. 2006). The mathematical modelling of the epidemiology of infectious diseases is an old and established subject (Diekmann & Heesterbeek 2000) that has until recently focused on ‘macroscopic’ differential equation models making some type of mean field/homogeneous mixing assumption (Hethcote 2000; Getz & Lloyd-Smith 2006). The flexibility of this approach has been increased through the use of multicompartment models that allow a population to be divided into subgroups interacting in very distinct ways (Getz 2006).

Nevertheless, the explicit incorporation of an underlying ‘microscopic’ network structure into epidemic models has yielded some very interesting connections and results. Grassberger (1983) compared epidemic spreading on a lattice network with percolation processes of interest to statistical physicists. More recently, Pastor-Satorras & Vespignani (2001*a*) showed that the standard picture of a minimum infectivity threshold required for epidemic outbreak does not hold for the models of infectious disease spreading on scale-free network topologies (this result has been recently extended to scale-free networks with clustering as well; Serrano & Boguna 2006).

The strategy for many theoretical network epidemiology papers is to suggest a specific (set of) algorithm(s) (or dynamics) for constructing the underlying networks, and then, either through simulation or analysis, to see how the process plays out on the resulting graphs (Andersson 1998; Watts & Strogatz 1998; Pastor-Satorras & Vespignani 2001*b*; Gross *et al*. 2006). Various structural metrics may be employed to offer insights into how varying some part of the network construction algorithm leads to a particular change in the pattern of spread of disease, but this is often not done in any systematic manner. That is, it is frequently possible to imagine other networks that yield the same values for the particular metric(s) employed, but which will yield dramatically different ‘spreading dynamics’. (Consider, for example, two networks with the same degree distribution but very different patterns of clustering.)

Alternatively, some papers develop an analytical framework for dealing with certain restricted classes of networks (generally those described either by the degree distribution only (Moreno *et al*. 2002; Newman 2002) or by the degree distribution and the degree–degree correlations, but no other features (Moreno *et al*. 2003)) without the need to specify exactly how such networks might be constructed. Still, the results connecting the input values of the structural parameters to the output measures of epidemic spreading on such networks are in these cases technically restricted to those special networks whose structures may be described entirely by the specific metrics considered (i.e. the degree distribution and the degree–degree correlations).

By contrast, we propose and explore a structure metric that is more directly relevant to spreading processes on general networks than the most commonly available alternatives. Also, it satisfies other useful properties, such as being computationally cheap to evaluate and admitting an intuitive interpretation.

## 2. Methods

### 2.1 Requirements for proposed metric

In order to give more precise meaning to our claim of defining a structure metric directly relevant to the behaviour of epidemic spreading processes on networks, it is necessary to consider what specific requirements must be imposed on such a hypothetical metric.

First, the metric should mimic the time course of spreading processes by consisting of a sequence of increasingly global values, loosely interpretable as the number of nodes (individuals) infected at successive time steps. In a finite network, this sequence should be subject to saturation at what is expected to be the upper limit to the size of the epidemic. Second, a versatile metric should be able to deal with complexities beyond the binary states of susceptible (S) or infected (I), for example, by including recovered/resistant states (R) that may or may not revert to susceptibility again (i.e. it would be useful if the measure could discriminate among the so-called SI, SIS, SIR and SIRS processes). Here, as a start, we focus our attention on a two-state model of the SIS type, but hopefully pointing the way to the development of measures useful for more complex problems.

Even restricting to SIS processes, one must still consider the continuum of the possible parameter values characterizing different SIS models. We confine our attention to two parameters: the transmission probability *p*_{I} that a susceptible node is infected by an infected neighbour in any one time step, and the recovery probability per time step *p*_{S} that an infected node recovers and reverts to susceptibility. Thus, we seek a metric that incorporates these two parameters, *p*_{I} and *p*_{S}, in an appropriate manner.

Finally, of course, the most important requirement must be the predictive power of the proposed metric when applied to epidemic spreading processes. For this paper, the processes to be predicted will be discrete time SIS simulations, as described in §2.6.

What sort of metric might satisfy these criteria? Given that the spreading phenomena must propagate along the paths present in the network, one might consider quantities derived from the numbers of distinct paths connecting all possible pairs of nodes. Unfortunately, these quantities are not particularly easy to calculate. However, a slight generalization from paths to walks—which differ from paths only in that they allow repetition of the same node at different steps—leads to significant mathematical simplification, as discussed in §2 in the electronic supplementary material.

### 2.2 Wellness and unwellness metrics

We hypothesize that the *k*-walk counts (*A*^{k})_{ij} from node *i* to *j* (for more details see §2 in the electronic supplementary material) provide useful information regarding the likelihood of transmission of infection from individual *i* to *j* over a *k*- and *p*_{S}-dependent time period. Here, (*A*^{k})_{ij} signifies the (*ij*)th entry of the adjacency matrix *A* raised to the *k*th power (i.e. matrix multiplied by itself *k* times). As discussed in §2 in the electronic supplementary material, (*A*^{k})_{ij} is also equal to the number of distinct walks of exactly *k* steps starting at node *i* and ending at node *j*: hence, the description ‘*k*-walk counts’.

Noting that *p*_{S}^{−1} defines the mean time for an infected individual to remain infectious before returning to susceptible status, it is natural to relate (*A*^{k})_{ij} to the level of disease transmission over *k* such ‘generation times’ *p*_{S}^{−1}. Thus, we seek an estimate for the probability *p*_{i}(*j*;*k*) of individual *j* being infected at time step *T*_{k}(*p*_{S})=ROUND[*k*^{*}*p*_{S}^{−1}] (ROUND[*x*] denotes rounding *x* to the nearest integer) of the form (the motivation for the introduction of *f*=(1−*g*) will be made clear below)(2.1)Obviously, this requires that *f*((*A*^{k})_{ij})∈[0,1]. We seek also to impose the condition that the *f*((*A*^{k})_{ij}) satisfy(2.2)i.e. the predicted infection status of *j* at time (*k*+1) depends only on the predicted infection statuses of those individuals *q* neighbouring *j* at the previous time *k*. This condition amounts to demanding that the quantities *f*((*A*^{k})_{ij}) form a localized dynamical system with discrete time index *k*.

Note that the interval [0,1] is closed with regard to binary multiplication (i.e. when both the numbers to be multiplied are taken from [0,1]) and to being raised to any positive power (but not with regard to addition). Thus, an obvious manner in which to simultaneously satisfy the saturation condition *f*((*A*^{k})_{ij})∈[0,1] and the localized dynamics condition described by equation (2.2) would be to impose the stronger condition(2.3)where *c*(*p*_{I},*p*_{S}) is a constant (i.e. independent of (*A*^{k})_{ij}) incorporating information regarding the transmission rate of the infection. Note that the presence of uninfected neighbours at time *k* does not influence the infection status of node *j* at time (*k*+1) in the dynamics resulting from equation (2.3), since a definitely uninfected node *q* must have *f*((*A*^{k})_{iq})=1. This is the motivation for using *f* as defined in equation (2.1) instead of *g*, since the latter unduly complicates the expression of equation (2.3).

Is there a simple choice for a function *f* satisfying equation (2.3)? In order to answer this question, consider the following equivalent form of equation (2.3):(2.4)That is, in algebraic terminology, *f* must be a homomorphism from the additive real numbers to the (positive) multiplicative real numbers. This essentially requires that *f* be an exponential function,(2.5)

In this manner, we are led to consider what we refer to as the ‘wellness metric’,(2.6)where *c*(*p*_{I}, *p*_{S}) has been specified as (*p*_{I}/*p*_{S}). This is the simplest possible form for *c* satisfying the requirement that for a large (both in the number of nodes and average number of connections 〈*d*〉 per node) random graph, the wellness metric should be able to accurately predict the epidemic threshold condition (in this case, corresponding to the more commonly written *R*_{0}>1)(2.7)That equation (2.6) does indeed reproduce this result for large random graphs can be demonstrated by considering the fact that the dominant eigenvalue of the adjacency matrix *A* of such a network must be approximately 〈*d*〉 (Farkas 2001). Since *A* is a symmetric matrix, it can be diagonalized by an orthogonal matrix *O*, *A*=*O*^{T}*DO*, where *D* is the diagonal matrix of eigenvalues of *A*. Then *A*^{k}=*O*^{T}*D*^{k}*O*, so that , where *λ*_{m}=*D*_{mm} is the *m*th eigenvalue of *A*. For large *k* then, (*A*^{k})_{ij}=〈*d*〉^{k}×[*O*_{1i}*O*_{1j}+ terms of order (*λ*_{2}*/*〈*d*〉)^{k}], where *λ*_{2} is the second largest magnitude eigenvalue of *A*, which is generally significantly smaller than 〈*d*〉 for a large random graph. Approximating *A*^{k}=〈*d*〉^{k}*O*_{1i}*O*_{1j}, it is then apparent that, if (*p*_{I}〈*d*〉/*p*_{S})<1, *w*_{ij}(*k*) from equation (2.6) will converge to 1 as *k* grows, while if instead (*p*_{I}〈*d*〉/*p*_{S})>1, *w*_{ij}(*k*) will decrease to 0.

Defining the ‘unwellness’ *u*_{ij}(*k*) by (so that *u*_{ij}(*k*)=*g*((*A*^{k})_{ij}) as defined in equation (2.1))(2.8)the sum Σ_{j}[*u*_{ij}(*k*)] is then taken as a crude indicator of the expected total level of infection after *T*_{k}=ROUND[*k*^{*}*p*_{S}^{−1}] time units resulting from an initial infection of individual *i*. Thus, the quantity(2.9)whereis here interpreted as affording a prediction of the expected total level of infection after *T*_{k} time units as a result of the initial infection of a single individual when that individual's identity is undetermined.

### 2.3 Unwellness connectivity variance versus node variance

In equation (2.9), bracket notation was introduced for the average of the quantities *u*_{ij}(*k*) over the indices *i* and *j*. That is, 〈*u*(*k*)〉 was introduced as an average over the degree to which all of the nodes are ‘connected’ to each other by *k*-walks filtered through the (*p*_{I}/*p*_{S})-dependent exponential saturation function defining the unwellness metric. In this sense, one could think of 〈*u*(*k*)〉 as the average value of the ‘unwellness-connectivities’, or for short, *u*-connectivities *u*_{ij}(*k*).

One could think of 〈*u*(*k*)〉 equally accurately, however, as the average(2.10)whereIn this interpretation, 〈*u*(*k*)〉 represents the average over all nodes *i* of what we will term the ‘unwellness node value’ (or *u*-node value) *u*_{i}(*k*). This *u*-node value then represents the average degree to which a particular node *i* is connected to the other nodes of the network by *k*-walks (again filtered by the saturation function of the unwellness metric). Heuristically speaking, the *u*-node value *u*_{i}(*k*) may be thought of as a measure of the average unwellness of any other node in the network as a result of the initial infection of the single node *i*.

The reason for introducing this alternative interpretation of 〈*u*(*k*)〉 becomes clear if one focuses on the *second* moments, instead of the mean or first moment of the distribution of the values *u*_{ij}(*k*). That is, the *u*-connectivity second moment,(2.11)is a very different quantity than the *u*-node second moment,(2.12)where the *u*-connectivity second moment 〈*u*(*k*)^{2}〉 relates to the variation in walk connectivity over all node pairs (*i*, *j*) and the *u*-node second moment {*u*(*k*)^{2}} relates instead to the variation over single nodes *i* of overall connectivity to the remainder of the network.

The manner in which the different types of network structures vary with respect to these measures is discussed further in §2.6.

### 2.4 Modified unwellness metric

In order to construct an improved epidemic metric, it should be noted that the dynamical system described by equation (2.6) (‘dynamic’ in the light of its equivalence to equation (2.3), discussed in §2.2) does differ qualitatively from the usual epidemic models in one very important way: the ‘wellness’ of individual *j* at time (*k*+1) may be arbitrarily close to 0 even if only one of its neighbours at time *k* has wellness differing from 1. That is, the interpretation of *w*_{ij}(*k*+1) as the probability that individual *j* is not infected at time (*k*+1) suffers from the unrealistic feature that this probability may be arbitrarily small, given even just one neighbour *q* with a sufficiently small value of *w*_{iq}(*k*).

We therefore seek to modify equation (2.9) for the predicted level of infection based on the unwellness metric to address this problem. Following the discussion above, it seems reasonable to suspect that the *u*-connectivity second moment 〈*u*(*k*)^{2}〉 (equation (2.11)) might be a particularly relevant quantity to consider. Since *u*_{ij}(*k*)≤1, it follows that 〈*u*(*k*)^{2}〉 must be less than or equal to 〈*u*(*k*)〉, though typically 〈*u*(*k*)^{2}〉 is much less than 〈*u*(*k*)〉 unless a substantial number of the quantities *u*_{ij}(*k*) are sufficiently close to unity (requiring *w*_{ij}(*k*)∼0). Note that this is precisely the condition in which the unwellness metric is expected to overpredict the transmission.

We also incorporate the *u*-node second moment {*u*(*k*)^{2}} (equation (2.12)) related to the variance in the network unwellness resulting from different choices of the initial infected node. Particularly, in the case where these sorts of variations in the unwellness quantities *u*_{ij}(*k*) are of similar or greater magnitude to their average value (e.g. in the scale-free networks with large degree variance), it seems highly relevant to consider this quantity.

We find that both of these quantities may be profitably incorporated into the *modified unwellness metric*,(2.13)The parameters *α*, *β* and *γ* appearing in equation (2.13) were fit by linear regression, as described in §2.8. The specific form of equation (2.13) (particularly, the choice of taking the square root of {*u*(*k*)^{2}} but not 〈*u*(*k*)^{2}〉) was chosen based on the resulting quality of the regression fit.

### 2.5 Application of unwellness metric to disease persistence

In §2.2–2.4, we develop the unwellness metric as a predictor of the dynamic progression of infection on networks for those cases in which the disease does not go extinct within a few generations of introduction. It is perhaps even more important, however, to gain insights into how the network structure influences the likelihood of such early extinction as opposed to persistence. This motivates us to apply a similar regression strategy to that used in obtaining the ‘modified unwellness metric’ to assess the degree to and manner in which the unwellness metric may be used as a predictor of the persistence of infection.

Consider first the logistic model(2.14)where *P*_{persistence} is the probability that an infection initiated at a single randomly chosen node will persist for several (here, 25) generation times and *k*_{early} is a fixed (low) value of the generation time index *k*. From the results of the regression analysis performed (as described in §2.8) on simulated SIS networks, the optimal value of *k*_{early} appears to be 2; this value is assumed below. This regression analysis also indicates that the *u*-connectivity second moment term 〈*u*(*k*_{early})^{2}〉/〈*u*(*k*_{early})〉 is not a significant predictor of persistence rates. Thus, we consider instead the reduced logistic model(2.15)

### 2.6 Network construction

Four different classes of networks are considered in this paper, as follows.

*ER**random graphs*with*n*nodes of average degree*d*were constructed by including each edge*i*↔*j*with (independent) probability (*d*/(*n*−1)).*SW**networks*with*n*nodes of (even) average degree*d*with rewiring probability*p*. These networks were constructed as in Watts & Strogatz (1998): after arranging the*n*nodes in a ring and connecting each node to the (*d*/2) nearest nodes on either side of it, each edge is randomly rewired with probability*p*to have one of its ends connected to a new node chosen with uniform probability from the set of all nodes in the system, which are not already neighbouring the edge's fixed end.*Barabasi–Albert**scale-free networks*with*n*nodes of average degree*d*. These networks were constructed as in Barabasi & Albert (1999). Starting with (*d*+1) fully connected nodes, new nodes are sequentially added to the network, with each new node bringing with it (*d*/2) new edges. Each of these (*d*/2) new edges has one terminus fixed at the new node*j*, with the other terminus selected to be previously existing node*i*(assuming*i*is not already a neighbour of*j*) with relative probability*d*_{i(j)}, where*d*_{i(j)}is the degree of node*i*with respect to the network made up of the first (*j*−1) nodes only. The network is completed after the addition of the*n*th node (and its corresponding edges).*Adjustable networks*with*n*nodes of average degree*d*, described by several additional parameters (table 1) controlling the degree distribution, the community structure/clustering and the assortativity of the networks. The parameter*C*controls the number of distinct communities into which the network will be divided, with the level of interconnection between distinct communities increasing with the parameter*N*_{cluster-swap}(see the electronic supplementary material). In conjunction with these clustering parameters, the parameter*d*_{sf}adjusts the variance of the degree distribution (increasing*d*_{sf}increases this variance). Finally, the sign of the parameter*p*_{sort}controls whether the network will be assortative (*p*_{sort}>0) or dissortative (*p*_{sort}<0), with the magnitude of*p*_{sort}adjusting the magnitude of the degree–degree correlations. The parameters*N*_{sort–swap}and*N*_{cluster–sort–swap}govern the number of iterations used in the assortativity adjustment algorithm (see the electronic supplementary material); as long as these are set to sufficiently high values, the desired*p*_{sort}-dependent assortativity patterns should be generated. The details of the algorithm used to construct these adjustable networks are described in §3 in the electronic supplementary material. Figure 1 shows how the standard deviation of the degree distribution, the transitivity and the assortativity (as measured by the Pearson correlation coefficient of the degree of the node at one end of an edge with that of the degree of the node at the other) vary with regard to the parameters in table 1 for the adjustable networks described here. (The values shown in figure 1 are averaged over 10 networks generated at each distinct combination of control parameters.) It is immediately apparent that the various parameters used to adjust the network structure do not work entirely independently of one another (as judged by these structural metrics). For instance, the variance of the degree distribution depends on both*d*_{sf}and on the number of communities the network is divided into, while the network transitivity depends very strongly on the assortativity rewiring parameter*p*_{sort}as well as the number of communities.View this table:

### 2.7 SIS simulations

The parameters defining the discrete-time network SIS models simulated to generate the data appearing in figures 2–4 are presented in table 2. For all of these simulations, the state of the dynamic system may be described as a network of labelled nodes. The network structure itself is here always taken to be constant (i.e. the network edges are fixed in time), but the ‘labels’ associated with the nodes—in this case, susceptible (S) or infected (I)—are functions of the discrete time index. The stochastic dynamics of the infection status of the nodes may then be described by giving the (independent) probabilities per time step of the various types of state-variable transitions available. There are only two basic types of transitions for the network SIS systems simulated herein.

*S→I transitions*. If node*j*is in the state S and has exactly*m*infected neighbours at time step*t*, node*j*will become infected at time step (*t*+1) with probability (1−(1−*p*_{I})^{m}). Otherwise, node*j*will remain in state S.View this table:*I→S transitions*. If node*j*is in the state I at time*t*, it will transition to state S at time (*t*+1) with probability*p*_{S}. Otherwise, node*j*will remain in the state I.

It should be emphasized that the discrete time step used for these simulations is not the same as the ‘generation time’ (which is equal to *p*_{S}^{−1} times the time step employed within these simulations) discussed in the context of the wellness metrics.

### 2.8 Regression for modified wellness metric and persistence probability models

In order to fit the parameters *α*, *β* and *γ* of the modified unwellness metric (see §2.4), 360 networks were constructed according to the adjustable network method described in §2.6 with systematically varied values of the network structure control parameters (table 1; figure 1). For each of the networks thus constructed, 50 SIS simulations were performed as described in §2.7. Those trajectories resulting from these stochastic simulations that did not result in rapid extinction (i.e. within 1000 time steps, or 25 generation times) of the infection were then used to estimate the parameters *α*, *β* and *γ* of the modified wellness metric. This was done by linear regression using the modelwith random error term *ϵ* and with denoting the number of infected nodes at time step (*k*/*p*_{S}) averaged over those trajectories with persistent infection. The results of this fit are shown in table 2.

The parameters *κ*, *λ* and *ν* of the logistic model for the probability of disease persistence were estimated similarly from the same simulation data. The results of this fit are indicated in table 3.

## 3. Results

Note that in all of figures 2–4, the simulation data presented resulted from running 250 trajectories on a single representative network of each type indicated and averaging the infection levels at each time step only over those trajectories exhibiting disease persistence through the entire 1000 time unit window run. The networks and simulations used to generate figures 2–4 were not used in the regression fits described in §§2.4 and 2.5.

### 3.1 Unmodified unwellness metric results

Figure 2*a–c* compares the results of SIS simulations (described in more detail in §2.7) on a variety of networks with the quantities *I*_{0}(*k*) (§2.2). These results indicate that while the unwellness metric given by equation (2.9) does capture some of the qualitative variation in epidemic progression on distinct network structures, structural features beyond 〈*u*(*k*)〉 must be important to take into account as well.

The quantities *I*_{0}(*k*)=*n*〈*u*(*k*)〉 track the average results of the SIS simulations well for some of the network types at early times, but overshoot the SIS simulations at later times. Particularly, with regard to the fact that the unwellness metric results asymptotically saturate at *u*_{ij}=1 for all *i* and *j* for those (connected) networks pictured, figure 2*a–c* demonstrates the consequences of the difference in the dynamics of the unwellness metric from the SIS simulations described in §2.7. The best match between the unwellness metric results and the SIS simulations is on the relatively featureless random graphs. The unwellness metric tends to show less difference in the epidemic spreading rate (in either direction) between the random graphs and other types of network structure than do the SIS simulations (figure 2*a*). This discrepancy is especially pronounced for the SW networks: while clustering appears to have the same qualitative effect of slowing down spreading in both the unwellness metric and the SIS simulations, the degree of slowdown in the simulations is considerably greater than it is for the unwellness metric.

### 3.2 Modified unwellness metric results

Figure 2*d–f* shows the modified unwellness metric *I*_{1}(*k*) (§2.4) fits rather well for a variety of networks, particularly the Erdos random graph, regular lattice, scale free and scale-free assortative networks. On the SW network featuring both a high degree of clustering and the ‘SW’ property of low characteristic path length, the modified wellness metric predicts the SIS simulation data very well over the first four to five generation times, but overpredicts the epidemic spread beyond this point. Finally, on the dissortative scale-free network, the prediction develops an interesting (but pathological) oscillatory behaviour starting at the fifth generation.

Figure 3 compares the results of the modified unwellness metric model of equation (2.13) with networks as the density of network connections is varied (in these cases, with the disease transmissibility parameter varying inversely so as to hold the expected number of secondary infections resulting from a randomly selected initially infected node constant). Two features of the results shown in figure 3 merit special attention. First, it is apparent that the model predictions generally fare better in comparison with the simulation data in denser networks than sparser networks. This trend is most noticeable for the highly clustered SW and lattice networks. Second, the predicted values for the first-generation time index *k*=1 overshoot the simulated values on the high degree-variance scale-free network topology at higher connectivity densities. Interestingly, as the generation time index *k* increases past 1, this behaviour rapidly vanishes.

Robustness of the fit parameters *α*, *β* and *γ* with respect to the ratio (which becomes equal to the epidemic threshold *R*_{0} in the limit of large random graphs with high average connectivity)was also studied through variation of the disease transmissibility *p*_{I} (table 2; figure 4). For the values of transmissibility considered, all three model parameters were significant, with *β* consistently negative and *γ* consistently positive. The parameter *γ* associated with the *u*-node second moment term appears to be more important for lower values of the transmissibility. This may reflect the larger importance of the bias of persistent disease trajectories towards those initiated from more well-connected nodes when disease extinction is a greater possibility (i.e. when transmissibility is lower). The quantitative value of the fit parameter *β* is considerably more robust to the disease transmissibility (table 2).

Finally, it is interesting to note that the results for the modified unwellness metric predictions with increased transmissibility show a qualitatively similar, albeit smaller, overshoot for the scale-free networks at the first-generation time index *k*=1 as appeared in the increased edge density situation.

### 3.3 Results of unwellness-based logistic model for persistence

The results of the regression fit of the model specified by equation (2.15) in §2.5 are shown in table 3. Table 4 displays the resulting predictions contrasted with the simulated data for various network types (again, these simulations were not used in the regression analysis determining the values of *κ*, *λ* and *ν*). The assortativity/dissortativity of the network structure exhibits a very large impact on the probability of persistence, which the model does a good job of capturing, largely through the *u*-node second moment term. Smaller effects seem to be associated with the clustering and the degree distribution.

Especially in combination with high clustering, reduced network density (again varying the infection transmissibility inversely to hold the product of connectivity and transmissibility constant) can lead to sharply decreased persistence probability. This is demonstrated by the simulation results in table 4. Also apparent in this table is the breakdown of the logistic model of equation (2.15) when applied to such sufficiently sparse (e.g. average degree 4 in the data presented in table 4) and clustered networks. Conversely, at higher network densities (e.g. networks of average degree 20 in table 4), the logistic model for persistence probability appears to perform much more reliably.

The parameter *λ* associated with the 〈*u*(*k*_{early})〉 (recall from §2.5 that *k*_{early}=2 here) term is found to be positive, suggesting that the networks for which the unwellness metric predicts faster epidemic spread tend to be more likely to sustain disease persistence. Interestingly, from the robustness with respect to transmissibility (and hence, with respect to the ratio (*p*_{I}〈*d*〉/*p*_{S})) results displayed in table 5, it is apparent that the impact of this unmodified unwellness metric term is much larger for relatively lower transmissibility.

The parameter *ν* associated with the *u*-node second moment term {*u*(*k*_{early})^{2}}^{1/2}/〈*u*(*k*_{early})〉 is found to be negative (and fairly robust to variation in the disease transmissibility). While considered in isolation, this might be taken to suggest that the networks in which there is especially large variation in the secondary infections produced as a result of infecting different initial nodes tend towards decreased propensity for disease persistence, examination of the results of simulations on (non-assortative) scale-free networks shows that this is not the whole story. In this case, in particular, the effects of the relatively large *u*-node second moment term are offset by a coinciding increase in the *u*-connectivity first moment term 〈*u*(*k*_{early})〉.

However, shifting attention to the assortative scale-free network simulation results, it becomes apparent that the combination of highly variable degree distribution and assortative degree correlations can result in even larger {*u*(*k*_{early})^{2}}^{1/2}/〈*u*(*k*_{early})〉 values without proportionally increased 〈*u*(*k*_{early})〉 values, and in this case there is indeed a dramatically lowered likelihood of disease persistence.

## 4. Discussion and conclusions

The ‘unwellness metric’, incorporating *u*-connectivity, the *u*-connectivity second moment and the *u*-node second moment, is introduced in this paper as a way of directly relating network structure data to epidemic spreading processes. This metric may be calculated in an entirely straightforward manner from the adjacency matrix of a network and herein is shown to contain a great deal of information regarding the rate of spreading of infection, as well as the likelihood of persistence of infection—at least, for networks with a sufficiently high edge density (approx. 10 edges per node).

It is worthwhile to pause for a moment to consider the relationship of the unwellness metric to a few more familiar measures of network structure. The networks with more variability in the distribution of node degrees (e.g. the scale-free networks) tend to lead to larger average values of the *u*-connectivity 〈*u*(*k*)〉 (figure 2*a*), while more highly clustered networks tend towards smaller average *u*-connectivity. These particular trends in *u*-connectivity appear to qualitatively reflect the different speeds with which the infections spread on such networks (figure 2*a*).

Somewhat surprisingly, the scale-free networks with either assortative or dissortative degree correlations were here found to exhibit lower *u*-connectivity values than those with no such correlations (figure 2*b–c*). While this phenomenon does seem to qualitatively match with the infection simulation results at times following the initial epidemic explosion, it does not capture the greater speed of outbreaks during the early transient phase in assortative networks (figure 2*c*). For this case, it appears that the increased variation, not just the decreased mean, of the distribution of *u*-connectivities *u*_{ij}(*k*) must be taken into account in understanding the transmission dynamics.

Two distinct measures of such variation were proposed in §2.4, the *u*-connectivity second moment 〈*u*(*k*)^{2}〉 and the *u*-node second moment {*u*(*k*)^{2}}. The *u*-connectivity second moment is associated with the variation in the strength of coupling of nodes taken in pairs, while the *u*-node second moment corresponds with the variation in connectivity of individual nodes to their surroundings. Large variation in the degree distribution tends to lead to relatively large values of the *u*-connectivity second moment compared with random graph structure, but at early generation times *k*, the network clustering leads to an even larger increase in this quantity (figure 5*a*,*b*). On the other hand, while, as one might intuit, more heterogeneous degree distributions lead to much larger values of the *u*-node second moment (figure 5*c*,*d*; again, for low *k*), clustering tends to depress this particular measure. Assortative degree correlations tend to further increase both the measures of variation for low *k* (figure 5). Dissortativity tends to lead to larger values of the *u*-connectivity second moment at low *k*, but exhibits a somewhat more complex pattern for the *u*-node second moment (figure 5).

The unwellness metric, modified to incorporate the variation measures as discussed, was used to predict both the probability that a single initial infection would lead to a persistent endemic infection (tables 4 and 5) and the transient time course of initial epidemic outbreak in persistent trajectories (figures 2–4). Both infection persistence and speed of outbreak tend to increase with the average *u*-connectivity. The speed of outbreak tends to be reduced for networks with larger *u*-connectivity second moments, but these quantities exhibited no significant effect on the persistence probability. Finally, networks for which the *u*-node second moment was particularly pronounced tended to show increased outbreak speed but decreased likelihood of persistence.

The results presented herein pertain to the SIS-type epidemic spreading processes for which recovery from infection provides negligible immunity to re-infection. The extension of the unwellness metric approach to the SIRS-type epidemic models poses some interesting challenges. The simplest option for such extension would be to apply the regression strategy employed in the modified unwellness metric discussed here to a set of SIRS models. In this manner, one might get a sense of how the structural features introduced here (the average *u*-connectivity, the *u*-connectivity second moment and the *u*-node second moment) influence the dynamics of infection in such models. One complication with this approach is that it would require multiple regressions to be done for situations involving different characteristic time scales for the loss of immunity (i.e. R→S).

A more complex approach to extending the unwellness metric to SIRS processes would be to try and replace the basic definition of *w*_{ij}(*k*) (equation (2.6)) with a quantity naturally incorporating the R→S rate parameter. For instance, one might try and modify *w*_{ij}(*k*) by subtracting out a quantity representing a measure of the number of immune individuals who might have, in the absence of immunity, been expected to have become infected in the interval between time index (*k*−1) and time index (*k*). Such a modification of *w*_{ij}(*k*) would have to be done with care in order to preserve the various properties discussed in §§2.1 and 2.2 (especially boundedness, i.e. 0<*w*_{ij}(*k*)<1).

It is hoped that the results of this study will prove useful in translating the data on social network structures into practicable predictions regarding the effectiveness of proposed intervention strategies such as vaccination, isolation and quarantine strategies that alter the network structures in ways that reduce transmission rates, possibly leading to the eradication of particular infectious diseases. In particular, it is worth emphasizing that the unwellness metric may be easily applied to networks with arbitrary clustering and degree correlation patterns. It also bears mentioning that this is not only a useful property for application to empirical network data, but also for application with regard to the network structures produced by dynamical network models which may not lend themselves to simple analytical solution.

## Acknowledgments

This work was supported by a James S. McDonnell Foundation 21st Century Science Innovation Award and NIH grant no. GM83863 to W.M.G.

## Footnotes

- Received September 28, 2008.
- Accepted November 20, 2008.

- © 2008 The Royal Society