## Abstract

The awareness that pathogens can adapt and evolve over relatively short time-scales is changing our view of infectious disease epidemiology and control. Research on the transmission dynamics of antigenically diverse pathogens is progressing and there is increasing recognition for the need of new concepts and theories. Mathematical models have been developed considering the modelling unit in two extreme scales: either diversity is not explicitly represented or diversity is represented at the finest scale of single variants. Here, we use an intermediate approach and construct a model at the scale of clusters of variants. The model captures essential properties of more detailed systems and is much more amenable to mathematical treatment. Specificities of pathogen clusters and the overall potential for transmission determine the reinfection rates. These are, in turn, important regulators of cluster dynamics. Ultimately, we detect a reinfection threshold (RT) that separates different behaviours along the transmissibility axis: below RT, levels of infection are low and cluster substitutions are probable; while above RT, levels of infection are high and multiple cluster coexistence is the most probable outcome.

## 1. Introduction

Most infections induce an immune response that confers only modest protection against reinfections. This is either owing to host inefficacy in mounting specific immunity (tuberculosis, malaria) or owing to pathogen ability to generate sustainable antigenic diversity (influenza viruses, respiratory syncytial virus, malaria parasites). We have recently generated a series of simple models to explore the epidemiological consequences of different forms of immune failure (Gomes *et al*. 2004, 2005; Breban & Blower 2005). When immunity is partially protective, a threshold in transmission was detected above which reinfection dominates the dynamics. As the transmission coefficient is increased across this so-called ‘reinfection threshold’ (RT), the levels of infection in the population rise steeply which is accompanied by a dramatic drop in the potential effectiveness of vaccination programmes. The specification of pathogen diversity in epidemiological models is the subject of much ongoing research and debate (Andreasen *et al*. 1997; Gupta & Maiden 2001; Girvan *et al*. 2002; Gog & Grenfell 2002; Gomes *et al*. 2002; Boni *et al*. 2004; Grenfell *et al*. 2004). Here, we construct a relatively simple model to demonstrate that the RT can have a crucial role in regulating pathogen diversity and determining pathogen evolution. The influenza viruses provide an excellent illustration.

Influenza viruses have a segmented RNA genome, which makes them particularly prone to mutations and intratypic recombination through genetic reassortment during multiple infections. Mutation generates variants that are antigenically closely related (drift candidates) and, in the case of influenza type A, genetic reassortment involving human and avian subtypes produces antigenically distant variants (shift candidates). The conditions under which drift and shift will effectively occur can be studied by modelling the dynamics of transmission. Previous models represented the transmission dynamics of multiple variants interacting by cross-immunity and reproduced the structuring of pathogen populations into distinct clusters of antigenically similar variants (Gog & Grenfell 2002; Gomes *et al*. 2002; Ferguson *et al*. 2003). Furthermore, in the case of influenza A, the switch between clusters has been shown to occur in discrete steps (Smith *et al*. 2004) reinforcing the notion that the emergence of a new cluster can be modelled as a discrete event.

Motivated by these previous studies, we use the cluster as a modelling unit, providing a complementary approach to variant-based models. The cluster structure of two influenza A subtypes is schematically represented in figure 1. The transmission dynamics of each cluster are modelled as a *SIRI* (susceptible-infected-recovered-infected) model, where reinfection occurs at a reduced rate owing to partial immunity between the constituent variants. Reinfection by a variant in a different cluster is assumed to occur at a higher rate to reflect lower cross-immunity, and increases with increase in the antigenic distance between the two clusters. We describe the conditions under which a resident cluster is replaced by an invader. By varying the parameter that controls cross-immunity between clusters, we implement a range of cluster replacement scenarios and formalize the processes of drift and shift evolution within the same model framework.

When considering only one cluster, the transmission dynamics induces a RT owing to partial immunity (Gomes *et al*. 2004, 2005; Breban & Blower 2005), and this is associated with a steep increase in the prevalence of infection. This threshold occurs at a value of transmission such as *R*_{0}=1/*σ*, where *σ* (between 0 and 1) represents the risk reduction of reinfection resulting from partial immunity in response to a primary infection. Here, we extend the model and consider more than one cluster. We show that the RT has a striking role in the regulation of pathogen antigenic diversity. Cluster replacement is identified as the preferred behaviour when the transmissibility is restricted to a window immediately below the RT. With this motivation, we propose that if the transmissibilities of influenza A viruses lie within this window, then the evolutionary pattern of drift and shift that so uniquely characterize these viruses can be accounted for by the same principles.

## 2. A model with multiple clusters

### 2.1 Cluster definition

We consider a pathogen population with the potential to accommodate antigenic diversity respecting a predefined clustered structure. The pathogen explores this potential structure in a way that is regulated by the epidemic dynamics, which in turn is largely determined by the pathogen population structure itself. Here, we consider two levels of relatedness: variants in the same cluster differ by some average within-cluster antigenic distance, *σ*, while different clusters are separated by some greater between-cluster antigenic distance, *σ*_{x}, as conceptually, greater antigenic distance represents lower cross-immunity. As such, in some appropriate normalized measure, we have 0<*σ*<*σ*_{x}≤1. Each cluster, *i*, is therefore responsible for three distinct rates of infection depending on the host's previous exposures: *Λ*_{i} for naive hosts, *σΛ*_{i} for hosts previously infected by the same cluster, and *σ*_{x}*Λ*_{i} for hosts previously infected by a different cluster.

### 2.2 Deterministic model

It is assumed that individual hosts are born with maximum susceptibility to infection, and acquire some protection as they experience sequential infections. Thus, individuals are characterized by their present infection status and history of past exposures. In analogy with previous models (Andreasen *et al*. 1997), and considering *N*={1, 2, …, *n*} be the set of all possible clusters, there are two types of dynamical variables: *S*^{J} is the proportion of presently uninfected individuals that have been infected by clusters in the subset *J*⊆*N*, and is the proportion of individuals presently infected with cluster *i* and previously infected by clusters in *J*. A given cluster, *i*, infects naive individuals at a *per capita* rate *Λ*_{i}=*β*_{i}*I*_{i}, where , while previously infected individuals are subject to reduced rates of infection as described above. The dynamics of transmission are described by the system of differential equations(2.1)and represented diagrammatically in figure 2 for the case of two clusters. The hierarchical structure of the pathogen population is captured by the parameters *σ* and *σ*_{x}. A crucial component in the analysis will be the difference, Δ*σ*=σ_{x}−*σ*. Other parameters consist of the death rate, *μ*, which is assumed equal to the birth rate and takes the value *μ*=1/70 year^{−1} (representing a life expectancy at birth of 70 years), the rate of recovery from infection *τ*=52 year^{−1} (representing an average duration of infection of approximately one week) and a transmission coefficient which is the same for all clusters, *β*. If time is scaled to units of average duration of infection, 1/(*τ*+*μ*), the transmission coefficient scales to become the basic reproduction number, *R*_{0}=*β*/(*τ*+*μ*). This is the average number of infections generated by a single infected individual in a totally susceptible population, and therefore it must be above 1 for a cluster to invade a susceptible population. We shall use *R*_{0} instead of *β* to represent the results, as this is a more universal measure, and note that the same value is assumed for all clusters.

### 2.3 Stochastic model

We construct a continuous time stochastic model by replacing the differential equation system (2.1) with corresponding Poisson processes with state-dependant rates and exponential waiting time between state transitions. These stochastic models are investigated with Monte Carlo simulation experiments using standard techniques (Renshaw 1991). All stochastic simulations are performed with two clusters only, with the additional assumption that both clusters have the same transmission coefficient. The transition rates of the specific model with two clusters are given in table 1.

## 3. Results

### 3.1 Endemic equilibria

According to the deterministic model, a single cluster invades a naive population and establishes an endemic state given that its transmissibility is above the epidemic threshold (*R*_{0}>1). The endemic state is reached in a damped oscillatory manner if transmissibility is below the RT (*R*_{0}<1/*σ*) and characterized by low prevalence level. Above the RT, equilibrium prevalences are about two orders of magnitude higher and oscillations are no longer observed (see Gomes *et al*. 2004). When two clusters are introduced, the long-term equilibrium is characterized by coexistence where both clusters maintain the same endemic level (Nunes *et al*. 2006). This work is focused on invasion by new clusters and the dynamics on shorter time-scales. The first insight into the associated behaviour is obtained by simulation of the deterministic model and the results are then consolidated by its stochastic counterpart.

### 3.2 Cluster replacement

Simulations consider that a resident cluster (cluster 1) is approaching equilibrium in a host population. An invading cluster (cluster 2) is then introduced. The reduced risk of infection owing to within-cluster acquired immunity, *σ*, was set at 0.25 for every simulation, unless stated otherwise. The antigenic distance between clusters is represented by a weaker risk reduction here modelled by varying *σ*_{x}>*σ*. Therefore, the same framework can be used to model either antigenic drift or antigenic shift, depending on whether Δ*σ* is small or large, respectively. Given the increasing evidence for the existence of heterosubtypic immunity between different subtypes of influenza (Yetter *et al*. 1980; Liang *et al*. 1994; Heinen *et al*. 2001), we assume that *σ*_{x}<1 in all settings.

Figure 3 illustrates the behaviour of the deterministic model, when the RT is placed at 4 (*σ*=0.25). When the invading cluster is introduced, we observe a transient effect: the invader generates an epidemic peak and the resident goes through a trough before initiating convergence towards a new equilibrium. This transient effect, leading to the extinction of one or both clusters, depends essentially on two factors: the position of *R*_{0} relative to the RT and the antigenic distance between the two clusters. Regarding the first factor, as transient oscillations are only observed below the RT, the potential for cluster extinctions is also confined to this parameter regime. Regarding the second factor, by fixing *R*_{0} below the RT (e.g. *R*_{0}=3.5) and varying Δ*σ*, the resident cluster always goes through a deep trough associated with probable extinction (figure 3*a*,*c*,*e*, black line), while the invading cluster induces an initial epidemic followed by a trough whose amplitude increases with Δ*σ*, facilitating the extinction of both clusters (figure 3*a*,*c*,*e*, grey line).

The infinite host population size inherent to the deterministic model cannot, however, accommodate cluster extinctions, as infection can survive the most infinitesimal troughs. To analyse the probable outcome in a finite population, extinction cut-offs are placed at frequency 1/*P*, depending on the population size *P* considered. A cluster is considered extinct when it crosses the assumed cut-off (figure 3*a*,*c*,*e*, dashed lines).

Depending on the parameters and the population size, we observe three possible scenarios: extinction of both clusters, replacement of the resident by the invading cluster, and coexistence of both clusters. These are, hereinafter, referred simply as ‘extinction’, ‘replacement’ and ‘coexistence’, and predominate in this order as *R*_{0} increases, except for small Δ*σ* (e.g. Δ*σ*=0.01), where extinction is never observed. Replacement is typically confined to a window between extinction and coexistence (figure 3*b*,*d*,*f*, black bars) and this window becomes narrower as we increase the antigenic distance between clusters, Δ*σ*. The replacement window is always contained below the RT, determining an important basis for the occurrence of antigenic drift and antigenic shift, as these require the previous cluster to become extinct and the invading one to survive. These results are confronted with their stochastic counterpart (figure 3*b*,*d*,*f*, grey bars) obtained for *R*_{0}>3.2. Below this value, stochastic fluctuations make persistence of the resident cluster alone virtually impossible owing to sustained oscillations inherent to the stochastic model (Nåsell 2002). Stochasticity appears to extend the replacement window towards higher *R*_{0}, reflecting a stronger tendency for cluster replacement.

It is important to quantify the probability of these different events according to the values of the epidemiological parameters. Figures 4 and 5 show detailed results for the stochastic model, with a population of size *P*=10^{6}. There are two main differences when comparing with the deterministic model: for the same *R*_{0} and Δ*σ*, several outcomes are now possible owing to stochasticity and the second cluster may not succeed to invade. For each set of parameter values, simulations were carried out to obtain 100 invasions. Of these successful invasions, the numbers of extinction, replacement and coexistence events were recorded. The results for the probability of each of these events, given that invasion occurs, are depicted in figure 4 for Δ*σ*=0.01 and 0.25. Figure 5*a*,*b* shows the probabilities of replacement, conditional on invasion, for *σ*=0.25 and 0.5, along with the context provided by the equilibrium prevalence curve for a single cluster and the associated RT. The RT at *R*_{0}=1/σ and the nested replacement windows located below this threshold are consistent across different values of sigma. Furthermore, when considering influenza A, the values of *R*_{0} encountered in the literature (Oxford 2000; Mills *et al*. 2004; Ferguson *et al*. 2005) are well accommodated within the parameter region associated with our replacement windows.

The results presented here show that the RT marks an upper bound on *R*_{0} for the occurrence of successive replacements of antigenic clusters that compete by cross-immunity. Above this threshold, the trend is for cluster coexistence and accumulation of antigenic diversity. Based on this, we propose that the RT has a potential role as a regulator of pathogen diversity.

## 4. Discussion

Supported by empirical evidence (Plotkin *et al*. 2002; Levin *et al*. 2004; Smith *et al*. 2004) and theoretical studies (Gog & Grenfell 2002; Gomes *et al*. 2002), a novel approach to the modelling of antigenically diverse pathogens, centred on the cluster of antigenic variants, is presented here, with an application to influenza. This approach intends to capture essential properties of the epidemiology resulting from what is now demonstrated as the emergence of an intermediate scale of aggregation for pathogen variants, the cluster. This is not contradictory to previous models centred on the individual strain, showing the emergence of these clusters, but in fact complementary.

Antigenically distinct clusters are introduced sequentially into circulation in a host population and the outcome of competition monitored. Cluster replacement is a phenomenon restricted to a range of transmissibility values that becomes narrow as we increase the antigenic distance between the clusters. This fundamental region is located just below the RT moving with it in an attached manner. Furthermore, it separates two other types of behaviour: extinction of both clusters at lower transmissibility and coexistence at higher transmissibility. Above the RT, accumulation of antigenic diversity is thus the expected result of any succession of cluster introductions. We expect influenza A to lie within the parameter region where replacement occurs, as the accumulation of diversity is limited for this pathogen, whose evolutionary pattern is characterized by a sequence of drift and shift events.

We find that the crucial factors determining replacement are:

the characteristic cluster size (factor affecting reinfection by a variant in the same cluster,

*σ*) as this determines the position of the RT;the distance between competing clusters (factor affecting reinfection by a variant in the competing cluster,

*σ*_{x}>*σ*) as this determines the width of the replacement window; andthe position of

*R*_{0}relative to the RT (located at*R*_{0}=1/*σ*) and the replacement window determined above.

Reinfection is indeed central to the mechanism of replacement as a simple *SIR* (susceptible-infected-recovered) model with no reinfection, equivalent to setting *σ*=0, exhibits no substitution for any given set of parameters. Our proposal is that the evolutionary pattern of influenza A viruses is largely determined by a characteristic aggregation into clusters of antigenic variants and the subsequent reinfection dynamics regimented by this aggregation.

Although no explicit phylogeny is extracted from the model, a ‘slender-trunked’ phylogenetic tree, such as obtained for the influenza A hemagglutinin, would be most probable if transmissibility takes place in the replacement regime. This extrapolation from antigenic to genetic diversity is supported by the parallel between the two measures in the case of influenza A established by Smith *et al*. (2004). Furthermore, the hemagglutinin gene encodes the most important target of the immune system (the hemagglutinin protein) implying that patterns of genetic diversity will reflect antigenic constraints.

The drift of influenza A as a single lineage and the tendency for subtype replacement has stimulated other theoretical investigations. A highly elaborate epidemiological model produces this behaviour under the hypothesis that infection induces some variant-transcending immunity, which completely prevents reinfections for some time (Ferguson *et al*. 2003). The authors assume this immunity to apply between variants of the same subtype and between subtypes, decaying exponentially with a half-life period of six months. This variant-transcending (hetero- and homosubtypic) immunity was included as a means to enhance competition between lineages and subtypes, and presented as a necessary condition to obtain the patterns of drift as a single lineage and shift evolution characteristic of influenza A.

The cluster-based model proposed here also requires some sort of hetero- and homosubtypic immunity. However, we implement a long-lasting partially protective effect in agreement with immunological studies (Yetter *et al*. 1980; Liang *et al*. 1994; Heinen *et al*. 2001). The only manifestation of a short-lasting total protection in our model is the impediment of co-infection. This protective effect can be attributed not only to immunity during the course of infection but also to other phenomena such as behavioural change associated with disease. As observed by Ferguson *et al*. (2003), some form of temporary variant-transcending protective effect appears necessary to oppose trends of coexistence and accumulation of diversity. However, our framework permits a considerable relaxation of the assumption by reducing the average duration of protection from six months to one week. Despite the lack of more definite experimental results, it is important that different assumptions are explored by modelling studies.

## Acknowledgments

We thank Jorge Carneiro, Ben Cooper and Graham Medley for their valuable comments. This research was funded by the Calouste Gulbenkian Foundation (FCG), the Portuguese Research Council (FCT) and the European Commission, grant MEXT-CT-2004-14338. J.B.O.M. had a studentship from PRODEP III.

## Footnotes

- Received July 26, 2006.
- Accepted August 31, 2006.

- © 2006 The Royal Society