## Abstract

The evolution of viruses to escape prevailing host immunity involves selection at multiple integrative scales, from within-host viral and immune kinetics to the host population level. In order to understand how viral immune escape occurs, we develop an analytical framework that links the dynamical nature of immunity and viral variation across these scales. Our epidemiological model incorporates within-host viral evolutionary dynamics for a virus that causes acute infections (e.g. influenza and norovirus) with changes in host immunity in response to genetic changes in the virus population. We use a deterministic description of the within-host replication dynamics of the virus, the pool of susceptible host cells and the host adaptive immune response. We find that viral immune escape is most effective at intermediate values of immune strength. At very low levels of immunity, selection is too weak to drive immune escape in recovered hosts, while very high levels of immunity impose such strong selection that viral subpopulations go extinct before acquiring enough genetic diversity to escape host immunity. This result echoes the predictions of simpler models, but our formulation allows us to dissect the combination of within-host and transmission-level processes that drive immune escape.

## 1. Introduction

Dynamical mathematical models have played a prominent role in building our current understanding of the mechanisms that drive epidemic and evolutionary dynamics of viruses (Anderson & May 1991; Grenfell & Dobson 1995; Read & Keeling 2006). Epidemiological models (Matthews & Haydon 2007 and references therein) at the population scale have provided mechanistic insight into the epidemic process and spatio-temporal dynamics of viruses; those at within-host scales have described viral infection dynamics (Gupta & Anderson 1999; Nowak & May 2000) through modelling aggregate host-level dynamics (Nelson 2006) and systems biology at the within-cell level (Young *et al*. 2008). Likewise, evolutionary dynamics of virus populations have also been modelled at either the population (Anderson & May 1991; Boots & Sasaki 1999) or within-host scale, with an emphasis on the population-level scale for acute infections and the within-host scale for chronic infections. A relatively small number of studies have developed cross-scale models to explore the role of viral life-history traits on epidemic dynamics (Keeling & Rand 1995; Gilchrist & Coombs 2006; Read & Keeling 2006). However, the role of immunity in viral dynamics has generally been investigated separately at these multiple scales, and no frameworks have been developed to explore the population implications of viral genetic and host immune variation within hosts. These are important processes for viral dynamics, particularly in dissecting the origins of *immune escape*, in which many viruses (and in particular incompletely immunizing RNA viruses such as influenza; Ferguson *et al*. 2003; Koelle *et al*. 2006; Recker *et al*. 2007) evolve to avoid the prevailing immunity of host populations.

In a preliminary attempt at unifying the between- and within-host scales for explaining patterns of viral immune escape, Grenfell *et al*. (2004) proposed a simple ‘phylodynamic’ model which predicts (via the evolutionary infectivity profile (EIP) that intermediate levels of immunity should impose the strongest selection. Specifically, Grenfell *et al*. predicted that intermediate levels of immune pressure create the most favourable conditions for epidemics of partially immunizing viruses. By contrast, strong immunity suppresses immune escape by limiting transmission, while weak immunity does not allow for selection of the escape variant. While the phylodynamic model lays a qualitative basis for integrating the complexities underlying virus dynamics, its description of this interaction is very simplistic. Note that the model of Grenfell *et al*. (2004) does not specifically address at what integrative level (within-host, along transmission chains etc.) the EIP arises. In this paper, we attack this problem by developing a model that explicitly incorporates transmission chains and more mechanistic models of viral evolution within hosts.

Viral quasi-species models have generated powerful insights into possible mechanisms of viral evolutionary dynamics (e.g. identification of the ‘error catastrophe’ wherein viruses mutate too rapidly to retain favourable genotypes) (Eigen 1971; Eigen & Schuster 1979; Nowak 1992; Baake & Gabriel 2000; Kamp 2003). However, most of these models do not in turn include the effects of virus evolution on the structure of adaptive immunity in the host population, making it difficult to understand how their interdependence affects viral evolution and epidemiology in the long term. Recent work by Kamp & Bornholdt (2002) has begun to bridge this gap with a within-host model of the co-evolutionary dynamics of viral and antibody populations, but the dynamics of transmission have not yet been considered. Here, we use a within-host model of viral sequence evolution to explore the between-host evolutionary dynamics occurring throughout repeated infection of the same host. Specifically, we aimed to understand how the interplay of viral replication and immune selection strength affect immune escape.

We begin with the simplest case, where viral variants evolve in a quasi-species sense, but immunity arises only in direct response to circulating virus strains and does not mutate dynamically. In order to study the resulting evolution along transmission chains, we again introduced the simplest model comprising a single host being infected repeatedly, with the viral variants for each new infection being selected from the prevailing variants at the end of the previous infection. During each infection, new viral variants appear in the host via mutations. Upon transmission to the next infection event (and accompanying renewal of the susceptible cell population), viral variants may become dominant, which results in the viral population moving through its genetic space. Although very simplistic, our model captures the basic epidemiological processes occurring at the population level.

## 2. General model

We model the evolutionary dynamics of a viral sequence and the specific host immune response that is generated in response to infection. We consider acute viral infections that generate a spectrum of levels of strain-specific adaptive immunity, from weak responses to ‘sterilizing’ immunity. The levels of specific immunity are maintained throughout multiple transmission events, such that there is immune memory against previously encountered strains that does not wane over time. Neutral genetic variants of the viral sequences are introduced by mutation, but, for simplicity, we assume that all immune variants pre-exist at low levels, replicating only in the presence of their corresponding viral strain. For simplicity in this initial cross-scale model, we do not consider mutation of the immune sequences (Kamp 2003). The interaction of virus and antibody variants is thus the sole source of selection for diversification of the viral sequences by competitive displacement within a given host.

We begin by considering the simplest within-host model that captures the essence of within-host factors that control viral kinetics, including both non-specific (via resource depletion and innate immunity) and specific (adaptive immunity) factors. We then generalize this model to transmission and host reinfection events to examine the long-term evolutionary patterns. Reinfection of hosts occurs with virus strains that originated from the previous infection and a completely replenished reservoir of the susceptible host cells.

### 2.1. Adaptive immunity and virus-immune variation

Our model consists of an ensemble of viral sequence subpopulations and corresponding immune cell subpopulations (with one-to-one correspondence between the virus and immune sequences), which together constitute the virus and immunity populations, respectively. Each variant can be represented as a sequence of length *N* consisting of letters from an alphabet of size λ (for simplicity, we chose λ = 2). In total there are λ^{N} different sequences for both the virus and antibody populations. The interaction between the virus and immune subpopulations determines both the rate at which the viral subpopulation is cleared and the rate at which specific antibody is produced. Note that, although not studied in this paper, the effects of cross-immunity could be captured by extending this interaction to neighbouring viral sequences (Gog & Grenfell 2002; §3).

We used the Hamming distance to measure the genetic distance between sequences; for our alphabet of length 2, this is simply the number of genetic elements by which two sequences differ. Viral variants are generated by point mutations that occur during replication. Assuming that *μ* is the probability that a given element of a sequence will mutate upon replication, then the probability that the *i*th sequence mutates into the *i*th sequence (*i*, *j* ∈ [1…2^{N}]) in one step is
2.1
where *h*_{ij} is the Hamming distance between variants (Kamp & Bornholdt 2002; Kamp 2003).

The specific immune response represents a population of mature B cells, which have been selected for receptors that optimally bind the antigenic regions of a particular viral genotype; antibodies released by these B cells are modelled implicitly via the immune killing of viruses.

### 2.2. Innate immunity and susceptible cell dynamics

Especially in previously naive hosts, acute viral infections such as influenza are typically curtailed by factors other than adaptive immunity (Baccam *et al*. 2006). In particular, innate immunity acts to make susceptible cells refractory to infection (or removes them by apoptosis), while cytopathic infections such as influenza remove cells directly. For simplicity, we combine these processes, introducing a further variable *n*(*t*) representing the population of susceptible host cells that are (directly or indirectly) ‘removed’ by the virus. More complex models, which separate susceptible cell depletion by infection and cell protection by innate immunity, give qualitatively similar results (results not shown); we therefore adopted the simpler model here. Unlike adaptive immunity, this generalized innate protection is not assumed to persist between infections (i.e. susceptible cells are replenished before the next infection).

Dynamics of the viral sequences *v*(*t*), susceptible cells *n*(*t*) and adaptive immunity *z*(*t*) are given by the following set of differential equations:
2.2
2.3
2.4
2.5

The virus replication rate *k*_{0}, the virus depletion rate owing to immunity *k*_{ε}, virus clearance rate *k*_{γ}, susceptible cell depletion rate *k*_{n} and immune cell replication rate *k*_{z} are constants. Equations (2.2) and (2.3) describe the depletion of the number of susceptible cells in the host (we assume that all virus variants have equal access and ability to use susceptible cells) and the dynamics of the adaptive immune system response. The first term in the right-hand side of equation (2.4) corresponds to virus replication owing to infection of available cells in the host, and the other two terms describe clearance of the virus owing to adaptive immunity and the loss of free virions.

In order to simplify the above set of equations, we rescaled the variables (*n* → *n*/*n*_{0}, *z*_{i} → *z*_{i}*k*_{z}/*k*_{n}, *v*_{i} → *v*_{i}*k*_{γ}/*k*_{n}, *t* → *t*/*k*_{γ}) and introduced the new parameters *α* = *k*_{α}/(*k*_{γ}*n*_{0}) and *ε* = *k*_{ε}*k*_{z}(*k*_{n}*k*_{γ}), obtaining the following equations:
2.6
2.7
2.8
2.9

Here, *α* and *ε* correspond to the viral replication and the strength of immunity, respectively. In order to reduce the number of equations (recall that *i* ∈ [1…2^{N}]), only the most abundant sequences and their nearest neighbours (i.e. those strains at a Hamming distance of 1) are tracked. This approximation holds well for the low mutation rate case, which we consider here.

## 3. Results and discussion

### 3.1. Single strain dynamics

In order to establish the basic dynamics of infection within a host, we consider the simplified case of a single genetic type of virus, which we term the wild-type or founding variant, and the corresponding specific immune response, 3.1 3.2 3.3 3.4

We can simplify this system of equations by the partial integration
3.5
3.6
3.7
where is the increase in immunity during infection, and . (Note that these equations are easily adjusted to model systems where the *per capita* growth of immunity is proportional to the viral abundance, . In this case, the exponent e^{−}^{z} in the above equations is replaced throughout by 1/*z*. We have verified that the results are qualitatively similar in this case.)

Further integration produces the following expression for *v*:
3.8

The maximum value of the viral abundance, *v*_{max}, is attained at , where *W* is the product log function (note that when ).

During infection by a single viral strain in a naive host with no pre-existing immunity, the viral load increases with a concomitant decrease in susceptible host cells (figure 1*a*). Owing to the depletion of susceptible cells and rising adaptive immune response, the viral load peaks and then declines asymptotically towards clearance. In contrast, infection of a host with a high level of pre-existing immunity to the infecting strain results in a monotonic decline in viral abundance towards zero owing to the strong levels of adaptive immunity (figure 1*b*). In such secondary infections, there is almost no change in the strength of adaptive immunity, but some depletion of uninfected cells. These infection dynamics capture the essence of many acute infections with a short incubation period (Grenfell *et al*. 2004).

### 3.2. Multiple reinfections of a single host

The impact of transmission on the evolutionary (mutation–selection) dynamics of a virus population in the context of host adaptive immune responses is modelled via the simplest case: reinfecting a single host multiple times. Transmission events are modelled by sampling viral genotypes from the infectious host and reintroducing them into the same host after clearing the infection and restoring the initial number of susceptible cells (to represent recovery during the intervening time), while keeping the adaptive immunity level intact. For simplicity, we sample the average abundance of each sequence during the period of infection and choose the three most abundant viral variants to initiate the next infection (our results are not qualitatively sensitive to the number of variants transmitted). In the absence of immune pressure, the subsequent reinfections of the host ought to progress identically, for the same founding dose—we used this criterion to choose the scaling factor for the size of the transmitted viral dose. In other words, before running the actual simulations, we considered a naive host with zero immune strength and infected it with a viral dose of *v*_{0}. Then, we calculated the mean viral load during infection and used the scaling factor to calculate the transmitted viral load in the actual simulations so that the founding dose in the next infection equals this factor times the average viral abundance during the current infection. This founding dose is composed of the three most abundant viral variants in the previous infection, in proportion to their abundance.

The transmission chain was initialized in the following manner: the host was naive at the start of the infection cycle, and then accumulated specific immunity to the viral variants it had been infected with during the simulation. The model did not include ‘cross-immunity’ to viral variants that were not encountered. As expected, the wild-type variant dominates the first infection where the host is naive (figure 2). In subsequent infections, as the host mounts specific immunity to the viral strains from previous infections, new variants (at progressively increasing genetic distance from the wild-type) gain an advantage.

In the absence of immunity (figure 2*a*), the founding sequence is transmitted upon every reinfection and depletes most of the susceptible cells; therefore, there is no opportunity for the mutants to outcompete the founding sequence, since it is always initiating infections with the highest dose. Weak immunity (figure 2*b*) leads to a gradual decline of the founding sequence, which creates potential for a nearest neighbour variant to predominate, albeit over a relatively long time scale. At higher levels of immune strength, the host transmits a novel sequence at almost every reinfection (figure 2*c*,*d*). Note that evolution of immune escape variants appears to stabilize at distance *N*/2 (figure 2*d*). Appearance of the new dominant sequences might be considered as a random walk in the Hamming space (a hypercube with 2^{N} vertices—see electronic supplementary material). Note that, unlike conventional random walks (e.g. on a two-dimensional plane), this random walk stabilizes and fluctuates at around the distance from the origin that is equal to half of the size of the genome (*N*/2) owing to the fact that the majority of all possible sequences are concentrated around that distance. As the immune pressure becomes very large, it starts to suppress both the founding sequence and any variants, leading to almost complete extinction of the virus population.

Throughout the epidemics, the host gradually accumulates immunity to all variants to which it was exposed in previous infections. Unlike the viral population, in which the old variants are replaced by the new ones, diversification of the immune cell population is cumulative (figure 3), though in reality this would be modulated by the kinetics of immunological memory.

### 3.3. Dynamics of immune escape

Figure 4 (see caption for details) introduces a measure of the effectiveness of evolutionary immune escape by viruses as a function of host immune pressure. (It is important to note that this figure was obtained after extensive averaging. Individual runs are characterized by unpredictable fluctuations owing to random sampling at transmission in cases where several strains were present in equal abundances (a ‘tie-break’ algorithm). Nevertheless, all individual runs show the same qualitative trend depicted in figure 4.) Upon each reinfection, we calculated the Hamming distance between the most abundantly transmitted strain and the one in the previous cycle and plotted it as a function of time. The slope of such a graph corresponds to the speed of propagation of the virus across the genetic space, and is a measure of the strength of selection on particular escape mutants (plotted in red). A slope of 1 corresponds to the maximal attainable velocity for our model, which occurs when new variants occur at each infection. The mean viral abundance during each infection declines as immune strength increases (blue curve). The product of viral abundance and evolutionary speed (black curve) serves as a measure of the effectiveness of immune selection on fixation rates of immune escape mutants. It exhibits a threshold and peaks at an intermediate immune strength. The behaviour is analogous to a rocket moving by ejecting its mass (fuel). By converting a lot of mass into motion, the rocket might acquire a large velocity; however, its total momentum (product of mass and velocity) and kinetic energy would not be large because of the depleting mass. The momentum and kinetic energy will remain small if mass depletion is too small because the velocity would remain small. The optimal situation for maximizing the momentum or kinetic energy occurs at intermediate values of the mass ejection rate. Here, the immune system (which acts as an engine by exerting the selective pressure on the viral population) transforms the viral abundance (mass) into the movement of the virus in genetic space, with the black curve being the analogue of the momentum.

This measure (black curve) is analogous to the EIP introduced by Grenfell *et al*. (2004), which proposes a relationship between immune selection strength and viral evolutionary dynamics. Since viral abundance declines and the rate of genetic divergence increases (approaching an asymptote) with increasing immune strength, the product of these two values maximizes at an intermediate value of the immune strength and drops to zero for both small and large immune strengths. The EIP encompasses the viral dynamics acting on the two scales: within-host (selection of novel variants based on pre-existing immunity) and population-level (transmission and persistence of novel variants between hosts) scales. Both scales are required to capture the EIP explicitly. Indeed, for the acute pathogens modelled here, novel variants produced during the infection cannot survive within host because of the depletion of susceptible cells. Only by coupling with the population-level mechanism of transmission between hosts (in our case, reinfection of the host) can these variants gain the opportunity of outcompeting the dominant strains, thus enabling the virus to circulate persistently in the population.

### 3.4. Entropic effects

It is worthwhile to note that, contrary to naive expectations, even at large selective pressure (large value of the immune strength), the virus does not move to a variant that is the farthest from the initial strain. Instead, it gets ‘jammed’ around a distance of *N*/2, i.e. half the size of the genome (note that such a behaviour is not related to the tie-break mechanism or the number of transmitted strains). This arises from an entropic effect—the majority of the strains are concentrated near *N*/2. Indeed, if the genome of the initial viral strain contained all zeros, then there exist 2^{N}^{−1} strains (fully half of the total number of different variants) that have *N*/2 zeros and *N*/2 ones, i.e. with a Hamming distance of *N*/2 from the original strain. For such a strain, the chances of flipping zero to one or vice versa are the same and thus, even when the virus moves with the fastest speed possible through genetic space (i.e. when new variants are transmitted at each reinfection), it predominantly samples genotypes near the *N*/2 distance from the initial strain. Interestingly, the introduction of cross-immunity could alter such behaviour—indeed, if the host has a partial protection against *all* strains that are genetically close to the previously encountered strains, the entropic log jam would be removed and the virus could travel farther and farther from the initial strain. Our further studies of the system will incorporate these and other important effects of cross-immunity.

### 3.5. Simplifying assumptions

As with all models, we have made numerous assumptions to simplify a complex biological system. The qualitative results presented here do not change on relaxing these assumptions.

There are a number of major simplifications.

—To focus on the dynamics of viral sequences, we ignored mutations in the sequence representing the specific immune response (Kamp 2003). Such mutations might affect the dynamics of the infection by creating immune protection against the neighbouring variants of the founding sequences, i.e. effectively acting as a cross-immunity mechanism (Gog & Grenfell 2002) (see below). As noted above, this might lead to quantitatively different behaviour of viral exploration of the genetic space; however, this effect does not qualitatively affect our conclusions about the interplay between viral and immune systems.

—Only the nearest neighbours of the transmitted strains were considered—this assumption prevented us from looking at the possibility of longer jumps in genetic space. Such jumps will broaden the oscillation around

*N*/2 and will allow the virus to explore regions that are farther away from the original strain.—Variant-specific immunity is cumulative and does not wane over time; this assumption tends to slow down the propagation of the virus across the genetic space, but does not qualitatively alter the conclusions of our study. Allowing for waning of immunity would effectively lower values of the immune strength—this could play an important role in more realistic population-scale models.

—The viral population is symmetric, in that all viral genotypes have equal replication rates. This could artificially inflate fixation rates of variants (since immune escape mutations can have negative effects on intrinsic replication), quantitatively affecting the EIP profile at the extremes of viral abundances. Allowing different fitness values for different viral genotypes would break the symmetry of the evolutionary process and diminish the importance of the entropic effect.

—Our model does not incorporate cross-immunity mechanisms of protection against similar viral strains (Gog & Grenfell 2002). In the absence of cross-immunity, the virus travels to the Hamming distance of half of the genome size and stays there for a prolonged time owing to the entropic arguments that we presented in our paper. As we have shown in the electronic supplementary material, the dynamics of the virus is similar to a random walk in the Hamming space. However, as time progresses, more and more viral strains are getting the counterpart immune variants and, ultimately, the virus will be driven to the larger Hamming distances and eventually will become extinct. Cross-immunity mechanisms would require the viral population to make longer jumps in genetic space in order to escape adaptive immune responses (Gog & Grenfell 2002), and hence may decrease the frequency at which potential immune escape variants arise.

—We considered an extremely simple model of the host population, which consisted of only one host reinfecting itself, such that transmission through the rest of the population was modelled only implicitly. More realistic models, with multiple heterogeneous hosts connected in networks with different topologies, will provide more detailed insight into the influence of population-level processes on immune escape.

## 4. Summary and conclusions

We have investigated the mechanistic basis of immune-mediated changes in the genetic composition of virus populations that cause acute infections (e.g. influenza: Ferguson *et al*. 2003; Koelle *et al*. 2006; norovirus: Patel *et al*. 2009 and references therein). However, the structure we have developed would be more broadly applicable to viral immune escape in general. Our model allows for mutations in the genetic sequence of the virus and includes the dynamics of repeated viral infections as a simple representation of transmission chains. We used a simplified deterministic model consisting of a system of differential equations to describe the within-host replication dynamics of the virus, the pool of host cells susceptible to viral exploitation and the host adaptive immune response. We neglected cross-immunity effects and the within-host mutational dynamics of immune cells in order to study a very simple population model that consisted of repeated infections of a single host.

To our knowledge, our model represents one of the first attempts to study viral (and immune) variation and evolution across scales. Our work has been inspired by earlier papers on quasi-species dynamics (Kamp & Bornholdt (2002), who allowed for quasi-species dynamics, but at the within-host scale only; Read & Keeling (2006) and Keeling & Rand (1995) analysed dynamics across scales, but did not address explicit viral genetic variation within hosts). In spite of its simplicity, we believe that the model introduced here may be a good starting point for studying the rich interplay between viral and immune populations as disease spreads through a population. The ultimate goal is to obtain a *quantitative* understanding of cross-scale dynamics, immune selection and the impact of control strategies such as vaccination.

Our model showed that the escape mutants were most successful at intermediate values of immune strength. At very weak levels of immunity, the founding viral sequence is transmitted repeatedly and the variant sequences have no chance to reach fixation during within-host replication. On the other hand, strong immunity causes suppression of both the founding viral sequence and its variants, leading to extinction of the virus population. Our results bear out the crude EIP argument of Grenfell *et al*. (2004). However, we also show that (at least) two levels of organization (in-host viral/immune evolution and the dynamics of viral abundance during transmission) need to be included in even a minimal model for this process. Further, although the average dynamics echo the predictions of Grenfell *et al*. (2004), evolutionary trajectories along individual transmission chains may deviate considerably. We note that, though this is an initially deterministic model, our ‘tie-breaking’ algorithm does add some stochasticity, and the qualitative EIP result is robust to this variation. Looking at the full effects of stochasticity and the dependence of deviations from the EIP, predictions on underlying biological assumptions are interesting areas for future work.

Future studies will explore the interplay between immune selection and intrinsic differences in fitness. This will entail a careful analysis and assessment of the assumptions of our model. Going beyond the consideration of a single individual to the population level with different topologies of contacts, the incorporation of cross-immunity and waning immunity, and allowing for the simultaneous evolution of the interacting rugged fitness landscapes of viruses and immunity, will greatly enrich our mechanistic understanding of immune escape evolution.

## Acknowledgements

This research was supported by NSF grant 0742373. B.T.G. and J.O.L.-S. were also supported by the RAPIDD programme of the Science and Technology Directorate, Department of Homeland Security, and the Fogarty International Center, National Institutes of Health.

## Footnotes

- Received December 22, 2009.
- Accepted March 3, 2010.

- © 2010 The Royal Society