## Abstract

Repeated measures data for rotavirus infection in children within 14 day care centres (DCCs) in the Oxfordshire area, UK, are used to explore aspects of rotavirus transmission and immunity. A biologically realistic model for the transmission of infection is presented as a set of probability models suitable for application to the data. Two transition events are modelled separately: incidence and recovery. The complexity of the underlying mechanistic model is reflected in the choice of the fixed variables in the probability models. Parameter estimation was carried out using a Bayesian Markov chain Monte Carlo method. We use the parameter estimates obtained to build a profile of the natural history of rotavirus reinfection in an individual child. We infer that rotavirus transmission in children in DCCs is dependent on the DCC prevalence, with symptomatic infection of longer duration, but no more infectious per day of infectious period, than asymptomatic infection. There was evidence that a recent previous infection reduces the risk of disease and, to a lesser extent, reinfection, but not duration of infection. The results provide evidence that partial immunity to rotavirus infection develops over several time scales.

## 1. Introduction

Rotavirus infection is the single most important cause of infectious diarrhoea and death globally in infants and young children (Offit 1998). It is highly transmissible in developed and developing countries alike and virtually all children will have experienced at least one infection by the age of 2 years (Fischer *et al*. 2002) and one episode of rotavirus gastroenteritis by the age of 5 years (Parashar *et al*. 2003). Parashar *et al*. (2006) estimate that each year rotavirus infection is responsible for roughly 600 000 deaths globally.

Reinfection is common in children (Offit 1998; Fischer *et al*. 2002) and adults (Griffin *et al*. 2002; Anderson & Weber 2004). Intestinal secretory antibodies, whose responses are of short duration, are pivotal to protection against rotavirus reinfection (Franco & Greenberg 2001). This would suggest that (even strain specific) immunity against reinfection is short lived. There is, however, good evidence that repeated rotavirus infection cumulatively protects against infection and associated diarrhoeal disease (Velazquez *et al*. 1996). Furthermore, protection has been shown to correlate with serum IgA antibody titre (Velázquez *et al*. 2000), such that two infections (regardless of symptoms) produce a titre that protects completely against subsequent moderate-to-severe diarrhoea, less so against mild illness and least against asymptomatic infection; the duration of this effect is unknown.

At the individual level, the natural history of infection has yet to be adequately characterized for the purposes of predicting population-level transmission dynamics. Relative risk of reinfection (based on the history of previous infection) was estimated by Fischer *et al*. (2002) using a birth cohort, but their approach did not consider the transmission dynamics of the infection. Individual-level risk depends on population-level factors, such as infection prevalence and seasonal variation in contact rates, which interact with the natural history of infection at the individual level (e.g. duration of infection and immunity).

Here we combine population and individual levels using a generalized mechanistic mathematical model in conjunction with follow-up data. The model takes into account possible effects of an individual child's infection history on the risk of infection, duration of infection, infectiousness and severity of symptoms. This is performed within the dynamic context of the time-varying exposure levels arising from infection within day care centres (DCCs). Parameter estimates are obtained using a Markov chain Monte Carlo (MCMC) estimation method, which has the provision for variable intersample periods. This analysis is designed to establish the most likely natural history of infection and estimate the parameters connected with the resultant profile, thereby characterizing partial immunity.

## 2. Materials and methods

### 2.1 Rotavirus data

Two cohorts of children aged 6–24 months attending Oxfordshire DCCs were enrolled over consecutive rotavirus seasons in 1998–1999 and 1999–2000 (Buttery *et al*. submitted). DCCs were visited twice weekly and faecal samples collected if available in the previous 24 hours. The children were prospectively followed for episodes of diarrhoea or gastroenteritis, upon which additional stool samples were collected. The samples were assayed for rotavirus by reverse transcription and polymerase chain reaction detecting the rotaviral *VP1* (RNA polymerase) gene. One hundred and two children from 11 DCCs participated in the 1998–1999 season, and 80 children from 10 DCCs in 1999–2000.

There were 130 episodes of diarrhoea reported in 80 children. There were 68 episodes of rotavirus diarrhoea, and 244 episodes of asymptomatic carriage (ratio 1 : 3.6), representing a rate of 0.37 diarrhoeal and 1.34 asymptomatic infections per child season. The median interval between commencing DCC care and first episode of diarrhoea was 188 days (range 4–641). Most diarrhoeal episodes were mild, fewer than half of the responding families (41 out of 86, 47.6%) saw a primary care practitioner, and no child visited hospital for their illness. Fever was noted in only 17 out of 86 responses. All but four children had at least one positive rotavirus sample. Thirteen children had at least three separate episodes of infection. The raw data are in the form of 2948 sampling events from 185 children within 2 cohorts within 15 DCCs. We discounted one DCC since there were only three samples for just a single child. The data from the remaining 14 DCCs are represented on a timeline for each child (figure 1).

### 2.2 Model structure

We use a conceptual model similar to that developed and analysed in White & Medley (1998; figure 2). At any point in time, children are either uninfected or infected. Within the uninfected population there is a distribution of susceptibility to infection that is dependent on the time since recovery from previous infection. Note that we do not know the infection history of individuals from birth, thus we cannot explicitly model those never infected. The profile of susceptibility in the uninfected population changes over time, as a result of recruitment of individuals just recovered from an infection and their changing susceptibility to infection and disease over time since recovery.

The infected population comprises children with disease (an infection that involves an episode of diarrhoea is termed a symptomatic infection), and without (non-diarrhoeal infection). Upon infection there is some probability that the infection will involve symptoms, which can be dependent on the time between recovery from previous and current infections. The numbers in each of the infected categories change over time as a result of recruitment following infection of the uninfecteds.

The incidence per child per unit time is the (instantaneous) rate at which uninfected children become infected. The magnitude of this incidence is related to the proportion of infected children sharing the DCC. The model framework also allows for differential infectiousness between diseased and non-diseased children. The rate (per child per unit time) by which infected children move into the uninfected class (recovery) is assumed dependent on disease state, and also related to the time since previous infection. Uninfected individuals become infected at a rate dependent on the time elapsed since their previous infection (partial, temporary immunity) and the level and type (clinical versus subclinical) of infection to which they are exposed. We also include infection by sources outside of the DCC.

A family of models can be derived from this structure by making different assumptions about the dependence of susceptibility and recovery rate on disease state and/or time since previous infection. The model assumes all processes to be independent of child age, gender, the number of previous infections and the genotype of current and past infections. This mechanistic structure can be expressed in the form of a set of probability models. The number of previous infections was not measured in the study, since the children were not followed up from birth. We therefore use time since previous infection as a proxy for immunity level.

We therefore have a family of models for considering the mechanisms associated with each of incidence (models 1–5) and recovery (models 6–8). Model 1 does not include the effect of time since previous infection on susceptibility and only assumes that it is dependent on the nursery-level prevalence of infection regardless of symptoms. Model 1 is extended to model 2 by including different infectiousness for symptomatic versus asymptomatic infections. Model 1 is extended to model 3 by including the effect of time since previous infection on susceptibility. Model 1 is extended to model 4 by including a differential risk of symptomatic infection compared with asymptomatic infection based on the time since previous infection. Model 5 is a combination of models 4 and 3. Model 6 does not include the effect of time since previous infection or the existence of symptoms during the infection on duration of infection. Model 6 is extended to model 7 by including different durations of infection for symptomatic versus asymptomatic infections. Model 7 is extended to model 8 by including the effect of time since previous infection on duration of infection. The models are summarized in table 1.

The parameters were estimated using MCMC within WinBUGS v. 1.4.1 (Spiegelhalter *et al*. 2000). We assumed vague prior distributions for all parameters (i.e. a uniform distribution from zero to unity for parameters defined in that range and a uniform distribution from 0 to 1000 for the other parameters, since they were expected to have values much less than 1000; table 2).

### 2.3 Probability models

For sample *j* of child *i* in DCC *k*, *s*_{ijk}=1 if the sample is positive and *s*_{ijk}=0 if negative for rotavirus. For each sample, we recognize three measures of time: the number of weeks since the beginning of the study, *t*_{ijk}; the days since the previous sample, *δt*_{ijk}; and the weeks since the previous infection, *τ*_{ijk}. The proportions of the DCC experiencing symptomatic and asymptomatic infections in each week are given by *y*_{d}(*t*_{ijk},*k*) and *y*_{nd}(*t*_{ijk},*k*), respectively. The transition events of incidence and recovery are denoted as *I*_{ijk} and *R*_{ijk}, respectively. Table 3 summarizes the relationship between *s*_{ijk} and *s*_{i(j−1)k} and the transition indicator variables.

The function *m*_{ijk} allows us to estimate the risk of symptoms during an infection, *θ*(*τ*_{ijk}),(2.1)The force of infection, *λ*_{ijk}, acting on child *i* in DCC *k* on day *t*_{ijk} is the hazard of becoming infected. The force of infection also includes a constant non-DCC component, *λ*_{0}. Then the force of infection can be constant, if *β*=0, (for all DCCs or for each DCC) or, if *β*>0, it can be dependent on the proportion of infected children with and without diarrhoea in the DCC. Children with diarrhoea and those without are equally infectious if *η*=0.5. If *η*>0.5, children with diarrhoea are more infectious than those without and are less if *η*<0.5.(2.2)The recovery rate (the inverse of the average duration of infection, *δ*_{ijk}) of child *i* at time *t*_{ijk} without the effect of previous infection is given by(2.3)Children with diarrhoea and those without are assumed to have an equal duration of infection of *δ*_{f} if *ρ*=0.5. If *ρ*>0.5, children with diarrhoea recover more quickly than those without and more slowly if *ρ*<0.5.

We use a survival approach (Clayton & Hills 1998) to model the transition events of incidence and recovery. The transition probabilities for incidence, *I*_{ijk}, and recovery, *R*_{ijk}, are given by(2.4)and(2.5)The function *m*_{ijk} of *θ* is included in equation (2.4) to allow us to differentiate between the incidence of symptomatic and asymptomatic infections. Thus, given that incidence has occurred, the probability that it is symptomatic is given by *θ*.

The incidence function refers only to samples from children with a previous negative sample, and thus the event of incidence (changing from negative to positive) is possible between the previous and current samples. Similarly, the recovery function refers only to samples from children with a previous positive sample.

The function *f*(*τ*_{ijk}) allows us to explore the effect of the time elapsed since a previous infection, *τ*_{ijk}, on the susceptibility of child *i* at time *t*_{ijk} to further infection. The function *g*(*τ*_{ijk}) allows us to explore the effect of the time elapsed since a previous infection, *τ*_{ijk}, on the duration of the current infection. A number of choices were considered for these functions and will be discussed in §2.4.

### 2.4 Data preparation

The time between samplings is variable (figure 3) and some intersample periods are too large to consider viable as repeated samples. That is, a child could have been reinfected a number of times during such periods. If we consider becoming infected as an event (*I*_{ijk}), then the first sample (*j*=1) for child *k* is relevant only in that it informs the value of *I*_{i2k} for the second sample for the incidence dataset. Then all first sample values of *I*_{i1k} are unknown and not present in the final dataset. Also, all samples that occur too long after the last one (say Δ*t*_{max} days) are discarded. If Δ*t*_{max}=21 days approximately 90% of the sampling intervals are included; and if Δ*t*_{max}=14 days approximately 68% are included. The proportions *y*_{d}(*t*_{ijk},*k*) and *y*_{nd}(*t*_{ijk},*k*) were calculated before the samples were excluded.

If the intersample period is less than Δ*t*_{max}, *I*_{ijk}=1 if the current sample is positive and the prior sample negative, and *I*_{ijk}=0 if the current and prior samples are negative. If the child was infected on their previous sample, then they are considered unexposed to infection and therefore the current value of *I*_{ijk} would be irrelevant and therefore not present in the dataset.

In order to consider the effect of low sensitivity (i.e. false negatives) of the test, we prepared two further datasets, *fn14-* and *fn7-data*, with the original dataset being *fn0-data*. In *fn14-data*, for a negative sample, if the previous and following samples are positive and are each within 14 days of the sample in question the sample was changed to positive. This occurred in 411 out of 2948 measurements. This was repeated with an interval of plus or minus 7 days to produce *fn7-data*. This occurred in 238 out of 2948 measurements. We report results in the text only for the *fn7-data* for clarity, but report results for all three datasets graphically. Table 4 gives the numbers of transitions for each dataset.

The primary aim of this study was to explore the effect of time elapsed since a previous infection on the three main characteristics—risk of infection, risk of disease and duration of infection—and therefore characterize partial immunity to rotavirus infection in the children participating in the study. In order to achieve this, only uninterrupted (i.e. each sample was within 14 days of the previous sample) data sequences occurring after their first measured infection were included for each child. Then, the time since previous infection could be measured. It was categorized into five categories (, , , and weeks (there were no data for the final category for the recovery models)) to obtain profiles for each of the three characteristics.

### 2.5 Model selection and convergence

For each model, we computed the value of the deviance information criterion (DIC; Spiegelhalter *et al*. 2002). The model with the smallest DIC is the model that would best predict a replicate dataset of the same structure as that currently observed. We also plotted the distribution of the Pearson residuals (Green *et al*. 2004) to assess the validity of the models. A plot of the residuals can be obtained by arranging the predicted probabilities for each data point in ascending order with their corresponding Pearson residual, then separating them into equal sized groups and taking the sum of the residuals, *S*_{R}, and the mean of the predicted probability, *μ*_{P}. If *S*_{R} is plotted against *μ*_{P}, the points should be randomly distributed around the *x*-axis.

Convergence was tested in each case using the Gelman–Rubin convergence diagnostic (Brooks & Gelman 1998) calculated within WinBUGS (Spiegelhalter *et al*. 2002). For each model, we ran three independent chains for 10 000 iterations following a burn in the period of 10 000 iterations (as this was sufficient to achieve convergence).

## 3. Results

The parameter estimates and the DIC of the nested models applied to the *fn7-data* are summarized in table 5.

Model 2 had a slightly higher DIC than model 1, also the confidence interval of the parameter *η* (altered infectiousness) included 0.5. Thus indicating that the inclusion of heterogeneous infectiousness based on symptoms does not improve the model fit. Model 6 had a higher DIC than model 7, also the confidence interval for parameter *ρ* (altered duration of infection) did not include 0.5. Thus indicating that the inclusion of heterogeneous duration of infection based on symptoms does improve the model fit.

Model 5 had a lower DIC than both models 3 and 4. Thus indicating that the inclusion of dependence of both susceptibility and risk of symptomatic infection on time since previous infection improves the model fit. The parameter *β*_{i} is the product of *β* and *f*(*τ*_{ijk}) from equations (2.1) and (2.3). The parameter *θ*_{i} is *θ*(*τ*_{ijk}) from equation (2.4). Except for *fn0-data*, when *βf*(*τ*_{ijk}) is plotted against *τ*_{ijk}, the risk of infection reaches a maximum at approximately 5 weeks since previous infection (figure 4*a*) and then falls. When *θ*(*τ*_{ijk}) is plotted against *τ*_{ijk}, the risk of symptoms given infection increases as time since infection increases (figure 4*b*).

Model 8 had a lower DIC than both models 6 and 7, thus indicating that the inclusion of dependence of duration of infection on time since previous infection improves the model fit. The parameter *δ*_{i} is the product of *δ*_{f} and *g*(*τ*_{ijk}) from equations (2.2) and (2.5). When the duration of infection, *δ*_{f}*g*(*τ*_{ijk}), is plotted against *τ*_{ijk} it decreases as time since infection increases (figure 4*c*).

## 4. Discussion

The application of models to data from repeated infections of rotavirus within children within DCCs has resulted in the estimation of basic epidemiological parameters and the development of hypotheses on the protective effect of previous infection against infection and disease. The duration of infection is estimated at approximately 15 days and appears to be highly variable even for repeated infections within the same child. Symptomatic infection was estimated to last about three times as long (23 days) as asymptomatic infection (8 days). Note, however, that the prevalence of asymptomatic children was over three times higher than of symptomatic children. The analysis indicates that children with symptomatic infection are no more infectious than those with asymptomatic infection. This seems to contradict experimental work that demonstrated higher levels of shedding in those individuals with more severe infections (Kang *et al*. 2004). This contradiction could be for a number of reasons: the number of asymptomatic infections may be too low to identify this effect; the definition of symptomatic as an infection involving symptoms may be too broad; the removal of children from their DCC due to symptoms could have occurred. Rotavirus reinfection within a child is dependent on the density of infection within his/her DCC with an infection coming from outside the DCC very rarely (table 5, estimates of *λ*_{0}).

Previous attempts at investigating the role of acquired immunity (i.e. previous infection) in rotavirus infection have failed to incorporate the transmission dynamic effects included in the model framework used here. In particular, the risk of infection of an individual is determined by both their immune status and their exposure, and not including the latter will result in inaccurate estimates. Unfortunately, information on the first infection, the total number of previous infections and their effect on subsequent infections cannot be obtained since the data are not from a birth cohort. The transmission dynamics of rotavirus repeated infections outside the high prevalence season cannot be explored using the data presented here. The transmission dynamics associated with reinfection, partial and temporary immunity are, generally, highly nonlinear (Gomes *et al*. 2004), and even more so if an effect of dose is included (Gomes *et al*. 2005). Community-based longitudinal data and their appropriate analysis are clearly required if the effect of a given vaccine and vaccination programme is to be understood.

Longitudinal, individual-based data in relatively small populations are increasingly being used to study infection dynamics (Auranen *et al*. 2000, 2004; Eerola *et al*. 2003; Smith & Vounatsou 2003; Basáñez *et al*. 2004; Cooper & Lipsitch 2004; Melegaro *et al*. 2004; Liu *et al*. 2005). Analysis of such data presents several methodological problems. Generally, the estimates of the two processes of incidence and recovery are likely to be highly correlated. If, however, the sampling is dense and testing sufficiently sensitive, then incidence and recovery can be treated separately, as here.

To consider immunity, it is necessary to have information on the history of infection in an individual. The simplest assumption is that an infection ‘resets’ the immune system to maximum. In this case only data from children with multiple infections and only their uninterrupted follow-up since their first measured infection can be used. To consider concepts such as cumulative and cross-immunity, follow-up serological and genotypic data, respectively, from a birth cohort are required.

The maximum measured time since previous infection from our datasets is 119 days and cellular and serological immune responses to rotavirus infection in children have been observed to remain for a few months post infection (Makela *et al*. 2004), thus it should be possible to detect waning effects of previous infection with this dataset. However, the results presented here should be considered within the context of a number of confounding factors. Firstly, the results have not been corrected for age. Secondly, an individual's response to exposure could be dose dependent (if they experience multiple exposures over a short time period or are exposed to a high initial dose) and, since rotavirus is seasonal, responses at the end of the season could be different from those earlier. This effect would be correlated with time since infection. Thirdly, the sensitivity of the diagnostic test for the presence of rotavirus is not 100% and therefore there are a number of choices for the definition of a continued infection.

Time since previous infection, *τ*, was used as a proxy for immunity in our analysis, and the functions *θ*, *f* and *g* of *τ* quantify the risk of symptomatic disease, risk of infection and duration of infection, respectively. The analysis strongly indicates that immunity to rotavirus manifests itself as a waning resistance to disease rather than infection *per se*. That is, if a child has recently recovered from an infection, its next infection will be less likely to involve symptoms if it occurs sooner rather than later. A schematic of partial immunity over time (figure 5) derived from the estimates for models 5 and 8 (table 4) suggests three types of partial immunity developing over different time scales. Immunity to infection (resistance to infection of any kind) is lost within a month and then regained after a few more weeks. This could represent a temporary disruption in protection against disease due to recent infection. Immunity to disease in the form of resistance to symptomatic infection is gained within a couple of weeks and is still waning after three months. Immunity to disease in the form of altered duration of infection is still being gained after two months. A challenge study in pigtailed macaques demonstrated differing profiles for IgA, IgG and IgM antibodies over time since challenge (Westerman *et al*. 2005). Although this occurs over a period of days rather than weeks in animals rather than humans, it does provide evidence that immune responses evolve over different time scales *in vivo* in response to challenge.

The third plot of figure 5 (dashed line) indicates the unexpected result that the duration of infection decreases with time since previous infection. That is, higher immunity correlates with longer durations of infection. This is a counter intuitive result that has also been reported indirectly for malaria, where duration of infection was positively correlated with age (a widely accepted correlate of immunity for this disease; Smith & Vounatsou 2003). A simple exploration of the data implied that right censoring is not responsible for this result. If a series of repeated measures for an individual child ended in a positive value, this infection (a series of consecutive positive values) was excluded from the dataset. An identical analysis performed on this reduced dataset gave similar results. A possible explanation (as an alternative to the long-term development of some sort of immunity against disease) for the result is that the sensitivity of the test combined with heterogeneous shedding has resulted in continued infections being defined as series of repeated infections. Richardson *et al*. (1998) observed continued hospital infections of rotavirus which included consecutive negative samples. If this behaviour also occurs for non-severe and asymptomatic infections, a series of shedding events of decreasing duration would be observed—a profile that would correspond to the third profile of figure 5. Another explanation for this result could be that children who are more susceptible (and therefore become reinfected shortly after recovery) also experience longer infections; however, plots of child-level transmission against time since infection and duration of infection did not support this notion.

The sensitivity of detection of rotavirus infection is a potentially significant confounding factor for the parameter estimation process. The definition of continued infection being a series of samples that end with two sequential negative samples has been used before (see Mohan & Haque 2003). If it is assumed that a negative result is indeed positive when sandwiched between two positive results within a given time period, the profile of past infection is changed for that child. For the case where additional data on the existence of current or recent infection such as antibody level are available, the use of data augmentation approaches (Eerola *et al*. 2003) can be used to estimate the extent of and account for measurement error. For example antibody level rather than time since previous infection could be used as a measure of immunity level within the individual.

The analysis of follow-up data results in different profiles for the natural history of rotavirus reinfection (depending on the definition of a continued infection). The estimates of the transmission coefficient and the average duration of infection are robust to alternative assumptions on the nature of immunity. However, there is very high variation in these values and larger numbers of measurements per child would be necessary to determine the contribution of variation at the child level. The results suggest that there may be some protection against symptomatic infection, and that symptomatic infections have a longer duration than asymptomatic infections. It is possible that we are observing only minor changes in the immune response due to reinfection, when the most significant change would have occurred during the first infection. Only a birth cohort follow-up study would provide the necessary data to test this hypothesis.

These results are of particular interest in view of the likely introduction of live attenuated rotavirus vaccines in many developed countries where DCCs are prevalent. Two candidate vaccines have completed large-scale safety and efficacy clinical trials (Rotarix, GSK Vaccines, and Rotateq, Merck Vaccines), with good safety and efficacy profiles in industrialized countries and in Latin America (Ruiz-Palacios *et al*. 2006; Vesikari *et al*. 2006; Grimwood & Buttery 2007; World Health Organization 2007). Like natural infection, vaccines do not protect against infection but do protect against disease, with repeated rotavirus exposures probable for most children post-vaccination. If the transmission dynamics of wild-type rotavirus will be influenced by the vaccine introduction, especially if a mixed immunized and non-immunized cohort exists in a high-transmission environment such as a DCC, is unclear. A further confounder is likely to be horizontal transmission of vaccine virus, documented at least in previous rotavirus vaccine trials (Jiang *et al*. 2002). It remains likely that the success of rotavirus vaccine at an individual host and population level relies upon the regular re-exposure of vaccine recipients to asymptomatic infection to maintain immunity, at least in the early years.

## Acknowledgments

The authors gratefully acknowledge the support and advice from Prof. Paul Burton (Department of Health Sciences, University of Leicester, UK) on the model formulation and coding in WinBUGS. They also gratefully acknowledge the financial support of the Wellcome Trust, grant nos 061584 and 076278, the Calouste Gulbenkian Foundation (FCG), the Portuguese Research Council (FCT) and the European Commission, grant MEXT-CT-2004-14338. The original study of rotavirus infection in community child care centres was supported by NHS Executive South East Project Grant Scheme (grant no. SPGS769).

## Footnotes

- Received March 20, 2008.
- Accepted April 22, 2008.

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

- Copyright © 2008 The Royal Society