## Abstract

Seasonal influenza appears as annual oscillations in temperate regions of the world, yet little is known as to what drives these annual outbreaks and what factors are responsible for their inter-annual variability. Recent studies suggest that weather variables, such as absolute humidity, are the key drivers of annual influenza outbreaks. The rapid, punctuated, antigenic evolution of the influenza virus is another major factor. We present a new framework for modelling seasonal influenza based on a discrete-time, age-of-infection, epidemic model, which allows the calculation of the model's likelihood function in closed form. This framework may be used to perform model inference and parameter estimation rigorously. The modelling approach allows us to fit 11 years of Israeli influenza data, with the best models fitting the data with unusually high correlations in which *r* > 0.9. We show that using actual weather to modulate influenza transmission rate gives better results than using the inter-annual means of the weather variables, providing strong support for the role of weather in shaping the dynamics of influenza. This conclusion remains valid even when incorporating a more realistic depiction of the decay of immunity at the population level, which allows for discrete changes in immunity from year to year.

## 1. Introduction

Annual influenza epidemics in temperate regions are dominated by a distinct seasonal pattern, generally peaking in winter each year and lasting from one to four months [1–3]. Even with this strong seasonal periodicity, recurrent annual influenza epidemics exhibit substantial inter-annual variation, as manifested in the variability of the timing and amplitude of the influenza outbreaks each year (figure 1*a*; Huppert *et al*. [4]). From a healthcare system point of view, predictions of the magnitude and timing of the upcoming influenza epidemic could greatly assist in early preparations and developing mitigation strategies. It is thus not surprising that over the past decade, several key studies have attempted to model and identify the cause of the inter-annual variability of seasonal influenza [5–10]. Here, we advance this goal by developing a new seasonally forced model for studying seasonal influenza, which is used to analyse high-quality data collected in Israel.

The main seasonal drivers that control influenza are still poorly understood. A number of different theories have been proposed in the past. These include [11,12] (i) seasonal variation in contact rates related to school terms and indoor crowding in the winter [13–16], (ii) seasonal variation in the functionality of the immune system related to the dark/light cycle [17–19], and (iii) seasonal variation in the viability and transmissibility of the influenza virus related to weather conditions [20–23]. Recent studies have focused their attention on this last category, particularly on the effect of absolute humidity. Laboratory results suggest that variation in absolute humidity explains the observed variation in influenza virus survival and transmission rates better than temperature and relative humidity [24,25]. At the population level, it was demonstrated that absolute humidity explains the pattern of observed influenza mortality across different states of the USA [7,26]. However, most of these theories are controversial and currently ‘our understanding of the mechanisms underlying influenza seasonal variation remains very limited’ [7, p. 1] and ‘hazy at best’ [12, p. 3645].

Through studies of other recurrent infectious diseases, it is now well understood that a necessary condition for periodic outbreaks is a continuous source of new susceptible individuals into the population [27,28]. In the case of influenza, this source derives from the rapid antigenic evolution of the different influenza virus types/subtypes circulating in the human population (A-H3N2, A-H1N1 and B) [11,29]. In general, infected individuals eventually recover from the disease gaining future immunity to the current strain. These recovered individuals will nevertheless become susceptible to any new influenza strains that evolve via antigenic drift. This source of new susceptibles provides the fuel for a new epidemic each year. Accumulating evidence from theoretical and experimental studies suggests that the antigenic evolution of the influenza virus is not necessarily gradual but sometimes punctuated, and characterized by the emergence of new antigenic clusters that replace each other every 2–8 years [30–32].

Immunity to seasonal influenza varies from year to year as a function of the particular strains of influenza virus circulating each year and the measure of antigenic change they accumulated since their last appearance in the population. The standard approach for incorporating antigenic drift into an epidemic model is by making use of the classical susceptible–infected–recovered–susceptible (SIRS) equations [33]. In this model paradigm, individuals who have recovered from influenza are temporarily immune from future infection. Eventually, recovered individuals become susceptible again, assuming a constant rate of immunity loss. However, in this approach, it is difficult to take into account the various circulating and coexisting strains of the different influenza types while keeping track of the population's different levels of immunity to each strain [34]. In addition, the assumption of a constant rate of immunity loss cannot account for punctuated antigenic evolution of the virus. This complicated reality can be simplified by considering a single dominant virus type in each influenza season; an assumption that is backed up for the most part by surveillance data (figure 1*a*). By incorporating surges of immunity loss at the beginning of each influenza season, it is possible to account for alternating dominant influenza types and punctuated antigenic evolution, while avoiding the complications of a multistrain model [5].

Here, we are interested in understanding the processes that give rise to the observed inter-annual variation in seasonal influenza epidemics. In particular, we wish to investigate the effect of inter-annual variability in weather conditions and immunity loss. For this purpose, we develop a stochastic, seasonally forced SIRS modelling framework, incorporating the different biological mechanisms described earlier. The framework introduces demographic and environmental process stochasticity, as well as a stochastic observation component. Given that the model yields an explicit analytical likelihood function to work with, we avoid the need to make use of complicated particle filtering approaches [35–37]. The likelihood allows us to perform inference and parameter estimation rigorously. The framework is used to generate model variants which we fit to incidence data of seasonal influenza in order to (i) test the capacity of different models to quantitatively reproduce the dynamics of seasonal influenza, (ii) compare the different models using model selection procedures so as to assess the role of weather and loss of immunity in inducing the observed influenza dynamics, and (iii) estimate important epidemiological parameters.

## 2. Methods

### 2.1. The data

The influenza data used in this paper consist of an 11 year time-series, between 1 July 1998 and 30 June 2009, of daily influenza-like illness (ILI) diagnoses in Israel (figure 1*a*). The data were collected by Israel's Maccabi health maintenance organization (HMO). The proportion of Maccabi members within the Israeli population is approximately 24 per cent and its coverage is nationwide. Laboratory tests of sampled ILI cases from sentinel clinics have shown the ILI dataset to be highly correlated with incidence of confirmed influenza cases [38] (see the electronic supplementary material, section C). The raw ILI data were preprocessed first by removing repeated visits and then by fixing the ‘weekend effect’. For the latter, we replaced the incidence during weekends and holidays with the interpolated incidence of the preceding and following days. The ILI incidence was then scaled to represent the nationwide incidence (see the electronic supplementary material, section A for full details).

We noted two distinct periods in the ILI dataset. During the first four seasons (1998–2002), the Maccabi HMO was using the ICD-9 coding system (with the code 487.1 defined as ILI). At the beginning of the fifth season, the Maccabi HMO began to use a more specific coding system, designed to improve the quality of the diagnoses. By comparing the number of ILI cases during summer in the two periods, it is evident that the change in coding resulted in a narrower definition of ILI. This difference between the two periods had to be considered when using the ILI data for inference (see §2.2.4).

Time-series of temperature and absolute humidity in Tel Aviv were obtained for the same time period as that of the ILI dataset (figure 1*b,c*). The absolute humidity time-series was constructed from measurements of the mean daily temperature and relative humidity, as measured by meteorological stations of the Israeli Metrological Society and pollution monitoring stations of the Israeli Ministry of Environmental Protection [39,40]. The city of Tel Aviv's weather data were chosen because they are highly representative of the weather experienced by most of Israel's population. Tel Aviv is located at the centre of Israel's long coastline with the Mediterranean Sea, and is the largest metropolitan area in Israel. The population of greater Tel Aviv alone constitutes approximately 40 per cent of the total population of Israel. In addition, Tel Aviv serves as a major hub, with people commuting in daily for work and entertainment from around the country, thereby creating a strong mix between the population of Tel Aviv and the population of other Israeli cities. Indeed, the ILI time-series for Tel Aviv is highly correlated with that of the rest of Israel (Pearson correlation: *r* = 0.97, *p* < 0.001), indicating the strong mixing between the subpopulations over the country.

### 2.2. Modelling and inference

#### 2.2.1. The modelling framework

In this paper, we extend the stochastic, discrete-time, age-of-infection model framework, first introduced in Katriel e*t al.* [41] to study single epidemics, in order to model seasonal influenza over many years [41]. The general SIRS model assumes each individual at any time is in one of three classes: susceptible, infected or recovered. Susceptible individuals may move to the infected class upon coming into contact with an infected individual. Infected individuals eventually recover and then gradually lose their immunity, thereby becoming susceptible once again and closing the SIRS loop. An initial formulation of the model is as follows:
2.1where
2.2and
2.3Here, *S*(*t*) and *R*(*t*) are the number of susceptible and recovered individuals on day *t*, respectively. Note that *i*(*t*) represents the number of *newly* infected individuals on day *t* and not the total number of infectives on day *t* as in the usual formulation of the SIRS model. *i*(*t*) is taken to be a random variable having a Poisson distribution with mean *E*(*i*(*t*)) = *λ*(*t*). The Poisson distribution was derived as an approximation to a generalization of the classical chain binomial model and the variability it introduces accounts for demographic stochasticity in the number of newly infected (see [41,42]).

In this model formulation, infected individuals are assumed to remain infective for a duration of *d* days, and *i*(*t* − *τ*) represents the number of infected individuals whose age-of-infection is *τ* days. The numbers *P _{τ}*(1≤

*τ*≤

*d*) represent the generation-time distribution;

*P*is the fraction of infections generated by an infective person that occur on day

_{τ}*τ*of infection (thus ). Here, it is assumed that

*P*has a discretized gamma distribution with

_{τ}*d*= 7 and mean generation time of 2.7 days, with a variance of 1.8 days (see [9,42] and electronic supplementary material, section D). The parameter

*R*

_{0}(

*t*) represents the basic reproductive number on day

*t. R*

_{0}is defined as the mean number of secondary infections caused by a single infective individual in a totally susceptible population. In a partially susceptible population, the mean number of secondary infections produced during a single day by an individual who was infected

*τ*days ago is given by

*S*(

*t*− 1)/

*N*(

*t*− 1) ·

*R*

_{0}(

*t*) ·

*P*, where

_{τ}*N*(

*t*) is the size of the population on day

*t*. The model assumes homogeneous mixing in the population; the expected number of new infectives on each day can be understood as the sum of infections caused by all infected individuals. In addition, the model assumes that each day,

*i*

^{0}individuals become infected by individuals outside of the modelled population (representing the arrival of infected individuals from abroad). There are two sources of new susceptibles in the model. The first is population growth given by daily birth rates. The daily birth and mortality rates (

*β*(

*t*) and

*μ*(

*t*)) were interpolated from the annual birth and mortality rates in Israel for the modelled period [43]. A second source of new susceptibles is due to loss of immunity, whose rate is given by

*γ*(

*t*), the portion of susceptible individuals that lose their immunity on day

*t*. The initial conditions for the model are the incidence in the first

*d*days (

*i*(1) …

*i*(

*d*)) and the fraction of susceptibles in the population

*S*

_{0}≡

*S*(0)/

*N*(0) at the beginning of the modelling period.

We extended the model formulation further by introducing additional environmental stochasticity into the infection process. Similar to He *et al*. [37], we incorporated a stochastic factor of the form *ξ* ∼ *Γ*(*k*, 1/*k*) to multiply the expected incidence on day *t.* The expected incidence on day *t* thus itself becomes a stochastic variable; with mean and a constant coefficient of variation . The composition of the gamma and the Poisson distributions results in a new stochastic equation in place of equation (2.1) [44]:
2.4where 1/*k* is a clumping factor for the distribution; as 1/*k* → 0, the model reverts back to the previous formulation (with no environmental stochasticity).

A stochastic observation model was incorporated based on the probability of reporting the ILI incidence data ILI(*t*), from the actual influenza incidence *i*(*t*)
2.5Here, *q* is the reporting rate—that is the probability that an influenza case will be reported as an ILI case. It is a product of the probabilities of developing symptoms, seeking consultation and being assigned with ILI. The parameter *θ* represents a time delay between the time of infection and the time the ILI diagnosis is given (measured in days). Here, we set *θ* = 1, which was found to provide optimum results (see §4). By combining the infection model (equation (2.4)) with the observation model (equation (2.5)), we obtain the following model for ILI incidence, which accounts for both the infection and observation processes (see derivation in electronic supplementary material, section E):
2.6We thus have *E*(ILI(*t*)) = *q* · *E*(*i*(*t* − *θ*)) and Var(ILI(*t*)) = *q* · Var(*i*(*t* − *τ*)), whereas the clumping factor 1/*k* remains unchanged. The stochasticity of the observation model (equation (2.5)) inflates the variance of the combined process and observation model (equation (2.6)). With a simple observation relationship ILI(*t*) = *i*(*t* − *θ*)/*q*, the combined model would have a variance of *q*^{2} · Var(*i*(*t*)) < *q* · Var(*i*(*t*)) as *q* < 1.

We developed model variations from the SIRS model framework in two ways:

(i) By incorporating different weather signals to force the seasonal variation in

*R*_{0}.(ii) By incorporating alternative methods for modelling the loss of immunity

*γ*.

#### 2.2.2. Modelling seasonality

A common approach for incorporating seasonality in epidemiological models is to apply annual seasonal forcing by making the reproductive number, *R*_{0}, time-dependent. Here, in order to study the hypothesis that variation in the reproductive number is caused by variations in weather, we used the expression
2.7The parameter is the value of *R*_{0} in the absence of a seasonal effect. and are the absolute humidity and temperature data (either actual or average; see below) after normalization to have zero mean and unit variance, and after smoothing using an exponential moving average with a 3 day time period to allow recent history to effect *R*_{0} (see the electronic supplementary material, section B). The parameters *δ*_{AH} and *δ*_{T} represent the strength of the seasonal forcing, and are measures of the impact of absolute humidity and temperature on *R*_{0}, respectively. Here, it is assumed that , *δ*_{AH} and *δ*_{T} are constant throughout the modelled period. The difference in *R*_{0}(*t*) between years thus depends only on the variations in the seasonal weather signals.

Using expression (2.7), the following seasonal signals were evaluated:

(1) Absolute humidity (AH): daily mean absolute humidity.

(2) Average absolute humidity (): daily absolute humidity averaged for each day of the year over all 11 seasons (inter-annual mean absolute humidity).

(3) Temperature (

*T*)*:*daily mean temperature.(4) Average temperature (): daily temperature averaged for each day of the year over all 11 seasons (inter-annual mean temperature).

(5) Absolute humidity and temperature (AH +

*T*)*:*using both (1) and (3).(6) Average absolute humidity and temperature ()

*:*using both (2) and (4).

Comparing the performance of average weather data with the actual data is meant to inform us whether inter-annual variation in weather has any role in determining the outcome of the observed ILI data.

#### 2.2.3. Modelling the loss of immunity

We compared two approaches for modelling the decay of immunity at the population level:

(i) Classical SIRS. In this model paradigm, the immune population size is assumed to decay at a constant rate, corresponding to an exponentially distributed period of immunity. In this case, we set

*γ*(*t*) =*Γ*/365, so that*Γ*is the constant yearly rate of loss of immunity in the recovered population (i.e. of moving from the recovered to the susceptible compartment) and 1/*Γ*is the mean duration of immunity (in years).(ii) SIRS with surges of immunity loss. In this model paradigm, the assumption of a constant rate of loss of immunity is relaxed by allowing a surge of loss of immunity every new season. This models the introduction of a new single dominant strain for each influenza season with its own degree of immunity in the population. In this case, we set

*γ*(*t*) =*Γ*/365 +*Δ*(*t*), where Here, we defined a season as 1 July–30 June. The fraction of the recovered population who lose their immunity each season can change from season to season, thus adding one parameter for each season. We used either a season-independent timing for the surges, adding one more parameter to the model , or allowed a varying date for the surge in each season, adding*n*parameters to the model .

#### 2.2.4. Inference

A maximum-likelihood approach was used for estimating model parameters. Given a set of values for the model parameters and the initial conditions *i*(1), *i*(2), …, *i*(*d*), the value of the likelihood function is the probability of obtaining ILI(*d* + 1) … *ILI*(*T*) from the stochastic model (equation (2.6)), and is given by:
Note, however, that the quantities *λ*(*t*) appearing on the right-hand side are defined in terms of the actual incidences *i*(*t*) (equation (2.1)), which are unobservable. Therefore, at this point, we make the approximation *i*(*t*) = ILI(*t* + *θ*)/*q*, in order to explicitly calculate the likelihood. Thus, the stochasticity of the observation model (introduced in equation (2.5)) effects the probability of obtaining the ILI incidence in each current day but does not carry on to effect the likelihood of the data in the following days.

As we have less confidence regarding the correlation between the ILI data during the summer months and actual influenza incidence, we calculate the likelihood of the model only for the months of the influenza season (i.e. 1 November–30 April), when the majority of ILI cases are related to influenza (see electronic supplementary material, section C). We have found that by taking into account only these months we obtain better fits to the entire time-series than the fits obtained by using the entire year.

Because, as discussed earlier, the reporting criteria for ILI changed during the modelled period, two different reporting rates were assumed for the two periods. Estimating a reporting rate using our inference method is difficult, as the likelihood function proved to be flat around a wide range of reporting rate values (see the electronic supplementary material, figure S3 and section F). However, it was found that if the reporting rate value for one period is fixed, then the reporting rate for the other period can easily be estimated. We therefore fixed the reporting rate for the second period in the dataset (2002–2009) to *q*_{2} = 0.1, and estimated the reporting rate *q*_{1} in the first period (1998–2001). With a reporting rate of 10 per cent the range of annual attack rates of influenza in the second period is 10–20% of the population, which is in line with previous estimates [2,9]. Using a different reporting rate for this period affects the values of some of the estimated parameters in a straightforward way (see the electronic supplementary material, section F), but it does not change any of the more general results presented in the paper.

Nonlinear optimization (using Matlab's fmincon routine) was used to maximize the likelihood function and thus find the maximum-likelihood estimates for the model parameters. The likelihood function and optimization program were tested on simulated data and were found to correctly infer the parameters.

## 3. Results

In total, we evaluated 18 model variations defined by: (i) different weather drivers (i.e. the type of the weather signal modulating transmission rates) and (ii) different assumptions regarding the rate of loss of immunity (constant rate, surges on a fixed date each season or surges on a varying date each season). The models were ranked according to their Akaike information criterion (AIC) scores [45]. Table 2 summarizes the rankings of the seasonal models that use both absolute humidity and temperature data (either actual or inter-annual mean). We have found that using actual data of both weather variables always outperforms a similar model using just one of the variables. The complete table with all 18 models can be found in the electronic supplementary material, section G. The highest ranking model was found to be the model incorporating surges of loss of immunity at a different date each season while using actual daily weather to modulate the transmission (model no. 1 in table 1). Figure 2 displays the best fit of the ILI time-series using this model. The fit was obtained by running the infection model (equation (2.4)) and observation model (equation (2.5)) using the maximum-likelihood parameter estimates. We ran 1000 stochastic simulations of the model, calculating the mean incidence and 95 per cent prediction interval (based on the bottom and top 2.5% incidence) for each day. The model was found to fit the 11 year time-series very well (Pearson correlation: *r* = 0.93, *p* < 0.001).

Also shown in figure 2 is the fraction of susceptibles *S*_{f}(*t*) = *S*(*t*)/*N*(*t*), which can be seen to rise every year from a minimum of between 0.20 and 0.26 at the end of an epidemic to a maximum of between 0.29 and 0.38 at the start of a new epidemic. This rise is composed of both the constant loss of immunity rate and the surges of loss of immunity, whose size vary between 1 and 12 per cent of the immune population (table 3). The optimization set the timing for the surges to just precede the incline of the epidemic in every season. The daily values of the reproductive number *R*_{0}(*t*) fluctuate typically every year between a minimum of around 2.75 during summer and a maximum of around 3.75 during winter, constituting a variation of approximately 15 per cent in *R*_{0} due to weather conditions (figure 2*c*). The effective reproductive number, defined as *R*_{e}(*t*) = *R*_{0}(*t*) · *S*_{f}(*t*), which is shown at the bottom of the figure, is a key index that determines the fate of an epidemic. In order for an epidemic to spread, *R*_{e} must be above unity [46]. Here, *R*_{e} climbs to above unity as the surges occur, marking the start of the seasonal epidemic. The timing of the surge models the introduction of the virus into the population. The varying date from year to year, in this case, means that the date of introduction of the virus is part of the explanation for the inter-annual variability.

Figure 3 displays the best fits of the models using surges at a fixed date (*a,b*) or no surges at all (*c,d*), with *R*_{0} modulated by the actual weather variables (models nos. 3 and 5 in table 1). With surges at a fixed date, the fit is still mostly very good (*r* = 0.90, *p* < 0.001). The optimal-fit timing of the surges was found to occur at mid-December for all years, which is typically at the very beginning of the epidemic. However, the significantly early epidemic of 2003–2004, which peaked at the start of December, could not be captured properly using this model. This influenza season was dominated by a new drift variant A/Fujian (H3N2) virus, resulting in a very early epidemic throughout much of the northern hemisphere [47–49]. When using the model without surges, the goodness of fit decreases (*r* = 0.74, *p* < 0.001). Nevertheless, even in this case, using actual weather data to modulate *R*_{0} improves the fit considerably over using average weather (*r* = 0.66, *p* < 0.001).

Table 3 presents the maximum-likelihood parameter estimates for the models that appear in table 1 (the complete table with all the tested models can be found in the electronic supplementary material, section H). Likelihood profiles were used to establish 95% confidence intervals for the estimated parameters of the models using actual weather data (see the electronic supplementary material, section I). When using actual weather data, we find absolute humidity to be the dominant factor impacting the seasonality of *R*_{0} (*δ*_{AH} > *δ*_{T}), whereas temperature is preferred when using average weather. The clumping factor of the additional environmental stochasticity was estimated at 1/*k* ≈ 0.03, which translates to a CV of approximately 17 per cent the daily value of *R*_{0}, due to unspecified environmental factors. The fact that, in all cases, this estimate is significantly different from zero indicates that the environmental stochasticity is an essential component in our modelling framework. The reporting rate for the first period of surveillance was estimated at *q*_{1} ≈ 0.14 − 0.15 in the different models, so that the ratio in the reporting rates between the two surveillance periods is *q*_{1}/*q*_{2} ∼ 1.4 − 1.5. This result confirms our preliminary assessment that the change in the coding system resulted in a smaller probability of being assigned an ILI diagnosis.

The results of the model comparisons suggest that inter-annual variation in weather is an important factor in driving seasonal influenza dynamics. The models using actual absolute humidity and temperature to modulate *R*_{0} (models nos. 1, 3, 5), outperformed the same models using average absolute humidity and temperature (model nos. 2, 4, 6, respectively). A randomization test was implemented to further assess this result on all three models using actual weather data. The time-series of absolute humidity and temperature were divided into 11 seasons (from 1 July to 30 June), reshuffled and randomly reordered to create a permutation of the actual time-series. This procedure was repeated *N* = 1000 times. Each permuted time-series was used to fit the ILI incidence data with the tested model. Table 4 summarizes the results of these tests. In all three cases, the randomized weather nearly always underperformed compared with the actual weather data. This result further supports the hypothesis that daily weather conditions are an important factor in modulating seasonal influenza transmission.

Recently, Truscott *et al.* [9] attempted to identify the key factors responsible for generating the inter-annual variability seen in seasonal influenza data. They concluded that any model that attempts to reproduce this variability must include, as essential components, co-circulating viral strains interacting via cross-immunity and an age-structured population that is not randomly mixed. Their conclusion was based on a study of a deterministic model with simple periodic seasonality. Here, our goal is to similarly explore the essential components in our model responsible for reproducing the required inter-annual influenza variability as seen in Israel's ILI data (figure 1). While until now we have evaluated model fits in terms of time-series correlations and by comparing likelihoods, we now follow Truscott *et al.* and zoom in on some characteristics of the epidemic dynamics.

As in Truscott *et al*., we examined models fits to the annual attack rates, measured as the sum of ILI cases in a season (1 July–30 June) divided by the reporting rate for that season. In addition, we examined the timing of peak incidence in a season, measured as the number of weeks from 1 July up to peak incidence. Figure 4 summarizes the results of this comparison for the three models discussed earlier. The model incorporating the surges of loss of immunity at a different date each season (model no. 1) reproduces these characteristics in each of the seasons very well. The model incorporating the surges of loss of immunity at a fixed date each season (model no. 3) is able to fit the annual attack rates very well but is unable to capture the timing of the peak in two of the 11 seasons. On the other hand, the model without the surges (model no. 5) is unable to mimic the observed variability of the data, and therefore does a poor job of fitting these characteristics in individual seasons. To investigate this further, we ran the two latter models for 1000 years using their estimated parameters, while removing any stochasticity by using the mean of the stochastic equations. Each year, we used the weather data of a random year from our dataset. For the model with the surges we sampled a random surge size, each year, from the estimated surge size range, whereas the timing of the surges was fixed to 15 December. Figure 5 shows the incidence in the last 50 years in each case, as well as a histogram of the annual attack rates. Without the surges, the incidence exhibits some variation due to variation in the weather (using the average weather leads to an annual cycle), yet not enough variation compared with the real ILI time-series. With the surges, we obtain a time-series with realistic variance in the magnitude of influenza epidemics. We therefore conclude that, as far as a deterministic model is concerned, a simple SIRS model is not sufficient to capture the inter-annual variation in influenza epidemics, even when transmission rates are modulated due to weather conditions in place of a periodic function. However, by incorporating surges of loss of immunity, it is possible to more accurately reproduce the characteristics of influenza epidemics without the need for explicitly invoking co-circulating strains or introducing population heterogeneity.

## 4. Discussion

In this work, we extended the stochastic, discrete-time, age-of-infection SIR model framework of Katriel *et al.* [41]. We generalized the framework by incorporating both seasonal variation in transmission and loss of immunity. In addition, we incorporated environmental stochasticity, and a stochastic observation component, to the model framework, which was based only on demographic stochasticity. Using this framework, we have investigated the role of weather and antigenic drift on the dynamics of seasonal influenza in Israel. By evaluating the likelihood of different model variations, we have demonstrated that inter-annual variation, in both weather conditions and the initial fraction of immune population at beginning of a season, are important factors in capturing the dynamics of seasonal influenza. Our study suggests that the assumptions of a simple SIRS model are not sufficient to capture the changes between years in the initial fraction of the population immune to seasonal influenza. We have found, as was suggested by Finkenstädt *et al*. [5], that additional surges of loss of immunity play an important role, by accounting for changing rates of antigenic drift and alternating dominant virus types. We have demonstrated that using a combination of actual data to modulate the reproductive number gives better fit than inter-annual means of the weather variables, providing strong support for the role of weather conditions in modulating influenza transmission rate. Even though absolute humidity is dependent on temperature, and is highly correlated with it, we have found that there is an added value to using temperature together with absolute humidity, suggesting that both factors are involved in modulating influenza transmission (as was also shown in Barreca & Shimshack [26]).

Several studies in the past have examined long-term modelling of seasonal influenza epidemics [5–10]. Finkenstädt *et al*. [5] compared the classical SIRS model with an SIRS model with surges of loss of immunity, when seasonality is given by a complex periodic function. By confronting the models with ILI data from France they found, as we did, more support in the data for the model incorporating the surges. Shaman *et al.* [7] used the classical SIRS model with a seasonality driven by absolute humidity to explain the observed mean annual cycle of deaths attributed to influenza in different states of the USA. In this work, we used a new modelling framework to re-examine the results of these two studies when the two inter-annual varying factors are considered simultaneously. *A priori*, incorporating one inter-annual varying factor could have eliminated the significance of the other factor. Instead, we have found both factors essential in accounting for the inter-annual variability of seasonal influenza.

The surges of loss of immunity incorporated into our model can have two explanatory functions. On the one hand, they allow the fraction of the population of susceptible individuals at the beginning of each season to vary from one year to another. On the other hand, the timing at which these surges occur represents the date of introduction of a new variant of the virus into the population. The fact that the versions of the model in which dates of the surges were allowed to vary from season to season outperformed corresponding versions in which surges occurred at a fixed date (as measured by AIC score) suggests that the timing of introduction of the virus into the population plays a partial role in determining the course of the epidemic. This timing itself could depend on the course of epidemics in other countries, as well as on the randomness in arrival of individuals carrying the virus from abroad.

When incorporating surges of loss of immunity into the models, we have first restricted the surges to positive surges, representing the appearance of an antigenically novel influenza virus. In theory, it is possible to consider negative surges, where a previously circulating virus returns, having more exposure in the population than the virus circulating in the preceding season. However, we found that allowing for such negative surges did not influence our results; in fact, negative surges were not selected for in the best model fits.

Incorporating the observation process into the models made it possible to evaluate the difference in the reporting rate between two distinct periods in the incidence time-series, when changes in the ILI coding definitions were made. In addition, we have used the observation model to introduce a time delay between the time an infection occurs and the time the ILI diagnosis is given (see equation (2.5) in §4). The results of all the models given in the paper use a delay of 1 day which was found to be the optimum delay period. Using no time delay resulted in slightly lower likelihoods for all the models, yet with no significant effect on the results. Using larger time delays resulted in lower likelihoods, which declined with the size of the delay.

The models used in this paper make some simplifying assumptions. We assume that the dominant strains in each season may differ in the associated fraction of immune population but have the same reproductive number when similar weather conditions prevail. In addition, the models assume a homogeneous population (e.g. age classes and a spatial structure were not included). However, we have shown that even with these limiting assumptions, we are able to adequately fit the observed ILI time-series. In particular, we have demonstrated how our best model does well in fitting the magnitude and timing of the influenza outbreak in each season. The fact that we were able to do that without incorporating a partially mixed age-structured population may suggest that age-structure is not as essential a component in modelling seasonal influenza as it is when modelling childhood diseases. This could be the case because, unlike most childhood diseases, influenza does not confer permanent immunity, owing to its fast strain evolution. Further research is necessary to evaluate this matter, perhaps by examining the essential requirements for fitting time-series of influenza incidence stratified by different age groups.

It is interesting to consider the potential of the modelling framework developed in this work, together with the estimated range of parameter values, to provide more realistic forecasts of the course of future seasonal or pandemic influenza epidemics. For a reliable prediction, one would need to estimate the fraction of susceptibles in the population at the start of the epidemic. This information could perhaps be obtained by fitting the observed incidence from the initial stage of the epidemic or by using other sources of data such as serological surveys. Because long-term predictions of weather are unavailable, initial projections would have to rely on inter-annual mean weather data for the seasonal forcing. As the epidemic progresses, corrections to the initial projections could be made using new data of the observed incidence and the observed weather (see also Shaman [50] for another approach of predicting short-term influenza incidence using weather).

## Acknowledgements

We thank the Israel Center for Disease Control (ICDC) and its director Prof. Tamar Shohat, Maccabi Health Services and Dr Varda Shalev. We also thank Prof. David Steinberg, Dr Uri Roll and two anonymous referees for their helpful comments. We are grateful for support from the European FP7, the Israel Science Foundation and the Israel Ministry of Health. R.Y. is supported by the Israel National Institute for Health Policy and Health Services Research.

- Received April 2, 2013.
- Accepted April 25, 2013.

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.