## Abstract

Early estimates of the transmission potential of emerging and re-emerging infections are increasingly used to inform public health authorities on the level of risk posed by outbreaks. Existing methods to estimate the reproduction number generally assume exponential growth in case incidence in the first few disease generations, before susceptible depletion sets in. In reality, outbreaks can display subexponential (i.e. polynomial) growth in the first few disease generations, owing to clustering in contact patterns, spatial effects, inhomogeneous mixing, reactive behaviour changes or other mechanisms. Here, we introduce the *generalized* growth model to characterize the early growth profile of outbreaks and estimate the effective reproduction number, with no need for explicit assumptions about the shape of epidemic growth. We demonstrate this phenomenological approach using analytical results and simulations from mechanistic models, and provide validation against a range of empirical disease datasets. Our results suggest that subexponential growth in the early phase of an epidemic is the rule rather the exception. Mechanistic simulations show that slight modifications to the classical susceptible–infectious–removed model result in subexponential growth, and in turn a rapid decline in the reproduction number within three to five disease generations. For empirical outbreaks, the generalized-growth model consistently outperforms the exponential model for a variety of directly and indirectly transmitted diseases datasets (pandemic influenza, measles, smallpox, bubonic plague, cholera, foot-and-mouth disease, HIV/AIDS and Ebola) with model estimates supporting subexponential growth dynamics. The rapid decline in effective reproduction number predicted by analytical results and observed in real and synthetic datasets within three to five disease generations contrasts with the expectation of invariant reproduction number in epidemics obeying exponential growth. The generalized-growth concept also provides us a compelling argument for the unexpected extinction of certain emerging disease outbreaks during the early ascending phase. Overall, our approach promotes a more reliable and data-driven characterization of the early epidemic phase, which is important for accurate estimation of the reproduction number and prediction of disease impact.

## 1. Introduction

There is a long and successful history of using compartmental transmission models to study epidemic dynamics, often calibrated using time-series data describing the progression of the epidemic [1–6]. A fundamental tenet of the classic epidemic theory is that the initial growth phase should be exponential in the absence of susceptible depletion or interventions measures. However, early subexponential (e.g. polynomial) growth patterns have been observed in outbreaks of HIV/AIDS [7–9], Ebola [10] and foot-and-mouth disease (FMD) [11]. Potential mechanisms remain debated but include spatial heterogeneity, perhaps mediated by the route of transmission (i.e. airborne versus close contact) [10–12], clustering of contacts [8] and reactive population behavioural changes that can gradually mitigate the transmission rate [10,11]. Accordingly, a range of mechanistic models can reproduce subexponential growth dynamics before susceptible depletion sets in, including models with gradually declining contact rate over time [13] and spatially structured models such as household–community networks [12], regular lattice static contact networks or small-world networks with weak global coupling [13,14]. For real epidemics, however, the underlying mechanisms governing subexponential growth can be difficult to disentangle and hence to model [13].

Given the relatively common occurrence of subexponential growth dynamics in empirical data and the variety of mechanisms at play, a flexible phenomenological model has been proposed to reproduce a variety of growth profiles. In the generalized-growth model, a tuning parameter (called the deceleration of growth, *p*) can reproduce a range of dynamics from constant incidence (*p* = 0) to exponential growth (*p* = 1) [11]. Application of this generalized-growth model to empirical data supports a notably slow spread (*p* < 1) of the 2014 Ebola outbreaks at district level in parts of West Africa, intermediate spread profiles for historical plague and smallpox outbreaks (*p* = 0.8), and near exponential dynamics for pandemic influenza (*p* ≈ 1) [15]. Departure from standard epidemic theory may be more common than previously thought because transmission heterogeneities are the rule rather than the exception [11]. In this paper, we build on the generalized-growth model concept [11,13] to show that faithful characterization of such departures is important for accurate estimation of the reproduction number.

The basic reproduction number, commonly denoted by *R*_{0}, is a key parameter that characterizes the early epidemic spread in a fully susceptible population, and can be used to inform public health authorities on the level of risk posed by an infectious disease and the potential effects of intervention strategies [16]. According to the classical theory of epidemics, largely based on compartmental modelling [1,2,17,18], *R*_{0} is expected to remain invariant during the early phase of an epidemic that grows exponentially and as long as susceptible depletion remains negligible [2]. More generally, temporal variation in the transmission potential of infectious diseases is monitored via the effective reproduction number, *R*_{t}, defined as the average number of secondary cases per primary case at calendar time *t* [19]. If *R*_{t} < 1, then the epidemic declines, whereas *R*_{t} > 1 indicates widespread transmission.

Here, we expand the generalized-growth method [11] to characterize and estimate the effective reproduction number *R*_{t} during the early growth phase and before susceptible depletion sets in. We illustrate our phenomenological approach, using a combination of analytical results and synthetic datasets derived from mechanistic models with spatial or temporal effects yielding subexponential growth, and apply the approach to empirical data reflecting a variety historic and contemporary outbreaks. We show that subexponential growth dynamics is both common and important for accurate assessment of the early transmission potential.

## 2. Material and methods

Our study is organized in four sections. First, we describe the phenomenological ‘generalized-growth’ model and the method to estimate the effective reproduction number in this model. Second, we simulate from this phenomenologic model to gauge the expected magnitude of temporal variation in reproduction number and test our analytical predictions. Third, we simulate epidemic datasets based on mechanistic transmission models that reproduce subexponential growth dynamics, and apply the generalized-growth estimation approach to these synthetic data. Fourth, we test a range of empirical outbreak datasets for presence of subexponential growth and estimation of the effective reproduction number.

### 2.1. The effective reproduction number during the early epidemic growth phase

We extend a previously established generalized-growth model [11] to estimate the effective reproduction number *R*_{g} according to disease generations *g*. Briefly, the generalized-growth model is a useful phenomenological model that relaxes the assumption of exponential growth in the early ascending phase of an outbreak, taking the form
2.1where *C*′(*t*) describes the incidence at time *t*, the solution *C*(*t*) describes the cumulative number of cases at time *t*, *r* is a positive parameter denoting the growth rate (with units of (people)^{l − p} per time), and is a ‘deceleration of growth’ parameter (dimensionless). If *p* = 0, then this equation describes constant incidence over time and the cumulative number of cases grows linearly, whereas *p* = 1 describes exponential growth in the Malthus equation, and the solution is given by , where *C*_{0} is the initial number of cases. An equivalent approach to model subexponential growth would be to modulate the growth rate rather than the cumulative number of cases (see electronic supplementary material).

For the exponential growth model, the average number of secondary cases generated by initial cases during the first generation interval *T*_{g} (assumed to be fixed) is estimated by [20,21]
2.2The expression for depends only on *r* and *T*_{g}. Moreover, during the exponential growth phase, remains invariant at . This can be shown by analysing , the ratio of case incidences over consecutive generation intervals, which is given by
2.3In the case of subexponential growth, i.e. when *p* < 1, we can characterize the effective reproduction number over disease generations, *g*. For such polynomial epidemics, equation (2.1) exhibits an explicit solution that describes the cumulative number of cases over time, *C*_{sub exp}(*t*), in the form of [22]
2.4

where . Hence, the corresponding incidence equation is given by 2.5The analytical expression for the effective reproduction number by disease generation, , captures the ratio of case incidences over consecutive disease generations 2.6

In contrast to the exponential growth model (*p* = 1), where was independent of disease generation throughout the early growth phase, we observe that in generation *g* varies as a function of *g*. Given that *A* is fixed in equation (2.6), the ratio declines to zero as *g* increases, and, thus, approaches 1.0 asymptotically. Moreover, as *p* → ^{1−} (see the electronic supplementary material).

### 2.2. Numerical estimation of the effective reproduction number

The effective reproduction number can be estimated from case incidence data simulated from the generalized-growth model and using information about the distribution of the disease generation interval (table 1). Specifically, based on the incidence at calendar time *t*_{i} denoted by *I*_{i}, and the discretized probability distribution of the generation interval denoted by *ρ*_{i}, the effective reproduction number can be estimated using the renewal equation [19,36]
2.7where the denominator represents the total number of cases that contribute (as primary cases) to generating the number of new cases *I*_{i} (as secondary cases) at calendar time *t*_{i} [19].

### 2.3. Trends in effective reproduction number based on simulations from the phenomenological ‘generalized growth’ model

To gauge the expected temporal variation in the effective reproduction number for a range of growth profiles and test analytic predictions, we simulate incidence data, using the generalized-growth model (equation (2.1)). We fix the growth rate parameter *r*, but assume different distributions of the disease generation interval (e.g. exponential, gamma, uniform, delta), and vary the ‘deceleration of growth’ parameter *p* between 0 and 1 [11]. We analyse the outbreak trajectory in the first five disease generations to estimate the effective reproduction number using equation (2.7) (assuming a fixed generation interval) and compare with estimates obtained from the analytical expressions in equations (2.3) and (2.6).

### 2.4. Trends in early growth dynamics based on mechanistic models simulations

Next, we develop three specific examples of mechanistic transmission models that support early subexponential growth dynamics. These include (i) SIR (susceptible–infectious–removed) dynamics on a spatially structured model, as substantial levels of clustering have been hypothesized to yield early polynomial epidemic growth [8,11,12], (ii) SIR compartmental model with reactive behavioural changes via a time-dependent transmission rate [10,11,31,37,38], and (iii) an SIR compartmental model with inhomogeneous mixing [39,40]. We briefly describe these models below.

#### 2.4.1. Susceptible–infectious–removed epidemics on a spatially structured model

One of the putative mechanisms leading to early polynomial epidemic growth dynamics is clustering [8,11], a network property that quantifies the extent to which the contacts of one individual are also contacts of each other [14]. Contact networks are particularly useful to explore the impact of clustering; here, we use a network-based transmission model with household–community structure, which has been previously applied to study the transmission dynamics of Ebola [12,41]. In this model, individuals are organized within households of size *H* (each household contains *H* individuals) and households are organized within communities of size households (each community contains individuals). Network connectivity is identical for every individual. The household reproduction number *R*_{0H} was set at 2.0 and the community reproduction number *R*_{0c} was set at 0.7 based on previous study [12]. For a fixed household size (*H* = 5) and different values of the community size parameter, we analyse the temporal profile in case incidence and the effective reproduction number during the first few disease generations from 200 independent stochastic realizations.

#### 2.4.2. Susceptible–infectious–removed compartmental model with reactive behavioural changes

In addition to contact clustering, rapid onset of behaviour changes is another mechanism that has been hypothesized to lead to subexponential growth dynamics, as it would result in an early decline in effective reproduction number. For instance, during the 2014–2015 Ebola epidemic, some areas of West Africa exhibited early subexponential growth even before control interventions were put in place [10,42].

To model behaviour change, we consider a classical SIR epidemic model [1,3] with time-dependent contact rate following
2.8where *S*(*t*), *I*(*t*) and *R*(*t*) denote the number of susceptible, infectious and removed (recovered) hosts in a randomly mixed population of size *N*, *β*(*t*) is the time-dependent transmission rate, the probability that a susceptible individual encounters an infectious individual is given by *I*(*t*)/*N* and 1/*γ* is the mean infectious period.

In the classical SIR model with constant transmission rate *β*, in a completely susceptible population, *S*(0) ≈ *N* and *I*(*t*) grows exponentially during the early epidemic phase, e.g. , where is the average number of secondary cases generated by a primary case during the infectious period. When susceptible depletion kicks in (*S*(*t*) *<* *S*(*0*)), the effective reproduction number, *R*_{t}, declines following . During the first few disease generations, where , the classical SIR model supports a reproduction number that is nearly invariant, i.e. *R*_{t} ≈ *R*_{0}. Here, to capture behaviour change, we model an exponential decline in the transmission rate *β*(*t*) from an initial value *β*_{0} towards *ϕβ*_{0} at rate *q* > 0 following
Here, *β*(*t*) leads to early subexponential growth dynamics whenever *R*_{0} > 1 and *q* > 0. Assuming that *R*_{0} > 1 in a sufficiently large susceptible population, so that the effect of susceptible depletion is negligible in the early epidemic phase, the quantity 1 − *ϕ* models the proportionate reduction in *β*_{0} that is needed for the effective reproduction number to asymptotically reach 1.0. Hence, *ϕ* can be estimated as 1/*R*_{0}. If *q* = 0, and we recover the classic SIR transmission model with early exponential growth dynamics. In general, a faster decline of the effective reproduction number towards 1.0 occurs for higher values of *q*, even without susceptible depletion. It is worth noting that prior HIV/AIDS models [43] have incorporated exponential decay in the transmission rate in a similar manner as described here, albeit the rate of decay was assumed to be a time-dependent function of HIV/AIDS prevalence.

To examine the behaviour of the effective reproduction number *R*_{g} over disease generations in the above model, we analyse the temporal progression in the number of cases at generation *g* based on the following discrete equations [44]
2.9where *ϕ* = 1/*R*_{0}, *I*_{g} is the number of new cases at generation *g* and *S*_{g} is the number of remaining susceptibles at generation *g*. We initialize simulations with *I*_{0} = 1 and *S*_{0} = *N* where *N* is set to 10^{8} individuals.

#### 2.4.3. Susceptible–infectious–removed compartmental model with nonlinear incidences

Beyond contact clustering and decay in transmission rate, a third mechanism potentially accounting for subexponential growth is departure from mass action which may owing to spatial structures or other forms of non-homogeneous mixing. These effects can be incorporated in the SIR models using nonlinear incidence rates [45,46]. For instance, the incidence rate can take the form: where α is a phenomenological scaling mixing parameter; *α* = 1 models homogeneous mixing, whereas *α* < 1 reflects contact patterns that deviate from random mixing and lead to slower epidemic growth [47]. A related version of this model is the TSIR model [40], which has found applications in various infectious disease systems, including measles [40,48], rubella [49] and dengue [50].

Here, we consider an SIR model with non-homogeneous mixing, with constant transmission rate *β*_{0} and mixing parameter α, following
2.10To analyse the progression of the reproduction number *R*_{g} over disease generations in the above model, we use the following discrete equations describing the number of cases at generation *g* [13,44]
2.11where *I*_{g} is the number of new cases at generation *g* and *S*_{g} is the number of remaining susceptible individuals at generation *g*. We initialize simulations with *I*_{0} = 1 and *S*_{0} = *N* where *N* is set to 10^{8} individuals.

### 2.5. Application to real outbreak data

Lastly, we analyse a variety of empirical outbreak datasets to test the importance of subexponential growth in observed disease dynamics and the resulting impact on the effective reproduction number estimates. We rely on a convenience sample representing a variety of pathogens, geographical contexts and time periods, and include outbreaks of pandemic influenza, measles, smallpox, bubonic plague, cholera, FMD, HIV/AIDS and Ebola (table 1 and electronic supplementary material for time series). The temporal resolution of the datasets varies from daily to annual. For each outbreak, the onset week corresponds to the first observation associated with a monotonic increase in the case incidence, up to the peak incidence.

We focus on the first three to five disease generations, depending on the length of the available empirical time series. We estimate the effective reproduction number, using the two-step approach. In the first step, we use nonlinear least-squares to fit the generalized growth model to the synthetic mechanistic data, and estimate parameters *r* and *p* (equation (2.1), [11]). The initial number of cases *C*_{0} is fixed according to the first observation. Nominal 95% CIs for parameter estimates *r* and *p* are constructed by simulations of 200 best-fit curves, *C′*(*t*), using parametric bootstrap with a Poisson error structure, as in prior studies [51]. In the second step, we simulate epidemic curves using the generalized-growth model with estimated *r* and *p*, and apply equation (2.7) to the simulated incidence data. We assume a gamma distribution for the generation interval, with means and standard deviations as in table 1 [52–58]. In addition, for each outbreak, we compare the goodness of fit of the phenomenological generalized-growth model versus the exponential growth models.

## 3. Results

### 3.1. Trends in effective reproduction number based on simulations from the phenomenological ‘generalized growth’ model

We first analyse simulations of epidemic growth under the generalized growth model in the first five disease generations of the outbreak, for different values of *r* and *p* and a fixed generation interval (figure 1 and electronic supplementary material, figure S2). Our simulations confirm the analytical results described in equations (2.3) and (2.6) in relation to changes in the effective reproduction number under early exponential (*p* = 1) and subexponential growth dynamics (*p* < 1). As expected, the greater the departure from exponential growth (*p* close to 0), the lower the effective reproduction number . However, more importantly, in the case of subexponential growth, and for a given growth rate *r*, the effective reproduction number is a dynamic quantity that approaches 1.0 asymptotically with increasing disease generations. In contrast, for exponential growth (*p* = 1), the effective reproduction number remains invariant during the early epidemic growth phase.

We also run simulations under different assumptions regarding the distribution of the generation interval and vary *p* in the range 0 < *p* ≤ 1 (figure 2). The declining trend in the effective reproduction number associated with the subexponential growth regime (*p* < 1) persists independently of the generation interval distribution. Moreover, as *p* decreases, estimates of the effective reproduction number become less dependent on the generation interval distribution (figure 2). This indicates that for a sufficiently small *p* < 1, the mean of the generation interval distribution provides sufficient information to estimate the reproduction number, without the need to specify a full distribution.

### 3.2. Trends in case incidence and the effective reproduction number based on mechanistic models simulations

#### 3.2.1. Susceptible–infectious–removed epidemics on a spatially structured epidemic model

Figure 3 shows simulations of case incidence and the effective reproduction number *R*_{g} derived from the household–community transmission model for different levels of community mixing, tuned by . As expected, the lower the community mixing, the greater the departure from homogeneous mixing, and hence the greater the departure from early exponential growth dynamics. Early subexponential growth dynamics are observed in all community mixing scenarios tested (, 45 and 65 households), which is consistent with a declining trend in the effective reproduction number *R*_{g} (figure 3).

#### 3.2.2. Susceptible–infectious–removed compartmental model with reactive behavioural changes

Representative profiles of *R*_{g} for the SIR model with time-dependent transmission rate *β*(*t*) are shown in figure 4 for different values of the speed of transmission decline, tuned by parameter *q*. The decline in effective reproduction number (*g* = 0 … *n*) is more pronounced as the decline in transmission rate is faster (i.e. ). Early subexponential growth dynamics is seen in all simulations where *q* > 0 (figure 4).

#### 3.2.3. Susceptible–infectious–removed compartmental model with nonlinear incidence rates

Simulations for this model display concave down incidence curves in semi-logarithmic scale, supporting the presence of early subexponential growth, even for values of *α* slightly below the homogeneous mixing regime (i.e. α just below 1; electronic supplementary material, figure S3). Accordingly, the effective reproduction number *R*_{g} exhibits a declining trend during the first few disease generations (figure 5). By contrast, the reproduction number remains invariant at *R*_{g} = *R*_{0} = 2 when *α* = 1 (figure 5*b*, black curve).

### 3.3. Application to real outbreak data

Lastly, we apply the concepts of subexponential growth dynamics to a variety of empirical outbreak datasets. The electronic supplementary material, figures S4–S5, provides a comparative analysis of the goodness of fit provided by the generalized growth and the exponential growth models across outbreaks. Our results indicate that the generalized-growth model consistently outperforms the exponential growth model in the early ascending phase of the outbreak, even when *p* is only slightly below 1.0 (i.e. departure from the exponential model is slight). Across outbreaks, we find variability in the deceleration of growth parameter, even for a given pathogen (median *p* = 0.57, interquartile range (IQR): 0.46–0.84; figure 6). Not surprisingly, parameter uncertainty declines with increasing length of the early epidemic phase used for estimation (figures 6 and 7). On the other hand, mean estimates of *p* (table 1) are stable during the first three to five disease generations (ANOVA, *p* = 0.9).

When we use the generalized-growth model to estimate the effective reproduction number, we find a declining trend in the effective reproduction number with increasing disease generation intervals and variability in estimates of the reproduction number across 21 outbreaks representing eight different pathogens (figure 7). Further, estimates of the effective reproduction number are sensitive to small changes in the deceleration of growth parameter across outbreaks (Spearman's *ρ* > 0.62, *p* < 0.002; electronic supplementary material, figure S6).

Model fits to empirical data illustrates a variety of exponential and subexponential growth profiles across pathogens (electronic supplementary material, figures S7–S11). For instance, the autumn 1918 influenza pandemic in San Francisco is characterized by near exponential growth, with *p* ∼ 0.8–0.9 and a relatively stable reproduction number in the range 1.7–1.8 (electronic supplementary material, figure S7). In contrast, the FMD outbreak in Uruguay at the farm level displays slower initial growth with mean *p* ∼ 0.4–0.5 and a more variable reproduction number in the range 1.6–2.8 (electronic supplementary material, figure S8). For the HIV/AIDS epidemic in Japan (1985–2012), we estimated the mean effective reproduction number in the range 1.3–1.6 with *p* ∼ 0.5 assuming a mean generation interval of 4 years, consistent with pronounced departure from exponential growth (electronic supplementary material, figure S9).

The wealth of district-level Ebola data available for the 2014 epidemic in West Africa provides a good opportunity to gauge geographical variations in the growth profiles, and in the resulting effective reproduction numbers. Indeed, we find variability across geographical locations in the effective reproduction number (median = 1.46, IQR: 1.26–1.83) and deceleration parameter *p* (median = 0.58, IQR: 0.46–0.72), and correlation between these parameter estimates (Spearman's *ρ* = 0.81, *p* < 0.001). For comparison, at the fifth disease generation interval, the highest estimate of the effective reproduction number was at 2.5 (95% CI: 2.0–2.7) for the 2014 Ebola outbreak in Montserrado, Liberia, whereas the lowest estimate was at 1.03 (95% CI: 1–1.1) for the outbreak in Bomi, Liberia (figure 7).

## 4. Discussion

In this study, we introduce a quantitative ‘generalized growth’ framework to characterize the transmission potential of pathogens in the early phase of an outbreak, when susceptible depletion remains negligible, without making explicit assumptions about the epidemic growth profile. The phenomenological ‘generalized growth’ model reproduces a range of growth dynamics from polynomial to exponential [11] and is agnostic of the mechanisms affecting growth, which may include contact patterns, spatial effects, non-homogeneous mixing and/or behaviour changes. A phenomenological model can be particularly useful when biological mechanisms are difficult to identify. Using a combination of analytical results, simulations from mechanistic models and analyses of empirical outbreak data, we demonstrate that the effective reproduction number typically displays a downward trend within the first three to five disease generations. Evidence of subexponential growth, and associated reproduction number decline, is found in disease systems as varied as Ebola, pandemic influenza, smallpox, plague, cholera, measles, FMD and HIV/AIDS. Our results indicate that the concept of subexponential growth is both widespread and important to consider for accurate assessment of the reproduction number.

For epidemics that truly depart from exponential growth theory, traditional estimation methods relying on the assumption of exponential growth are expected to inflate reproduction number estimates. The bias between theoretical values and estimates increases as departure from exponential growth becomes more pronounced, i.e. when *p* decreases towards 0, representing slower epidemic spread compared with the exponential case where *p* = 1. For instance, our estimate of the reproduction number for the 1972 smallpox epidemic in Khulna, Bangladesh (approx. 2 (95% CI: 1.6–2.6)) is significantly lower than earlier historic estimates of smallpox based on an exponential growth assumption (range 3.5–6.0) [59]. In contrast, when *p* is near 1.0, indicating near exponential growth, our estimates of the reproduction number remain consistent with those of compartmental models. This is the case for the 1905 bubonic plague epidemic in Bombay, India [60], or the 1918 influenza pandemic in San Francisco [15]. Overall, our estimates for Ebola outbreaks tend to be slightly lower than those reported in prior studies, possibly because of subexponential growth at the district levels [12,31,55,61–66]. It is also worth noting that the incorporation of generalized growth in a phenomenological logistic-type model can substantially increase the performance of the model for short-term forecasting and prediction of the final epidemic size as recently illustrated in the context of the Zika epidemic in Colombia [67].

Here, we have studied simulations from three common types of mechanistic models supporting early subexponential growth dynamics and incorporating characteristics of the host contact network, behaviour changes and inhomogeneous mixing. In all models, relatively small departures from crude SIR dynamics led to subexponential growth profiles, and in turn a quick decline in effective reproduction number, speaking to the generalizability of our findings. With real outbreak case series data, however, it can be difficult to disentangle the mechanism or combination of mechanisms shaping the early epidemic growth profile, especially when case series data are limited to the early epidemic growth phase. In order to assess the contribution of different mechanisms, independent sources of data would be required to quantify the structural characteristics of the contact network [68–70], or the timing and intensity of possible behaviour changes. Further, for comparison purposes, it can be particularly useful to enumerate secondary cases from transmission tree data whenever available, and obtain independent estimates of R0 [68–70] agnostic of any model form. There is clearly scope for more research work in this area. In the absence of detailed information on the chains of transmission or the biological mechanisms at play however, a phenomenological approach such as that proposed here with the generalized-growth model may be preferable.

In addition to providing a quantitative framework for estimation of the reproduction number based on a phenomenological approach, this study has implications for disease control, particularly our understanding of herd immunity and extinction thresholds [1,2]. In the simple SIR models, the critical fraction of the population needed to be effectively vaccinated to prevent an epidemic is given by 1−1/*R*_{0}, which is in the range 50–90% of the population for most epidemic diseases [1,66]. However, this fraction may be potentially considerably lower for epidemics rendering subexponential growth, where the effective reproduction number naturally declines towards unity, irrespective of other intervention measures and before susceptible depletion sets in. For example, the 2014 West African Ebola outbreak ended with less than 1% of the population registered as cases, which defies expectations from SIR models, and the contribution of large-scale interventions on these low attack rates remains debated [71]. These data-driven observations suggest that more attention should be paid to the shape of the early ascending phase of emerging infectious diseases outbreaks, and the associated uncertainty in the reproduction number estimates should be considered.

A related consequence of subexponential growth dynamics, and associated decline in effective reproduction number, is the effect on the extinction threshold. Indeed, it is natural to expect a higher probability of extinction owing to stochastic effects for epidemics governed by subexponential growth. This may in part explain the small magnitude and a short duration of most Ebola outbreaks since 1976 [72–74], as Ebola showed substantial departure from exponential growth (0.6 < *p* < 0.72). In fact, simulations using an individual-level stochastic model for Ebola with household and community contact network structure are consistent with early subexponential growth dynamics, and has a probability of approximately 40% of spontaneous die-out of an outbreak within the first month of transmission [12]. This model [12] is also consistent with an effective reproduction number that asymptotically declines towards unity as the virus spreads through the population. From a public health perspective, outbreaks characterized by subexponential growth dynamics may provide a greater window of opportunity for implementation of control interventions compared with those following exponential or near exponential growth dynamics [12].

Overall, our results underscore the need to carefully characterize the shape of the epidemic growth phase in order to accurately assess early trends in reproduction number. Consideration of the subexponential growth phenomenon will improve our ability to appropriately model transmission scenarios, assess the potential effects of control interventions, and provide accurate forecasts of epidemic impact. Looking to the future, the development of new mechanistic transmission models is needed to provide a better understanding of the factors shaping early epidemic growth. Such models would allow for systematic evaluation of epidemic outcomes and disease control policies. A recent review of forecasting models for the West African Ebola epidemic highlighted a range of approaches to investigating disease spread from simple phenomenological models, to compartmental epidemic models, to intricate contact networks [75]. The vast majority of these approaches considered early exponential growth dynamics, an assumption that led to substantial overestimation of Ebola epidemic size and peak timing and intensity. In the light of these findings, Chretien *et al*. [75] stress the need for new mechanistic models that incorporate ‘dampening approaches’ to improve characterization of the force of infection and provide uniform forecasting approaches and evaluation metrics. We believe this study represents a significant step in this direction.

## Data accessibility

All of the epidemic incidence data employed in this paper are being made publicly available in the electronic supplementary material.

## Authors' contributions

G.C. designed the study. G.C. and S.M. contributed to methods. G.C. carried out simulations, analysed the data and wrote the first draft of the manuscript. G.C., C.V., L.S. and S.M. contributed to the writing and revisions of the manuscript, and approved its final version.

## Competing interests

We have no competing interests.

## Funding

G.C. acknowledges financial support from the NSF grant no. 1414374 as part of the joint NSF-NIH-USDA Ecology and Evolution of Infectious Diseases programme; UK Biotechnology and Biological Sciences Research Council grant no. BB/M008894/1, NSF-IIS RAPID award no. 1518939 and NSF grant no. 1318788 III: Small: Data Management for Real-Time Data-Driven Epidemic simulation, and the Division of International Epidemiology and Population Studies, The Fogarty International Center, US National Institutes of Health. C.V. and L.S. acknowledges financial support from the RAPIDD Programme of the Science and Technology Directorate and the Division of International Epidemiology and Population Studies, The Fogarty International Center, US National Institutes of Health. L.S. also acknowledges generous support from an EC Marie Curie Horizon 2020 fellowship. S.M. acknowledges the support from the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Mathematics of Information Technology and Complex Systems (Mitacs).

## Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.3473634.

- Received August 17, 2016.
- Accepted September 7, 2016.

- © 2016 The Author(s)

Published by the Royal Society. All rights reserved.