## Abstract

Understanding the growth and spatial expansion of (re)emerging infectious disease outbreaks, such as Ebola and avian influenza, is critical for the effective planning of control measures; however, such efforts are often compromised by data insufficiencies and observational errors. Here, we develop a spatial–temporal inference methodology using a modified network model in conjunction with the ensemble adjustment Kalman filter, a Bayesian inference method equipped to handle observational errors. The combined method is capable of revealing the spatial–temporal progression of infectious disease, while requiring only limited, readily compiled data. We use this method to reconstruct the transmission network of the 2014–2015 Ebola epidemic in Sierra Leone and identify source and sink regions. Our inference suggests that, in Sierra Leone, transmission within the network introduced Ebola to neighbouring districts and initiated self-sustaining local epidemics; two of the more populous and connected districts, Kenema and Port Loko, facilitated two independent transmission pathways. Epidemic intensity differed by district, was highly correlated with population size (*r* = 0.76, *p* = 0.0015) and a critical window of opportunity for containing local Ebola epidemics at the source (*ca* one month) existed. This novel methodology can be used to help identify and contain the spatial expansion of future (re)emerging infectious disease outbreaks.

## 1. Introduction

The 2014–2015 West African Ebola epidemic is the most severe Ebola outbreak on record. It is believed to have emerged in the Guéckédou region of Guinea during December 2013 [1], and spread to adjacent nations, Liberia in March [2] and Sierra Leone (SL) in May [3,4]. On 25 May 2014, SL reported its first confirmed Ebola case from Kailahun [3], a district on the border, south of Guinea and west of Liberia. By 26 April 2015, 12 371 Ebola cases and 3899 deaths had been reported in SL [5].

Quantification of local growth rates and the geographical spread of (re)emerging infectious diseases are crucial for determining the level and speed of intervention needed to contain an epidemic. Previous studies have used models to simulate and project the propagation of infectious diseases, such as influenza, over national and larger scales and to infer key spatial and temporal epidemiological characteristics of these outbreaks [6–10]. Detailed data resolving population structure and movement are typically needed to calibrate these models; however, such data are not readily available in West Africa [11,12]. Furthermore, observations of Ebola incidence and mortality have been error-laden and biased; indeed, the US Centers for Disease Control and Prevention estimated that only 40% of Ebola cases have been reported [13], and the underreporting rate could vary by region over time. To address these challenges, we here develop a modified patch network model of intermediate complexity for Ebola transmission, which we use in conjunction with a Bayesian inference method, implemented via data assimilation, and district-level incidence data for SL. The combined model-inference system enables simulation and inference of the spatial–temporal spread and characteristics of Ebola in SL.

## 2. Results

### 2.1. Framework of the spatio-temporal inference system

A number of epidemiological model structures have been used to simulate Ebola transmission [6,7,13–20]. For Ebola, a person, initially susceptible (*S*) to the disease, becomes infected when exposed (*E*) to the virus, then becomes infectious (*I*) following an incubation period, and is finally removed (*R*) from the diseased pool owing to recovery or death. Accordingly, we model the propagation of Ebola through a population using a susceptible–exposed–infectious–removed (SEIR) compartmental model [21]. This choice is also based on our previous work indicating that more parsimonious model structures, such as the SEIR model, tend to be more easily optimized than more complex modelling frameworks [22,23]. In a network of multiple districts, residents within each district may contract the disease locally or when traveling to other districts. For instance, new infections at district 1 within a three-district network would occur at a rate of
2.1where the single digit subscripts are district indices; *β* is the transmission rate; respectively, denote the number of infectious, susceptible and total number of people *present* in each district at a given time, i.e. including both local residents and visitors; *c*_{11}, *c*_{21} and *c*_{31} are, respectively, the proportions of residents of district 1 that stay local and that travel to districts 2 and 3. The summation (equation (2.1)) includes the number of cases infected locally (first summand) and those infected outside their home district (the second and third summands). This formula assumes that the subpopulation within each district is well mixed, and that new infections in a given district are allocated to local residents and visitors in proportion to their corresponding percentages among the total number of susceptibles; for instance, the total new infection rate in district 1 is with a portion, local residents.

The proportions form a matrix, e.g. for a three-district network, which represents the connectivity among districts. Previous studies [10,24] have computed the raw interdistrict commute flow using a gravity formula
2.2where *N _{i}* and

*N*are the population size for the two districts,

_{j}*d*is the distance between the two districts,

_{ij}*θ*is a proportionality constant and the exponents

*τ*

_{1},

*τ*

_{2}and

*ρ*together determine the connectivity between the two districts [8,10]. The model is then calibrated using detailed human mobility data (e.g. ground commute flow) and the raw numbers are adjusted to compute the [

*c*] matrix (as detailed in the Material and methods section). As commuter data are not available for SL, we normalize all inputs for population size and distance such that the proportionality constant,

_{ij}*θ*, which heavily relies on mobility data, is eliminated. Further, we use the area of each district to gauge local mobility (see Material and methods for details of the modified gravity model).

We then couple this modified gravity connection formula with the SEIR network model, and use the ensemble adjustment Kalman filter (EAKF), in conjunction with district-level incidence data, to estimate all state variables (i.e. *S*, *E*, *I*, for each district) and model parameters, including the transmission rate *β* for each district, the incubation period, the infectious period and the three exponents (*τ*_{1}, *τ*_{2} and *ρ*) for the gravity model. The EAKF [15,23,25–27] is a data assimilation method that uses an ensemble of system replicas to represent the distribution of possible model state and parameter values. It approximates a Bayesian update to the model states and parameters using the observational data and an estimate of the errors in the data. When tested against a model-synthesized dataset resembling the incidence record for SL, our inference method was able to sensibly estimate the state variables and model parameters (see electronic supplementary material, figures S1–S3).

### 2.2. Model fitting to the district epidemic curves

We applied the inference method to district-level incidence data from the week ending 25 May 2014, the first week of the Ebola epidemic in SL, to the week ending 25 January 2015. Of the 14 SL districts, four in the northwest (Western Area Urban, Western Area Rural, Port Loko and Bombali) recorded the greatest numbers of Ebola cases; the peripheral districts, Bonthe and Pujehun, only had sporadic cases; the remaining districts had moderate outbreaks. The epidemic started from Kailahun [3], then spread west to Kenema and Bo, and then to the rest of SL. Our inference framework is able to recreate the epidemic trajectories for each of the 14 districts (figure 1), and our estimates of key model parameters (table 1) are in line with past studies [28,29]. Note, however, that estimates for the three exponents in the gravity model are not comparable to previous studies [10,30]; this discrepancy is not unexpected as our gravity model is framed differently (see Material and methods).

### 2.3. The Ebola transmission network in Sierra Leone

To sort out the transmission path of Ebola from district to district, we compute the number of new cases infected locally versus externally for each district during each week. This computation allows identification of the source of self-sustained local epidemic transmission, as well as the source of the first cases in each district (see electronic supplementary material, table S3). Here, we focus on the former issue. We posit that importation of cases is critical at the very beginning of each local outbreak before transmission is sustained locally. Accordingly, we define the week that a district acquired over 80% of its new infections locally, following the onset of local outbreak, as the onset of self-sustained transmission (*T*_{ss}), and identify the district that contributed the largest number of new infections prior to *T*_{ss} as the most likely source of infection (see Material and methods and electronic supplementary material for sensitivity analysis).

A clear transmission network emerges (figure 2). Two major transmission paths exist, one spreading from Kailahun to the west and the other from Western Area Urban to the east. The first path emerged on the week ending 25 May 2014 in Kailahun, then spread to adjacent Kenema, over the weeks ending 15 June through 6 July 2014 and produced 21−67% of incidence each week in Kenema during those four weeks prior to *T*_{ss} (the range shows the minimum and maximum weekly contributions over the period, same elsewhere). The other adjacent district, Kono, also imported a substantial number of cases (32−41%) during 25 August−7 September 2014, followed by local, sustained transmission. From Kenema, the Ebola epidemic spread farther westwards to Bo (23 June−7 September 2014, 30−72%), southwards to Pujehun (18 August−21 September 2014, 13−27%) and northwards to Tonkolili (4−24 August 2014, 11−14%). These findings are consistent with a recent field investigation that reported the emergence of Ebola epidemic in Sierra Leone, in particular, the spread from Kailahun to Kenema [31].

The second path initiated from Western Area Urban, which encompasses the Capital Freetown, during 5 July−3 August 2014. This was a relatively quiet period with only two to six cases recorded each week. Ebola incidence then increased rapidly and spread to other districts in western SL. Western Area Rural likely imported a small number of cases from the capital area during the week ending 17 August 2014, prior to *T*_{ss} (see electronic supplementary material, figure S5 and figure S7). Ebola emerged in Port Loko, the largest district near Western Area Urban, around the same time; our estimates indicate that 14−17% of Port Loko cases each week during 21 July−3 August 2014 were imported from Western Area Urban. Port Loko then served as a source of transmission for the region: the epidemic spread from there southwards to Moyamba (1−21 September 2014, 14−23%), likely contributed to the spread of Ebola in Kambia (8 September 2014−11 January 2015, 20−62%), and may have seeded Bombali (the week ending 24 August 2014, 17%). The importation to Bombali was less obvious; two-thirds of simulations (*n* = 300) suggest that the epidemic in the district started locally rather than from Port Loko. These first cases could have originated zoonotically, from outside the country (a source not included in our model), or could have been imported from other districts through longer-distance travel. Regardless, the epidemic then spread from Bombali to Koinadugu (13 October−28 December 2014), and importation from Bombali may have enhanced the transmission in Tonkolili during 1−21 September 2014 (15−17%).

### 2.4. Spatial expansion characteristics

The two transmission paths converged in Tonkolili; however, it took the Kailahun path three months (25 May−24 August 2014) to reach Tonkolili, compared with less than two months for the Western Area Urban path (28 July−21 September 2014). For both paths, spread occurred more readily to adjacent districts, and a strong source region existed along the path, i.e. Kenema in the east and Port Loko in the west, which borders and facilitated spread to many other districts. This finding indicates that control of outbreaks prior to their spread to these critical, well-connected source districts might delay or reduce importation to surrounding districts and reduce overall case levels. Here, we estimate this window of opportunity as having been approximately one month (25 May−6 July 2014, i.e. the time lag of self-sustained transmission between Kailahun and Kenema) for the Kailahun transmission path, but much shorter for the Western Area Urban path, which traversed a region of greater population and connectedness.

To explore potential differences in local transmission characteristics, we estimated the effective reproductive number, *R*_{e}, independently for each district. *R*_{e} is the average number of secondary cases arising from a primary case, and thus an indicator of force of infection. To sustain an epidemic, *R*_{e} should be above 1. *R*_{e} is marginally above 1 for most districts at the onset of local epidemic (table 1); at its maximum, *R*_{e} ranges from 0.85 ± 0.20 (mean ± s.d., same elsewhere) in Bonthe, where only sporadic cases were recorded, to 2.22 ± 0.09 in Western Area Urban, where the largest number of cases were reported. These maximum *R*_{e} estimates are highly correlated with the population size of each district (*r* = 0.76, *p* = 0.0015), and less so with population density (*r* = 0.46, *p* = 0.10). Note this finding is not an artefact of the built-in population information in the gravity network connectivity model (see equation (4.2) in Material and methods), as *R*_{e} was evaluated at its maximum when local transmission has become the major force of infection. Indeed, the most populated districts (figure 2), i.e. Western Area Urban (16.4% of the nation's total population), Kenema (10.3%), Port Loko (8.8%) and Bombali (7.8%) also had the largest *R*_{e} and the highest incidence. This finding indicates that areas with larger populations were more likely to sustain more intense outbreaks and serve as major source regions (e.g. Western Area Urban, Kenema and Port Loko) to neighbouring areas, which is consistent with the transmission network shown in figure 2; it again highlights the importance of early control in these areas. Western Area Rural, where the third largest number of cases was reported, has the second smallest population size. Our estimates indicate that 7.2 ± 2.7% of cases were infected while travelling in the neighbouring epidemic centre, Western Area Urban (table 2).

For each district, the relative contributions of new infections from other districts varied over the course of the epidemic. For instance, Kenema initially seeded infection in Pujehun; however, later in the epidemic, Bo became the primary source of imported infections for Pujehun. In contrast, some districts remained strong sources throughout the epidemic. For example, Kailahun produced 9.0 ± 0.95% of all cases in Kenema, more than half of all imported cases (table 2). Throughout the epidemic, imported infections remained a significant force of infection for most districts, particularly those with moderate outbreaks, e.g. Koinadugu (39 ± 7.9%), Kambia (37 ± 14%) and Moyamba (33 ± 10%) (table 2).

## 3. Discussion

During the 2014–2015 Ebola epidemic, a number of issues complicated and potentially obfuscated modelling efforts. First, owing to the scope of the outbreak and the limited surveillance and public health infrastructure in the three most affected West African countries—Guinean, Liberia and SL—observational error, particularly underreporting [13], was unavoidable. This observational error undermines model inference; however, data assimilation methods, such as the EAKF used here, are equipped to handle and explicitly account for observational error [25]. To further allow for observational error effects, we considered six filter settings, running simulations with three different levels of observation error variance (see Material and methods), and used the agreement of inference among the six settings as an indicator of the confidence, or reliability, of the inference (figure 2).

A second concern involves system stochasticity [32], seemingly random processes that arise owing to variations of behaviour or changes in the intensity of intervention measures. To account for these effects, modelling studies have used stochastic model structures [14,17,20,32]. The SEIR network model used here is deterministic; however, the EAKF, used in conjunction with the model, to some extent introduces stochasticity to the system through its ensemble formulation and the random selection of initial state variable and parameter conditions. Indeed, we alternatively built and tested our network model-inference framework using a stochastic SEIR model as the core (see electronic supplementary material). Findings with this alternative stochastic structure generated similar transmission networks and parameter estimates (electronic supplementary material, figure S4 and table S1).

Additional complicating issues, which were not resolved observationally nor represented in the model structure, include differences in transmission risk in different settings (e.g. community, hospital, versus funeral), heterogeneity in population mixing and contact patterns and resulting variations in infection risk at differing geographical scales, potential asymptomatic infections and pre-existing immunity owing to undetected prior circulation of Ebola in the region. Our parsimonious SEIR-network model does not represent these processes; however, despite its simplicity, the system was able to identify major transmission pathways of Ebola spread between districts as well as key characteristics of source regions. For instance, our inference suggests that districts with higher spatial connectivity and larger populations (e.g. Kenema and Port Loko) could be key regional source districts for control. Future work may construct more complex models to investigate the aforementioned issues, were detailed data available to constrain the system. Future work may also apply the model-filter framework to district-level data for Guinea and Liberia, two other countries with intense Ebola transmission, to further test the findings reported here. In addition, our findings may be tested as more detailed viral sequence data, and field investigations become available.

Detailed, timely understanding of the spatial–temporal progression of (re)emerging infectious disease is needed to devise effective control and containment strategies of these outbreaks. Here, we have presented a novel spatial inference system that enables estimation of this spatial–temporal progression of disease. The method uses only district-level incidence data, as well as the area and population of each district and interdistrict distances. All these data can be compiled in near real-time, even during a public health crisis. The core network model presented here uses a normalized, non-dimensional framework that could be easily applied to other diseases and different regions. In the future, this methodology can be used to help public health officials more effectively combat (re)emerging infectious disease.

## 4. Material and methods

### 4.1. Data

Weekly incidence data for each of the 14 districts in SL were obtained from the Sierra Leone Ministry of Health and Sanitation. These data were compiled based on each patient's district of origin, as opposed to the district of report. For instance, although the index case in SL was sent for testing at Kenema Government Hospital and reported in Kenema district [1], this patient case was originally from Kailahun district and thus counted as a case for Kailahun. Incidence records are the same as released in the Ebola situation reports [33] and include numbers of suspected, confirmed and probable cases, as defined by the World Health Organization [34], from the week ending 25 May 2014 to the week ending 25 January 2015. We combined these three categories for the simulations reported in the main text.

### 4.2. Network transmission model

We used an SEIR model to simulate the propagation of Ebola in each district, as our previous work suggests that parsimonious model forms, when used in conjunction with data assimilation methods and observations, can be more easily constrained [22]. Transmission between districts is formulated as a patch network model [35]. We assume that interdistrict transmission occurs when susceptible individuals travel from their home district to other districts and interact with infectious individuals therein. Specifically, the SEIR-network model is formulated as follows
4.1where the single digit subscripts (*i* or *j*) denote district; *S _{i}*,

*E*,

_{i}*I*are, respectively, the number of susceptible, exposed and infectious people in district

_{i}*i*;

*β*is the local transmission rate in district

_{i}*i*;

*α*is transmission from outside the network domain, e.g. from outside SL or zoonotic spillover, and is arbitrarily set to one per 10 days in this study (we also tested lower values for

*α*and the difference in model estimates was nominal);

*Z*is the incubation period,

*D*is the infectious period, and both variables are assumed the same for all districts; the basic reproductive number for district

*i*,

*R*

_{0i}, is linked to the transmission rate and infectious period through the relationship

*R*

_{0i}=

*β*;

_{i}·D*c*is the proportion of residents of district

_{ji}*i*travelling to district

*j*; a hat sign (^) over a variable denotes the number of people

*present*in the district at a given time point, regardless of home district; is the number of people

*present*in district

*i*at time

*t*, i.e. with

*N*as the population size for district

_{j}*j*and

*n*

_{dist}as the total number of districts (i.e.

*n*

_{dist}= 14 for SL); is the number of infectious people present in district

*i*at time

*t*, and is approximated as

*I*, because symptomatic individuals are less likely to travel to other districts (except for medical transfer for treatment) and cause infections elsewhere. Note that the approximation of with

_{i}*I*here is not a choice of convenience; rather, without this approximation, e.g. calculated as done for , symptomatic, infectious individuals are assumed to have the same interdistrict mobility, which is less likely given the severity of Ebola virus disease.

_{i}### 4.3. Modified gravity model

The quantities *c _{ij}* (

*i*= 1,…,

*n*

_{dist},

*j*= 1, … ,

*n*

_{dist}) in equation (4.1) form a matrix,

*C*= [

*c*]

_{ij}

_{i}_{= 1, …},

*n*

_{dist, j = 1, …},

*n*

_{dist,}that represents the strength of connection between each district pair. Past studies [8,10,24] have used gravity type formulations to compute interlocale commuter flow rates (e.g. equation (2.2)). Population size for each district in SL and interdistrict distances are publicly available [33,36]; however, mobility data, previously used to calibrate the four parameters in the gravity model (equation (2.2)) [10] are not available for SL. As such, we reframed equation (2.2) using scaled, proportions of population size and distance. In so doing, the proportionality parameter

*θ*, which is heavily dependent on commuter data, is eliminated and equation (2.2) becomes 4.2where the new dimensionless model inputs are (note that

*d*, the within-district distance, is computed as , with

_{ii}*N*

_{tot}as the national population in SL, and a proxy radius for the country). These quantities are then further adjusted, so that the

*proportions*of all residents in a given home-district, i.e. those staying local (

*c*) plus those travelling to each of the other districts (

_{ii}*c*,

_{ij}*i*≠

*j*), sum to 1. That is, 4.3

We use the proxy radius for each district (i.e. ) to gauge within-district versus interdistrict mobility, which partly accounts for the effect of population density; that is, residents of more densely populated districts (e.g. the capital region) are less likely to travel to less populated regions (e.g. a remote district). Furthermore, as all inputs are normalized by the characteristic quantities of the country (e.g. area and population of the country), the model framework can be readily applied to other regions. For instance, by using characteristic quantities for Guinea or Liberia, the same model could also be used without relying on mobility data for these countries.

Equations (4.2) and (4.3) are then used to calculate the connectivity matrix needed for equation (4.1). Note, however, by combing equations (2.2), (4.2) and (4.3), the scaling done in equation (4.2) is not necessary, as the normalization in equation (4.3) would cancel out the common factor . The exponents *τ*_{1}, *τ*_{2} and *ρ*, which determine the influence of donor and recipient population size and distance, are estimated by the data assimilation method described below.

### 4.4. Data assimilation method

We applied the EAKF [15,25] in conjunction with the SEIR-network model and weekly incidence records for all 14 districts. The EAKF uses multiple system replicas, termed ensemble members, to represent the prior and posterior distributions of each state variable and parameter. Each ensemble member is initialized as a random draw from an initial distribution for each of the model state variables and model parameters; it is then propagated per the SEIR-network model (prediction step), and adjusted using weekly incidence data from the 14 districts per the EAKF algorithm (update step). This prediction–update cycle is done sequentially, and an update is triggered by the arrival of new data (e.g. weekly in this study). Specifically, the EAKF computes the posterior ensemble mean per the Kalman filter update algorithm [37] as follows
4.4where the subscript *k* denotes week, and obs, prior and post, denote the observation, prior and posterior, respectively; *z* is the observed weekly incidence; *x* is the observed state variable, i.e. the model counterpart of *z* calculated by the SEIR-network model; and are, respectively, the posterior and prior ensemble mean; *σ ^{2}* is variance. Each ensemble member, , is then adjusted towards the ensemble mean as follows
4.5

This adjustment (equation (4.5)) ensures that the posterior variance equals the value predicted by Bayes' theorem [26,38]. Note that the posterior ensemble mean is computed using the linear formula (equation (4.4)) as prescribed by the Kalman filter [37]; however, the propagation of the ensemble per the underlying dynamical model (e.g. the SEIR-network model in this study) preserves much of the nonlinear dynamics of the system.

Initial priors for the model-filter runs were as follows: *S _{i}* ∼ Unif [40%

*N*, 100%

_{i}*N*];

_{i}*R*∼ Unif [0.5, 3.5];

_{i}*Z*∼ Unif [2, 14];

*D*∼ Unif [5, 14];

*τ*

_{1}∼ Unif [0.3, 0.7];

*τ*

_{2}∼ Unif [0.3, 0.7];

*ρ*∼ Unif [2, 8]; initial prior values for

*E*and

_{i}*I*were randomly drawn from a truncated normal distribution (restricted to non-negative values) with mean equal to the first observation, most of which were zero, and standard deviation equal to twice the observation standard deviation. Note that the ensemble can migrate outside these prior ranges over the course of filtering.

_{i}To account for observational errors in the data, the EAKF was run with or without adaptive covariance inflation [39,40], and at three observation error covariance (OEV, i.e. in equations (4.4) and (4.5)) levels, i.e. six settings in total. Covariance inflation intentionally widens the spread of the ensemble prior to avoid filter divergence [41]. The three OEV levels were one, four or four times the observation and thus varied through time with observed incidence. Five hundred ensemble members were used for each ensemble run. To account for the stochasticity in initialization of the model, 50 model-filter ensemble runs, each with a different set of initial model state variables and parameters, were performed for each filter setting (i.e. combination of covariance inflation and OEV), i.e. 300 ensemble runs in total.

### 4.5. Inference of key epidemiological parameters

We applied the SEIR-network-EAKF framework to district-level incidence data from the week ending 25 May 2014, the very beginning of the Ebola epidemic in SL, to the week ending 25 January 2015. The SEIR-network model, used in conjunction with the EAKF and data, constitutes a state-space, or hidden Markov, model that allows estimation of unobserved, or latent, state variables (e.g. numbers of susceptible and exposed people) [42]. Estimates of all state variables (i.e. number of susceptible, infected and infectious, at each week for each district) and model parameters (i.e. the basic reproductive number *R*_{0} for each district, the incubation period, *Z*, the infectious period, *D* and the three exponents in the gravity model) are updated each week through the EAKF filtering process. That is, the parameters are time-varying and reflect potential fluctuations owing to exogenous effectors such as timeliness of reporting/treatment and travel restrictions over the course of the epidemic. However, when the model-filter framework was run using fixed values for the incubation period and infectious period (electronic supplementary material, figure S5), the inferred transmission network was similar as reported in figure 2. We treated *R*_{0} as an independent parameter for each district to explore differences in the force of transmission among districts.

In our model-filter framework, the filter can adjust the population susceptibility to reflect changes in population vulnerable to infection. For a disease such as Ebola, a decrease in population susceptibility over time may occur owing to an increase of the number of infected (i.e. depletion of actual susceptibles) or from an increase in awareness of the disease and precautions and intervention measures taken that reduce the chance of transmission and effectively remove individuals from the susceptible pool [43]. In addition, our previous study suggests that population susceptibility and the basic reproductive number tend to compensate for each other, whereas estimates of the effective reproductive number (calculated as *R*_{e} = *R*_{0}*S*/*N* for a non-network model) are generally more accurate [23]. As such, we here focus on the effective reproductive number *R*_{e}, and did not analyse the population susceptibility or the basic reproductive number *R*_{0}. The effective reproductive number, *R*_{e}, represents the number of secondary cases arising from a primary case during the course of the epidemic. It is an indicator of the observed force of transmission, given population susceptibility, intervention measures in place and the transmissibility of the infecting pathogen. In this study, we calculated *R*_{e} for each district as

We defined the week with the maximum *R*_{e} as the time of maximum epidemic forcing. Estimates of *R*_{e} at local outbreak onset, *T*_{ss}, the time of maximum epidemic forcing and the week ending 25 January 2015 for each district are shown in table 1. The incubation period, *Z*, infectious period *D*, and the three exponents for the gravity model were assumed the same for all districts; these parameters are presented using the time points determined for the Western Area Urban, the district with the largest number of Ebola cases in SL.

### 4.6. Construction of the transmission network among districts

We posit that cross-district transmission is critical at the very beginning of a local outbreak prior to sustained local transmission. Accordingly, the most likely source of infection (referred to as the source-district) is identified as the outside district that contributes the largest number of new infections during the period from the onset of the local outbreak to the week prior to self-sustained transmission (*T*_{ss}). We defined the onset of a local outbreak as the first of three consecutive weeks with non-decreasing incident cases and the onset of *T*_{ss} as the week in which more than 80% of incidence occurs locally. These identified timings for each district are shown in electronic supplementary material, table S2.

To construct the transmission network among the 14 districts in SL, we first computed the numbers of new cases infected locally as well as in each of the other 13 districts during each week for each district based on equations (4.1)–(4.3) and the state variables and parameters estimated by the filter. We then recursively identified each district's source-district until no further source-district could be identified. The terminal source-district is the source-district of a transmission path. For each identified source-district, we also calculated the percentage of new infections contributed to a given sink-district from local outbreak onset to the week prior to *T*_{ss}. This route-searching procedure was carried out separately for each of the 300 ensemble runs.

Some pathways were identified by all ensemble runs, whereas some only by a fraction of runs. Transmission paths identified for ensemble runs carried out with the same filter setting (i.e. the same covariance inflation and OEV options) showed greater agreement than among those with different filter settings. Therefore, we grouped the ensemble runs to identify transmission paths by each filter setting and used the agreement among filter settings as an indicator of the reliability, or confidence, in the inference (figure 2). Greater agreement for a particular transmission path (e.g. all six filter settings identifying the same path) suggests higher confidence in the inferred path and vice versa.

## Authors' contributions

W.Y. and J.S. conceived the study; W.Z., D.K., R.Y., Y.C., Z.C., A.Kam, B.K. and C.L. collected the data; W.Y., S.K., A.Kar. and J.S. developed the method; W.Y. performed the experiment; W.Y. and J.S. analysed the inference and drafted the paper; all other authors commented on the paper.

## Competing interests

J.S. discloses consulting for J.W.T. and Axon Advisors, as well as partial ownership of SK Analytics.

## Funding

This study was supported by US National Institutes of Health (GM100467, GM110748, GM088558 and ES009089) and the Research and Policy for Infectious Disease Dynamics (RAPIDD) programme of the Science and Technology Directorate, US Department of Homeland Security.

## Acknowledgements

We thank all personnel involved in collecting the data and combating the Ebola epidemic.

- Received June 16, 2015.
- Accepted October 22, 2015.

- © 2015 The Author(s)