## Abstract

Ecological systems with threshold behaviour show drastic shifts in population abundance or species diversity in response to small variation in critical parameters. Examples of threshold behaviour arise in resource competition theory, epidemiological theory and environmentally driven population dynamics, to name a few. Although expected from theory, thresholds may be difficult to detect in real datasets due to stochasticity, finite population size and confounding effects that soften the observed shifts and introduce variability in the data. Here, we propose a modelling framework for threshold responses to environmental drivers that allows for a flexible treatment of the transition between regimes, including variation in the sharpness of the transition and the variance of the response. The model assumes two underlying stochastic processes whose mixture determines the system's response. For environmentally driven systems, the mixture is a function of an environmental covariate and the response may exhibit strong nonlinearity. When applied to two datasets for water-borne diseases, the model was able to capture the effect of rainfall on the mean number of cases as well as the variance. A quantitative description of this kind of threshold behaviour is of more general application to predict the response of ecosystems and human health to climate change.

## 1. Introduction

Ecology is pervaded by models with threshold behaviour, that is, models showing drastic shifts in population abundance or species diversity in response to relatively small changes in the parameters that control their dynamics. Resource competition theory, for example, predicts that microbial community should change drastically when resource ratio crosses a critical point where growth limitation shifts from one resource to another (Grover 1997). Epidemic theory predicts that parasites should persist only if host population density increases above a critical value (Anderson & May 1992). Population dynamic models also describe drastic population responses to environmental factors. Ellis & Post (2004), for example, show that the effect of temperature and density on wolf population is dramatically different below and above a critical density value. Below this threshold, population density is positively and linearly associated with temperature due to its effect on prey availability. Above the threshold, the effect of temperature becomes weak when compared with density effects. Grenfell *et al*. (1998) and Stenseth *et al*. (2004) describe a similar threshold behaviour for soya sheep in islands. Jacobson *et al*. (2004) show that the rate of population increase of ungulates changes drastically as snow depth crosses a critical point. The existence of such thresholds has significant consequences for the impact of climate change on ecological systems.

Classical approaches for modelling systems with threshold behaviour assume the existence of two (or more) non-interacting regimes: ‘low-density’ and ‘high-density’ (Ellis & Post 2004) or ‘regular’ and ‘explosive’ (Engel *et al*. 2001). Depending on the current regime, the system's response to covariates is governed by a different probability distribution or a different dynamic equation. The value of the threshold is often fixed and deterministic, and identified from data by visual inspection of scatter plots (Jacobson *et al*. 2004) or by investigating the threshold value that optimizes the fit of piecewise functions (Grenfell *et al.* 1998; Engel *et al*. 2001). This approach, however, often hides several difficulties that arise when either looking for theoretically predicted threshold phenomena in real datasets or representing observed thresholds using mathematical models. First, in real data, the existence of thresholds may be blurred by stochasticities and inadequate data resolution (Lloyd-Smith *et al*. 2005) which may soften the observed transition between regimes, making the identification of critical points a difficult task. Moreover, the variability in the system's response and not just its mean can vary drastically across a threshold. Figure 1 shows two examples of this pattern that will be considered in this paper. Similar patterns arise in the abundance of blue-green algae as a function of N : P ratios in lakes (Smith, 1983).

Here we propose a statistical modelling framework that allows for a flexible treatment of the transition between regimes, including different degrees of sharpness in the transition as well as drastic changes in system variability. The proposed model assumes two underlying stochastic processes, each of which dominates at one end of the environmental spectrum. Between these two extremes, the observations are a mixture of the two processes and the mixture proportion is a function of the driver. Thus, the transition between states can be more or less gradual and the actual shape of the transition is determined from the data as a function of the covariates. With this approach, we model situations where shifts between regimes do not follow a simple on and off response.

We illustrate the model with an application to time-series data for two water-borne diseases, leptospirosis and cholera, and their seasonal response to rainfall. The proposed threshold model captures not only the change in mean disease levels, but also their variances, as a function of rainfall, providing a functional form for the stochastic transition between regimes.

## 2. Two examples of environmental thresholds

Figure 1 shows scatter plots of disease versus rainfall for leptospirosis in Salvador, Brazil (Flannery *et al*. 2001) and cholera in Parganas, Bangladesh (Pascual *et al.* 2002). Leptospirosis is a water-borne acute bacterial disease that affects both humans and animals. Human infection occurs by contact with rodent urine-contaminated environment, surface water, soil and objects (WHO 2005). In many tropical urban centres, leptospirosis is an endemic disease, presenting a baseline incidence throughout the year and sporadic outbreaks after heavy rains, possibly due to the increased contact of people with contaminated flood water (Sarkar *et al*. 2002; Karande *et al*. 2003).

Historically, large cholera outbreaks in Bangladesh have been associated with deficient monsoon rainfalls (see Pascual *et al*. 2002, for a review of environmental drivers). This pattern has been classically attributed to the short supply of potable water during droughts. Similarly, the typical seasonal pattern in this endemic region involves two peaks per year with a drastic reduction in cases during the monsoons. A number of other hypotheses have been proposed linking rainfall to seasonal cholera cases through environmental factors that affect the concentration of the pathogen in aquatic environments (brackish and freshwater). Larger inocula of *Vibrio cholerae* in the environment would result from warming up of the water and associated plankton blooms outside the rainy season (Lobitz *et al*. 2000). Increased rainfall would dilute the concentration of the pathogen through an increase in volume of the environmental reservoir, lower salinity (below 15 0/00; Miller *et al*. 1982), lower pH (below 8.5; Huq *et al*. 1984) and reduced numbers of lysogenic vibriophages in the water (Faruque *et al*. 2005). Since a heavy bacterial load must be consumed to initiate a life-threatening case of cholera, the number of total deaths by cholera goes down in response to this dilution effect. A negative effect of rainfall has also been described for other regions of the world where periodic floods occur, as in the region of Madras, India (Pascual *et al*. 2002) and the Brazilian Amazon (Codeço 2001). However, the relationship between rainfall and cholera is complex with historical patterns of positive seasonal association in dry regions such as the Punjab and the northern districts of Madras where intermittent outbreaks occur (Pascual *et al*. 2002), as well as in more endemic, less dry regions at longer lags in southern Madras (Ruiz-Moreno *et al*. 2007). Evidence for positive interannual associations have also been recently described between floods and cholera transmission following El Niño events in recent decades in a rural area south of Dhaka (Koelle *et al*. 2005).

Despite the anecdotal evidence and the many known mechanisms by which rainfall can drive the seasonal outbreaks of water-borne diseases, a quantitative analysis of the nonlinear response of incidence to this environmental driver is still missing. Here, we illustrate a positive response of leptospirosis, with increased incidence after heavy rainfall in Salvador, Brazil, and a negative response of cholera, with a decrease in cases during the monsoon season in Bangladesh.

Both diseases show a different response at low and high values of the environmental driver and a sharp transition between these behaviours. At one extreme, the number of cases are consistently low (relatively to the average), while at the other extreme, a mixture of high and low cases is found. With the threshold model, the objective is to describe the effect of the environmental driver not only on the expected number of cases but also on the system's variability.

### 2.1 Threshold models: description and inference

To identify and describe the threshold behaviour suggested by the above disease patterns, we introduce a threshold model that incorporates two underlying processes and considers that the observed response is a mixture of these two processes. At extreme values of the environmental driver, one of the processes predominates. Between these two extremes, the observations are a mixture of the two processes and the mixture proportion is a function of the values of the environmental covariate.

McLachlan & Peel (2000) provide a good introduction to the general theory of probabilistic mixtures of distributions and likelihood-based inference. We use here the theory of estimating functions (Godambe 1991), instead of applying the full probabilistic modelling approach and likelihood-based inference, and consider only the mean and the variance functions of the process to estimate the parameters of the model.

Let *μ*_{1}, *μ*_{2} denote the mean responses and , denote the variances of the two processes, respectively. Let *π*(*z*) denote the mixture proportion that is a function of the observed covariates *Z*. Then, the mean and the variance of the observed mixed process are given by(2.1)

(2.2)

In the water-borne diseases we consider here, we model the individual processes such that and . This is similar to modelling the component processes as having a Poisson distribution. One may also consider other mean and variance functions. Furthermore, note that if one takes *μ*_{1}=*μ*_{2}, this approach allows one to model observations where the variance of the observed process is a mixture of the variance of the component processes depending on the covariates, but the mean is unaffected by these values. We model the proportion in the mixture using the logistic function, namely,(2.3)This flexible function takes values in the (0,1) range and allows for sharp or slow changes in the mixture proportion. Of course, any other function that takes values between 0 and 1 may also be used. In our examples, the change in the mixture proportion as a function of the covariate is fairly sharp and hence we call these models ‘threshold models’.

Let us denote the set of parameters that we want to estimate as *θ*. To estimate *θ*, we minimize the objective function as follows (Lele & Taper 2002):(2.4)with respect to the parameters. The theory of estimating functions can be used to prove that the estimators obtained by this procedure are consistent and asymptotically normal. The standard errors and confidence intervals (CI) for the parameters are obtained by the parametric bootstrap approach (Efron & Tibshirani 1993).

## 3. Results

Table 1 shows the point estimates and their bootstrap 95% CI for the two datasets. All calculations were performed using the R v. 1.5 statistical package (scripts are available upon request).

For leptospirosis, we found an expected incidence of 1.77 cases per week in the drought scenario, and 13.8 cases per week after extreme floods. The other parameters, *a* and *b*, describe the effect of the environmental driver on the mixture of the two processes. Since rainfall is associated with increasing reporting of leptospirosis cases, parameter *b* is positive. Figure 2*a* shows the estimated mixture function for leptospirosis. The abrupt change in the mixture proportion when log(rainfall) exceeds 2 supports the hypothesis of a threshold response to rainfall.

For cholera, the expected number of deaths in the drought scenario is 2214 (95% CI: 2132, 2348) per month. During extreme floods, deaths go down to an average of 157 (95% CI: 156,159). The mixture function shows a negative association between rainfall and deaths (figure 2*b*), and the estimated response is less steep than that for leptospirosis.

To evaluate goodness of fit, we compared empirically measured and model-derived estimates of mean number of cases per level of rainfall intensity, as well as its standard deviation (figure 3). Empirical estimates were obtained by calculating mean and standard deviations for the observations within the rainfall intervals: [0,1], [1,2], …, [5,6]. For the threshold model, mean and average were calculated using fitted values within the same intervals. We further included, for the sake of comparison, estimates from a standard Poisson regression model in which expected cases are considered as a function of rainfall. From the results, it is clear that the threshold model captures the effect of rainfall reasonably well on the expected number of cases and the variance, for both diseases. By contrast, the simple Poisson regression model captures well only the expected value but not the variability.

## 4. Discussion

The probabilistic model introduced here allows the estimation of nonlinear functional responses of ecological systems to environmental variables. Contrary to previous approaches, this model is capable of capturing sharp transitions between system states without resorting to *a priori* definition of cut points. The presence of thresholds is left to the user as it is not imposed by the model. For example, in the leptospirosis case, the steep mixing function suggests the presence of a threshold response to rainfall; hypothetically, when rainfall exceeds the structure of the city's run-off system, flooding occurs. In the absence of flooding, leptospira contamination is restricted to a few sites; in its presence, contamination spreads to large areas and the number of exposed individuals increases drastically. From the estimated mixing function, a threshold point for hospital alerts could be determined and used in combination with rainfall forecasts.

On the other hand, a less sharp mixing function is estimated for cholera and may not be compatible with a mechanistic threshold (on–off) response. The relationship between rainfall and cholera is known to be more elusive, involving both positive and negative responses depending on geographical region, temporal lag and overall water availability in the environment (Pascual *et al*. 2002). The negative nonlinear response at zero lag described here for seasonal cholera to rainfall in Parganas is also present in other districts of former Bengal (results not shown). Future work should examine the basis for a positive effect of rainfall in other regions (e.g. Punjab), for longer lags than that of the dilution effect itself (e.g. Madras; Ruiz-Moreno *et al*. 2007), and at longer (interannual) temporal scales as indicated by the recently reported association between floods and cholera transmission in Matlab, Bangladesh (Koelle *et al*. 2005).

As mentioned in §1, sharp transitions between regimes have been observed in many ecological and epidemiological systems. For epidemiological systems, the identification of thresholds is important for the development of climate-based alert systems (Curriero *et al*., 2001; Charron *et al*., 2004; Grover-Kopec *et al*. 2005; Platonov 2006); for ecological systems, to evaluate the impact of environmental changes on ecological dynamics. More generally, a characterization of the nature of the transition between states may serve as a basis for the development of more mechanistic models for intervention analysis.

The threshold model we have developed is static in the sense that it does not involve the temporal dynamics of the system itself. As such, the model is concerned with the estimation of functional responses to environmental drivers and differs in scope from the dynamical models of Grenfell *et al*. (1998) and Stenseth *et al*. (2004). The latter are time-series autoregressive models in which the threshold is not a function of an environmental covariate but of the state of the system itself, specifically of the (log) population density in the previous year. The population evolves in time according to a piecewise linear equation whose parameters and additive noise level vary across the threshold. Future developments of our model should investigate two possible directions. The first one is to incorporate probabilistic threshold responses to environmental covariates within time-series models for ecological dynamics. This could be done within the framework of new inference methods for stochastic dynamical systems (e.g. Ionides *et al*. 2006) to examine, for example, the association of residuals with environmental covariates. The second one is to incorporate the type of threshold considered here, involving a mixture of stochastic processes and changes in the variance, within time-series models in which the autoregressive variable itself exhibits a threshold, as in Grenfell *et al*. (1998) and Stenseth *et al*. (2004). More generally, the resulting approaches should be applicable to threshold dynamics in other types of ecological systems (e.g. Scheffer & Carpenter 2003).

## Acknowledgments

This project was carried out as part of the working group on ‘Seasonality and the population dynamics of infectious diseases’ supported by the National Center for Ecological Analysis and Synthesis, a centre funded by NSF, the University of California, and UC Santa Barbara. The leptospirosis project is supported by grants of the Brazilian National Research Council (300.861/96-6) and the National Institutes of Health (AI-052473 and TW-00919). The cholera research was supported by joint funding from the National Science Foundation, National Institutes of Health (Ecology of Infectious Diseases Grant EF 0430 120) and the National Oceanic and Atmospheric Administration (Oceans and Health Grant NA 040 AR 460019; to MP). The first author held an International Fellowship from the Santa Fe Institute during this project.

## Footnotes

- Received June 10, 2007.
- Accepted July 12, 2007.

- © 2007 The Royal Society