## Abstract

Population extinction is a fundamental biological process with applications to ecology, epidemiology, immunology, conservation biology and genetics. Although a monotonic relationship between initial population size and mean extinction time is predicted by virtually all theoretical models, attempts at empirical demonstration have been equivocal. We suggest that this anomaly is best explained with reference to the transient properties of ensembles of populations. Specifically, we submit that under experimental conditions, many populations escape their initially vulnerable state to reach quasi-stationarity, where effects of initial conditions are erased. Thus, extinction of populations initialized far from quasi-stationarity may be exposed to a two-phase extinction hazard. An empirical prediction of this theory is that the fit Cox proportional hazards regression model for the observed survival time distribution of a group of populations will be shown to violate the proportional hazards assumption early in the experiment, but not at later times. We report results of two experiments with the cladoceran zooplankton *Daphnia magna* designed to exhibit this phenomenon. In one experiment, habitat size was also varied. Statistical analysis showed that in one of these experiments a transformation occurred so that very early in the experiment there existed a transient phase during which the extinction hazard was primarily owing to the initial population size, and that this was gradually replaced by a more stable quasi-stationary phase. In the second experiment, only habitat size unambiguously displayed an effect. Analysis of data pooled from both experiments suggests that the overall extinction time distribution in this system results from the mixture of extinctions during the initial rapid phase, during which the effects of initial population size can be considerable, and a longer quasi-stationary phase, during which only habitat size has an effect. These are the first results, to our knowledge, of a two-phase population extinction process.

## 1. Introduction

Factors believed to affect a biological population's persistence include spatial fragmentation [1], habitat size [2–5], Allee effects [6], the degree of genetic inbreeding [7,8], environmental and demographic stochasticity [9–15], and initial population size [16,17]. While the effects of most of these are complex and depend on interactions with other conditions, there is broad theoretical consensus that the direction of effect for one of these factors, initial population size, is general [17,18]: populations starting with a small initial size will reach extinction sooner, on average, than larger ones owing to their greater vulnerability to demographic stochasticity [19–21].

Indeed, a positive monotonic relationship between initial population size and expected extinction time is predicted by virtually all standard population models. The paradigmatic model in which small size vulnerability is due to demographic stochasticity is the general birth–death process [22]. This model is a special case of the continuous-time Markov process where states represent the current size of the population and where the transitions are limited to births and deaths of biological individuals. For this model, the effect of initial population size is proved by monotonicity in the expression for the time-dependent probability of extinction [22,23]. By comparison, diffusion processes are standard when the extinction hazard is affected by random environmental variability [10], in which case the effect of initial population size is implied by a continuity condition assumed in the derivation of the model [24]. The majority of stochastic population models fall into one of these two classes. While exceptional counter-examples might be contrived (for instance, stochastic discrete-time recursion equations with strong nonlinearity), since both the general birth–death process and the general diffusion process entail the initial population size hypothesis, we conclude that the phenomenon is a general prediction of theory.

Despite this generality, few studies have sought to empirically test this theory and the limited evidence to date is mixed. Specifically, observations and experiments conducted in the field have supported the prediction that time-to-extinction increases with initial population size, while laboratory experiments have failed to detect an effect. Examples of supporting studies include 355 populations of birds on the British coast [16], bighorn sheep in western North America [25], and experimental introductions of mice [26], voles [27] and chrysomelid beetles [28]. In summary, both observations and experiments in field settings support the initial population size hypothesis, although the mechanisms by which small initial size exposed these populations to the extinction hazard are obscure. By contrast, laboratory experiments, in which the mechanisms of growth and decline have been elucidated, have typically contradicted this pattern (reviewed in [4]). Thus, for instance, Belovsky *et al*. [29] and Burkey [30] failed to obtain evidence that initial population size influenced extinction risk in experiments with brine shrimp (*Artemia franciscana*) and predatory ciliates (*Euplotes aediculatus*) that were specifically designed to measure the magnitude of the effect.

We propose to explain both sets of observations as special cases of a more general extinction process, which (for small populations) consists of two phases. In an early phase, small populations are vulnerable to rapid extinction, the probability of which scales with initial population size according to the power law, *p* = (1/*ξ*)^{x0}, where *ξ* > 1 is the ratio of the birth rate to death rate at small population sizes and *x*_{0} is the initial population size [31,32]. While we do not aim to test this law directly, it underlies the two-phase phenomenon which is our main focus. The second phase occurs in those populations that either begin at population sizes close to their theoretical long-run means or escape early extinction with probability 1 − *p* to achieve quasi-stationarity where effects of initial population size are vanishingly small [32–34].

The need for the concept of quasi-stationarity derives from the fact that, for closed populations, extinction is an absorbing state so that fluctuations in all non-extinct populations are transient in the sense that ultimate extinction is certain [35,36]. To be more precise, we will model population dynamics as a stochastic process. Defining a population as a group of organisms of a species occupying a defined area and isolated from other similar groups [37], we denote the number of individuals in population *i*, its abundance at time *t* by *X*_{i}(*t*) ∈ IN_{0}, IN_{0} = {0,1,2,…}. The fluctuations of the population are recorded in the vector *X*_{i} = {*X*_{i}(*t*), *t* ∈ *T*}, where *T* is the index set (‘time’), which may be either discrete or continuous. The preceding definition assumes that the process is confined to a discrete state space, as in the standard birth–death theory [22,23]. In the diffusion theory (which is sometimes derived as a continuous-state approximation (e.g. [10]), the state space IN_{0} is replaced by the non-negative real numbers IR_{0}. The extinction time of the *i*th population is defined as . Together, these definitions express the belief that population fluctuations may be interpreted as realizations of a stochastic process and that the observed extinction time is a stopping time of the stochastic process. While in theory populations are terminated only by extinction, in practice there may be other stopping criteria. Particularly, censoring occurs if observation is ceased prior to extinction. Following the conventions of survival analysis, we denote the censoring time of the *i*th population by *c*_{i} and let , the follow up time, and , the status indicator which takes the value 1 if is observed and 0 otherwise. Importantly, biological theory primarily concerns the distribution of *τ** from realizations of the stochastic process ** X** (and particularly its expectation, the mean extinction time), while what is actually observed is instances of the pair (

*τ*

_{i},

*δ*

_{i}). The quasi-stationary distribution of

*X*may be defined as . (In the diffusion theory, IN

_{1}is replaced by the non-negative real numbers IR

_{+}.) When the characteristic time of convergence to quasi-stationarity is sufficiently less than the extinction time (convergence to stationary state), a notion which may be made precise in the (finite) discrete space case by noting that these two characteristic times are given by the dominant and subdominant eigenvalues of that portion of the intensity matrix of the stochastic process

**corresponding to the set of transient states [35], a two-phase extinction process may occur. Indeed, in special cases, the extinction time distribution may even be bimodal [32].**

*X*As an illustration, we simulated populations subject to a density-dependent birth–death process with birth rate 1.1and death rate 1.2

The names ‘birth rate’ and ‘death rate’ for the functions *b*(*x*) and *d*(*x*) are standard in the stochastic theory to refer to rates of increase and decrease in the population with respect to time, despite the fact that they may not be directly measurable from observations of individuals. In this model, the parameter *b*_{0} is known as the *per capita birth rate* because it expresses the speed of reproduction on an individual basis, which typically is measurable. Where *d*_{0} = *b*_{0}, the difference of the rates gives the average growth rate of the population in continuous time and is readily rearranged to reveal that the average process is equivalent to the familiar logistic model . The parameter *k* > 0 is an upper equilibrium of this model and is therefore often referred to as the *carrying capacity*. This model has been well studied in the context of ecological applications, for instance by Allen [31]. We use it here because, despite its simplicity, it gives rise to dynamics that are qualitatively similar to those observed in our experimental system and because it demonstrates the two-phase extinction process, unambiguously illustrated here by bimodality in the extinction time distribution, when initialized at *x*_{0} = 1, far from quasi-stationarity (figure 1*a*).

The two-phase process can be understood more generally by distinguishing populations on an individual basis that ‘reach quasi-stationarity’ from those that go extinct rapidly (despite the fact that the boundary between these two phases may not be sharp) and view the total extinction time distribution as a mixture of extinction times for the two groups. For populations initialized below the quasi-stationary mean, the probability that a population goes extinct rapidly decreases as the initial population size approaches the mean. The effect on the extinction time distribution for both classes together is that an early signal of initial population size will be lost as the distribution of population size in remaining populations increasingly approaches the quasi-stationary distribution. It follows that populations going extinct early in the process should reveal an effect of initial population size on extinction, while later extinctions will mostly be populations from the quasi-stationary group and will not exhibit an effect of initial population size. Equivalently, a two-phase extinction hazard in which an initial effect of initial population size is washed out over time may be interpreted as the signature of escape to quasi-stationarity.

This argument provides both an explanation for the discrepancy between extinctions observed in the field and extinctions in laboratory populations and a method for detecting effects of small initial population size on extinction in replicated populations. Specifically, the difference may be that the effect is extremely subtle when either initial population size is large or population growth rate is large (or both), i.e. the initial population sizes of 5, 10, 15 or 20 individuals [29] and 4–91 individuals [30] used in laboratory experiments are too large relative to the population growth rate (or too close to the quasi-stationary mean population size) to exhibit detectable variation. In laboratory populations, environmental conditions are often ideal for reproduction, exaggerating *per capita* birth rate, but also spatially restrictive, imposing a low carrying capacity. In general, when population growth rates are large or carrying capacities are small (relative to initial population size), the variation in mean extinction time will be minimized, dramatically reducing the statistical power of experiments to detect an effect of initial population size on extinction time even when it exists. This suggests that experimentally detecting an effect of initial population size on extinction risk will require conducting studies with extremely low initial population sizes under relatively hostile conditions.

We experimentally investigated this explanation using laboratory populations of the water flea *Daphnia magna. Daphnia magna* is a cyclically parthenogenetic, planktonic crustacean inhabiting lakes and ponds throughout North America and Europe. Populations were started with *x*_{0} ∈ [1,2,3,4,5] and reared on a food resource of the relatively non-nutritious blue-green alga *Spirulina* sp. Two trials were conducted. One investigated the role of initial population size in isolation. The other trial combined initial population size with habitat size, a factor that has known effects on extinction in this system [5]. The outcome provided evidence for a very rapid early transient phase during which initial population size had a pronounced effect on the extinction hazard, followed by a second phase during which the extinction hazard was virtually constant, consistent with having settled to a quasi-stationary distribution.

## 2. Material and methods

### 2.1. Study species

Study organisms originated from a single female of the freshwater cladoceran *D. magna* reared on a solution of freeze-dried *Spirulina. Daphnia magna* is well suited to experimental studies of population extinction owing to its short generation time (approx. 10 days), tolerance to laboratory conditions and ability to reproduce through parthenogenesis. Further, since reproduction under laboratory conditions is clonal, this system enabled focus on the demographic mechanisms postulated here, excluding confounding genetic explanations.

### 2.2. Experiment

A few days prior to each trial, newborn *D. magna* females were selected from laboratory stock populations and isolated, grouped by day of birth. Meanwhile, 700 ml and/or 1400 ml (depending on trial) Plexiglas experimental chambers (figure 2) were arranged on a laboratory bench and filled with synthetic hard water medium [38]. Once a sufficient number of *D. magna* neonates had been obtained, experimental chambers were inoculated with an initial population of randomly selected individuals (number of individuals determined by treatment) and blocked by neonate age (either 1, 2 or 3 days old at the start of the experiment). In an effort to study interactions between initial population size and habitat size, experimental chambers used for the two trials differed. Specifically, trial I comprised *n* = 89 experimental chambers (all 700 ml), divided into three blocks of 30 based on the date of birth evenly distributed among treatments (*n* = 6 per treatment per block, except for one replicate of *x*_{0} = 1, which was not inoculated owing to laboratory error); trial II crossed initial population size with a habitat size treatment to determine which factor had a greater effect at different times following inoculation. In this trial, 30 small (700 ml) and 30 large (1400 ml) experimental chambers were all inoculated on the same day (*n* = 6 for each of *x*_{0} ∈ [1,2,3,4,5]).

Each chamber was visually inspected every day to determine whether extinction had occurred. Some chambers in each trial (35 in trial I, eight in trial II) became contaminated by a live alga part way through the experiment. These chambers were considered to be censored in the sense defined above on the day contamination was first detected. Thus, our data are in the form of pairs (*τ*_{i}, *δ*_{i}) from which we wish to learn about the distribution of *τ**. Both trials were continued until all populations went extinct or were censored owing to algal invasion. Populations were fed daily 200 µl of a suspension produced by mixing 0.1 g dried alga (JEHM Co., Inc.; 10.16% N and 44.98% C) into 50 ml of deionized water. Thus, this experiment satisfied the anticipated requirements for detecting an effect of initial population size on extinction risk by using very low initial population sizes and by establishing a hostile environment where low-quality food was provided at minimal levels.

### 2.3. Statistical analysis

Treatment effects were tested using Cox proportional hazards regression models [39]. This family of models assumes that all experimental subjects (populations of *Daphnia* in our case) are exposed to a hazard function of the form
2.1where *λ*_{0} is an unknown baseline hazard shared by all populations, *X*_{i} is a vector of covariates for population *i* and ** β** is a column vector of estimated coefficients. The hazard function may be interpreted as the instantaneous extinction rate or (by analogy to the concept of the force of mortality in demography) the

*force of extinction*. The force of extinction may be interpreted probabilistically as the limit of

*p*

_{Δ}

_{t}, the probability of extinction in the interval (

*t*,

*t*+

*Δ*

*t*) conditional on persistence of the population to time

*t*, as

*Δ*

*t →*0. Equation (2.1) encapsulates the key assumption, referred to in the literature as

*proportional hazards*. This model assumes that covariates act multiplicatively to affect the hazard function perceived by a cohort of identical subjects. While this assumption is sometimes questionable on theoretical grounds [40], it is often the case that transformations of the covariates enable the proportional hazards assumption to be met on empirical grounds (i.e. statistical tests fail to reject the null hypothesis that the proportional hazards assumption is met). Additional assumptions of this model include the requirements of non-informative censoring and homogeneous initial conditions (i.e. lack of heterogeneity, which would require an accelerated failure time model), achieved in our case through experimental control and design, and non-time-covarying covariates (i.e. temperature and food supply were constant through time). The main benefit of the proportional hazards model is that all effects may be estimated relative to an unknown baseline hazard via maximization of the partial likelihood function.

In our study, changes in population size are large at the beginning of the experiment and our analysis therefore focuses on distinguishing extinctions realized during this period from later times. Following Therneau & Grambsch [39], we used analysis of the scaled Schoenfeld residuals to test the proportional hazards assumption and to diagnose the time-varying effects that we have argued are diagnostic of a two-phase extinction process. Scaled Schoenfeld residuals are partial residuals obtained from the differences between the time of observed events and their conditional expectations based on the risk set (the set of subjects under observation at the time of each extinction event) scaled by the variance of the covariate vector. Suppose the true model has a time-varying coefficient *β*(*t*). Then, the expected value of the residual is approximately equal to this true value minus the fit coefficient from the standard Cox model (unnumbered equation at the bottom of Therneau & Grambsch [39, p. 130]). It follows that violation of proportional hazards may be inspected by serially plotting the scaled Schoenfeld residuals (plus an offset for the coefficient from the ordinary Cox regression) against time. Our interpretation follows Therneau & Grambsch [39] in their example study of lung cancer in 137 patients participating in a clinical trial [39, p. 135 ff.]. Following Lin [41], a time-rescaled process is obtained from the Kaplan–Meier estimates at the observation times (e.g. test 4 in §2 of Grambsch & Therneau [42]), which Therneau & Grambsch [39, p. 136] report to be relatively less sensitive to censoring patterns and to more evenly distribute the scaled residuals than alternatives. For diagnosis, we plot the spline-based smooth of the residuals against time (solid line) ±1 s.e. intervals (dashed lines). If the proportional hazards assumption is not violated, this smooth fit will be a horizontal line with intercept equal to the constant coefficient *β*. Estimates were obtained with the function `plot.cox.zph` in the R package `survival`.

To illustrate the expected outcome of this analysis under the assumptions of the demographic theory, we simulated populations using the density-dependent birth–death process described by equations (1.1) and (1.2) above, inoculated at *x*_{0} ∈ [1,5] with parameters *b*_{0} = 0.015 and *k* = 10. Figure 1*b* shows that non-stationarity of the scaled Schoenfeld residuals of a Cox proportional hazards regression is diagnostic of the transient behaviour of ensembles initialized far from stationarity. Specifically, the levelling-off of the mean (solid line in figure 1*b*) indicates that the escape to quasi-stationarity has occurred. We attempted to use piecewise regression to statistically estimate the location of these turning points, but found available estimation procedures to be unstable with respect to the relatively small datasets collected here. As described below, log-transformations of extinction time data commonly helped to meet the proportional hazards assumption and were typically applied. Trials I and II were first analysed separately and then pooled for a final analysis.

## 3. Results

### 3.1. Trial I

Initial population size was the only covariate in trial I. As a baseline for comparison, we first fit a standard Cox proportional hazards model (natural scale, stratified by block), which failed to detect an experimental effect (*p* = 0.24; electronic supplementary material, table S1). Neither a spline model, used to check for nonlinearity, nor log-transformation of initial population size improved on this evidence for an effect at the statistically significant level of *α* = 0.05 (*p*_{spline} = 0.18, electronic supplementary material, table S2; *p*_{log} _{x}_{-form} = 0.21, electronic supplementary material, table S3). A Schoenfeld residual analysis to test the proportional hazards assumption revealed strong evidence that the proportional hazards assumption was violated by these data (*χ*^{2}-test, *p* = 0.011), and further inspection of the residual plot (figure 3) suggested that a systematic deviation from the proportional hazards assumption occurred particularly during the first 8 days of the experiment, i.e. the mean of the residuals (solid line in figure 3) levels off at *t ≈* 8. The residual pattern shown by this plot closely follows the analysis of the simulated data in figure 1*b*. Particularly, the estimated negative coefficient during the initial period indicates that the deviation was in the expected direction: as initial population size increased across treatments, the estimated effect of initial population size approached zero, as expected for a population initialized close to quasi-stationarity. For times later than *t* ≈ 8, the estimated effect of initial population size was negligible.

We therefore repeated our original Cox analysis, censoring all data by day 8 (persistent or extinct) and found a significant effect of log-transformed initial population size on extinction time (*p* = 0.01; electronic supplementary material, table S4). Analysis restricted to populations persisting beyond day 8 (*n* = 66) failed to provide evidence for an effect (*p* = 0.42; electronic supplementary material, table S5). These results provide evidence of a two-phase extinction process with stationarity achieved around day 8, presumably driven by a high rate of extinction among those populations inoculated with just one individual (figure 4), although a post hoc test controlling the family wide error rate (*α* = 0.05) failed to detect a difference between this and any other treatment.

### 3.2. Trial II

As in trial I, a Cox proportional hazards model was used to test for experimental effects, including the interaction between initial population size (natural scale) and habitat size. Again, no evidence for an effect was found for either the interaction (*p* = 0.54; electronic supplementary material, table S6) or main effects of initial population size (*p* = 0.58) and habitat size (*p* = 0.067). Motivated by the marginal non-significance of habitat size, we excluded the interaction and tested only for the main effects, finding a significant effect of habitat size (*p* = 0.003; electronic supplementary material, table S7), but not of initial population size (*p* = 0.21). Analysis was repeated using log-transformed initial population size with similar results (habitat size: *p* = 0.003; initial population size: *p* = 0.18; electronic supplementary material, table S8). Note that both latter tests were significant even after performing Bonferroni correction for multiple tests.

As in trial I, Schoenfeld residual analysis achieved a maximum (though not an obvious asymptote), this time around day 17 (figure 5*a*,*b*). We therefore repeated analysis censoring at day 17. Contrary to expectation, this analysis obtained a significant effect of habitat size (*p* = 0.032; electronic supplementary material, table S9), but not of initial population size (log-transformed; *p* = 0.24). One explanation for this discrepancy is the low power of the experimental design, illustrated in figure 5 by the relatively wide standard error bands in each plot. Indeed, if we interpret the mean of the residual plot as the ‘estimated effect’ (without regard for whether the analysis rejects an arbitrarily chosen null hypothesis, e.g. *H*_{0}: *β*(*t*) = 0), then the evidence we have points to an increase in *β*(*t*) during the interval from *t* = 0 to *t* ≈ 22 (figure 5*a*). Further, the intercept of the mean line *β*(0) ≈ −1 is virtually indistinguishable from the analysis of trial I; cf. figure 5*a* with figure 3. The main difference is in the dispersion of the residuals, presumably owing to the additional habitat size treatment. Unfortunately, the design of the experiment does not enable any more conclusive inference to be drawn from these data.

As in trial I, there was no evidence for an effect of (log-transformed) initial population size (*p* = 0.52; electronic supplementary material, table S10), when analysis was restricted to the subset that went extinct after day 17 (*n* = 25), while the evidence for an effect of habitat size was marginally non-significant (*p* = 0.08).

### 3.3. Combined data from trials I and II

A final analysis was performed on the pooled data from both trials. As in preceding analyses, we failed to detect an effect of initial population size when all extinction time data were considered together (*p* = 0.064; electronic supplementary material, table S11), though the effect of habitat size was highly significant (*p* = 0.003). For the pooled data, Schoenfeld residual analysis demonstrated a deviation from the Cox model prior to day 10 (figure 6) and agreement thereafter. Analysis with censoring imposed at day 10 showed that the effect of log-transformed initial population size was highly significant (*p* = 0.003; electronic supplementary material, table S12), while there was no evidence for an effect of habitat size (*p* = 0.98). Separate analysis for populations that went extinct after day 10 (*n* = 106) showed the opposite (electronic supplementary material, table S13). In this case, there was no evidence for an effect of log-transformed initial population size (*p* = 0.98) and strong evidence for an effect of habitat size (*p* < 0.001). In no case was there a significant interaction between initial population size and habitat size (*α* > 0.05), indicating that the effect of population size in the combined analysis was not owing to the predominance of small habitats when the data were combined.

## 4. Discussion

To our knowledge, this is the first study to detect an effect of initial population size on extinction time in laboratory populations. Consistent with the standard theory, we found that populations initiated with a smaller number of individuals exhibited higher rates of extinction. However, this effect was strongly driven by populations initialized at the smallest possible population size (figure 4), presumably owing to large clutch sizes of *D. magna*. This could explain why Belovsky *et al*. [29] were unable to observe this effect in a species with similar life history. We suspect that for populations with high intrinsic rates of increase, i.e. populations with high resilience and therefore the ability to rebound from perturbations, initial population size will only influence population extinction during the first generation (approx. 10 days in *D. magna*). We expect that for such species, environmental factors such as habitat size and environmental variability will have the greatest effects on population persistence.

This explanation puts our results into the broader context of life-history evolution. In this experiment, the limitation of the effect of initial population size to the first generation may reflect *D. magna*'s high reproductive potential that allows it to quickly escape small population sizes, perhaps coupled to the small carrying capacity of the constructed environment. Thus, the first phase of this two-phase extinction risk (where the population is especially vulnerable to demographic stochasticity) may be most pronounced in species with long generation times or low reproductive capacity. These results are consistent with the recent finding that the mean durations of expansion and decline are equal in populations where environmental variation is not too great [43]. Future studies to disentangle the mechanisms of extinction should therefore consider the subtle effects that life-history strategies may have on extinction phenomena, including the exaggeration or reduction of the hazard owing to small population size.

## Acknowledgements

This research was sponsored by the Odum School of Ecology. We thank Andrea Silletti for assistance with the experiment. We thank five reviewers for their comments, which improved greatly both the article and our understanding of these results. We particularly thank one reviewer for the alternative interpretation of figure 5.

- Received January 18, 2011.
- Accepted February 28, 2011.

- This Journal is © 2011 The Royal Society