## Abstract

Model sensitivity is a key to evaluation of mathematical models in ecology and evolution, especially in complex models with numerous parameters. In this paper, we use some recently developed methods for sensitivity analysis to study the parameter sensitivity of a model of vector-borne bubonic plague in a rodent population proposed by Keeling & Gilligan. The new sensitivity tools are based on a variational analysis involving the adjoint equation. The new approach provides a relatively inexpensive way to obtain derivative information about model output with respect to parameters. We use this approach to determine the sensitivity of a quantity of interest (the force of infection from rats and their fleas to humans) to various model parameters, determine a region over which linearization at a specific parameter reference point is valid, develop a global picture of the output surface, and search for maxima and minima in a given region in the parameter space.

## 1. Introduction

Keeling & Gilligan (2000) analysed a deterministic model of vector-borne bubonic plague en route to developing a stochastic metapopulation model. The goal is to use the disease and population dynamics of rats and fleas to explain how sporadic epidemics can occur in human populations. The behaviour and sensitivity of the deterministic model with respect to its parameters constitute an important part of Keeling & Gilligan's considerations. Their sensitivity analysis is essentially based on the properties of the linearization of the model at a single reference value for the parameters, which is claimed to represent the behaviour of the model over a large region in parameter space. We explore the underlying basis for the Keeling & Gilligan analysis and the consequences of their conclusions.

Adjoint-based analysis (Lanczos 1996; Marchuk *et al*. 1996) is a classic way to obtain derivatives of a quantity of interest computed from the solution of a differential equation with respect to parameters in the model. In recent work (Estep & Neckels 2006, 2007), we used the derivative information obtained from the adjoint problem to develop computationally efficient sensitivity analysis tools. In this paper, we apply adjoint-based analysis and the sensitivity analysis tools from Estep & Neckels (2006, 2007) to Keeling & Gilligan's model. We determine the sensitivity of a specific quantity of interest to model parameters, determine a region over which linearization at a given parameter value is valid, develop a global picture of the output surface and search for maxima and minima in the quantity of interest in a given region in the parameter space.

### 1.1 A model of plague

We consider the Keeling & Gilligan (2000) SIRNF model that describes the dynamics of outbreaks of plague caused by the bacterium *Yersinia pestis* in populations of rats and humans. We follow their description rather closely. They begin with an SIR model for the rat population that gives the changes in the number of susceptible (*S*_{R}), infectious (*I*_{R}) and resistant (*R*_{R}) rats(1.1)where . denotes the time derivative and *T*_{R}=*S*_{R}+*I*_{R}+*R*_{R} is the total size of the rat population. The net reproduction rate of susceptible and infected rats is given by *r*_{R} with carrying capacity *K*_{R}. The proportion of offspring that inherit resistance to the disease from their parents is given by *p*. The natural death rate of all rats is *d*_{R}. The number of free infected fleas looking for a host is *F*. The infection term, (1−exp (*aT*_{R})), corresponds to infected fleas randomly searching for a new rat host for some given time period (Nicholson & Bailey 1935). If they find a host and it is susceptible, then the rat becomes infected with a given probability. Thus, *β*_{R} is the transmission rate from fleas to rats and *a* is a measure of the searching efficiency of the fleas. Rats leave the infected class at a rate of *m*_{R} and a fraction *g*_{R} of these survive to become resistant; the remainder die and release their infected fleas back into the environment.

The flea population dynamics are modelled by the flea index *N*, which is the average number of fleas living on a rat (each rat is assumed to have the same number of fleas) and the number of free infectious fleas *F* that are searching for a new host(1.2)The net reproduction rate of the fleas is given by *r*_{F} with a carrying capacity of *K*_{F}, while the death rate is *d*_{F}.

The human population dynamics do not affect the number of new human cases since a subsequent transmission between humans is rare (Gage & Kosoy 2005). The potential for human cases is directly related to the number of infected fleas that fail to find a rat host and instead choose a human host(1.3)The first *quantity of interest*, _{I} (labelled *λ*_{H} in Keeling & Gilligan 2000), represents the upper bound or the *potential* force of infection to humans.

Keeling & Gilligan used a stochastic model derived from (1.1) and (1.2) to explain how plague can lay dormant but then break out into an epidemic in the human population. Their explanation depends on consideration of plots of _{I} versus *F* and *T*_{R} as well as plots of _{I} and *I*_{R}/*m*_{R} versus time. These show that their model predicts that the highest potential for a human epidemic occurs after a major epizootic in the rat population, which leaves a large number of infected fleas looking for a host. This is discussed in terms of the reproductive ratio of the disease, .

### 1.2 Four assertions

Keeling & Gilligan made assertions about the behaviour of the model (1.1) and (1.2) over a large region in the parameter space centred at specific reference values (table 1).

Our analysis addresses four assertions that depend on claims about the sensitivity of the model with respect to the parameters. As a measure of sensitivity of an output variable *V* with respect to a parameter *λ*, Keeling & Gilligan used , where *μ* is the reference value. The four assertions are as follows.

From multiple simulations, we note that, even when parameters are changed by a factor of 2, the essential pattern of sensitivity… remains, showing that is a robust measure of the effects of parameter change.

Only

*K*_{F}, the carrying capacity of fleas per rat, has any effect (on_{I}and*I*_{R}/*m*_{R}) that is much stronger than linear, although*r*_{R}, 1−*p*,*K*_{R}, and*a*all have effects on the number of rat or human cases that are close to linear.… it is sufficient to consider whether results are robust to changes in the parameter

*a*.The basic reproductive ratio of the disease,

*R*_{0}, is related to the parameters via*aK*_{R}in the following way: . From standard theory, rat epizootics can occur when*R*_{0}>1, which corresponds to*aK*_{R}>0.39. It can also be seen that a large human outbreak is possible when 0.5<*aK*_{R}<20.

These claims depend on the underlying fact that the linearization of the model with respect to the parameters at the reference values describes the behaviour of the model over a large region in the parameter space. We point out that there is no *a priori* reason to believe that Taylor's theorem can be used in this way.

### 1.3 A quick overview of sensitivity analysis

Describing response to variations in model input, e.g. initial and boundary data and parameters, is key to understanding a mathematical model of a complex system. Inputs into a model are subject to many sources of uncertainty, variability and measurement error. The model itself may be subject to uncertainty arising from incomplete information or poor understanding of the physical processes and driving forces. Sensitivity analysis is the organized study of the way in which the output of a model responds to different sources and kinds of variations in the input into the model and in the model itself.

We consider the problem of computing a quantity of interest(1.4)where ** ψ** is a function of time corresponding to the desired information and

**∈**

*x*^{n}solves the initial value problem(1.5)Here, is the model and (

*d*=

*p*+

*n*) are the parameters, with representing model parameters and the initial conditions. The quantity of interest could be a statistical quantity, values of the solution at a given time or a more complicated function determined from the solution values.

Two fundamental tools for sensitivity analysis are *density estimation* and computing partial derivatives with respect to the parameters. Density estimation is concerned with computing the probability distribution associated with the quantity of interest *q*(** x**(

**)) computed from a solution of a model, where the model inputs, e.g. parameters**

*λ***, are assumed to be random variables with associated distributions. Density estimation describes how the model output varies as the parameters vary over the**

*λ**entire*parameter space. Parametric density estimation is concerned with the case in which the output distribution is known, in which case the goal is to determine best values for the parameters defining the distribution. Non-parametric density estimation is used when the output distribution is unknown or not recognizable, as often happens, e.g. for nonlinear models. In this case, random samples are drawn from the parameter space, the model is solved and the quantity of interest is computed for each sample; finally, a histogram is computed from the values to form an approximate distribution. Various smoothing techniques are often used to reduce the effects of the discrete binning associated with a histogram.

There are two classic approaches to computing partial derivatives of model output with respect to parameters in the model. In forward linearization analysis (Caswell 2007), equations for the partial derivatives of the solution of a model with respect to parameters are determined by differentiating the model and using the Chain Rulewhere is the gradient of ** x** with respect to parameter

*λ*_{1}, and

*D*

_{x}

**and are the Jacobians of**

*f***with respect to**

*f***and**

*x*

*λ*_{1}, respectively. The resulting (large) system of equations for the solution and its partial derivatives is integrated.

The second classic approach uses the mathematical notions of duality and adjoint operators (Lanczos 1996; Marchuk *et al*. 1996; Estep & Newton in preparation). This approach is not as easy to describe as forward linearization analysis. The most familiar example is provided by the technique of Green's functions in differential equations (Lanczos 1996). In the adjoint-based approach, a specific adjoint problem(1.6)is defined for each quantity of interest *q*(** λ**) to be computed from the model solution. This adjoint problem yields partial derivatives of the quantity of interest with respect to model parameters relatively cheaply (see appendix A).

Asymptotically, values computed from the two approaches are equal. There are circumstances under which each approach is preferred. The forward linearization technique yields values of partial derivatives of the solution at *all* points in time. From this, one can compute partial derivatives of particular quantities of interest using the Chain Rule. On the other hand, the augmented system is very large and expensive to solve. If the values of the partial derivatives at all points in time are not required, then much of the computational work is wasted. In case that only partial derivatives of particular quantities of interest are desired, the adjoint-based technique is computationally more efficient.

In a recent work, Estep and co-workers have developed new computational tools for sensitivity analysis using the adjoint equation. This work uses the fact that partial derivatives of a quantity of interest with respect to parameters are computed relatively cheaply by solving the adjoint equation. In Estep & Neckels (2006, 2007), efficient numerical methods for density estimation have been developed. The derivative information provided by the adjoint equation is used to either create a higher order approximation or derive adaptive sampling. This adaptive sampling is not probability based. In appendix A, we describe this work briefly.

The computations in this paper were performed with an earlier version of globally accurate adaptive sampling package (GAASP; Estep *et al*. 2006). GAASP provides computational tools for solving ordinary differential equations with quantitative estimates of numerical accuracy and tools for analysing the parameter and data sensitivity of differential equation models.

## 2. Analysis of the model

We use adjoint-based sensitivity analysis and the new computational tools described in appendix A to analyse Keeling & Gilligan's model in several ways.

We find a (relatively small) region in the parameter space over which the linearization at the reference value provides a reasonably accurate description of the behaviour of the model.

We construct a piecewise constant approximation of the quantity of interest over a large region in the parameter space that is close in size to the region over which Keeling & Gilligan wanted to draw conclusions about the model behaviour.

We implement a method of steepest ascent and descent to find the local maxima and minima in a large region in the parameter space.

We then draw several conclusions about the model sensitivity and examine the validity of assertions (i)–(iv).

We make some initial observations. First, Keeling & Gilligan stated that their conclusions hold for parameter values that vary by as much as a factor of 2. This is imprecise since several parameters have physical limitations. Instead, we consider a large rectangle _{L} with sides for each *i* (if either endpoint goes past a physical bound, we use the physical bound instead). Even a region of that size allows a wide range of behaviours (figure 1). Second, _{I} depends on the final time *T*. Keeling & Gilligan used *T*=100. However, (1.1) and (1.2) are used to construct a stochastic model that, Keeling & Gilligan noted, has a different long-time behaviour. Indeed, it is the transient behaviour of (1.1) and (1.2) which is relevant to the stochastic model. Therefore, we use a smaller time, *T*=10. This affects the conclusions quantitatively but not qualitatively.

### 2.1 Linearization at the reference value

Sampling _{I} over the large rectangle _{L} shows that the linearization at the reference point *μ* does not describe the behaviour of the model over _{L}. For example, if we use a uniform distribution on _{L}, the partial derivatives have approximately normal distributions centred at 0 for *i*=4, 5, 6, 7 but have variances 210, 3300, 0.19, 49 000, respectively (figure 2). This reflects the fact that, when we sample over the entire region _{L}, varying more than one parameter at a time results in a more complicated behaviour than that observed when varying each parameter individually and that _{I} depends nonlinearly on some parameters. We provide evidence in figure 3. In addition, it demonstrates that there are enormous scale differences in the responses to the various parameters.

Since the quantity of interest depends smoothly on the parameters, Taylor's theorem shows that linearization at the reference value provides a good approximation on all *sufficiently small* rectangles centred at the reference value. The goal is to try to determine the largest possible rectangle on which linearization produces a reasonable approximation. To find such a region, we successively refine _{L} one-parameter dimension at a time until we find a smaller rectangle, _{S}, over which linearization appears to be valid.

One way to do this would be to assume a uniform distribution on the parameters, then, holding all parameters fixed but one, compute a least-squares linear approximation to the observed values of the output as that one parameter varies and then checking goodness of fit using the *R*^{2}-statistic. We can decrease the size of the rectangle in that parameter dimension until we achieve a reasonable fit.

However, this approach does not provide a way to deal with scale differences in the parameters. We wish not only to determine a relatively large region on which linearization is valid but also to determine a region over which the *relative* variation in parameters is roughly the same. Our criteria for deciding whether linearization is valid are based on the observation that a one-dimensional linear transformation maps a normal input distribution to another normal distribution. We fix all the parameters at the reference values except one and place a normal distribution on the one parameter that is allowed to vary. The normal distribution is centred at the reference value and the variance is computed so that the range for the parameter in the current rectangle represents a 95% CI. The normal is then truncated to the current rectangle. We then compute the corresponding output distribution of _{I} using a kernel density estimator based on (A 5). If the output distribution is not approximately normal (which can be checked crudely using a least-squares line fit and *R*^{2} statistic after changing variables), we refine the rectangle by halving the size in the dimension of the parameter in question, and try again. Once we determine a range of valid linearization for a parameter, we proceed to the next parameter. After working through all the parameters, we obtain a smaller rectangle, _{S}.

While _{S} is a region on which the linearization of _{I} provides an accurate approximation, the variances of the distributions of _{I} corresponding to the distributions on the different parameters on _{S} still vary over a very large range. We refine _{S} further in order to reduce the variances corresponding to the different parameters to all having relative size 1, where we scale each output distribution corresponding to each parameter by the size of the reference value for that parameter. We stop after 16 iterations to obtain _{S} with side lengths . Further refinement did not lower the variances for *λ*_{2} and *λ*_{8}. However, sampling on _{S} suggests that the time behaviour of _{I} and *I*_{R}/*m*_{R} is relatively constant over _{S} (figure 1). On the other hand, if we increase the dimensions of _{S} by 50% in each dimension, this is no longer true.

### 2.2 Adaptive sampling and dimension reduction

Recall that, in the adaptive sampling algorithm, we add points in the parameter direction in which the partial derivative of _{I} is the largest. As the sampling proceeds, the number of sample points added in each parameter direction indicates the sensitivity of _{I} with respect to the parameters. In figure 4, we plot the value of the parameters at the centre of each division in fast adaptive parameter sampling (FAPS) against the number of samples used to create the piecewise constant approximation at each iteration. This shows that _{I} is highly sensitive to *λ*_{1}, *λ*_{3}, *λ*_{5}, *λ*_{10} and *λ*_{11} and not very sensitive to *λ*_{2}, *λ*_{4}, *λ*_{6}, *λ*_{7}, *λ*_{8} and *λ*_{9} on _{L}. We emphasize that the same conclusion can be made from a FAPS computation with just 50 sample points.

If *F*(*x*, *ω*) is a target distribution and is an approximation to *F*(*x*) computed using *n* samples, the standard Kolmogorov–Smirnov (K–S) statisticis used to check the accuracy. In figure 5, we plot the K–S statistic for FAPS versus the number of sample points. By way of comparison, we plot the K–S statistic for the error of the mean of many densities determined by computing histograms of the Monte Carlo samples drawn from uniform random sampling of the parameter space along with the asymptotic upper bound given by the law of the iterated logarithm (Estep & Neckels 2006). Note that the error of a single Monte Carlo density has a much larger variance than that of the average of many Monte Carlo computations, as can be seen in the plot.

For a small number of sample points, FAPS converges at the same rate as the *average* of a number of Monte Carlo computations as well as the theoretical bound. Restricting the addition of sample points to the directions with the largest derivatives leads to the staircase appearance of the plots and the levelling out of the statistic for moderate sample sizes.

In figure 5, we also plot the K–S statistic for FAPS when *λ*_{2}, *λ*_{4}, *λ*_{6}, *λ*_{7}, *λ*_{8} and *λ*_{9} are held fixed at the reference value. This confirms the observation from figure 4 that _{I} is sensitive to the parameters *λ*_{1}, *λ*_{3}, *λ*_{5}, *λ*_{10} and *λ*_{11}, and we can safely reduce the dimension of the model by considering variations in these parameters alone.

### 2.3 Searching for extreme values in parameter space

As part of predicting human epidemics, Keeling & Gilligan examined *R*_{0} at extremal values of *a* and *K*_{R} while fixing the other parameters at the reference values. There is a very narrow window, , for which plague can affect the rat population significantly without causing a human epidemic. Their approach is reasonable if the model is insensitive to parameter changes as they claim, given that is very large at the reference value. However, we have seen that the model is sensitive to five parameters and also to simultaneous parameter changes.

Alternatively, we search for local maxima and minima in the parameter domain using a gradient search algorithm. Starting at a point, we can take a small step in the direction of the gradient (or minus the gradient) and recompute new values before stepping again. In this way, we create a path of linked sample points along which we eventually find either an extremal value where the gradient is zero or a point where the path leaves the parameter domain. The final stopping point depends on the initial point as, in general, we expect to find many local extrema.

We use this in two ways. First, we start at the reference value and conduct searches on both _{S} and on a larger rectangle with side lengths . Keeling & Gilligan's claimed amounts to stating that any search starting from the reference value for the parameters should leave along the *λ*_{8}=*a* axis. Leaving the reference value, the first few steps of the gradient search are roughly parallel to the *λ*_{8}=*a* axis, as is the dominant component in the gradient of *R*_{0} near the reference value. If remains the dominant component in the gradient of *R*_{0} at all points, then the gradient search will always move roughly parallel to the *λ*_{8}=*a* axis. If, however, *R*_{0} becomes more sensitive to changes in other parameters away from the reference value, the gradient search will change direction.

Second, we choose five points near the reference value and follow those paths to see if they stay close to the reference trajectory.

On _{S}, the searches starting from all six initial points exit the rectangle on the face perpendicular to the *λ*_{8} axis. However, the exit points vary greatly over this face because variations in the other parameters have an important effect. This affects the gradient searches as they move in the directions that are not roughly parallel to the *λ*_{8} axis. Therefore, Keeling & Gilligan's claim is partially correct on _{S}. On the larger rectangle , the searches starting at the reference value leave the larger rectangle on the lower face perpendicular to *λ*_{7} and the upper face perpendicular to *λ*_{2}. In other words, the gradient search does not follow the *λ*_{8} axis as derivatives with respect to other parameters come to dominate the gradient. The searches that begin from the five other starting points all reach different local minima inside . On the other hand, all of the searches corresponding to following the negative gradient leave on the face perpendicular to *λ*_{2}, though at six different points.

## 3. Conclusions

We begin by addressing Keeling & Gilligan's assertions (i)–(iv).

Regarding assertion (i), the linearization at the reference values simply does not represent the behaviour of the nonlinear model over the large rectangle

_{L}. In particular, there is an enormous variance in the sensitivity on_{L}; simultaneously varying multiple parameters on_{L}leads to unpredictable results and there are some definite nonlinear dependencies. The linearization at the reference values does accurately represent the behaviour of the model over a much smaller rectangle_{S}.Regarding assertion (ii), we find that the quantity of interest

_{I}is most sensitive to variations in the parameters*λ*_{1}=*r*_{R},*λ*_{3}=*K*_{R},*λ*_{5}=*β*_{R},*λ*_{10}=*d*_{F}and*λ*_{11}=*K*_{F}in contrast to the findings of Keeling & Gilligan, which predict the most sensitive dependence on*λ*_{1}=*r*_{R},*λ*_{2}=*p*,*λ*_{3}=*K*_{R},*λ*_{8}=*a*and*λ*_{11}=*K*_{F}. That the model is sensitive to three parameters (*λ*_{1}=*r*_{R},*λ*_{3}=*K*_{R}and*λ*_{11}=*K*_{F}) under both analyses is no surprise, as these directly control growth rates and sizes of the rat and flea populations, and thus have the greatest potential for altering disease dynamics. It is also not surprising that FAPS found sensitivity to*λ*_{5}=*β*_{R}, the transmission rate from fleas to rats, and*λ*_{10}=*d*_{F}, the death rate of fleas. Both of these parameters influence the number of infected fleas and thus_{I}. A recent model of bubonic plague dynamics by Webb*et al*. (2006) is also found to be sensitive to transmission rates and infectious periods.Regarding assertion (iii), we cannot predict the behaviour of the model, even on the smallest rectangle

_{S}, by fixing all the parameters except for*a*at the reference values and considering the effect of varying*a*. Moreover, there is no apparent correlation between the direction of the gradient at the reference point and nearby points and the location of extremal values along paths given by the method of steepest ascent and descent. We conclude that the model is sensitive to a number of parameters in addition to*a*, and that in some regions of the parameter space it is more sensitive to parameters other than*a*.Regarding assertion (iv), when predicting the behaviour of the model for the values of

*aK*_{R}, we have to consider the range for the other parameter values. Thus, predicting plague outbreaks in rats that could spill over into human populations depends upon more parameters than just*a*, the search area by fleas and*K*_{R}, the carrying capacity for the rat population (Gage & Kosoy 2005).

As a practical consequence of our analysis of the Keeling & Gilligan model, we conclude that the general danger of large-scale outbreaks of plague in humans predicted by the model was overstated. The reason is that the force of infection from rats and their fleas to humans, _{I}, depends strongly on more parameters than just the rat carrying capacity, *K*_{R}, and searching efficiency of fleas, *a*. In particular, Keeling & Gilligan concluded that the reproductive ratio of disease, *R*_{0}, is predicted to be greater than 1 for even fairly low carrying capacities of rat populations provided the searching efficiency of fleas *a* is high. On the other hand, our analysis shows that, in large parts of the parameter space, the model is relatively insensitive to *a* and that *R*_{0}>1 and human risk depends more on large rat populations than on flea behaviour. Large populations of rodents still provide the risk of human outbreaks, but details of epidemiology within the rat and flea plague reservoir, especially flea survival and transmission rates, must also be specified (see Gage & Kosoy 2005).

We emphasize that our observations do not necessarily contradict the scientific conclusions of Keeling & Gilligan (2000) because their main results are based on a stochastic model, not (1.1) and (1.2). However, our results do raise questions about the connection between the stochastic model analysed by Keeling & Gilligan and the underlying deterministic model that was used to create the stochastic model. The properties of the stochastic and deterministic models are clearly different.

Our analysis emphasizes the fact that nonlinear models like (1.1) and (1.2), even when composed of well-understood mechanisms, can exhibit complicated nonlinear sensitivity to parameters that are not predictable from linearization at a single point in the parameter space. Keeling & Gilligan's assertions about the model are accurate on a relatively small region in the parameter space surrounding the reference value, but do not hold true in a larger region in the parameter space. Whether this is relevant or not depends on the region in the parameter space on which predictions are needed and how the model results are used. However, in any situation, it is scientifically important to determine the limitations affecting the accuracy of predictions of model behaviour.

When analysing the sensitivity of a model with respect to parameters, the ability to compute gradient information at sample points is invaluable. In this paper, we have used adjoint-based analysis tools to carry out a detailed sensitivity analysis of a nonlinear model of the plague over a large parameter domain. We have also demonstrated that the FAPS and high order parameter sampling numerical methods based on the adjoint problem, variational analysis and the generalized Green's function provide a very efficient way to perform non-parametric density estimation.

## Acknowledgments

The research activities of M.B. were partially supported by the National Science Foundation (DGE-0221595003). The research activities of D.N. were partially supported by the National Science Foundation (DGE-0221595003, MSPA-CSE-0434354). The research activities of M.F.A. were partially supported by the National Science Foundation (EF-0327052 and DEB0217631). D.E.'s work was supported in part by the Department of Energy (DE-FG02-04ER25620, DE-FG02-05ER25699, DE-FC02-07ER54909), the National Aeronautics and Space Administration (NNG04GH63G), the National Science Foundation (DMS-0107832, DMS-0715135, DGE-0221595003, MSPA-CSE-0434354, ECCS-0700559), Idaho National Laboratory (00069249), the Sandia Corporation (PO299784) and the United States Department of Agriculture (58-5402-3-306).

## Footnotes

- Received December 16, 2007.
- Accepted January 22, 2008.

- © 2008 The Royal Society