Abstract
Estimating parameters from data is a key stage of the modelling process, particularly in biological systems where many parameters need to be estimated from sparse and noisy datasets. Over the years, a variety of heuristics have been proposed to solve this complex optimization problem, with good results in some cases yet with limitations in the biological setting. In this work, we develop an algorithm for model parameter fitting that combines ideas from evolutionary algorithms, sequential Monte Carlo and direct search optimization. Our method performs well even when the order of magnitude and/or the range of the parameters is unknown. The method refines iteratively a sequence of parameter distributions through local optimization combined with partial resampling from a historical prior defined over the support of all previous iterations. We exemplify our method with biological models using both simulated and real experimental data and estimate the parameters efficiently even in the absence of a priori knowledge about the parameters.
1. Introduction
The increasing drive towards quantitative technologies in biology has brought with it a renewed interest in the modelling of biological systems. Models of biological systems and other complex phenomena are generally nonlinear with uncertain parameters, many of which are often unknown and/or unmeasurable [1,2]. Crucially, the values of the parameters dictate not only the quantitative but also the qualitative behaviour of such models [3,4]. A fundamental task in quantitative and systems biology is to use experimental data to infer parameter values that minimize the discrepancy between the behaviour of the model and experimental observations. The parameters thus obtained can then be crossvalidated against unused data before employing the fitted model as a predictive tool [2]. Ideally, this process could help close the modelling experiment loop by: suggesting specific experimental measurements; identifying relevant parameters to be measured; or discriminating between alternative models [5–7].
The problem of parameter estimation and data fitting is classically posed as the minimization of a cost function (i.e. the error) [8]. In the case of overdetermined linear systems with quadratic error functions, this problem leads to leastsquare solutions, convex optimizations that can be solved efficiently and globally based on the singular value decomposition of the covariance matrix of the data [9]. However, data fitting in nonlinear systems with small amounts of data remains difficult, as it usually leads to nonconvex optimizations with many local minima [10].
A classic case in biological modelling is the description of the time evolution of a system through ordinary differential equations (ODEs), usually based on mechanistic functional forms. Examples include models of biochemical reactions, infectious spread and neuronal dynamics [1,11]. Typically, optimal parameters of the nonlinear ODEs must be inferred from experimental time courses but the associated optimization is far from straightforward. Standard optimization techniques that require an explicit cost function are unsuitable for this problem because of the difficulty in obtaining full analytical solutions for nonlinear ODEs [4,12,13]. Splinebased methods, which approximate the solution through an implicit integration of the differential equation [10], require linearity in the parameters and are therefore not applicable to models with nonlinear parameter dependencies, e.g. Michaelis–Menten and Hill kinetics.
Implicit techniques, such as direct search methods [14], simulated annealing [15], evolutionary algorithms (EAs) [16,17] or sequential Monte Carlo (SMC) [18], do not require an explicit cost function. However, if as is usually the case, the cost function is a complicated (hyper)surface in parameter space with many local minima; gradient and direct search methods tend to get trapped in local minima because of their use of local information. Although still a local method, simulated annealing alleviates some of the problems related to local minima through the use of stochasticity. However, this comes at the cost of high computational overhead and slow convergence and, yet, with no guarantee of finding the global minimum.
Instead of an optimization based on local criteria, EAs produce an ensemble of possible answers and evolve them globally through random mutation and crossover followed by ranking and culling of the worst solutions [16,17,19]. This heuristic has been shown to provide an efficient protocol for parameter fitting in the life sciences [20,21]. However, EA methods can be inefficient when the feasible region in parameter space is too large, a case typical of models with large uncertainty in the parameters.
Probabilistic methods, such as SMC [18], propose a different conceptual framework. Rather than finding a unique optimal parameter set, SMC maps a prior probability distribution of the parameters onto a posterior constructed from samples with low errors until reaching a converged posterior. Recently, SMC has been combined with approximate Bayesian computation (ABC) and applied to data fitting and model selection [22]. However, methods such as ABC–SMC are not only computationally expensive but also require that the starting prior include the true value of the parameters. This requirement dents its applicability to many biological models, in which not even the order of magnitude of the parameters is known. In that case, the support of the starting priors must be made overly large (leading to extremely slow convergence) in order to avoid the risk of excluding the true parameter value from the search space.
In this work, we present a novel optimization algorithm for data fitting that takes inspiration from EA, SMC and direct search optimization. Our method iterates and refines samples from a probability distribution of the parameters in a ‘squeezeandbreathe’ sequence. At each iteration, the probability distribution is ‘squeezed’ by the consecutive application of local optimization followed by ranking and culling of the local optima. The parameter distribution is then allowed to ‘breathe’ through a random update from a historical prior that includes the union of all past supports of the solutions (figure 1). This iteration proceeds until convergence of the distribution of solutions and their average error. A key, distinctive feature of our algorithm is the accelerated steptostep convergence through a combination of local optimization and of culling of local solutions. Importantly, the method can also find parameters that lie outside of the range of the initial prior, and can deal with parameter values that extend across several orders of magnitude. We now provide definitions and a full description of our algorithm and showcase its applicability to different biological models of interest.
2. Algorithm
2.1. Formulation of the problem
Let X(t) = [x_{1}(t), … , x_{d}(t)] denote the state of a system with d variables at time t. The time evolution of the state is described by a system of (possibly nonlinear) ODEs: 2.1
Here, θ = [θ_{1}, … , θ_{N}] is the vector of N parameters of our model.
The experimental dataset is formed by observations of some of the variables of the system: 2.2where X̃(t_{i}) corresponds to the real value of the system plus observational error. Ideally, M > 2N + 1 as 2N + 1 experiments are enough for unequivocal identification of an ODE model with N parameters when no measurement error is present [23].
The cost function (i.e. the error) to be minimized is: 2.3where ∥ · ∥ is a relevant vector norm. A standard choice is the Euclidean norm (or 2norm) that corresponds to the sum of squared errors: 2.4where we assume that d′ variables are observed. The cost function E_{𝒟}: ℝ^{N} → ℝ_{+} maps an Ndimensional parameter vector onto its corresponding error, thus quantifying how far the data and the model predictions are for that particular parameter set.
The aim of the datafitting procedure is to find the parameter vector θ** that minimizes the error globally subject to restrictions dictated by the problem of interest: 2.5
2.2. Definitions

— Data set: 𝒟, a set of M observations, as defined in equation (2.2).

— Parameter set: θ = [θ_{1}, … , θ_{N}] ∈ ℝ_{+}^{N}. Owing to the nature of the models considered, θ_{i} ≥ 0, ∀i.

— Objective function: E_{𝒟}(θ), the error function to be minimized, as defined in equation (2.4).

— Set of local minima of E_{𝒟}(θ): 𝕄 = {θ*E_{𝒟}(θ*) ≤ E_{𝒟}(θ), ∀θ ∈ 𝒩(θ*)}, where 𝒩(θ*) is a neighbourhood of θ*.

— Global minimum of E_{𝒟}(θ): θ**, a parameter set such that E_{𝒟}(θ**) ≤ E_{𝒟}(θ), ∀θ. Clearly, θ** ∈ 𝕄.

— Local minimization mapping: L : ℝ_{+}^{N} → 𝕄. Local minimization maps θ onto a local minimum: L(θ) = θ* ∈ 𝕄.

— Ranking and culling of local minima: {θ^{†}}_{1}^{B} = ℛ𝒞_{B} ({θ}_{1}^{J}). This operation ranks J parameter sets and selects the B parameter sets with the lowest E_{𝒟}.

— Joint probability distributions of the parameters at iteration k: π_{k}(θ) (prior) and ϖ_{k}(θ) (posterior).

— Marginal probability distribution of the ith component of θ: for instance, π(θ_{i}) = ∫π(θ)∏_{r≠i} dθ_{r}.

— Historical prior at iteration k: ζ_{k}(θ) = ∏_{i=1}^{N} ζ_{k}(θ_{i}) where 2.6

Here U(a, b) is a uniform distribution with support in [a, b] and З_{k}(θ_{i}) = ζ_{k−1}^{−1} ∪ ϖ_{k}^{−1} is the union of the supports of ϖ_{k}(θ_{i}) and ζ_{k−1}(θ_{i}).

— Update of the prior at iteration k: π_{k}(θ) = ∏_{i=1}^{N} π_{k}(θ_{i}) with 2.7that is, a convex mixture of the posterior and the historical prior with weight p_{m}, from which a new population is sampled in iteration k + 1.

— Repopulation: obtain population of J random points simulated from the prior π_{k−1}(θ).

— Convergence criterion for the error: the difference between the means of the errors of the posteriors in consecutive iterations is smaller than the predetermined tolerance: 2.8

— Convergence criterion for the empirical distributions: the samples of the posteriors in consecutive iterations are indistinguishable at the 5 per cent significance level according to the nonparametric Mann–Whitney rank sum test: 2.9
2.3. Description of the algorithm
Algorithm 1 presents the pseudocode for our method using the definitions above. The iterations produce progressively more refined distributions of the parameter vector. At each iteration k, a population simulated from the prior distribution π_{k−1}(θ) is locally minimized followed by ranking and culling of the local minima to create a posterior distribution ϖ_{k}(θ) (squeeze step). This distribution is then combined with an encompassing historical prior to generate the updated prior π_{k}(θ) (breathe step). The iteration loop terminates when the difference between the mean errors of consecutive posteriors is smaller than the tolerance and the samples of the posteriors are indistinguishable. We now explain these steps in detail (figure 1) through the Bliss–Painter–Marr (BPM) model (see 3.1).

— Formulation of the optimization: the dataset 𝒟 and the model equations parametrized by θ allow us to define an error function E_{𝒟}(θ) whose global minimum corresponds to the best model.
In our illustrative example, the BPM model (3.1) has the parameter vector θ = [α, β] and the error function is depicted in figure 1a. The global optimization on the rugged landscape of this function is computationally hard.

— Initialization:

Set the running parameters of the algorithm: the size of the simulated population, J; the size of the surviving population after culling, B; the update probability, p_{m}; and the tolerance, Tol. In this example, J = 500, B = 50, p_{m} = 0.95 and Tol = 10^{−5}.

Choose π_{0}(θ), the initial prior distribution of the parameter vector. In this case, we take α and β to be independent and uniformly distributed: π_{0}(θ) ∼ U(0, 100) × U(0, 100).

Initialize ζ_{0}(θ) = π_{0}(θ), the historical prior of the parameters.

Simulate J points from π_{0}(θ) to generate the initial sample {θ̂_{0}}_{1}^{J}.


— Iteration (step k): repeated until termination criterion is satisfied. Figure 1 shows the first iteration of our method applied to the BPM example.

Local minimization: apply local minimization to the simulated parameters from the ‘prior’ {θ̂_{k−1}}_{1}^{J} and map them onto local minima of E_{𝒟}(θ) to generate {L(θ̂_{k−1})}_{1}^{J} ∈ 𝕄.
Here, we use the Nelder–Mead simplex method [24], though others can be used. Figure 1b shows the simulated points from π_{0}(θ) (squares in plot) and its corresponding histograms (top and left). After local minimization, this sample is mapped onto the dark triangles in figure 1b (dark histograms bottom and right). Note how the local minima align with the level curves of E_{𝒟} with a markedly different distribution to the uniform prior. Note also that many of the optimized values of α lie outside the range of the prior (0, 100) and are now distributed over the interval (0, 200). On the other hand, the values of β have collapsed inside (0, 1).
Algorithm 1. Squeezeandbreathe optimization. Set running parameters of algorithm: B, J ∈ ℕ, p_{m} ∈ [0, 1], Tol Choose initial priors π_{0}(θ) and ζ_{0}(θ). Set ℋ_{0} = ∅︀ and k ← 1. repeat Let ℋ_{k} = ℋ_{k−1}. Simulate J points from π_{k−1}(θ) through repopulation. for ℓ = 1 → J do Obtain local minimum θ_{ℓ}^{*} = L(θ_{ℓ}). Store the pair [θ_{ℓ}^{*}, E_{𝒟}(θ_{ℓ}^{*})] in ℋ_{k}. end for Rank and cull the set of local minima: ℋ_{k} = ℛ𝒞_{B}(ℋ_{k}) Define the posterior ϖ_{k}(θ) from the sample ℋ_{k}. Update ζ_{k}(θ) from ζ_{k−1}(θ) and ϖ_{k}(θ). Update the prior: π_{k}(θ) ∼ p_{m}ϖ_{k}(θ) + (1 − p_{m})ζ_{k}(θ). k ← k + 1. until ϕ_{k} < Tol and ℳ𝒲(ϖ_{k}(θ), ϖ_{k−1}(θ)) = 0 
Ranking and culling: rank the J + B local minima from the k − 1 and k iterations, select the B points with the lowest E_{𝒟} and cull (discard) the rest: We consider {θ̂_{k}^{†}}_{1}^{B} to be a sample from the optimized (‘posterior’) distribution, ϖ_{k}(θ) and we denote the best parameter vector of this set as The B = 50 best parameter sets are shown (light stars in plot) in figure 1c (bottom histograms).

Termination criterion: check that the difference between the mean errors of the consecutive optimized samples is smaller than the tolerance: ϕ_{k} ≤ Tol. We also gauge the ‘convergence’ of the posteriors through the Mann–Whitney (MW) test to determine if the samples from consecutive posteriors are distinguishable: where ℳ𝒲 is a 01 flag. The MW test gives additional information about the change of the optimized posteriors from one iteration to the next.
Figure 1d shows the convergence check for the first iteration of the BPM model: (i) top—errors of the sampled prior (left) with errors of the local minima (right) and the B surviving points (light stars); (ii) bottom—histograms of the prior and the posterior. Clearly, in this iteration neither the error nor the distributions have converged and so the algorithm does not stop.

Update of historical prior and generation of new sample: if convergence is not achieved, update the historical prior ζ_{k}(θ) as a uniform distribution over the union of the supports of the existing historical prior and the calculated posterior (2.6). Equivalently, the support of the historical prior extends over the union of the sequence of all historical priors {ζ_{0}(θ), … , ζ_{k−1}(θ)} and of all posteriors {ϖ_{1}(θ), … , ϖ_{k}(θ)}.
As shown in figure 1e for the BPM example, the marginal of the historical prior for α is expanded to U(0, 200), as the optimized parameter sets have reached values as high as 200. Meanwhile, the β marginal of the historical prior remains unchanged as U(0, 100) because there has been no expansion of the support.
The historical prior is used to mutate the updated prior before the next iteration by constructing a weighted mixture of the posterior and the historical prior with weight p_{m}, as shown in (2.7). We repopulate from this updated prior by simulating from the posterior with probability p_{m} = 0.95 and from the historical prior with probability (1 − p_{m}) to generate the new sample {θ̂_{k}}_{1}^{J} and iterate back.
Figure 1e shows the sample of J points simulated from the new prior. The αcomponents of most points are between 100 and 200 and the βcomponents are between 0.1 and 1, but there are a few that lie outside the support of the posterior. The process in figure 1(b–e) is iterated for this new set of points.


— Output of the algorithm: when the convergence criteria have been met, the iteration stops at iteration k* and the minimum of this last iteration, θ_{k*}^{‡}, is presented as the optimal parameter set for the model (i.e. the estimation of the global minimum θ** provided by the algorithm). We can also examine the sequence of optimized parameter distributions {ϖ_{1}(θ), … , ϖ_{k*}(θ)} obtained for all iterations (figure 1f).
3. Application to biological examples
We apply our algorithm to three biological examples of interest. The first two correspond to simulated data from models in the literature, while in the third example, we apply our algorithm to unpublished experimental data of the dynamical response of an inducible genetic promoter constructed for an application in synthetic biology.
3.1. Bliss–Painter–Marr model of gene–product regulation
The BPM model [25] describes the behaviour of a gene–enzyme–product control unit with a negative feedback loop: 3.1Here, R, E and P are the concentrations (in arbitrary units) of mRNA, enzyme and product, respectively. The degradation rate of the product has an explicit time dependence, which in this case has the form of a ramp saturation:
The model represents a gene that codes for an enzyme which in turn catalyses a product that inhibits the transcription of the gene. This selfinhibition can lead to oscillations, which have been shown to occur in the tryptophan operon in Escherichia coli [25].
We construct a dataset from simulations of this model with θ_{real} = [α, β] = [240, 0.15] and initial conditions R(0) = E(0) = P(0) = 0. The dataset 𝒟 consists of 10 measurements of R(t) at particular times with added Gaussian noise drawn from 𝒩(0, 15^{2}) (table 1 in the electronic supplementary material). The error function E_{𝒟}(θ) (2.4) corresponds to a nonconvex optimization landscape:^{1} a complex rugged surface with many local minima making global optimization hard (figure 1a).
We use algorithm 1 to estimate the ‘unknown’ parameter values from the ‘measurements’ of R, as illustrated in §2c and figure 1. Feigning ignorance of the true values, we choose a uniform prior distribution with range [0, 100] for both parameters: π_{0}(θ) ∼ [U(0, 100), U(0, 100)]. The rest of the parameters are set to: J = 500, B = 50, p_{m} = 0.95 and Tol = 10^{−5}. Note that the true value of α falls outside of the assumed range of our initial prior, while the range of β in our initial prior is two orders of magnitude larger than its true value. This level of uncertainty about parameter values is typical in data fitting for biological models.
Figure 1 highlights a key aspect of our algorithm: the local minimization can lead to local minima outside of the range of the initial prior. Furthermore, our definition of the historical prior ensures that successive iterations can find solutions within the largest hypercube of optimized solutions in parameter space. In this example, the algorithm moves away from the U(0, 100) prior for α and finds a distribution around 240 (the true value) after three iterations, while in the case of β, the distribution collapses to values around 0.15 after one iteration. Although the algorithm finds the minimum θ^{‡} after five iterations, the algorithm is terminated after nine iterations, when the posterior distributions are similar (according to the MW test) and the mean errors have also converged (table 1). The estimated parameters for this noisy dataset are θ_{k*}^{‡} = [251.7189, 0.1530]. In fact, the error of the estimated parameter set is lower than that of the real parameters: E_{𝒟}(θ^{‡}) = 26.65 < E_{𝒟}(θ_{real}) = 28.26, because of the noise introduced in the data. When a dataset without noise is used, the algorithm finds the true value of the parameters to nine significant digits (not shown).
3.2. Susceptible–Infected–Recovered epidemics model
Susceptible–Infected–Recovered (SIR) models are widely used in epidemiology to describe the evolution of an infection in a population [11]. In its simplest form, the SIR model has three variables: the susceptible population S, the infected population I and the recovered population R: 3.2
The first equation describes the change in the susceptible population, growing with birth rate α and decreasing by the rate of infection γIS and the rate of death dS. The infected population grows by the rate of infection γIS and decreases by the rate of recovery vI and the rate of death dI. The recovered population grows by the rate of recovery vI and decreases by the death rate dR. Here, we use the same form of the equations as performed by Toni et al. [22].
The data generated from model (3.2) (table 2 in the electronic supplementary material) were obtained directly from the study of Toni et al. [22]. Hence, the original parameter values were not known to us and further we assumed the initial conditions also to be unknown and fitted them as parameters. We used algorithm 1 to estimate α, γ, v and d, and initial conditions S_{0}, I_{0} and R_{0}. The prior marginal distributions for all parameters were set as U(0, 100). The other parameters were set to: J = 1000, B = 50, p_{m} = 0.95 and Tol = 10^{−5}. The algorithm converged after six iterations. Figure 2a shows the prediction of the model (3.2) with the best parameters estimated by our algorithm. The fit is good with little difference between the curves obtained using the real initial conditions and those estimated by our method.
The posterior distributions after six iterations of the algorithm are shown in figure 2b. The errors obtained after each local minimization in a decreasing order on each iteration are shown on a semilogarithmic scale in figure 2c. We can observe how the errors decrease by several orders of magnitude over the first three iterations and converge steadily during the last three iterations until ϕ_{k} ≤ Tol.
3.3. An inducible genetic switch from synthetic biology
The use of inducible genetic switches is widespread in synthetic biology and bioengineering as building blocks for more complicated gene circuit architectures. An example is shown schematically in the inset of figure 3a. This environmentresponsive switch is used to control the expression of a target gene G (usually tagged with green fluorescent protein or gfp) through the addition of an exogenous small molecule I_{1} (e.g. isopropyl thiogalactopyranoside or IPTG). The input–output behaviour of this system can be described by the following ODE [2,27]: 3.3
Here, αk_{1} is the basal activity of the promoter P_{1} and dG is the linear degradation term. The second term is a Hill function that models the cooperative transcription activation in response to the inducer I_{1} with maximum expression rate k_{1}, constant K_{1} and Hill coefficient n_{1}.
The lacI–P_{lac} switch has been characterized experimentally with response to different doses of IPTG in different studies [26,28]. Equation (3.3) can be solved explicitly and one can use nonlinear least squares and the analytical solution to fit data at stationarity (i.e. at long times) and estimate α, n_{1}, K_{1} and the ratio k_{1}/d. These estimates have been obtained, from an earlier study [28], assuming equilibrium (Ġ = 0) and initial condition G(0) = 0 (table 2).
In fact, the experiments measured time series of the expression of G every 20 min from t = 140 to 360 min for different doses of inducer I_{1} = 0, 3.9 × 10^{−4}, 1.6 × 10^{−3}, 6.3 × 10^{−3}, 2.5 × 10^{−2}, 0.1, 0.4, 1.6, 6.4, 12.8 mM, with two different reporters (gfp30 and gfp34; see tables 3 and 4 in the electronic supplementary material). Instead of assuming equilibrium and using only the data for t > 300 min as done previously [28], we apply algorithm 1 to all the data with the full dynamical equation (3.3) to estimate θ = [α, k_{1}, n_{1}, K_{1}, d]. In this case, we used initial priors U(0, 1) for α and n_{1}; and U(0, 20) for k_{1}, K_{1} and d. The other parameters were set to: J = 1000, B = 50, p_{m} = 0.95 and Tol = 10^{−5}.
Our algorithm converged after five iterations to the parameter values in table 2. The parameter estimates provide good fits to both the time courses (figure 3b) and to the dose–response data (figure 3a). The values of K_{1}^{‡} and n_{1}^{‡} obtained here are similar to those obtained from the study of Wang [26] by using only stationary data. This is reassuring as these parameters are related to the dose threshold to halfmaximal response and to the steepness of the sigmoidal response, both static properties. On the other hand, the values of α and the ratio k_{1}/d differ to some extent owing to the (imperfect) assumption by Wang [26] that steady state had been reached at t = 300 min. As figure 3b shows, G is not at steady state then. Hence, the parameter values obtained with our method should give a more faithful representation of the true dynamical response of the switch.
4. Discussion
In this work, we have presented an optimization algorithm that brings together ingredients from EAs, local optimization and SMC. The method is particularly useful for determining parameters of ODE models from data. Our approach can also be used in other contexts where an optimization problem has to be solved on complex landscapes, or when the objective function cannot be written explicitly. The algorithm proceeds by generating a population of solutions through Monte Carlo sampling from a prior distribution and refining those solutions through a combination of local optimization and culling. A new prior is then created as a mixture of a historical prior (which records the broadest possible range of solutions found) and the distribution of the optimized population. This iterative process induces a strong concentration of the Monte Carlo sampling through local optimization which accelerates convergence and increases precision, while at the same time the presence of the historical prior allows the possibility that solutions can be found outside of the initial presumed ranges for the parameter values.
We have illustrated the application of the algorithm to ODE models of biological interest and have found it to perform efficiently. The algorithm also works well when applied to larger problems with tens of parameters in a signal transduction model (paper in preparation). The efficiency of the algorithm hinges on selecting appropriate running parameters and priors. For instance, the number of samples from the prior J should be large enough to allow for significant sampling of the parameter space while small enough to limit the computational cost. We have found that simulating J = 350–500 points in models of up to 10 parameters and keeping the best 15 per cent of the local minima leads to termination within fewer than 20 iterations. In our implementation, the Nelder–Mead minimization is capped at 300 evaluations. These guidelines would result in up to 300 000 evaluations of the objective function per iteration. Therefore, our method can become computationally costly if the objective function is expensive to evaluate, e.g. in stiff models that are difficult to solve numerically. In essence, our algorithm proposes a tradeoff: fewer but more costly iterations. It is important to remark that, as with any other optimization heuristic for nonconvex problems, there are no strict guarantees of convergence to the global minimum. Therefore, it is always advisable to run the method with different starting points and different settings with enough sampling points in parameter space to check for consistency of the solutions obtained.
The generation of iterative samples of the parameters draws inspiration from Monte Carlo methods [6,18,22] but without pursuing the strict guarantees that the nested structure of the distributions provides in ABC–SMC. Our evolutionary approach adopts a highly focused Monte Carlo sampling driven by a sharp local search with culling. Hence, our iterative procedure generates samples that only reflect properties of the set of local minima (up to numerical cutoffs) without any focus on the global convergence of the distributions. As noted from the study of Toni and coworkers [22], the distributions of the parameters (both their sequence and the final distributions) give information about the sensitivity of the parameters: parameters with narrow support will be more sensitive than those with wider support. Future developments of the method will focus on establishing a suitable theoretical framework that facilitates its use in model selection. Broadening the choice of historical priors may be a way of establishing such framework. Currently, we make no assumptions about the parameter space, hence we use uniform distributions over the support of all the posteriors. However, other distributions (e.g. exponential or lognormal) may be considered as a way to bias the historical prior towards regions of particular interest. Other work will consider the possibility of incorporating a stochastic ranking strategy in the selection of solutions, similar to the one present in the SRES algorithm [17], in order to solve more general optimization problems with complex feasible regions.
Acknowledgments
The authors would like to thank C. Barnes, T. Ellis, E. Garduño, H. Harrington, M. Owen, M. Stumpf and S. Yaliraki for their comments and suggestions. M.B.D. is supported by a BBSRCMicrosoft Research Dorothy Hodgkin Postgraduate Award. This work was partly supported by the US Office of Naval Research and by BBSRC through LoLa grant BB/G020434/1 (M.B.) and SABR grant BB/F005210/2 (M.B.), and by EPSRC through grant EP/I017267/1 under the Mathematics underpinning the Digital Economy programme (M.B.).
Footnotes
↵1 We thank Markus Owen of the University of Nottingham for suggesting this example.
 Received November 8, 2011.
 Accepted December 16, 2011.
 This journal is © 2012 The Royal Society