## Abstract

Bayesian methods are advantageous for biological modelling studies due to their ability to quantify and characterize posterior variability in model parameters. When Bayesian methods cannot be applied, due either to non-determinism in the model or limitations on system observability, approximate Bayesian computation (ABC) methods can be used to similar effect, despite producing inflated estimates of the true posterior variance. Owing to generally differing application domains, there are few studies comparing Bayesian and ABC methods, and thus there is little understanding of the properties and magnitude of this uncertainty inflation. To address this problem, we present two popular strategies for ABC sampling that we have adapted to perform exact Bayesian inference, and compare them on several model problems. We find that one sampler was impractical for exact inference due to its sensitivity to a key normalizing constant, and additionally highlight sensitivities of both samplers to various algorithmic parameters and model conditions. We conclude with a study of the O'Hara–Rudy cardiac action potential model to quantify the uncertainty amplification resulting from employing ABC using a set of clinically relevant biomarkers. We hope that this work serves to guide the implementation and comparative assessment of Bayesian and ABC sampling techniques in biological models.

## 1. Introduction

Parametric models have been hugely popular in the fields of systems and cellular biology since the inception of mathematical biological modelling [1]. Relative to phenomenological models, which simply seek to reproduce observations, parametric models boast advantages in interpretability, as the components and parameters of a well-designed model are meant to correspond directly to biological entities. Because of this correspondence, intuition into the target system can aid in formulation and parametrization of models, and conversely, the results obtained from the fitting of these models to biological data can give insights into the operation of the target system [2–5]. The so-called ‘inverse problem’ of determining model parameters from model observations, generally by automated means, can give insight into biological systems not only by determining the values of biologically relevant parameters that are most likely to reproduce the observed data, but also by determining to what degree these parameters are unique [6].

Variation over model parameter estimates can come from one of several sources, and informs either our understanding of the biological system or our faith in the suitability of the model to explain the system. We expect measurement noise to contribute to model uncertainty in a well-defined manner proportional to its magnitude [7]. Greater uncertainty can be attributed either to inherent biological variability, such as that between individuals in a population, or, in cases of extreme variation, to ‘model unidentifiability’—the presence of potentially infinitely many equally valid parametrizations of a model given a set of observations [8,9]. Such unidentifiabilities may arise from attempting to model portions of the system which are impossible to observe experimentally, and should lead either to a reformulation of the model or a refinement of the data collected to fit it. Thus, while the form of a parametric model suggests a belief in a single set of ‘true’ parameters, this is unrealistic when fitting to most biological data. Consequently, automated fitting methods such as maximum likelihood and least squares that output a single set of ‘optimal’ values may lose a great deal of information from the data.

In the face of these concerns, Bayesian inference has become an increasingly attractive solution for parameter fitting in biological models [10]. Instead of single parameter values, Bayesian methods output a posterior *distribution* over the model parameters. This allows quantification of the spread over the parameter values as well as showing potential dependencies between model parameters. In addition to highlighting potentially unidentifiable model parameters, which would be characterized by extreme variance and/or correlation with another parameter [11], Bayesian methods can qualify and quantify inherent biological variation about parameters, such as the expected distribution of model parameters across a population of cells from which data were aggregated [10]. While other methods of uncertainty quantification exist, such as calculation of standard confidence intervals or regression-based sensitivity analyses, these methods can perform poorly in the face of large amounts of noise, large parameter spaces and highly nonlinear models [8]. Because Bayesian methods are more resistant to these conditions, they are a far more reliable and general approach for systems biology modelling [10].

One field that can benefit greatly from Bayesian modelling approaches is the field of cardiac action potential modelling. Over the last half century, these models have undergone an evolution in complexity commensurate with the increase in quality and quantity of cellular and sub-cellular data available [5,12]. As a result, the days of Hodgkin and Huxley, where models were small enough to be fitted by hand to data collected independently from a single source [13], are gone. The difficulty in gathering enough data to fit these larger models has driven many modellers to adapt parameters or even entire components from studies conducted under differing experimental conditions, or even from studies on cells from different species [14,15]. While care is taken to adjust these models so that overall outputs match experimental data, these practices raise concerns as to model identifiability, or the degree to which the parameters for the model are unique [16]. Some modellers are addressing these concerns by the well-documented application of Bayesian inference techniques, which unambiguously link models to the data used to fit them, and provide a comprehensive characterization of posterior variability in the model parameters [17].

Studies involving cardiac models, however, may face restrictions on system observation that prevent the application of Bayesian techniques. Bayesian inference requires a calculable ‘likelihood function’—a function quantifying how likely it is that a given set of parameters would generate a given model output. When we cannot directly observe the quantity we are attempting to model, we cannot calculate such a likelihood. In cardiac and other biological systems, such observational constraints can be encountered when recording data *in vivo*. Such *in vivo* recordings are important for current efforts to create personalized heart models, which aim to advance clinical treatment by allowing for *in silico* patient-specific risk assessment for atrial fibrillation and other complications [18]. Although the output of a cardiac model is generally either the current passing through one or more ion channels or the voltage across the cell membrane over time, it is generally impossible to record such time-dependent data *in vivo*, forcing experimentalists to instead record or report only lower-dimensional biomarkers that attempt to characterize the action potential [19,20]. Thus, data recorded or reported in these clinical studies may be limited to so-called ‘summary statistics’—data such as the time required for a cell to repolarize, the peak membrane potential or the steepest change in membrane potential—rather than the full model output. Because these summary statistics are often highly nonlinear functions of the membrane voltage, it is difficult to reason about the statistical properties of the error on these outputs even when such properties are well understood for the underlying voltage trace. Without such knowledge, the specification of a likelihood function, and consequently, the application of Bayesian inference, becomes impossible. Even when a system is fully observable, if one chooses to employ a stochastic mathematical model of the underlying system, the randomness in the model can make the deterministic calculation of the likelihood of a set of parameters intractable, and Bayesian inference is similarly impossible. There are currently proposed stochastic models for ion channels [21], which would face this problem if employed.

In these situations where we cannot perform *exact* Bayesian inference, we can instead perform *approximate* Bayesian inference. Approximate Bayesian computation (ABC) methods, appropriately, approximate the true likelihood with another proximity metric—such as a custom distance function between vectors of summary statistics, or the squared distance between two realizations of a stochastic model observation—and seek to maximize this instead. The variance of the posteriors produced by these methods is generally inflated relative to the true value, as uncertainty resulting from measurement noise or the underlying system is compounded with uncertainty resulting from incomplete system observation, randomness in the model, or both. Despite this, the incredible generality of ABC methods has made them an area of current interest for the modelling of cardiac cells as well as other biological systems [22–24].

Owing to their generally differing spheres of use, little theoretical or practical literature exists comparing Bayesian and approximate Bayesian inference on a given class of models (although an example on a toy model may be seen in [25]). This might prove frustrating to biological modellers. Cardiac modellers, for example, may wish to know how much uncertainty is introduced by the application of approximate Bayesian methods on clinical summary statistics relative to true Bayesian posteriors that would be obtained using full time-course data. The evaluation of the information content of summary statistics is currently an area of focus in ABC research. Because individually ‘sufficient’ summary statistics—ones that represent the model output without loss of information—are nearly non-existent in biological studies, modellers generally employ ensembles of summary statistics with the intention of capturing as much information as possible [26]. Because the selection of optimal summary statistics is often data-dependent, there is not yet a universally agreed upon means of doing so in an approximate context [27,28]. Cardiac action potentials, however, may be observed fully or partly depending on the system of study, and thus a comparative analysis of the approximate and exact posteriors could give *a priori* quantification of the information retained by a given set of summary statistics.

To enable such comparative analyses in cardiac models, we present two ABC sequential Monte Carlo (ABC-SMC) samplers that we have generalized to perform either Bayesian or approximate Bayesian inference with minimal adjustment. The method for generalizing these samplers was proposed originally by Wilkinson [29] for ABC rejection and ABC Markov chain Monte Carlo (ABC-MCMC) samplers, but are extended in this paper to the more powerful class of SMC samplers, which use importance sampling to provide performance gains [24,25,30,31]. We compare the Bayesian and approximate Bayesian inference capabilities of two sampling approaches, based on ABC-SMC algorithms proposed by Toni *et al.* [32] and Del Moral *et al.* [33], on a range of model problems, highlighting the relative merits of each sampling scheme and their sensitivities to key algorithmic parameters. We culminate with a novel *a priori* assessment of the information content within a set of commonly reported summary statistics of the cardiac action potential, which we determine by comparing the posteriors produced over the O'Hara–Rudy action potential model [34] by the approximate Del Moral sampler to those produced by the exact Del Moral sampler using simulated time course data. We limit ourselves to single action potential recordings, which have been shown to contain a great deal of information about the parameters we will be considering [17], though the analyses presented could be easily extended to include multiple repeats. We hope that this will serve as an example of both exact and approximate SMC implementation and application, as well as a novel study on summary statistic evaluation in the field of cardiac modelling, the methodology of which may be extended to other areas of time-series analysis.

In §2, we present and compare the two SMC samplers, provide pseudocode implementations for each and discuss the interpretation of key algorithmic parameters. In §3, we outline the three models we chose for study along with our simulation conditions and common algorithmic settings for each posterior inference experiment. In §4, we present the results of posterior inference on each of the model problems, and in §5, we provide a full computational specification and a brief discussion of computational complexity. In §6, we discuss the implications of the results of our posterior inferences for choosing between the two samplers for a given system, and for the choice of summary statistics in a cardiac action potential model. Finally, in §7, we conclude and discuss potential future directions for this work.

## 2. Two approximate Bayesian computation sequential Monte Carlo samplers

### 2.1. The acceptance kernel

In the absence of a calculable likelihood function, most ABC methods employ the squared error distance metric:
2.1to define the proximity of simulated data—in this instance, simulated according to model *f*( · , · ) at *n* points **x** under parameters *θ*—to (noisy) experimental data **y**. While minimization of the squared error function is appealing due to its equivalency to maximizing a Gaussian likelihood, which is commonly employed in biological models to represent observational noise, ABC methods make no assumptions about the distribution of this error. Instead, they typically uniformly accept or reject points in parameter space according to the binary function
2.2where *ɛ* represents a ‘threshold’ used to account for model error and/or observational noise. This deterministic kernel function is sometimes referred to as a ‘top-hat kernel’ due to its shape.

Inference using such a top-hat kernel is exact only when model/system error is uniform over a sphere of radius *ɛ*. Because this is generally not the case for models of biological systems, employing a top-hat kernel may lead to an overestimation of posterior variance. Wilkinson suggests [29] that by replacing the indicator function in equation (2.2) with a probability density function (PDF) representing the true distribution of error, ABC rejection and ABC-MCMC samplers can be adjusted to perform exact inference.

If our observation error is instead independently Gaussian with constant variance *σ*_{e} at each time point:
2.3where are the ‘true’ system parameters, we should instead adopt the following smooth acceptance kernel to match that error distribution and allow for us to perform exact inference:
2.4where *c*_{ɛ} is a normalizing constant and *ɛ* is an adjustable ‘tolerance’ controlling the width of the acceptance kernel. The simplest means of performing exact inference using such a kernel would be rejection sampling (algorithm 1).

The efficiency of a rejection sampler employing the acceptance kernel in equation (2.4) is highly dependent on the choice of *c*_{ɛ}. If the constant is set too high, the algorithm will require too many iterations, and if the constant is set too low, then the likelihood function will be ‘decapitated’, exceeding a value of 1 over some range of parameter space and resulting in uniform sampling over this range regardless of the true distribution (see figure 1 for a graphical illustration).

ABC-SMC samplers employ a sequence of decreasing thresholds *ɛ*_{0} > *ɛ*_{1} > … > *ɛ*_{T} to sample from a sequence of intermediate distributions, smoothing the difference between the prior and posterior. Adopting the probabilistic acceptance kernel in equation (2.4) will allow us to adjust these algorithms to perform exact inference as the iteration proceeds, as when . In this case, the tolerance parameter *ɛ* can be analogized to the ‘temperature’ parameter in simulated annealing. Thus, the adjusted algorithm similarly ‘heats’ the likelihood to an initial value *ɛ*_{0}, at which point it is highly smooth, then slowly ‘cools’ the system towards the true posterior through a sequence of intermediate distributions defined by *ɛ*_{t}. Such sequential strategies have shown benefits even for so-called ‘static’ systems, where the likelihood function is known and MCMC methods would involve only a single posterior [30].

### 2.2. The generalized Toni algorithm

The first sampler we consider is based on the one proposed by Toni *et al.* [32]. This sampler maintains a constant number of particles *N* that, at each intermediate distribution, are updated by rejection sampling (along with local perturbation according to a fixed proposal distribution) from the previous posterior estimate, but with a lower error threshold. This is represented schematically in figure 2.

We propose a generalized version of the Toni sampler in algorithm 2 that employs an arbitrary acceptance kernel *ρ*_{ɛ}(*θ*). Let *π*(*θ*) represent the prior distribution on parameters *θ*. We will draw a series of intermediate distributions composed of *n* particles {*θ*^{(i)}_{t}} estimating the posterior, where the subscript *t* denotes the index of the intermediate distribution and the superscript *i* indicates the index of the particle *within said estimate*. We define a (series of) proposal distribution(s) *K*_{t}(*θ*′|*θ*) which describe the probability of a particle moving from *θ* to *θ*′ in parameter space between iterations. Finally, we define weights for these particles, {*w*^{(i)}_{t}}, that are updated after each iteration according to the following scheme:
2.5These particle weights {*w*^{(i)}_{t}} are calculated according to a scheme proposed by Del Moral *et al*. [35] that minimizes their variance with respect to those of the previous estimate {*w*^{(i)}_{t−1}}.

The *α* parameter in algorithm 2 can be thought of as the parameter defining the ‘cooling schedule’ in simulated annealing, as it controls the per cent reduction of *ɛ*_{t} at each iteration. If the posterior estimate update does not succeed within a given number of iterations (*I*_{max} = 5 × 10^{6}), we reattempt with a more conservative reduction of *ɛ*_{t} for the next iteration. This allows adaptive adjustment of the threshold and has given us good results with the top-hat acceptance kernel, where the value of the squared error to be minimized is related to the magnitude of the model output and would otherwise require much problem-specific adjustment.

A major advantage of this scheme is its amenability to parallelization. Because rejection sampling (lines 7–14) is performed independently for each particle, sampling can be divided between different threads/processes. The calculation of weights (lines 20–22) can also be parallelized once rejection sampling for the current posterior estimate has completed. In our Python implementation, this parallelization was accomplished through the use of the Pathos multiprocessing module [36,37].

### 2.3. The generalized Del Moral algorithm

A potential drawback of the Toni algorithm is the complexity of its weighting scheme. We note that this weighting scheme (equation (2.5)) has an asymptotic cost of *O*(*N*) for each particle, leading to an *O*(*N*^{2}) cost for updating weights during distribution update. While the cost of simulation is usually assumed to dominate the runtime of the sampler, this expensive weighting step may add notable overhead for large numbers of particles regardless of the model chosen.

This computational complexity in part motivated Del Moral *et al.* to propose an alternative ABC-SMC implementation based on Metropolis–Hastings updates, rather than rejection sampling, that allowed them to employ an approximation of the weighting scheme in equation (2.5):
2.6This scheme boasts *O*(1) calculation of particle weights, leading to only *O*(*N*) cost for a single distribution update. While this approximation overestimates the variance of the incremental weights relative to the scheme in equation (2.5), the authors believe the approximation should be sufficiently close as long as sequential posterior estimates are sufficiently close (i.e. *π*_{t}(*θ*) ≈ *π*_{t−1}(*θ*)).

The Del Moral algorithm effectively maintains a set of *n* independent Markov chains that are periodically resampled when the effective sample size (ESS):
2.7falls below some threshold (generally taken to be *n*/2), indicating a significant drop in entropy. This implementation is represented schematically in figure 3, and a generalized version employing an arbitrary acceptance kernel *ρ*_{ɛ}(*θ*) is described in algorithm 3.

The adaptive *ɛ*_{t} schedule (controlled by a cooling parameter *α* identically to our implementation of the Toni algorithm in algorithm 2) remains the same as in the Toni implementation, but as this sampling scheme performs a fixed number of Metropolis–Hastings steps for each intermediate distribution update, there is no need to set a maximum number of iterations *I*_{max} for the posterior update step. Instead, as the weighting scheme is effectively counting the number of particles that would ‘survive’ the *ɛ*_{t} update, the conditional on line 10 determines whether this update would result in all particles being rejected by the kernel *ρ*_{ɛt}. One might replace this check with a more general condition, where the update fails if the ESS drops below some critical threshold, which we expect would increase resistance of the posterior estimates to underdispersion resulting from an overly aggressive choice of cooling schedule.

We have generally been using a Gaussian proposal distribution of fixed width, chosen to be 10% of the variance of the prior for each parameter. However, we have found that adaptively adjusting the width of the kernel (scaling the covariance matrix by a constant) based on the acceptance rate of draws, as is done for many MCMC samplers, has helped our Del Moral implementation. We use an update scheme adopted from the PyMC [38] implementation of tuned MCMC that increases/decreases proposal variance proportionally to deviation of the acceptance rate from the 20–50% range at fixed intervals (every 100 proposals)—see electronic supplementary material, algorithm A.1, for more details.

## 3. Models and methods

### 3.1. The linear model

The following linear model was chosen as an initial test bed for the two algorithms:
3.1This model has a known Gaussian posterior over the slope parameter *a* when a conjugate Gaussian prior *π*(*a*) is assumed over the slope parameter:
3.2

For each fitting experiment, data were simulated as a single vector of *n* noisy observations **y** taken at fixed time points **x;** *n* = 301 time points were chosen, symmetrically about 0: **x** = {−150, −149, …, 0, …, 149, 150}. The true value of *a* was chosen as , while the width of the prior *σ*_{a} was set to 10, and the magnitude of Gaussian noise *σ*_{e} was set to 1.

A Gaussian proposal distribution was chosen for both samplers during both exact and approximate inference. The width of this distribution was periodically updated in the Del Moral scheme as described in electronic supplementary material, algorithm A.1, in order to heuristically attain a 20–50% acceptance rate of Metropolis–Hastings steps.

When employing the top-hat kernel (equation (2.2)), the initial tolerance *ɛ*_{0} for both samplers was set to , the maximum error among *N* samples drawn from the prior. Both algorithms were set to terminate when , or when *ɛ*_{t} < 301, the sample size of the data. This second criterion was introduced because the expected value of at the true solution, , is equal to the number of observations *n*, which thus sets a reasonable estimate for the final error *ɛ*_{T} and would ensure timely termination. In the presence of noisy observations, the acceptance probability of the top-hat kernel (equation (2.2)) approaches 0 for all *θ* as when a finite number of observations are employed, and thus a positive limit is expected on *ɛ*_{T}.

When employing a Gaussian kernel (equation (2.4)), the initial tolerance *ɛ*_{0} for both samplers was set to 1000, as it gave a reasonable acceptance rate for the first round of rejection sampling. Both algorithms were set to terminate when , or when *ɛ*_{t} = 1, as in the probabilistic kernel (equation (2.4)) the interpretation of *ɛ* is a multiplier of the true standard deviation of the error model, and thus decreasing below 1 would cause underdispersion in the final estimate.

### 3.2. The polynomial model

The following polynomial model was chosen for subsequent comparison to investigate the effect of high-dimensional parameter space on the samplers:
3.3where **x** ranges from −1 to 1 in 301 equally spaced increments, *σ*^{2}_{e} = 1, and *c*_{0} = 6, *c*_{1} = −6, *c*_{2} = 5, *c*_{3} = −5, *c*_{4} = 4, *c*_{5} = −4, *c*_{6} = 3, *c*_{7} = −3, *c*_{8} = 2, *c*_{9} = −2, *c*_{10} = 1, *c*_{11} = −1, *c*_{12} = 2, *c*_{13} = −2, *c*_{14} = 3, *c*_{15} = − 3. Output from this model is depicted graphically in electronic supplementary material, figure B.1.

All parameters were given uniform priors over the range (−10, 10) to remove any effects of the prior PDF from particle weighting in the SMC algorithms. An independent Gaussian proposal distribution was set over each model parameter for both samplers. The width of this distribution was periodically updated in the Del Moral scheme as described in electronic supplementary material, algorithm A.1.

For both samplers, the number of particles was set to 1000 and the cooling schedule *α* was set to 0.2. For approximate inference using the top-hat kernel (equation (2.2)), the initial and terminal conditions were the same as those used for the linear model in the previous section. For exact inference using the Gaussian kernel (equation (2.4)), the initial tolerance was set to *ɛ*_{0} = 100, and the algorithms were set to terminate when *ɛ*_{t} = 1.

### 3.3. The O'Hara–Rudy action potential model

The final model chosen for examination was the O'Hara–Rudy cardiac action potential model. Like most action potential models, the O'Hara–Rudy formulation models the change in membrane potential as a linear function of the total transmembrane current *I*_{tot}(*t*):
3.4which in turn is treated as the sum of *m* currents. Each such current is associated with a particular ion channel—a protein spanning the membrane that specializes in the transport or exchange of specific types of ions. These channels generally go through changes of shape in response to changes in cellular conditions (generally transmembrane potential), which affect the rate at which ions may pass through them. This allows for the restriction of ion channel activity to certain portions of the action potential, which is essential for restoring the cell to an excitable state between firings.

The O'Hara–Rudy model employs the so-called Hodgkin–Huxley formalism to model 13 ionic currents. This formalism, pioneered by the eponymous authors in 1952 [13], models transmembrane ionic conductance as a polynomial function of one or more ‘gating variables’, which are in turn modelled by time- and voltage-dependent ordinary differential equations, as exemplified below:
3.5and
3.6where all *s*_{x,i} are gating variables associated with current *I*_{x}, *V*_{x} is the ‘reversal potential’ of the channel (the potential at which no net ionic flow is observed through the channel) and *G*_{x} is the maximum conductance attainable through the channel. This formalism treats each ion channel as a collection of *M*_{x} types of subunits, each with multiplicity *k*_{x,i}, that must all be active in order to allow for the transport/exchange of ions. The activation probability of each of these subunits is modelled by a gating variable *s*_{x,i}, which can also be thought of as the active fraction of each subunit across the whole membrane. While the more general Markov model formulations can be (and have been) employed for ion channel models in an effort to more accurately capture their kinetics, the added computational complexity limits their use in large-scale tissue simulations [39]. Thus, Hodgkin–Huxley style models retain their popularity due to their simplicity and scalability.

In this study, inference was performed over the 13 maximum ionic conductance parameters (*G*_{x} in equation (3.5)). These are linked directly to expression levels of the associated genes, which are in turn a major source of variation between cell types and even individuals. While inference could additionally be performed on the parameters controlling gating (equations (3.5) and (3.6)), experimentalists generally perform these types of fitting using current data gathered from patch clamp experiments [40]. As these studies are typically performed prior to whole-cell action potential recordings, we treat these gating parameters as known in order to be congruous with modern practices.

For each current, a uniform prior of between 50 and 200% of the values reported in the original paper was assumed. An independent Gaussian proposal distribution was set over each conductance parameter, with standard deviation equal to 10% of its reported value (table 1).

Action potentials were simulated using the functional curation extension to Chaste [41,42], using a CellML [43] model file for the O'Hara–Rudy model annotated with metadata tags to allow the adjustment of maximum conductance parameters. The transmembrane potential was recorded every 0.25 for 500 ms.

For exact inference (SMC), the full time course data were considered by the sampler. ‘Experimental’ data were generated by simulating the action potential using the parameter values reported in the original publication and applying independent Gaussian noise according to at each time point [34]. Assuming knowledge of this noise model, the Gaussian acceptance kernel (equation (2.4)) with *σ*_{e} = 0.25 was employed.

For approximate inference (ABC-SMC), five commonly reported clinical summary statistics of the action potential were considered:

(1) APD90—the time required for the membrane to repolarize to 90% of its resting membrane potential.

(2) APD50—the time required for the membrane to repolarize to 50% of its resting membrane potential.

(3) Peak potential—the largest positive membrane potential observed over the course of the action potential.

(4) Resting potential—the steady-state membrane potential (approximated by the most negative membrane potential observed during the action potential).

(5) Maximum upstroke velocity—the largest value of d

*V*/d*t*observed during the action potential.

These quantities, represented visually in figure 4, were calculated using the standard cardiac library available in functional curation Chaste [41]. ‘Experimental’ data were generated by simulating the action potential using the parameter values reported in the original publication, adding noise as was done for exact inference data, and then calculating the five summary statistics [34]. The five summary statistics were treated as a single vector, with each being normalized by its ‘experimental’ value to control for differences in scale. Assuming no knowledge of the noise model, the top-hat acceptance kernel (equation (2.2)) was employed over the vector of five statistics.

Owing to the computational complexity added by calculating summary statistics, the Del Moral sampler was limited to 1000 particles for both probabilistic and approximate inference. To ensure gradual evolution of the posterior, and consequently make equation (2.6) a good approximation of the true importance sampling weights, the cooling schedule *α* was set to 0.2, and resampling was triggered at 60% ESS (as opposed to 50% on line 16 of algorithm 3). An additional minimum of 30% ESS was set for all posterior updates (as opposed to the simpler non-zero requirement on line 10).

The approximate Del Moral sampler was set to terminate when the threshold *ɛ* fell below the value of the objective function evaluated for data simulated under the generating parameters (table 1). The probabilistic solver was set to terminate at *ɛ* = 1, the point at which exact posterior sampling was being performed.

## 4. Results

### 4.1. The linear model

The first model chosen for a comparative study of the two samplers was the linear growth model with Gaussian observation error (equation (3.1)). Using the known theoretical posterior on the model parameter (equation (3.2)), we can verify if the empirical posteriors produced by our exact SMC algorithms are correct. We note that the theoretical posterior mean, as shown in figures 5, 7 and 8, differs from the generating value *a* = 5 by less than 0.01%.

We began with a comparison of the ABC inference capabilities of both samplers on the model, performing posterior inference with the Toni and Del Moral samplers using the standard top-hat kernel (equation (2.2)), which does not match the true (Gaussian) distribution of observation error (equation (3.1)). We first compared the two samplers with a common number of particles (1000) and *α* = 0.5, aiming to halve the maximum error among particles in the population after each update. Both samplers terminated at *ɛ*_{T} = 301 and the resulting posteriors are shown in figure 5. We see that while both posteriors capture the theoretical mean and span the same range of values, the posterior produced by the Del Moral sampler shows significantly higher variance than the one produced by the Toni sampler, which shows the uniformity expected when employing a top-hat kernel.

We sought to investigate the effects of increased population size and relaxed cooling schedule on the uniformity of the Del Moral posterior estimates for the linear model. Both of these changes would increase the similarity between subsequent posterior estimates, and thus would be expected to improve the approximation of the true importance sampling weights employed by the Del Moral algorithm [33]. In figure 6, we see that either by increasing the number of particles from 1000 to 10 000 or by decreasing the cooling schedule *α* below 0.3, the variance in the resulting Del Moral posterior estimate decreases dramatically and it approaches the uniformity of the Toni posterior. While the variance in the final posterior decreases proportionally to *α* for both 1000 and 10 000 particle samplers, the effect is much more pronounced at the lower population size.

We next ran both ABC-SMC variants on the same linear model, but employing the probabilistic acceptance kernel that matches the observational error distribution (equation (2.4)). Informed by the results of the approximate study of the linear model, we decreased the cooling schedule parameter *α* to 0.25 for both samplers and increased the number of particles employed by the Del Moral sampler to 10 000 to improve the quality of the weighting approximation. We continued to use 1000 particles for the Toni sampler due to its reasonable performance under these settings when performing approximate inference.

Applying the probabilistic Del Moral sampler, we see in figure 7 that the resulting posterior estimate shows a high degree of homology with the theoretical posterior. The histogram of values of the slope parameter *a* appears normally distributed, and captures the theoretical mean almost exactly. Additionally, the quantile–quantile plot, which plots the per cent point function (PPF) of the theoretical posterior distribution (equation (3.2)) against that of the empirical posterior, shows a linear relationship with slope 1, indicating that the two distributions are Gaussian with equal variance.

To similarly employ the probabilistic Toni sampler, however, we must additionally consider the value of the normalizing constant *c*_{ɛt} in equation (2.4). In the Del Moral sampler, the constant is cancelled by the Metropolis–Hastings update, but in rejection-based samplers like algorithm 2, it controls the trade-off between a reasonable acceptance rate (line 13) and correctness of the sampler. Wilkinson suggests choosing for *c*_{ɛt} the modal value of the acceptance kernel, where SE(*θ*) = 0 [29], which would improve the overall acceptance rate but still bound the function between 0 and 1. In the presence of noise, however, no single parameter set will generate such error-free predictions, so we instead employed the empirical mode , where is the maximum-likelihood estimate of the parameters given the noisy data.

In figure 8, we see the sensitivity of the Toni sampler to three values for *c*_{ɛt} chosen relative to this empirical mode. When , deliberately below the modal value, we see the posterior take on a nearly uniform character around the mean, with high nonlinearity in the quantile–quantile relationship confirming disagreement with the theoretical Gaussian posterior. Conversely, when , the algorithm terminates early with final *ɛ*_{T} > 3 due to repeatedly exceeding the criterion (line 15 of algorithm 2), and the marginal and quantile–quantile plots reveal the final posterior estimate to be overdispersed relative to the theoretical posterior in equation (3.2). Finally, when , we see overall agreement between the empirical posterior and the theoretical one, despite some levelling off near the mode. In both cases of and the value of the acceptance kernel was observed to take on values greater than 1, although with far less frequency in the latter, and only at later iterations.

### 4.2. The polynomial model

We next sought to explore the effect of the dimension of parameter space on the convergence of both algorithms by performing inference on the polynomial model detailed in equation (3.3) with an increasing number of free parameters, *P*. For a number of free parameters increasing from 3 up to 16, while keeping all other parameters constant between algorithms, we measured the convergence of both algorithms by recording the standard deviation of the posterior weights as the iteration proceeds, which should approach a constant value. For the Toni sampler, we performed only approximate inference using the top-hat kernel (equation (2.2)) over the full trace to avoid the confounding effects of the choice of the normalizing constant (as illustrated in the previous section). For the Del Moral sampler, we performed inference using both the exact and approximate variants.

As we see in figure 9, all samplers show an initial increase in posterior weight variance from the uniform prior across all values of *P*. In both the approximate and exact Del Moral samplers, we see that regardless of the value of *P*, the sampler maintains a ceiling on weight variance as the importance sampling proceeds, owing to resampling steps which redistribute weight across the posterior (indicated by sudden drops in weight variance). It is difficult to make a quantitative comparison between the exact and approximate versions of the sampler at a given iteration, due to the differing interpretations of the *ɛ* tolerance parameter, but for a fixed set of particles, we would expect the weights of the approximate Del Moral sampler to have a higher variance due to the approximation of the Gaussian likelihood function with the top-hat kernel.

In the approximate Toni sampler, we see that, at *P* ≤ 6, the variance of posterior weights largely decreases as the iterations proceed, yet at large values of *P* the variance swings wildly between iterations, with effects becoming more pronounced at higher values of *P*. Upon termination, the posterior returned by the Toni sampler when *P* = 16 contained a single particle whose weight accounted for over 34% of the distribution.

### 4.3. The O'Hara–Rudy action potential model

As with the high-dimensional polynomial model described in the previous section, we found that the Toni sampler struggled with highly variable particle weights when performing inference on the 13-parameter O'Hara–Rudy model (described below), resulting in posterior estimates collapsing to single-point estimates (not shown). Consequently, we have chosen to employ only the Del Moral sampler in order to compare probabilistic inference on action potential traces with approximate inference on summary statistics of the action potential. In this analysis, we employ simulated data for all solvers, as we aim to investigate the identifiability of the model *a priori*, before any experimental recordings are carried out.

We began our analysis of the O'Hara–Rudy action potential model by performing posterior inference over the full 13-parameter model, using simulated action potential data with added noise. From the selection of marginal distributions presented in figure 10 (a full visual summary of the marginal posteriors can be found in electronic supplementary material, figure B.3), we see that the marginal posteriors are well constrained for both major currents such as *G*_{Na} and minor currents such as *G*_{Nab}, but may show noticeable mean deviation from the values used to generate the data. We expect deviations of the posterior means from generating values due to the effects of random noise (as in the linear model), as well as redundancies in the system, in which parameters may vary in a compensatory manner so that model output remains largely unchanged. The presence of the latter effect can be confirmed by electronic supplementary material, figure B.2, in which we observe great consistency between action potentials simulated under the mean posterior values and the data used to fit the model despite their differing values. It is also worth noting that the width of these posterior distributions is a function of the number of sample points employed, and thus the location of the generating values outside of some ‘credible interval’ of the posterior mean can be most accurately attributed to low variance in the estimator.

To ensure that the solver was not becoming stuck in a local minimum, we repeated the fitting 10 times and have reported the results in table 2. For nearly all model parameters, particularly the major currents *G*_{Na}, *G*_{to}, *G*_{Kr} and *G*_{K1}, the posterior mean averaged over these ten repeated inferences showed strong agreement with the generating values, as well as low standard deviation across the repeats. As such, the mean posterior deviations observed in single instances of inference are probably due to the difficulty of covering the 13-dimensional space with only 1000 sampler particles. While greater consistency between the posterior mean and the generating values may be obtained by increasing the population size of the sampler, computational restrictions imposed by the calculation of summary statistics in our subsequent analyses limited us to 1000 particles in the approximate sampler. As such, we maintained this number across both exact and approximate samplers to allow for maximal homology between the two during our comparisons.

We next fitted the 13-parameter model to five summary statistics of the same noisy action potential—APD50, APD90, peak potential, resting potential and maximum upstroke velocity (see §3.3)—using the approximate Del Moral sampler. In figure 11 (and electronic supplementary material, figure B.4), we see a general widening of the approximate marginal posteriors relative to those produced using full time course data, as well as multimodality approaching uniformity in the posteriors over parameters such as *G*_{Nab} and *G*_{NaCa}. In table 2, we quantify the increase in uncertainty resulting from the use of summary statistics by presenting the ratio of posterior to prior standard deviation, both when full data are employed (*σ*_{P}/*σ*_{0}) and when only summary statistics are employed (*σ*_{A}/*σ*_{0}). These ratios normalize against the scale and initial spread of the parameters, allowing for the assessment of relative uncertainty about a parameter after inference.

In table 2, we see that *G*_{Na} and *G*_{Kr} remain the two best constrained parameters, as indicated by the value of *σ*_{A}/*σ*_{0}, with nothing else even reaching 50% reduction in uncertainty. If we compare the uncertainty reduction for these two descriptors to that when full time course data were used (*σ*_{P}/*σ*_{0}), however, we see that *G*_{Na} only shows less than two-fold relative increase in uncertainty when summary statistics were employed, while *G*_{Kr} shows a nearly 10-fold increase in relative uncertainty when these summary statistics are employed. Thus, while we can tell simply by examining posterior–prior variance ratios that the five summary statistics contain the most information about *G*_{Na}, and to a lesser extent *G*_{Kr}, we can tell by comparing to the corresponding ratios from exact inference over time-course data that a disproportionate amount of information is being lost by some major (*G*_{Kr}, *G*_{to}, *G*_{K1}) and minor (*G*_{Nab}, *G*_{NaK}) currents when these summary statistics are employed.

To control for under determination in the system (the 13 free parameters exceeding the five data points) when summary statistics are employed, we repeated both fittings—using full time trace data for the exact Del Moral sampler and using summary statistic data for the approximate Del Moral sampler—allowing only the five currents whose conductances were best constrained by the summary statistics in our previous inference (as indicated by *σ*_{A}/*σ*_{0} in table 2) to vary, and holding all other parameters at their reported values. While *G*_{Na} and *G*_{Kr} were clearly the two best constrained, the next best three—*G*_{pCa}, *G*_{to} and *G*_{Cab}—belonged to a class of parameters with roughly identical values of *σ*_{A}/*σ*_{0}, and thus their selection was somewhat arbitrary.

We see in table 3, as well as electronic supplementary material, figures B.5 and B.6, that, under these conditions, there is better agreement between the posterior mean and the reported value for both exact and approximate inference, demonstrating the solver's increased ability to cover the parameter space in lower dimensions for a fixed number of particles.

In table 3, we see that relative to the posteriors on the 13-parameter model, the three major currents in this model—*G*_{Na}, *G*_{Kr}, *G*_{to}—show similar or decreased posterior uncertainty when time-series data are employed, and dramatically reduced uncertainty when summary statistics are employed. This suggests that when compensatory interactions between model currents are removed, the approximate sampler in particular is better able to constrain these individual major currents. We notice, however, by comparing *σ*_{A}/*σ*_{0} to *σ*_{P}/*σ*_{0} in the 5-parameter model, that *G*_{Kr} and *G*_{to} still exhibit a disproportionate increase in posterior uncertainty relative to *G*_{Na} when employing summary statistics, just as they did in the 13-parameter model. *G*_{Na} in fact exhibits a slight decrease in *σ*_{A}/*σ*_{0} relative to *σ*_{P}/*σ*_{0}, though this is probably an artefact of the solver as the information contained in the summary statistics is necessarily a subset of the information contained in the full time trace.

Even when time trace data are available, however, it is relatively more difficult to estimate background currents such as *G*_{pCa} and *G*_{Cab} in the 5-parameter model. In table 3, we see that the value of *σ*_{A}/*σ*_{0} for these minor currents is much higher than that of the major currents, and actually shows an average-case increase in uncertainty relative to the 13-parameter model. This may be due to the effects of noise added to the action potential, which causes the true maximally likely model parameters to differ slightly from the parameters used to generate the data when only a finite number of observations are used during fitting. Thus, when we fixed eight of the model parameters to the values from table 1 (as opposed to the best values found when fitting to the full trace), we may have influenced the solver to explore a region of ‘flatter’ likelihood for the parameters *G*_{pCa} and *G*_{Cab}, resulting in an increase in uncertainty. Because summary statistics were not shown to have significantly more information about these parameters than others in the 13-parameter model, it is unsurprising that the posteriors produced by approximate inference in this case are completely uninformative relative to the prior, as indicated by a *σ*_{A}/*σ*_{0} of 1 in table 3. The degree of apparent constraint of these minor current parameters in the approximate posterior over the 13-parameter model was probably an artefact of under determination of the system and an insufficient number of particles in the solver.

## 5. Computation

All code used in this study was written in Python.

We found that, on a 6-core (hyperthreaded) Intel Xeon machine at 3.5 GHz with 16 GB RAM, performing inference on the linear model using the approximate Toni algorithm took approximately 3090 s, while inference using the Del Moral algorithm took approximately 400 s. The calculation of weights for our implementation of the Toni algorithm contributed significantly to this runtime, taking of the order of 60 s per iteration. We recognize inefficiencies in this particular implementation brought on by an abstraction layer that we have enforced, and as such future implementations may significantly improve this step in particular, which amounts to evaluating and summing *N*^{2} PDFs. While the exact Del Moral algorithm was almost identical in speed to the approximate version, we found that the performance of the exact Toni algorithm varied greatly depending on the normalizing constant, which controls the acceptance rate and, consequently, the accuracy of the algorithm, as we have shown.

On the same machine, a single forward simulation of the O'Hara–Rudy model took 0.064 s. Pairing this simulation with summary statistic calculation brought the overall forward simulation time up to 0.45 s, as calculation of APD with post-processing functions in functional curation is particularly expensive. As a result, execution of the approximate Del Moral algorithm on the full 13-parameter model took 72 716 s, while the exact Del Moral algorithm, which did not perform summary statistic calculation, took only 7891 s. On the 5-parameter model, the approximate Del Moral algorithm took 63 509 s, while the exact variant took only 7512 s.

## 6. Discussion

When we performed approximate inference on the linear model, we saw that the Del Moral sampler performed noticeably worse than the Toni sampler under identical settings, returning highly varying posteriors inconsistent with the uniformity expected under such an inference strategy. We presumed that this increase in variance was due to the approximation employed by equation (2.6), which only holds when the evolution of the posterior estimates is very gradual. We confirmed this by showing that increasing the number of particles in the sampler (to mitigate the effects of randomness) and/or relaxing the cooling schedule (controlled by the *α* parameter), both of which reduce the difference between subsequent estimates, led to a drastic increase in posterior uniformity.

The performance of the Del Moral sampler on this problem was probably additionally hindered by the fact that we limited ourselves to a single noisy dataset. When using the top-hat kernel and a single data realization, the weights of the Del Moral sampler (equation (2.5)) become binary indicators of whether or not the particle meets the final threshold. This scheme has inherently high variation relative to the Toni weighting scheme (equation (2.5)), in which all particles are guaranteed to have non-zero weight at all times due to rejection sampling, and weights are smoothed by a denominator informed by the previous posterior estimate and proposal distribution. When repeated experimental data are available, the variance of the Del Moral weighting scheme can be reduced, as the weights may instead indicate the *fraction* of experimental data that is within a threshold of the simulated data rather than a binary quantity. Similarly, when employing a stochastic model, one may generate multiple simulations (at added computational cost) for each proposed parameter set and weight particles based on the fraction of simulated data that falls within a threshold of the experimental data. In the case of deterministic models and singular data, however, our results indicate that the approximate Del Moral sampler may require a larger number of particles and/or a reduced cooling schedule to match the performance of the approximate Toni sampler.

Despite this advantage, our investigation into the linear and polynomial models revealed that the Toni sampler is highly sensitive to the choice of a kernel normalizing constant when performing exact inference, and relatively sensitive to the dimension of the model when performing either exact or approximate inference. When performing Gaussian inference on the linear model, we observed that setting the kernel normalizing constant too high resulted in a large number of rejected samples in later iterations that eventually caused the algorithm to terminate before converging to the true posterior (achieved when *ɛ* = 1), while setting the constant too low resulted in a uniform posterior due to decapitation of the kernel. While, in theory, setting the normalizing constant to the modal value of the distribution will cap the value of the acceptance kernel at 1 and lead to exact estimation, we have found that, in practice, small errors in estimation resulting from experimental noise can result in underestimation and decapitation of the posterior even for the linear model. For more complex models, locating the true modal value will be harder and more expensive, and without extensive system-specific knowledge it would be difficult to choose a coefficient that avoids truncation while ensuring a reasonable acceptance rate.

As the model increases in complexity, as in the case of the polynomial model, as we increased the dimensionality of the parameter space, we demonstrated the tendency of particle weights in the Toni sampler to fluctuate highly between iterations. When the parameter space is large relative to the number of particles in the posterior, we expect the denominator of the Toni weighting scheme (equation (2.5)) to have a high variance due to poor coverage of the proposal distribution *K*(*θ* | *θ**) by the posterior particles. As such, even with a uniform prior, certain particles may become severely upweighted between iterations and the posterior estimate may collapse to one or several points. The Del Moral weighting scheme (equation (2.6)) is immune to this effect, and the resampling step in the algorithm itself (algorithm 3, lines 16–21) effectively caps the variance that the particle weights may attain. Beyond simply increasing the number of particles, which comes at a computational cost in the weighting scheme, the performance of the Toni sampler might be improved by employing an adaptive proposal distribution that can redistribute weight on distributions that become too peaked, or by explicit resampling as in the Del Moral scheme, but under current formulation the Del Moral sampler is preferable for high-dimensional parameter spaces.

The effects of model dimensionality were also seen during our investigation into the O'Hara–Rudy cardiac action potential model. For a fixed number of 1000 particles, the Toni sampler struggled with convergence due to the aforementioned weighting difficulties, but our results from the probabilistic Del Moral sampler suggest that it may be inherently hard to cover the 13-dimensional parameter space with this number of particles due to the presence of local minima. Indeed, we saw that fixing a number of the model currents allowed for the more reliable recovery of the globally optimal values for the remaining free parameters in both the exact and approximate Del Moral samplers. We believe this result highlights the importance of *a priori* analyses of high-dimensional biological models, where fitting to simulated data with a known solution can highlight the ability of the solver to recover a known set of parameters in the face of system redundancy.

By examining the posteriors produced by approximate Bayesian inference over the summary statistics APD50, APD90, peak potential, resting potential and maximum upstroke velocity, we found that these measurements contained the greatest amount of information about the conductances of the fast sodium (*G*_{Na}) and rapid delayed rectifier (*G*_{Kr}) currents, as measured by the relative width of their marginal posterior distributions after approximate inference. This makes intuitive sense, as fast sodium current activity is extremely localized to the rapid depolarization phase of the action potential, which is well characterized by maximum upstroke velocity and peak potential, while the rapid delayed rectifier current is involved in repolarizing the membrane in the later stages of the action potential, a phase which may be well characterized by ADP50, APD90 and resting potential. Other model currents whose phases of activity were not characterized well by the summary statistics, such as the slow delayed rectifier current *G*_{Ks} which describes a small repolarization of the membrane occuring after reaching peak potential but long before APD50 or APD90, showed very little decrease in posterior variability relative to the prior. This suggests that a certain amount of domain-specific knowledge can be employed by experimentalists when choosing the summary statistics to use in fitting currents within a model. However, as we have shown, once these summary statistics have been chosen, ABC methods should then be used to assess the extent to which these statistics can capture the information necessary to fit to all of the parameters of interest. In our case, it is clear that the chosen set of five summary statistics contain sufficient information about only one of the parameters of interest, *G*_{Na}, as this was the only parameter that could be recovered to a precision similar to that achievable with the full dataset. Care should therefore be taken not to over-interpret changes in the observed values of such summary statistics in experimental data as being in any way indicative of changes in the underlying values of the conductance parameters other than *G*_{Na}, and to a more limited extent *G*_{Kr} and *G*_{to}.

Future work on these samplers will move beyond the realm of simply assessing the amount of information contained in a given experiment, such as the fixed set of summary statistics presented here, and into the automated design of maximally informative experiments (often referred to as ‘optimal experimental design’). This may consist of optimization over a parametrized input to the system, such as a voltage signal to be applied to the membrane before observing its response over time, or the selection of the subset of summary statistics that contain the most information about the system parameters from a list of possible measurements. Summary statistical data might also be employed for reliable model selection in an ABC context, if multiple formulations are being considered to describe the same phenomenon [44]. Such automated methods could aid experimentalists in complex systems where the most informative measurements to take are not immediately obvious, or may be expensive to record/calculate, and a trade-off between information content and simplicity must be reached for experimental measurements. Algorithms for automated experiment design would involve the repeated execution of posterior analyses as presented in this paper, and thus would be aided by improvements to sampler efficiency, either through additional parallelization or speed-ups in model simulation and summary statistic calculation.

## 7. Conclusion

We have presented two SMC samplers, originally proposed for approximate Bayesian inference, that have been adapted to perform exact inference by replacing the kernel function. While both samplers demonstrated sensitivity to the number of parameters in the model to be fitted, we demonstrated the additional sensitivity of the Toni sampling scheme to the choice of the kernel normalizing constant during probabilistic inference, which limits the appeal of this algorithm for the purposes of performing exact inference. The Del Moral sampler, which does not share this sensitivity due to its use of Metropolis–Hastings update steps, was able to perform Bayesian inference on the O'Hara–Rudy cardiac action potential model, as well as approximate Bayesian inference on the same model using a set of five common summary statistics. Comparison of the posteriors produced under these inference settings allowed us to conclude that the summary statistics chosen contained a large amount of information about the conductances of the fast sodium current, rapid delayed rectifier current and inward rectifier current, as represented in the model, but were insufficient to constrain parameters controlling many of the background currents or currents primarily active between the time of peak membrane voltage and substantial membrane repolarization. Future developments on these methods may be focused on including larger bodies of data, as well as improvements to the efficiencies of the samplers in order to facilitate the efficient exploration of large parameter spaces.

We hope that this study serves as an example of an identifiability analysis of cardiac models, as well as other biologically relevant models, that may be performed under both Bayesian and approximate Bayesian schemes prior to gathering experimental data. Performing this type of analysis would allow modellers to assess the sufficiency of experimental recordings to capture information about model parameters and subcomponents, and could therefore lead to the design of experiments that constrain the model most effectively before much time and money are expended in laboratories or clinics. Thus, we believe that the understanding and informed use of sampling techniques as described in this paper can both facilitate and accelerate the production of cardiac models when *in vitro* or *in vivo* conditions place limitations on one's ability to observe the system.

## Data accessibility

All code and data used in the study can be found (along with instructions for installation and execution of analyses) in the following ‘paper tutorial’ repository: https://chaste.cs.ox.ac.uk/trac/wiki/PaperTutorials/DalySMC.

## Authors' contributions

A.C.D. conceived of the study, participated in the design of the study, carried out all analyses and drafted the manuscript. C.H. participated in the design of the study, validation of results and editing of the manuscript. D.J.G. and J.C. participated in the design of the study and editing of the manuscript.

## Competing interests

We declare we have no competing interests.

## Funding

A.C.D. was funded by the Rhodes Trust, UK. J.C. gratefully acknowledges research support from the ‘2020 Science’ programme funded through the EPSRC Cross-Disciplinary Interface Programme (EP/I017909/1).

## Acknowledgements

A.C.D. thanks the Rhodes Trust for its funding and support.

## Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.3876085.

- Received May 9, 2017.
- Accepted August 29, 2017.

- © 2017 The Author(s)

Published by the Royal Society. All rights reserved.