## Abstract

As systems approaches to the development of biological models become more mature, attention is increasingly focusing on the problem of inferring parameter values within those models from experimental data. However, particularly for nonlinear models, it is not obvious, either from inspection of the model or from the experimental data, that the inverse problem of parameter fitting will have a unique solution, or even a non-unique solution that constrains the parameters to lie within a plausible physiological range. Where parameters cannot be constrained they are termed ‘unidentifiable’. We focus on gaining insight into the causes of unidentifiability using inference-based methods, and compare a recently developed measure-theoretic approach to inverse sensitivity analysis to the popular Markov chain Monte Carlo and approximate Bayesian computation techniques for Bayesian inference. All three approaches map the uncertainty in quantities of interest in the output space to the probability of sets of parameters in the input space. The geometry of these sets demonstrates how unidentifiability can be caused by parameter compensation and provides an intuitive approach to inference-based experimental design.

## 1. Introduction

Biologists, in collaboration with mathematicians, statisticians, computer scientists and engineers, are building increasingly sophisticated mathematical models of both individual biological processes and the complex behaviour that results through interactions between these processes (often termed ‘systems biology’). In fields such as cardiac mechanics, where modelling approaches have been developed over many decades, comprehensive, state-of-the-art models are often constructed from smaller models of individual phenomena, such as ion channel kinetics. Increasingly, researchers are understanding the importance of validating model predictions at all levels. Employing component models without a complete understanding of the means by which they were developed and parametrized can lead to unexpected consequences in new settings [1,2]. For mathematical models of biological systems to be of greatest utility, we need to understand not only the process by which parameters were fit using experimental data, but the degree to which the parameters of such models can be constrained to unique values given the experimental data, or at least constrained to a range of values within acceptable physiological limits. If multiple model parametrizations can produce nearly identical behaviour, then the variation across these equally valid parameters might inform us about variation in the biophysical entities they represent. In extreme cases however, such variation may reduce our faith in the suitability of the model to represent the system.

The degree to which a set of model parameters can be determined from experimental data is known as the *identifiability* of the model. When all model parameters can be uniquely determined from the data in the absence of noise, and can be constrained within a reasonable range of values when noise *is* present in the data, the model is said to be *identifiable*. When the parameters cannot be all uniquely determined from the data in the absence of noise, or when parameters cannot be constrained with a reasonable range in the presence of noise, the model is said to be *unidentifiable*. ‘Structural unidentifiability’ describes a fundamental lack of information about one or more model parameters and results in unbounded variation of the parameters even in the absence of noise. ‘Practical unidentifiability’ describes extreme sensitivity of the recovered model parameters to noise. Unlike structural unidentifiability, this *may* be remedied by an increase in recording precision or experimental control [3]. Concerns regarding identifiability, its causes and means to address unidentifiability have been raised in many areas of science and engineering (e.g. [4,5]), including many of biological significance such as cardiac modelling [6,7]. A discussion of the consequences of model identifiability on prediction appears in [8,9]. The relationship between identifiability and ‘sloppiness’, the notion that parameters can vary substantially without significantly influencing the outcome of the model, is discussed by Chis *et al.* [4]. Those authors conclude that ‘structural and practical identifiability analyses are better tools for assessing confidence in parameter estimates’.

The desire to avoid unidentifiability when building and parametrizing models leads naturally to consideration of the design of experiments that are maximally informative about the parameters of a given model. Optimal experimental design is concerned with determining the best times to observe the system and the best quantities to observe at those times, in order to avoid structural unidentifiability and minimize practical unidentifiability. Some of the most popular approaches to experimental design are the family of methods that focus on the spectral properties of the Fisher information matrix (FIM), which quantify the sensitivity of the model output to changes in system parameters. These are the same indices used in the calculation of classical confidence estimates, which will become very wide when the recorded model output is insensitive to parameter changes as a result of structural/practical identifiability. Because these indices are relatively inexpensive to compute, a family of methods to maximize FIM-based indices of identifiability have been proposed in [10,11]. We do not conduct a detailed exploration of the relationships between optimal design and identifiability. The interested reader is referred to [10]. This paper focuses instead on an intuitive approach to experimental design based on the inferential analyses described below.

When the observable model output is not analytically differentiable with respect to the parameters of interest, as often occurs in cardiac modelling, it may be difficult to apply methods based on the FIM. While numerical approximations of these derivatives may be calculated, such approximations are sensitive to numerical error and scale poorly with parameter dimension. Instead, we may prefer to consider methods that are based on inference from experimental data. We may perform Bayesian or approximate Bayesian inference given a set of experimental data, and draw conclusions about the identifiability of the model based on the size and character of the variation in the posterior distribution over the model parameters. The utility of inference techniques for assessing model identifiability is being increasingly recognized in the field of biochemical modelling. Discussions and evaluations of competing approaches appear in [12–16]. For example, Hines *et al.* [17] demonstrate the use of posteriors produced by a Markov chain Monte Carlo (MCMC) sampler to assess the identifiability of several competing models of reaction kinetics for a fixed set of data.

Complicating all these approaches are the facts that: (i) experimental data are often very noisy, (ii) measurements may only be possible at a limited set of time points, (iii) only summary statistics of the data may be available (these are often termed ‘biomarkers’ in physiological system modelling). Further, (iv) the model may contain many unknown parameters, and (v) the full biological model may be expensive to compute. Any or all of the above may limit our ability to calculate the FIM or perform parametric inference.

In this paper, we review and compare methods for assessing model identifiability of *deterministic* models. We are particularly interested in the under-determined case where there are more parameters to be estimated than observations, and consider methods based on an *effective* FIM, a ‘measure theoretic’ approach and a commonly used Monte Carlo-based approach. We deliberately consider small models in order to concentrate on the concepts rather than the details of the implementations. While more optimized variants of both ‘measure theoretic’ and Monte Carlo algorithms have been formulated (the reader is referred to [18,19], respectively), the implementations considered in this paper have been shown to be robust across a number of modelling studies. The relative computational cost of the inferential methods will vary according to algorithm, implementation details and scale, so we choose the number of function evaluations as a machine-independent assessment of computational effort within this study. In §2, we introduce the effective FIM- and inference-based methods for assessing model identifiability. In §3, we present the logistic growth model and assess its identifiability using all three techniques. In §4, we present the Hodgkin–Huxley model and demonstrate the capabilities of inference-based methods when observation is limited to summary statistics of the model output. In §5, we present our conclusions and discuss future directions for this research. We hope that this study serves as a guide to the selection and correct interpretation of identifiability metrics for nonlinear biological models, as well as an exploration of inference-based methods for the design of maximally informative experiments.

## 2. Mathematical and computational tools

### 2.1. Fisher information

The Fisher information and the FIM can be defined for a random variable or variables whose distributions depend upon an unknown parameter or parameters. When model outputs ** q**(

*t*) are differentiable with respect to the parameters

**, the FIM can be used to quantify the amount of information about unknown model parameters one can expect to gain from observing**

*θ***(**

*q**t*). Assume that we observe the system outputs

*y*

_{i}at a discrete set of times

*t*

_{i},

*i*= 1, …,

*n*, and compare them to the model predictions

*x*(

*t*

_{i}). A common approach to the inverse problem is to solve the least-squares minimization problem: find parameters

**to minimize the least-squares error 2.1This is equivalent to a maximum-likelihood approach under the assumption of Gaussian error. Where applicable, we can then use standard asymptotic methods based on the FIM to obtain confidence intervals for**

*θ***(e.g. [20]). It should be noted, however, that standard confidence intervals are only exact when the model output is linear in its parameters, and may fail to provide a good estimation of uncertainty when the amount of experimental data is not large with respect to the number of parameters, or if the magnitude of noise is too great [21]. This limits the appeal of standard confidence intervals for studies involving modern biological models, which frequently violate all three of these conditions.**

*θ*### 2.2. An effective Fisher information matrix

Throughout our analyses, we will be considering the *deterministic* time-dependent systems *x*(*t*, ** θ**) with the state vector and the system parameters . The evolution of these systems will be described by systems of ordinary differential equations of the form
2.2We will also consider cases where we observe ‘quantities of interest’ of the system state, rather than observing the state directly, when these quantities of interest are functions of the solution and parameters, i.e. when
2.3where . We define an effective FIM for the deterministic case and examine the ability of this effective FIM to assess model identifiability and compare the results of these assays to inference-based approaches.

#### 2.2.1. The sensitivity matrix

The sensitivity matrix ** M** defines the rate of change of the solution or the rate of change of quantities of interest derived from the solution, with respect to the parameters. In general, the sensitivity matrix is defined as
2.4

#### 2.2.2. The effective Fisher information matrix

An effective FIM can be defined as 2.5

In practice, we never construct ** F** but perform all operations on the sensitivity matrix

**. Consider the singular value decomposition 2.6where**

*M**U*and

*V*are unitary matrices and

*S*is a diagonal matrix with real (diagonal) entries.

We need to consider situations in which *N*_{q} ≥ *N*_{θ} and *N*_{q} < *N*_{θ} and define *N*_{S}, the maximal possible rank of ** M**, as
2.7This is the maximum number of non-zero singular values of

**.**

*M*A simple calculation shows that the FIM, ** F** and the

*N*

_{θ}×

*N*

_{θ}diagonal matrix

*S*

^{⊤}

*S*are unitarily similar. The non-zero real eigenvalues of the FIM,

**, typically labelled**

*F**σ*

_{i},

*i*= 1, …,

*N*

_{S}, are related to the singular values of the sensitivity matrix

**, labelled**

*M**s*

_{i},

*i*= 1, …,

*N*

_{S}(the diagonal entries of matrix

*S*) as 2.8If the singular values are ordered such that

*s*

_{1}≥

*s*

_{2}≥ · · ·

*s*≥

*s*

_{NS}and if

*U*has columns {

*u*

_{i}},

*i*= 1, …,

*N*

_{q}and

*V*has columns {

*v*

_{i}},

*i*= 1, …,

*N*

_{θ}, then for

*ν*≤

*N*

_{S}, the matrix is the best

*ν*-rank approximation to

**as measured in the 2-norm. Specifically, defining**

*M**s*

_{NS + 1}= 0,

For vector-valued quantities of interest observed at discrete times *t*_{obs} = {*t*_{1}, …, *t*_{n}}, the sensitivity matrix is
2.9The corresponding effective FIM is given by equation (2.5).

### 2.3. Inferential methods

Inferential methods use experimental data to estimate *distributions* of the model parameters, and seek to simultaneously infer the unknown parameters and assess the degree to which they are unique.

#### 2.3.1. A Monte Carlo method

Monte Carlo methods are a class of algorithms that generate numerical results through repeated stochastic sampling. When a ‘likelihood function’, which quantifies the probability of observing a set of experimental data under a given set of model parameters is available, Monte Carlo methods can be used to sample from the posterior distribution over model parameters through the application of Bayes' rule,
2.10where *D* are the data and *π*(** θ**) is a prior distribution over model parameters. The most common class of Bayesian Monte Carlo methods for inference are MCMC methods, and in this study we employ an ‘adaptive covariance’ MCMC sampler (electronic supplementary material, algorithm A.1) [22] to infer posteriors over model parameters, thereby gaining insight into the magnitude and character of uncertainty in model parameters. When the likelihood function

*P*(

*D*|

**) is unknown or incalculable, Monte Carlo methods can be employed for**

*θ**approximate*Bayesian computation (ABC) [23], which approximates the missing likelihood function with a distance metric between simulated and experimental data, often resulting in an inflation in uncertainty in the posterior. In this study, we employ a sequential Monte Carlo ABC sampler [24] to perform inference on the Hodgkin–Huxley model when experimental conditions prevent us from being able to calculate

*P*(

*D*|

**) (electronic supplementary material, algorithm A.3).**

*θ*Because there are recognized concerns for the convergence of Monte Carlo methods in the face of model unidentifiability [25], we have verified that all our MCMC chains terminate in ‘well-mixed’ conditions (i.e. demonstrating unbiased exploration about a mean value), and have repeated all our inferences to ensure consistency. While this may be sufficient for the smaller models considered in this study, more sophisticated models present a risk for MCMC chains to become ‘stuck’ in local minima, giving the outward appearance of convergence. In these scenarios, more advanced approaches to convergence assessment must be taken, often involving multiple random restarts of MCMC chain to ensure convergence to a global optimum. Bayesian approaches to inverse problems are discussed extensively in [26–28].

#### 2.3.2. A measure theoretic method

Consider a deterministic map *g* from an input (parameter) domain onto a range . The measure theoretic method recognizes that when *N*_{q} < *N*_{θ} (i.e. there are fewer observed quantities of interest than parameters), the inverse *g*^{−1} is not a one-to-one map from the output space to the input space *Θ*, but rather it is a one-to-one map from the output space on to a space of *equivalence classes*, where each element in the space is the *set* of points *x*∈*Θ* that map to the same point (value) in output space. Under mild assumptions, these sets form manifolds or generalized contours in *Θ* which are the analogue of an elevation contour on a topographic map. Observations of the quantities of interest provide experimental probabilities for each possible value to occur, and from these and the inverse map *g*^{−1} we can compute probabilities on the space of equivalence classes . Probabilities on the space of equivalence classes do not directly provide probabilities for the system parameters ** θ** ∈

*Θ*, so in the absence of further information we assume the probability in is uniformly distributed along the contours in parameter space corresponding to each equivalence class. An algorithmic implementation of this approach to posterior inference is presented in the electronic supplementary material, algorithm B.1.

### 2.4. Software

Simulations involved in Monte Carlo analyses were performed using the Chaste simulation environment [29] which employs the CVODE solver [30]. All Monte Carlo methods employed custom Python code. Simulations involved in measure theoretic analyses were performed using the standard Matlab (https://www.mathworks.com/) ODE solvers.

Because Monte Carlo and measure theoretic analyses were performed on different computers using in-house code optimized to varying degrees, we do not provide wall clock run times as a means of comparison. Instead, we employ a fixed number of function evaluations for all comparative analyses.

## 3. The logistic growth model

Having introduced the theoretical approaches that we will compare in §2, we begin our analysis by considering parameter inference for the following two-parameter logistic model, which is commonly employed to study bacterial proliferation:
3.1In figure 1, we present the essential geometry of the solution to the above equations using growth rate *r* = 0.7, carrying capacity *K* = 17.5, initial population density *x*_{0} = 0.1 and final time *T* = 30. For the remainder of this study, we will consider *K* and *r* to be free parameters, treating *x*_{0} as known and generate synthetic data according to the parametrization listed above.

### 3.1. Fisher information matrix-based identifiability analysis

We begin our examination of the logistic model by considering approaches to identifiability assessment based on the FIM, as detailed in §2. To apply these methods, we must first formulate the sensitivity matrix for the system.

The sensitivity matrix for the logistic equation ** M** when the quantity of interest

**(**

*q**t*) =

*x*(

*t*) is given by 3.2where 3.3are solutions of the two initial value problems 3.4

The singular value of the sensitivity matrix and the components of the corresponding singular vector are plotted in figure 1, and we may use these plots to draw conclusions about system sensitivity and identifiability. When measurements are taken at times *t* ≤ 10, the singular vector (which is also the eigenvector corresponding to the single non-zero eigenvalue of the FIM) is oriented in the direction of the growth rate *r* in parameter space. For *t* ≤ 10, the system is therefore sensitive to changes in the growth rate *r*, but largely insensitive to changes in the carrying capacity *K*. Conversely, for measurements taken at times *t* ≥ 20, the singular vector of the sensitivity matrix is oriented in the direction of the growth rate *K*, and the system is sensitive to changes in the carrying capacity *K* but largely insensitive to changes in the growth rate *r*. Both these conclusions are physically intuitive. Between *t* = 10 and *t* = 20, the singular vector rotates through *π*/2 and contains components of both parameters.

In the more realistic scenario where we consider *n* measurements simultaneously, the (*n* × 2) sensitivity matrix (2.9) at *n* discrete times *t*_{i}, *i* = 1, …, *n* is
3.5

A common approach for collecting multiple measurements is to space them uniformly at a given granularity. In order to investigate the effects of sample size on model identifiability under these conditions, we present the singular values of the sensitivity matrix *M*_{n} based on the solution at *n* = 1, 2, 3, 6 and 12 equally spaced points in [0, *T*] in table 1. In order to compare this uniform sampling strategy to one that incorporates system-specific information, the last row in table 1 presents an approach in which the system is measured repeatedly at a point of maximum sensitivity, as quantified by when the (leading) singular value of the sensitivity matrix is maximized (figure 1). Such sampling strategies are often recommended by FIM-based methods for optimal experimental design, which are discussed in [10,11].

In table 1, we additionally provide the ratio *κ* of the largest to smallest singular values, which is also the square root of the condition number of the corresponding FIM. Raue *et al.* [21] define rank-deficiency of the FIM as indicating *structural* unidentifiability in the system. The FIM is rank-deficient and the system is structurally unidentifiable when fewer than two observations are taken. Given two or more observations, the FIM is full rank and the ratio *κ* may inform us as to the *practical* identifiability of the system, with large values of *κ* indicating practical unidentifiability. We note that determining precise thresholds for practical identifiability is difficult, as factors such as measurement precision may vary greatly between application domains and affect what we consider to be acceptable levels of confidence about parameter estimates. Nonetheless, both rank and condition number of the sensitivity matrix (or FIM) are important to consider when analysing identifiability.

Unsurprisingly, we see in table 1 that when a single measurement is collected, the sensitivity matrix (and by extension the FIM) of the logistic solution is rank-deficient, and thus the model is structurally unidentifiable. Table 1 shows that, in general, the condition number (*κ*) decreases as the number of uniformly spaced measurements in [0, *T*] increases. The small value of *κ* when the system is measured exclusively at *t*_{i} = 12, 24 is anomalous and may be explained by the fact that neither of these measurements fall near the point of inflection (*t* ≈ 7.9), where the system is maximally sensitive. In addition to the number of measurements, table 1 shows us that the *location* of measurements affects model identifiability. Clustering observations near the time at which the singular value of the sensitivity matrix is maximized results in a condition number several orders of magnitude greater than that using the same number of uniformly spaced measurements, and suggests practical unidentifiability. For this reason, modellers should take care when designing experiments based on simple spectral properties of the FIM, and should consider including realistic constraints such as minimum inter-measurement distances in their sampling strategies.

### 3.2. Inference-based identifiability analysis

Having explored the use of FIM-based approaches for considering parameter identifiability for the logistic equation, we now consider the measure-theoretic and MCMC inferential techniques. Throughout this section, we will be operating in the presence of experimental noise which introduces uncertainty in the measured outputs.

#### 3.2.1. Measure theoretic method

The probability distributions recovered by electronic supplementary material, algorithm B.1, for observations at various single time points in [0, *T*] are plotted in figure 2. We have assumed that the (experimental) measurements of the population are normally distributed with mean equal to the population given the nominal parameter values and standard deviation equal to 5% of the nominal population.

The conclusions that figure 2 support are consistent with those obtained from an examination of the singular vector of the sensitivity matrix. At small times, the sets of equal probability are parallel to the *K*-axis, indicating that a single observation at small times enables us to assign different probabilities to different growth rates, but is completely uninformative about the carrying capacity *K*. At large times, the sets of equal probability are parallel to the *r*-axis, indicating that a single observation at large times enables us to assign different probabilities to carrying capacities *K*, but is completely uninformative about the growth rate *r*. A single solution at intermediate times enables us to assign different probabilities to different linear combinations of *r* and *K*, and we see for the first time the notion of parameter compensation (whereby interactions between the parameters can lead to identical changes in the measure outputs for ranges of values of *K* and *r*).

#### 3.2.2. Monte Carlo method

We may construct similar visualizations by applying the MCMC algorithm to compute the posterior distribution. We have assumed a Gaussian error model for observations *y*(*t*) for the logistic model, i.e.
3.6and assumed uniform priors on both model parameters
3.7In this section, we consider a noise model with constant Gaussian variance rather than the proportional noise model employed in the previous subsection. We will demonstrate how these error models, which are both biologically realistic (depending on the manner in which measurements are taken), produce qualitatively similar recovered distributions in order to show the robustness of inference-based approaches in assessing model identifiability.

Visualizations of posteriors produced by an MCMC sampler using the noise model (3.6) are presented in figure 3. Multiple samples were collected from the beginning (*t*_{i} = 1, 2, 3), middle (*t*_{i} = 6, 7, 8) or end (*t*_{i} = 22, 23, 24) of the time trace, or from all three locations (*t*_{i} = 2, 7, 23). Here, we employed three time points for all sampling schemes so that we may compare the identifiability of the system when measured at three distinct regions in ensemble to the identifiability of the system when measured at each region exclusively while maintaining a consistent sample size. For all analyses, we used *N* = 6000 steps for each sampler with a burn-in of 3000, which was sufficient to obtain unbiased sampling.

The MCMC posteriors in figure 3 again show vertical banding when *t* is small and horizontal banding when *t* is large. When *t* = {6, 7, 8}, a set of values near the maximum of the singular value of the sensitivity matrix, we observe a distinct maximum of the posterior distribution near the true parameter values. However, we also observe that the posterior distribution is highly curved, with tails extending to two extreme regions of parameter space. This suggests that collecting data only at this point allows for compensatory interactions between parameters, with different combinations of *K* and *r* producing outputs that are equally likely under the observation model. Thus, while the model parameters could be uniquely identified under this observational scheme in the absence of noise, the presence of noise causes confidence regions to become increasingly large and non-elliptical. If we were concerned with the 90% confidence region of these parameters, we would conclude them to be practically unidentifiable, as in a similar study by Hines *et al*. [17]. It is only when we observe the system at points taken from disparate phases of the curve (*t* = {2, 7, 23}) that we see an elliptical posterior concentrated within a small region of parameter space, indicating identifiability even in the presence of noise. Note the scale of the axes in the bottom right-hand plot compared to the axes of the other three plots.

## 4. The Hodgkin–Huxley model

We now revisit the analyses undertaken in §3 for a slightly more complex system—the three-parameter Hodgkin–Huxley action potential model, an historical model of excitation in the squid giant axon that forms the basis for current state-of-the-art models for action potentials in neuronal and cardiac cells. The Hodgkin–Huxley model is a system of differential equations that represents the transmembrane voltage *V* as a function of the membrane conductivity *C*_{m} and four separate transmembrane currents, sodium (*I*_{Na}), potassium (*I*_{K}), leakage (*I*_{l}) and stimulus (*I*_{Stim}). *I*_{Na} and *I*_{K} are controlled by transmembrane proteins, each comprised of multiple subunits, all of which must be in an ‘open’ configuration to allow current to flow. These currents are modelled as polynomial functions of state variables *m*, *h* and *n*, which represent the time- and voltage-dependent probabilities of each type of subunit being open. This leads to the following system of four differential equations:
4.1where *I*_{Na} = *G*_{Na}*m*^{3}*h*(*V* − *V*_{Na}), *I*_{K} = *G*_{K}*n*^{4}(*V* − *V*_{K}), *I*_{l} = *G*_{l}(*V* − *V*_{l}), *I*_{Stim} =− 20. *I*_{Stim} is generally modelled as a square wave to represent an infrequent, instantaneous application of pacemaker current that triggers the action potential, but we have instead considered a continuously applied current of 20 mV so as to ensure smooth solutions. As we are only considering single action potentials in this study, we are not concerned with the ramifications of this simplification on long-term steady-state behaviour. The values of the reversal potentials *V*_{Na}, *V*_{K} and *V*_{l} (so-called because they represent the membrane potential at which the associated current reverses direction) and rate constants *α* and *β* are assumed known and are taken as reported in [31]. For the purposes of this study, the three unknown parameters are *G*_{Na}, *G*_{K} and *G*_{l} which represent the maximum conductance through each channel type. Their nominal parameter values (in mS cm^{−2}) are
4.2The initial conditions are assumed to be fixed and are
4.3The solution of the membrane voltage using the nominal parameter values is shown in figure 4, while the solution for all four state variables is visualized in electronic supplementary material, figure C.1.

### 4.1. Direct observation of the voltage signal

While four state variables exist in the Hodgkin–Huxley model, the membrane voltage *V* is (generally) the only one that can be observed in practice, and we therefore consider only the effects of the parameters on this single output measure.

#### 4.1.1. Fisher information matrix-based identifiability analysis

We again start our analysis by considering a Fisher information-based approach to identifiability analysis. The sensitivity matrix ** M** when the quantity of interest is the voltage, i.e.

**(**

*q**t*) =

*V*(

*t*), is given by 4.4where 4.5are solutions of the three initial-value problems 4.6

The solution, singular value of the sensitivity matrix based on the membrane potential (only), and corresponding singular vector are shown in figure 4. The conclusions to be drawn are less intuitive than in the logistic case. Spikes in the components of the singular vector associated with sodium and potassium conductance parameters occur during the upstroke, repolarization and recovery, but the singular vector is almost always dominated by the component associated with the leakage current. This initially surprising result can be understood by constructing the FIM and noting *m*, *n*, *h* < 1. Because the *G*_{Na} and *G*_{K} components of the FIM are high-ordered polynomial functions of these small variables, the constant-valued *G*_{l} component is dominant over most values, and thus controls the direction of the leading eigenvector. Phenomena such as this highlight the difficulties of interpreting the FIM directly even in relatively simple systems.

We now turn our attention to cases involving *n*-independent measurements to investigate the effects of sample size on model identifiability. The (*n* × 3) sensitivity matrix (2.9) at *n* discrete times *t*_{i}, *i* = 1, …, *n* is
4.7The singular values of the sensitivity matrix for *n* equally spaced points are shown in table 2. The system is structurally unidentifiable for *n* = 1, 2 observations and the ratio *κ* decreases with increasing numbers of uniformly spaced measurements.

#### 4.1.2. Inference-based identifiability analysis

We now turn to an MCMC approach to investigate the effect of the temporal location of measurements. In this study, the prior distributions for each parameter are taken to be uniform between 50% and 200% of their nominal values 4.8Noisy observations of the membrane voltage were simulated by adding Gaussian noise to the computed voltage according to 4.9

For this study, we considered three measurements in each MCMC iteration, as this was the minimum number of measurements required to produce a structurally identifiable system. These measurements were collected according to two differing strategies: (i) uniformly spaced points, *t*_{i} = 3.0, 6.0, 9.0 and (ii) local maxima of the singular value of the sensitivity matrix, *t*_{i} = 1.2, 2.3, 7.2 (figure 4). One local maximum of the singular value occurs at the same time as the maximum rate of increase in membrane potential, or ‘maximum upstroke’ (*t* ≈ 1.2 ms). Two other local maxima of the singular value occur during the ‘repolarization’ period after peak membrane potential is reached (*t* ≈ 2.3 ms), and during the ‘recovery’ period during which the membrane recovers its initial potential (*t* ≈ 7.2 ms). The sodium current is most active during the upstroke, when the cell rapidly depolarizes, while the potassium current is involved mainly in repolarizing the cell, and the leakage current is a catch-all for ion exchanges occurring close to resting potential. As such, we would expect measurements taken at upstroke, repolarization and recovery to constrain *G*_{Na}, *G*_{K} and *G*_{l}, respectively, and an ensemble of these measurements to constrain the system as a whole.

In figure 5, we see the results of posterior inference under these two measurement strategies using 40 000 MCMC steps with 30 000 discarded as burn-in. When the system is observed naively at uniform time intervals, the leakage current is unidentifiable as indicated by bands of nearly equal posterior likelihood spanning the entire prior range of the parameter. The posterior distributions for *G*_{Na} and *G*_{K} are broad due to the lack of a measurement during the upstroke period of the system. Using the measurements at *t*_{i} = 1.2, 2.3, 7.2, which includes a measurement at the maximum upstroke, produces a significant decrease in the uncertainty of *G*_{Na}. Additionally, under this measurement scheme, *G*_{l} becomes identifiable, as indicated by a bounded elliptical posterior distribution in all parameter dimensions, although it does still exhibit greater posterior uncertainty than either of the other conductances.

### 4.2. Indirect observation via biomarkers

In realistic experimental scenarios when we are able to record membrane voltage directly (through the use of transmembrane electrodes on cells *in vitro*), it is easy to take many measurements at very fine time scales. As such, it is less important to choose the location of these measurements in time. A more realistic case of limited observability occurs when studies are performed *in vivo* and such transmembrane recordings are not possible. In these scenarios, experimentalists are generally limited to recording one or more summary statistics of the action potential. We investigate (a) APD50 (time for cell to cross and re-cross the voltage corresponding to 50% of the difference between rest and peak potentials); (b) APD90 (time for cell to cross and re-cross the voltage to 10% of the difference between rest and peak potentials); (c) maximum upstroke velocity (maximum rate of voltage increase); (d) peak potential (maximum voltage); (e) rest potential (minimum voltage). These five summary statistics are represented in figure 6. Even when full voltage traces are recorded, action potentials may be reported as an ensemble of these summary statistics, particularly in clinical drug safety assays [32–34].

#### 4.2.1. Identifiability and inferential analysis

Because many of these summary statistics are non-differentiable functions of the model output, we can no longer employ Fisher information indices, or indeed any Jacobian-based sensitivity metrics. We may still, however, employ both the measure theoretic inverse sensitivity approach and Bayesian/approximate Bayesian inference (depending on the assumptions about model error) strategies in this situation. We begin by assuming a situation under which we are able to observe any or all of these summary statistics with a known noise distribution. For each statistic, we take this noise to be independent, zero-centred and normally distributed with standard deviation equal to 1% of its value when calculated under reported parameters (see (4.2)). This relative noise model was chosen to correct for the differing magnitudes of the summary statistics in the calculation of the likelihood function, giving equal weighting to each observation.

We present the results from both measure theoretic and MCMC methods for varying numbers of summary statistic measurements below, holding the number of function evaluations employed by each to be equal at 10^{6} to facilitate comparison. This corresponds to using 10^{6} steps in each MCMC chain (with the first 10% discarded as burn-in) and using 100 subdivisions of each dimension of parameter space for the measure theoretic method.

##### 4.2.1.1. One summary statistic

As expected for such an underdetermined system, the posteriors produced by both methods demonstrate structural unidentifiability in the system, as evidenced by the unconstrained relationships between model parameters in the likelihood surface. Despite this, we can ascertain the information each summary statistic contains about each model parameter by examining the shape of the likelihood surface. Figures 7 and 8 indicate the probability distributions on the input space computed using the measure theoretic inverse sensitivity and MCMC methods, respectively.

Despite the unidentifiability present in the system, these figures give us valuable insights into the relative merits of the two approaches to posterior inference. We see that the measure theoretic approach resolves the posterior into clearly defined bands of equal probability, while the corresponding features in the MCMC posterior distribution show less definition. This is unsurprising, as MCMC methods rely on random sampling to approximate the posterior distribution, while the measure theoretic approach deterministically inverts the parameter-output map, and thus can be considered to be ‘exact’ for a given granularity (100 divisions of each dimension in parameter space, in this instance). This may also explain why figure 7 shows better resolution of areas of low probability than figure 8—MCMC posteriors are more concentrated in areas of high probability (as evidenced by their relatively constrained area and higher modal probability) due to the use of random sampling, while the deterministic measure theoretic algorithm shows no such bias.

In figure 9, we show the Kullback–Leibler divergence [35] between the measure theoretic and MCMC posteriors when employing APD90 data only for an increasing number of MCMC steps. We see that the divergence between the posteriors decreases approximately log-linearly with the number of steps, and thus an MCMC approach would require several orders of magnitude more function evaluations than the measure theoretic approach to achieve the same resolution. Furthermore, as mentioned in electronic supplementary material, appendix B, the measure theoretic approach requires the full forward map (10^{6} voltage trace simulations in this instance) to be computed only once across all analyses. One can then save this mapping of parameter sets to either full model outputs (a more costly approach) or supersets of summary statistics, which can then be re-used to evaluate the inverse map for any combination of summary statistics being considered in the future. MCMC approaches cannot make similar re-use of information, as all Metropolis–Hastings steps rely on the value of the likelihood function, and thus will depend on the specific values of the summary statistics being considered.

While the measure theoretic approach to inversion may be preferable due the reasons outlined above, the geometric relationships between parameters in the presence of parameter compensation can be seen very clearly in both figures 7 and 8, greatly aiding the intuitive understanding of the nature of the unidentifiability. We see that APD50, APD90 and peak potential each define a nearly linear relationship between *G*_{Na} and *G*_{K}, but do not provide enough information to uniquely identify either of these parameters or *G*_{l}. The maximum upstroke velocity provides a good deal of information about *G*_{Na}, as we observe tight banding in the likelihood along this axis, but very little information about the other two parameters. This matches our intuition, as the sodium current is the dominant factor controlling the upstroke.

By considering the geometry of the inverse sets corresponding to individual outputs, both methods provide insight into which combinations of outputs might be most informative. The timing/placing of any additional experimental measurements can then be tested *in silico* to determine whether, based on the geometric structure of the inverse sets (posterior distributions), they may potentially provide information to improve identifiability. This approach exploits the notion of *geometrically distinct* quantities of interest developed in [36] which means the Jacobian of the map *g* is full rank everywhere in *Θ*.

As discussed in [36], the generalized contours corresponding to scalar-valued quantities of interest in a three-dimensional parameter space are two-dimensional surfaces. Here, these surfaces depend only weakly on the *G*_{l} parameter, as is seen most clearly in the cases of APD50, maximum upstroke velocity and peak potential. Two distinct regions of high probability are observed in the inverse sets recovered from the observation of APD90. The perspective of the measure theoretic approach, which constructs *sets* in the input space corresponding to *sets* in the output space, encompasses such situations in a very natural manner.

##### 4.2.1.2. Two summary statistics

As discussed in [36], the generalized contours corresponding to a two-dimensional quantity of interest in a three-dimensional parameter space are one-dimensional objects. Geometrically, we can consider the contours corresponding to a two-dimensional quantity of interest as the intersection of the two surfaces corresponding to each of the (independent) scalar-valued quantities of interest. We may use this information in constructing a maximally informative vector of summary statistics to observe. In figures 10 (measure theoretic inverse sensitivity) and 11 (MCMC), we see that when we combine summary statistics whose confidence regions were roughly overlapping in parameter space when observed separately, as with APD50 and APD90, we observe almost negligible decreases in uncertainty relative to independent observation. However, when we combine statistics whose individual confidence regions were more orthogonal, such as APD50 and maximum upstroke velocity, we see reduction in uncertainty to the point where we can uniquely identify *G*_{Na} and *G*_{K}, as indicated by the elliptical posterior likelihood. Note however, that if the contours corresponding to the two quantities of interest considered independently are each only weakly dependent upon the parameter *G*_{l}, their intersection will remain only weakly dependent upon the parameter *G*_{l}.

##### 4.2.1.3. Three summary statistics

The generalized contours corresponding to a three-dimensional quantity of interest in a three-dimensional parameter space are zero-dimensional objects, provided the Jacobian of the map *Q* is full rank. However, if the generalized contour corresponding to two of the quantities of interest (summary statistics) is a curve that is weakly dependent upon *G*_{l}, and we seek the intersection of this curve with the generalized contour corresponding to a third quantity of interest which is a surface that is only weakly dependent upon *G*_{l}, then it is clear that the location of the intersection point is highly sensitive to error.

In the MCMC context, we also find that no combination of three summary statistics from this set is able to constrain the leakage current, as these descriptors separately failed to constrain the parameter past its uniform prior likelihood. Example posteriors when employing APD50, peak potential and rest potential are visualized in figures 12 and 13, while the posteriors produced when observing all possible combinations of three parameters can be found in electronic supplementary material, appendix figures C.2 and C.3.

#### 4.2.2. Computations in the absence of a likelihood function

In the previous section, the summary statistics were calculated from a noise-free action potential. In practice, summary statistics are calculated from a voltage trace that has its own observational error. It is difficult to predict how the (point-wise) error distribution on the voltage trace (generally assumed to be Gaussian) will propagate to the error distribution of the summary statistics. Electronic supplementary material, figure C.4 in the appendix, provides the distributions of the five summary statistics calculated from 1000 independent noisy realizations of a Hodgkin–Huxley action potential (simulated using the noise model in (4.9)). These are clearly non-Gaussian and it is difficult to know the correct error model (likelihood function) to use when employing either the deterministic measure theoretic inverse sensitivity algorithm or MCMC algorithms for inference.

ABC methods [23,37] have been developed to perform Bayesian computations in the absence of a likelihood function. These methods instead employ a *distance function* between the model output and repeated observations of the system. ABC methods determine the subset of parameter space for which simulated data fall within some tolerance *ɛ* of the experimental data in the given distance metric. The algorithm used for this study appears in electronic supplementary material, algorithm A.3, with distance metric at a point in parameter space, ** θ**, calculated as
4.10where

*Y*

^{sim}

_{i},

*i*= 1, …,

*n*

_{ss}are the values of

*n*

_{ss}summary statistics computed without error using parameters

**, and**

*θ**Y*

^{exp}

_{ij},

*i*= 1, …,

*n*

_{ss},

*j*= 1, …,

*n*

_{obs}are

*n*

_{obs}experimental observations of the same statistics. The distance metric (4.10) fits naturally within the measure theoretic inverse sensitivity framework as a single new quantity of interest with a uniform distribution.

Because the calculation of maximum upstroke velocity from an action potential trace is based on a finite difference approximation using a noisy voltage trace, it has such large fluctuations that it is unsuitable for inference (electronic supplementary material, appendix figure C.4). While we could attempt to mitigate this issue by fitting a spline through the voltage trace before calculating summary statistics, we have decided for simplicity to remove it from consideration when performing approximate inference, despite its high information content in *G*_{Na}. Instead we consider the effects of approximation on inference over the ensemble of APD50, peak potential and rest potential, as these three descriptors showed the best ability to structurally constrain *G*_{Na} and *G*_{K} in figures 12 and 13.

Figures 14 (measure theoretic) and 15 (ABC) provide the results of measure theoretic inference and ABC-SMC inference (using 100 000 particles with 10 iterations for 10^{6} total function evaluations), respectively, employing data from *n*_{obs} = 25 independent noisy action potentials (equation (4.10)) and with the tolerance for the distance function arbitrarily set to *ɛ* = 0.075. Relative to the posteriors produced by exact inference, we see an increase in posterior uncertainty to the point where *G*_{Na} and *G*_{K} are once again unidentifiable. This indicates that, under these limited observation conditions, we would be able to gain very little insight into the ‘true’ parameter values of the system, and should consider either an alternative observation strategy—such as collecting data from an *in vitro* system where direct voltage measurements can be obtained—or a simplified model.

## 5. Conclusion

By examining the logistic growth and Hodgkin–Huxley action potential models, we have demonstrated the application and discussed the relative merits of both (effective) Fisher information and inference-based approaches to model identifiability analysis in a biological context. Structural unidentifiability, indicated by rank-deficiency in the FIM, may lead to unbounded regions of equal likelihood in the probability distributions on the input space. Practical unidentifiability, indicated by ill-conditioning of the FIM, is typically manifested as a functional (geometric) relationship between large regions of equal probability in parameter space. Two-dimensional projections of the probability distributions on the input space produced by measure theoretic inverse sensitivity and MCMC provide information as to which parameters are well constrained and which are poorly constrained. While this can also be achieved by examining the direction of the eigenvectors of the FIM, such an examination becomes non-trivial for even a small number of parameters. Thus, while posterior inference strategies incur additional cost relative to operations on the FIM, they do offer benefits in interpretability, and do not require knowledge of the ‘true’ model parameters.

When the model output is not differentiable and the FIM is not available, we have shown how both measure theoretic inverse sensitivity and Monte Carlo methods can still be applied to assess model identifiability, although the former approach has several distinct advantages. We demonstrated the increased ability of the measure theoretic approach to resolve areas of low posterior probability when compared with an MCMC approach which produced more localized posteriors due to random sampling favouring areas of high probability. The ability to save the forward map computed by the measure theoretic algorithm allows modellers to perform inverse sensitivity analysis on a range of possible summary statistics with minimal simulation cost, while Monte Carlo approaches require separate simulations for each measurement strategy considered. MCMC methods are easy to implement and scale better with dimension than naive implementations of the measure theoretic approach, although recent work [18,38] seeks to overcome this issue. In future work, we will seek to develop these methods to scale better with increasing problem size, and hence enhance further their range of applicability within biology.

When output is restricted to summary statistics, we have shown how the output of posterior inference methods can be used to construct informative vectors of measurements based on the geometry of the marginal posterior distributions. Combining measurements such as APD50 and maximum upstroke velocity that individually constrain the system in differing manners (as indicated by the shapes of the posteriors produced by measure theoretic or MCMC approaches) resulted in an overall decrease in uncertainty. By contrast, a poor choice such as APD50 and APD90, which individually provided very similar information, did not significantly reduce uncertainty. This demonstrates the utility of inference-based methodology not only for assessing identifiability prior to experimentation, but also for the principled design of future experiments. While such design was performed in this study by manually constructing subsets of observations based on the geometry of the posterior distribution, this could easily be automated by choosing subsets of observations with maximally divergent posteriors, as quantified by an inter-distributional distance metric such as Kullback–Leibler divergence, and optionally weighted by the relative difficulty (in terms of cost, time or invasiveness) of making each recording.

Finally, we demonstrated how both measure theoretic and Monte Carlo methods for identifiability analysis and experimental design may be applied when the error model for the system is unknown, as is often the case for biological summary statistics. Many Monte Carlo methods for approximate Bayesian inference have been formulated to handle these scenarios and the measure theoretic method can easily be generalized to produce approximate posteriors. Examining the Hodgkin–Huxley model under these conditions, we found that the inflation of posterior uncertainty resulting from the use of approximate inference caused previously informative measurements, such as the ensemble of APD50, peak potential and rest potential, to be no longer able to uniquely identify any parameters of the model. This suggests that different or additional measurements may be required when knowledge of observational error structure is limited, and showcases the ability of approximate inference methods as diagnostic tools for the information content of measurements.

## Data accessibility

Code for all Monte Carlo analyses can be found at https://chaste.cs.ox.ac.uk/trac/wiki/PaperTutorials/DalyID, while code for measure theoretic analyses can be found at http://www.math.colostate.edu/tavener/SIP.

## Authors' contributions

A.C.D. and S.T. conceived of the study, participated in the design of the study, carried out all analyses and drafted the manuscript. D.G. and J.T. participated in the design of the study and editing of the manuscript.

## Competing interests

We declare we have no competing interests.

## Funding

This work was partially supported by the UK EPSRC through ‘The 2020 Science Program’ grant no. EP/I017909/1. A.C.D. was funded by the Rhodes Trust, UK. S.T. was partially supported by the US National Science Foundation under grant DMS-1720473/1720402.

## Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.4153676.

- Received May 8, 2018.
- Accepted June 21, 2018.

- © 2018 The Author(s)

Published by the Royal Society. All rights reserved.