## Abstract

A new class of functions, called the ‘information sensitivity functions’ (ISFs), which quantify the information gain about the parameters through the measurements/observables of a dynamical system are presented. These functions can be easily computed through classical sensitivity functions alone and are based on Bayesian and information-theoretic approaches. While marginal information gain is quantified by decrease in differential entropy, correlations between arbitrary sets of parameters are assessed through mutual information. For individual parameters, these information gains are also presented as marginal posterior variances, and, to assess the effect of correlations, as conditional variances when other parameters are given. The easy to interpret ISFs can be used to (a) identify time intervals or regions in dynamical system behaviour where information about the parameters is concentrated; (b) assess the effect of measurement noise on the information gain for the parameters; (c) assess whether sufficient information in an experimental protocol (input, measurements and their frequency) is available to identify the parameters; (d) assess correlation in the posterior distribution of the parameters to identify the sets of parameters that are likely to be indistinguishable; and (e) assess identifiability problems for particular sets of parameters.

## 1. Introduction

Sensitivity analysis [1] has been widely used to determine how the parameters of a dynamical system influence its outputs. When one or more outputs are measured (observed), it quantifies the variation of the observations with respect to the parameters to determine which parameters are most and least influential towards the measurements. Therefore, when performing an inverse problem of estimating the parameters from the measurements, sensitivity analysis is widely used to fix the least influential parameters (as their effect on the measurements is insignificant and removing them reduces the dimensionality of the inverse problem) while focussing on estimation of the most influential parameters. Sensitivity analysis is also used to assess the question of parameter identifiability, i.e. how easy or difficult is it to identify the parameters from the measurements. This is primarily based on the idea that if the observables are highly sensitive to perturbations in certain parameters then these parameters are likely to be identifiable, and if the observables are insensitive then the parameters are likely to be unidentifiable. However, the magnitude of the sensitivities is hard to interpret, except in the trivial case when the sensitivities are identically zero. Lastly, parameter identifiability based on sensitivity analysis also assesses correlation/dependence between the parameters—through principle component analysis [2], correlation method [3], orthogonal method [4] and the eigenvalue method [5]—to identify which pairs of parameters, owing to the high correlation, are likely to be indistinguishable from each other (also see [6] and the referenced therein). Another method to assess correlations is based on the Fisher information matrix [6–8], which can be derived from asymptotic analysis of nonlinear least-squares estimators [9,10]. Ashyraliyev & Blom [11] suggested that a singular value decomposition of the Fisher information matrix can be used to identify linear combinations of parameters that can be well identified given the observables and measurement noise. Another class of methods to assess identifiability, proposed by Raue *et al.* [12–14], are based on the exploiting the curvature of the likelihood function or the flatness of the profile likelihood, i.e. minimization of the likelihood with respect to all parameters but one. Li & Vu [15,16] proposed that pairwise and higher-order correlations between the parameters may be identified by assessing linear dependencies between the columns of the sensitivity matrix [16] or the matrix of first-order partial derivatives of the state equations [15]. Thomaseth and Cobeli extended the classical sensitivity functions to ‘generalized sensitivity functions’ (GSFs) which assess information gain about the parameters from the measurements. This method has been widely used to assess identifiability of dynamical systems [10,17–20], where regions of high information gain show a sharp increase in the GSFs while oscillations imply correlation with other parameters. There are two drawbacks of GSFs: first, that they are designed to start at 0 and end at 1, which leads to the so-called ‘force-to-one’ phenomenon, where even in the absence of information about the parameters the GSFs are forces to end at a value of 1; and second, oscillations in GSFs can be hard to interpret in terms of identifying which sets of parameters are correlated. Based on a pure information-theoretic approach Pant & Lombardi [21] proposed to compute information gain through a decrease in Shannon entropy, which alleviated the shortcomings of GSFs. However, since their method relies on a Monte Carlo type method the computational effort associated with the computation of information gains can be quite large. In this article, a novel method which combines the method of Pant & Lombardi [21] with the classical sensitivity functions to compute information gain about the parameters is presented. The new functions are collectively called ‘information sensitivity functions’ (ISFs), which assess parameter information gain through sensitivity functions alone, thereby eliminating the need for Monte Carlo runs. These functions (i) are based on Bayesian/information-theoretic methods and do not rely on asymptotic analysis; (ii) are monotonically non-decreasing and therefore do not oscillate; (iii) can assess regions of high information content for individual parameters; (iv) can assess parameter correlations between an arbitrary set of parameters; (v) can reveal potential problems in identifiability of system parameters; (vi) can assess the effect of experimental protocol on the inverse problem, for example, which outputs are measured, associated measurement noise, and measurement frequency; and (vii) are easily interpretable.

In what follows, first the theoretic developments are presented in §§2–8, followed by their application to three different dynamical systems in §9. The three examples are chosen from different areas in mathematical biosciences: (i) a Windkessel model, which is a widely used boundary condition in computational fluid dynamics simulations of haemodynamics; (ii) the Hodgkin–Huxley model for a biological neuron, which has formed the basis for a variety of ionic models describing excitable tissues; and (iii) a kinetics model for the influenza A virus.

## 2. The dynamical system and sensitivity equations

Consider the following dynamical system governed by a set of parametrized ordinary differential equations (ODEs):
2.1where *t* represents time, is the state vector, is the parameter vector, the function represents the dynamics and **x**_{0} represents the initial condition at time *t*_{0}. The initial conditions may depend on the parameters, and therefore
2.2The above representation subsumes the case where the initial condition may itself be seen as a parameter. The RHS of the dynamical system, equation (2.1), can be linearized at at a reference point (**x**_{r}, *θ*_{r}, *t*_{r}), to obtain
2.3where represents ( · ) evaluated at the reference point. Henceforth, in order to be concise, the explicit dependence of **f**(**x**, ** θ**,

*t*) on its arguments is omitted and

**f**, without any arguments, is used to denote

**f**(

**x**,

**,**

*θ**t*). Following this notation, equation (2.3) is concisely written as 2.4The above linearization will be used in the next section to study the evolution of the state covariance matrix with time. Let denote the matrix of sensitivity functions for the system in equation (2.1), i.e.

**S**= ∇

_{θ}

**x**, or 2.5It is well known that

**S**satisfies the following ODE system, which can be obtained by applying the chain rule of differentiation to equation (2.1): 2.6The goal is to relate the evolution of the sensitivity matrix to the evolution of the covariance of the joint vector of the state and the parameters. Let the subscript

*n*denote all quantities at time

*t*

_{n}; for example,

**x**

_{n}denotes the state vector at time

*t*

_{n},

**S**

_{n}the corresponding sensitivity matrix, and so on. To relate the sensitivity matrix

**S**

_{n+1}at time

*t*

_{n+1}with

**S**

_{n}, a first-order discretization of equation (2.6) is considered 2.7and, therefore, the matrix product can be written as 2.8Next, it is hypothesized that under certain conditions can be seen as the covariance matrix of the state vector at time

*t*

_{n+1}. These developments are presented in the next two sections.

## 3. Forward propagation of uncertainty

As the objective is to study the relationship between the parameters and the state vector, a joint vector of all the state vectors until the current time *t*_{n} and the parameter vector is considered. Assume that at time *t*_{n}, this joint vector [**x**^{T}_{n}, **x**^{T}_{n−1}, …, **x**^{T}_{0}, *θ*^{T}]^{T} is distributed according to a multivariate Normal distribution as follows:

3.1

To obtain the joint distribution of [**x**^{T}_{n+1}, **x**^{T}_{n}, …, **x**^{T}_{0}, *θ*^{T}]^{T} (all the state vectors until time *t*_{n+1} and the parameter vector), the linearized dynamical system, equation (2.4), is used. Considering the reference point (**x**_{r}, *θ*_{r}, *t*_{r}) in equation (2.4) to be (*μ*_{xn}, *μ*_{θ}, *t*_{n}), i.e. considering the linearization around the mean values of the parameter vector and the state at time *t*_{n}, one obtains
3.2Ignoring the higher-order terms, and employing a forward Euler discretization, one obtains
3.3

### Remark 3.1.

**x**_{n+1} is completely determined by **x**_{n} and ** θ**, i.e. given

**x**

_{n}and

**nothing more can be learned about**

*θ***x**

_{n+1}. Hence, the forward propagation forms a Markov chain.

### Remark 3.2.

are evaluated at (*μ*_{xn}, *μ*_{θ}, *t*_{n}).

### Remark 3.3.

In equation (3.1), *Σ*_{α,β} = *Σ*^{T}_{β,α} and *Λ*_{α,β} = *Λ*^{T}_{β,α}.

The joint vector [**x**^{T}_{n+1}, **x**^{T}_{n}, …, **x**^{T}_{0}, *θ*^{T}]^{T} can be written from equations (3.1) and (3.3) as

3.4

where **I**_{q} represents an identity matrix of size *q*, **O**_{q,r} represents a zero matrix of size *q* × *r*, and
3.5is a term that does not depend on **x**_{n} and ** θ**. The distribution of [

**x**

^{T}

_{n+1},

**x**

^{T}

_{n}, …,

**x**

^{T}

_{0},

*θ*^{T}]

^{T}can be written from equation (3.4) as 3.6and the covariance

*Σ*_{n+1}can be expanded as 3.7where 3.8 3.9and 3.10

If the above evolution of the covariance matrix can be related to the evolution of the sensitivity matrix, as presented in §2 and equation (2.8), then the dependencies between the state vector and the parameters can be studied. This concept is developed in the next section.

## 4. Relationship between sensitivity and forward propagation of uncertainty

In this section, the relationship between the evolution of the sensitivity matrix and the evolution of the covariance matrix of the joint distribution between all the state vectors until time *t*_{n} and the parameters is developed. Equation (3.8) can be expanded as follows:
4.1Assume the following:
4.2
4.3and
4.4Under the above assumptions, it can be deduced from equations (4.1) and (2.8) that
4.5Furthermore, equation (3.9) reads
which, as evident from equation (2.7), is the standard forward propagation of the sensitivity matrix. Hence
4.6Finally, the term *Σ*_{n+1,n} from equation (3.10) can be written as
4.7

From equations (4.5), (4.6) and (4.7), it can be concluded that if the initial *prior* uncertainty in [**x**^{T}_{0}, ** θ**]

^{T}is assumed to be Gaussian with covariance 4.8then the joint vector of

**, the parameters, and [**

*θ***x**

^{T}

_{n},

**x**

^{T}

_{n−1}, …,

**x**

^{T}

_{0}]

^{T}, the state-vector corresponding to time instants [

*t*

_{0},

*t*

_{1}, …,

*t*

_{n}], can be approximated, by considering only the first-order terms after linearization, to be a Gaussian distribution with the following covariance:

4.9

### Remark 4.1.

Note that *a prior* mean for the vector [**x**^{T}_{0}, *θ*^{T}]^{T} is assumed to be
4.10based on which the mean vector of the state will propagate according to equation (3.3), essentially according to the forward Euler method. While this propagated mean does not directly influence the *posterior* uncertainty of the parameters, which depends only on the covariance matrix, it is important to note that the sensitivity terms in the covariance matrix of equation (4.9) are evaluated at the propagated means. The propagated mean of the joint vector [**x**^{T}_{n}, **x**^{T}_{n−1}, …, **x**^{T}_{0}, *θ*^{T}]^{T} is referred throughout this manuscript as *μ*_{n} = [*μ*^{T}_{xn}, *μ*^{T}_{xn−1}, …, *μ*^{T}_{x0}, *μ*^{T}_{θ}]^{T}.

### Remark 4.2.

The required conditions presented in equations (4.2), (4.3) and (4.4), can also be derived without temporal discretization of the sensitivity and linearized forward model. This is presented in appendix A, which presents a differential equation describing the evolution of the joint covariance matrix, leading to the conditions derived above without temporal discretization. Even though the method presented in appendix A may be considered more general, the author first conceived the idea using the arguments shown above, and hence these ideas are presented in the main text.

## 5. Measurements (observations)

Having established how the covariance of the state and the parameters evolves in relation to the sensitivity matrix, the next task is to extend this framework to include the measurements. Eventually, one wants to obtain an expression for the joint distribution of the measurements and the parameters, so that conditioning this joint distribution on the measurements (implying that measurements are known) will yield information about how much can be learned about the parameters.

Consider a linear observation operator where is measured at time *t*_{n} according to
5.1where is the observation operator at time *t*_{n} and *ε*_{n} is the measurement noise. Let *ε*_{n} be independently (across all measurement times) distributed as
5.2where **O**_{m} is a zero vector and *Υ*_{n} is the covariance structure of the noise. From equations (4.9) and (5.1), it is easy to see that [**y**^{T}_{n}, **y**^{T}_{n−1}, …, **y**^{T}_{0}, ** θ**]

^{T}follows a Gaussian distribution with the following mean and covariance:

5.3

where

5.4

and 5.5

### Remark 5.1.

A nonlinear observation operator in equation (5.1), as opposed to the linear operator **H**, does not present any technical challenges to the formulation as it can be linearized at the current mean values. Following this, in equations (5.4) and (5.5), **H** would need to be replaced by the tangent operator .

## 6. Conditional distribution of the parameters

The quantity of interest is the conditional distribution of parameters; i.e. how the beliefs about the parameters have changed from the *prior* beliefs to the *posterior* beliefs (the conditional distribution) by the measurements. More than the mean of the conditional distribution, the covariance is of interest. This is due to two reasons: (i) owing to the Gaussian approximations, the covariance entirely reflects the amount of uncertainty in the parameters; and (ii) while the mean of the conditional distribution depends on the measurements, the covariance does not. The latter is significant because *a priori*, the measurement values are not known. Consequently, the average (over all possible measurements) uncertainty in the parameters too is independent of the measurements, and hence can be studied *a priori*.

From equation (5.3), since the joint distribution of the parameter vector and the observables is Gaussian, the conditional distribution of the parameter vector given the measurements is also Gaussian and can be written as
6.1with
6.2and
6.3where **y**^{o}_{i} denotes the measurement value (the realization of the random variable **y**_{n} observed) at *t*_{i}. Note that the conditional covariance is independent of these measurement values **y**^{o}_{i}. Furthermore, since the uncertainty in a Gaussian random variable, quantified by the differential entropy, depends only on the covariance matrix, the posterior distribution uncertainty does not depend on the measurements.

## 7. Conditional covariance when *n* → ∞ and when *n* is finite

For the asymptotic case when *n* → ∞, it can be shown that (for proof see appendix B)
7.1where is the Fisher information matrix defined as
7.2Furthermore, for finite *n*, the conditional covariance can be written as (for proof see appendix C)
7.3

## 8. Information gain

In this section, the gain in information about the parameters by the measurements is considered. For details of such an information-theoretic approach the reader is referred to [21]. The gain in information about the parameter vector ** θ** by the measurements of

*z*_{n}= [

*y*^{T}

_{n},

*y*^{T}

_{n−1}, …,

*y*^{T}

_{0}]

^{T}is given by the mutual information between

*z*_{n}and

**, which is equal to the difference between the differential entropies of the prior distribution**

*θ**p*(

**) and the conditional distribution**

*θ**p*(

**|**

*θ***z**

_{n}). From equations (4.8), (6.1) and (6.3), this gain in information can be written as 8.1where denotes the determinant. The above can be expanded through equation (7.3) as 8.2

Note that the above represents the information gain for the joint vector of all the parameters. Commonly, one is interested in individual parameters, for which the information gain is now presented. Let denote the vector of a subset of parameters indexed by the elements of set and denote the vector of the remaining parameters, the complement of set . Hence, *θ*^{{i}} denotes the *i*th parameter, *θ*^{{i,j}} denotes the vector formed by taking the *i*th and *j*th parameters, and so on. The conditional covariance matrix can be decomposed into the components of and as

8.3

and

8.4

where is the cardinality of , and and are the sensitivity matrices for and , respectively, i.e. and . Given the above decomposition, the marginal covariance of given the measurements can be written as the Schur complement of the matrix in as follows: 8.5and the information gain as 8.6

Another quantity of interest is the correlation between two subsets of parameters and . In an information-theoretic context this can be assessed by how much more information is gained about the parameters in addition to if was also known, i.e. the mutual information between and given the measurements. Similar to the procedure employed in equation (8.4), by splitting into three components for , and , one can write the conditional covariance of the parameters given the measurements and, additionally, the parameters as follows: 8.7where is the cardinality of , 8.8and 8.9

The information gain about the parameters given both the measurements and the parameters is 8.10Lastly, the conditional mutual information (CMI), i.e. the additional (after the measurements are known) information gained about the parameters due to the knowledge of is 8.11

### Remark 8.1.

is the gain in information about the parameters given the measurements and when nothing is known about the parameters .

### Remark 8.2.

is the gain in information about the parameters given the measurements and the parameters , when nothing is known about the parameters .

### Remark 8.3.

In [21], the authors suggested a method to interpret the information gains and when the set contained a single parameter by proposing a hypothetical measurement device. This is not necessary in the current formulation as all the distributions are approximated to be Gaussian. Therefore, when contains only a single parameter, the conditional covariances and are scalar quantities representing the posterior variances of the parameter . When contains more than one parameter, the quantities and are scalars that quantify the gains in information.

The above developed functions for information gains (and associated variances) are collectively referred as ‘ISFs’. From this point onwards, the terms *marginal posterior variance* or just *marginal variance* for a parameter subset refers to the the variance conditioned on only the measurements, equation (8.5), and the corresponding information gain, equation (8.6), is referred as the *marginal information gain*. Similarly, the term *conditional variance* is used to refer to the variance when the measurements and additionally a parameter subset is given, equation (8.7), and the corresponding information gain is referred as the *conditional information gain*, equation (8.10). Lastly, the information shared between two subsets of parameters given the measurements, equation (8.11), is referred as the *conditional mutual information* or just the *mutual information*. Finally, the vector **z**_{n} = [**y**^{T}_{n}, **y**^{T}_{n−1}, …, **y**^{T}_{0}]^{T} is used to denote a collection of all measurement vectors up to time *t*_{n}.

## 9. Results and discussion

In this section, the theory developed above is applied to study three dynamical systems.

### 9.1. Three-element Windkessel model

Windkessel models are widely used to describe arterial haemodynamics [24]. Increasingly, they are also being used as boundary conditions in three-dimensional computational fluid dynamics simulations to assess patient-specific behaviour [20,25]. To perform patient-specific analysis, it is imperative that the parameters of the Windkessel model are estimated from measurements taken in each patient individually. A three-element Windkessel model is shown in figure 1*a* and consists of three parameters: *R*_{p} (proximal resistance) which represents the hydraulic resistance of large vessels; *C* (capacitance) which represents the compliance of large vessels; and *R*_{d} which represents the resistance of small vessels in the microcirculation. Note that these models use the electric analogy to fluid flow where pressure *P* is seen as voltage and flow-rate *q* is seen as electric current. Typically, inlet flow-rate *q*^{i} is measured (via magnetic resonance imaging or Doppler ultrasound) and inlet pressure *P*^{i} is measured by pressure catheters. The goal then is to estimate the parameters (*R*_{p}, *C* and *R*_{d}) by assuming *q*_{i} is deterministically known and minimizing the difference between the *P*^{i} reproduced by the model and the *P*_{i} that was measured. The model dynamics is described by the following differential algebraic equations, which may also be rewritten as a single ODE:
9.1where *P*^{i} and *P*^{c} are the inlet and mid-Windkessel pressures, respectively, (figure 1*a*); *P*_{ext} and *P*_{ven} are the reference external and venous pressures, respectively, which are both set to zero; and *q*^{i} and *q*^{o} are the inlet and outlet flow-rates, respectively. The measurement model is written as follows:
9.2where *ε*_{n} is the noise (normally distributed with zero mean and variance *σ*^{2}_{noise}) in measuring *P*^{i}_{n} to give the measurement *y*_{n} at time *t*_{n}. The measurement vector, therefore, has only one component **y**_{n} = [*y*_{n}]. The nominal values of *R*_{p}, *C*, *R*_{d} are 0.838 mmHg · s cm^{−3}, 0.0424 cm^{3} mmHg^{−1} and 9.109 mmHg · s cm^{−3}. Note that these units are chosen so that the results are comprehensible in typical units used in the clinic: millilitres for volume and millimetres of mercury for pressure. Figure 1*b* shows the inlet flow-rate *q*^{i} (taken from [20,26] where it was measured in the carotid artery of a healthy 27-year-old subject), and the resulting pressure curves obtained by the solution of equation (9.1) with *P*^{i}_{0} = 85 mmHg and nominal parameter values. To put a zero-mean and unit-variance prior on the parameters, see equation (4.8), the following parameter transformation is considered
9.3where *ξ* represents the real parameter, *ξ*_{0} and ς_{ξ} are transformation parameters, respectively, and *θ*_{ξ} represents the transformed parameter on which *a prior* of zero mean and unit variance is considered. Therefore, the prior considered on the real parameter *ξ* has mean *ξ*_{0} and variance ς^{2}_{ξ}. The posterior variances for the transformed parameter *θ*_{ξ} and the real parameter *ξ* are represented by *σ*^{2}_{θ} and *σ*^{2}, respectively. A total of 150 time-points, evenly distributed between *t* = 0 s and *t* = *T*_{c} (where *T*_{c} = 0.75s is the time period of the cardiac cycle), are used for the computation of ISFs and conditional variances.

Figure 2 shows the marginal posterior variances (conditional only on the measurements, (*a*–*c*)) and the corresponding information gains (*d*–*f*) for individual parameters at four different levels of measurement noises. The conditional variances when all measurements are taken into account, i.e. at *t* = *T*_{c}, are also summarized in table 1. An immediate utility of figure 2 is in identify intervals of time where information is concentrated about a parameter. For example, from the first column it is clear that most of the information about the parameter *θ*_{Rp} is concentrated in the interval *t* ∈ [0.3, 0.4] as this is the interval that shows maximum reduction in the marginal variance and highest information gain. This interval corresponds to the rising peak of the inlet flow-rate curve, see figure 1*b*, and from equation (9.1) it is clear that the parameter *θ*_{Rp} should have most effect on the pressure *P*^{i} in this interval. For the parameter *θ*_{C}, it appears from figure 2 that while information is available in the entire cardiac cycle, larger amount of information is concentrated in the later half of the cardiac cycle, *t* ∈ [0.4, 0.75]. For *R*_{d} information is available throughout the cardiac cycle. These observations have also been presented in [20] through the computation of GSFs [27] and in [21] through a Monte Carlo type computation of information gain. However, as opposed to GSFs which can be non-monotonic and therefore hard to interpret, the ISFs are always monotonic. Furthermore, since the GSFs are normalized by design, they are forced to start at 0 and end at 1, thereby making the assessment of measurement noise difficult. On the other hand, the effect of measurement noise is inherently built in to the ISFs. Figure 2 quantifies how increasing measurement noise results in a decreasing amount of information gained about the parameters. While this behaviour is intuitively expected, its quantification with respect to each individual parameter is made possible with the proposed method. For example, while at *σ*^{2}_{noise} = 100.0 mmHg^{2} the conditional variance of the parameter *θ*_{Rp} after considering all the measurements is 0.158 square units, at *σ*^{2}_{noise} = 4900.0 mmHg^{2} this conditional variance is 0.887 square units. Comparing this to the prior variance of 1.0 square units, one may conclude that at measurement noise of 4900.0 mmHg^{2} (standard deviation of 70.0 mmHg), the parameter *R*_{p} is extremely difficult to identify relative to when the measurement noise is 100.0 mmHg^{2} (standard deviation of 10.0 mmHg). A similar argument can be made for the parameter *θ*_{C}, even though its identifiability is better than that of *θ*_{Rp} (*θ*_{C} has posterior variance of 0.672 square units at measurement noise of *σ*^{2}_{noise} = 4900.0 mmHg^{2}). However, the parameter *θ*_{Rd} appears to be well identifiable even at *σ*^{2}_{noise} = 4900.0 mmHg^{2} with final posterior variance of 0.07 square units. This behaviour can be explained by the fact that measurement noise is assumed to be independent and identically distributed with zero mean at all measurement times. Therefore, the mean pressure is measured much more precisely than individual pressure measurements, irrespective of the noise levels, as when mean/expectation of equation (9.2) is taken, the expectation of noise component is zero:
9.4where denotes the expectation operator. From equation (9.1) and figure 1*a*, the inlet mean pressure is equal to the inlet mean flow-rate times the sum of both resistances, i.e. . Approximating by the sample mean as , one obtains
9.5

As *q*^{i} is assumed deterministic, from the above equation it can be seen that *R*_{p} + *R*_{d} is indirectly measured with high precision. As *R*_{d} is approximately an order of magnitude larger than *R*_{p}, it is natural that *R*_{d} dominates the sum (*R*_{p} + *R*_{d}) and hence, irrespective of the noise levels, a large amount of information is obtained about *R*_{d} (figure 2*c*,*f*). The order of magnitudes of the resistances are chosen by the physics of circulation, where the resistance of small vessels and microcirculation is significantly higher than that of large vessels [20,26], and is reflected in the chosen priors for the problem.

Equation (9.5) and the arguments presented above imply that a significant amount of correlation must have been built up between the parameters *R*_{p} and *R*_{d} in the posterior distribution as the sum (*R*_{p} + *R*_{d}) is measured with high precision. This correlation implies that if one of the parameters *R*_{p} or *R*_{p} were known then how much additional information can be gained about the other parameter. The CMI presented in equation (8.11) precisely measures this additional information. CMIs for all the three pairs of the parameters are shown in figure 3. It is clear that at the end of the cardiac cycle, the largest CMI is for the parameter pair *θ*_{Rp} and *θ*_{Rd}. It is sensible to compare the magnitude of CMIs with the marginal information gains (figure 2). For example, for the case of *σ*^{2}_{noise} = 100.0, the marginal gain in information about the parameter *R*_{p} is approximately 0.9 nats and the mutual information between *R*_{p} and *R*_{d} is 0.35 nats; therefore, one may conclude that approximately 40% extra information about the parameter *R*_{p} is locked up in the correlation with *R*_{d}. For the pair *R*_{d} and *C*, it appears that correlation is built up in the *t* ∈ [0.0, 0.4], the diastole, and destroyed in the remaining part, the systole, of the cardiac cycle. This can be explained by the fact that the time-constant *e*^{−t/τ}, with *τ* = *R*_{d}*C*, is the dominant parameter that governs the diastole phase [21] leading to a built up of correlation, and as independent information about *C* and *R*_{d} is acquired in systole (figure 2) this correlation is destroyed. It should be noted that these aspects, even without knowing the physics (or solution) of the problem, can be naturally inferred from figures 2 and 3.

The effect of correlations can be further assessed by looking at the conditional variances (*a*–*c*) and conditional information gains (*d*–*f*) as depicted in figure 4. For *σ*^{2}_{noise} = 625.0, this figure shows the conditional posterior variances and the conditional information gains for individual parameters when other parameters are given. For the parameter *θ*_{Rp}, it can be seen that the conditional variance given *θ*_{Rd} is lower than the marginal variance in the interval *t* ∈ [0.4, 0.75] as this is the region where mutual information (correlation) is built between these parameters (figure 3). Similarly, in diastole, *t* ∈ [0.0, 0.4], it can be seen that the conditional variance of parameter *θ*_{C} given *θ*_{Rd} is significantly lower as correlation is built up, but this gain quickly diminishes to zero in systole, *t* ∈ [0.4, 0.75]. For the parameter *R*_{d}, as a large amount of individual information is obtained marginally, the conditional variances are not too different than the marginal variances. Note, that the variances show an opposite behaviour to information gains as a decrease in variance implies gain in information. Therefore, even though the two measures appear to be similar, information gain is a better measure as it can be readily applied to cases where behaviour of a set of parameters is required to be studied. For example, if one was interested in the joint information again for a set of two parameters given a third, the information gain measure will be a scalar but the joint covariance will be a matrix. Furthermore, the relation between conditional information gain, marginal information gain, and mutual information is additive, see equation (8.11), whereas the relation between conditional variance and marginal variance is, in general, not additive. As a demonstration, it can be observed that the conditional information gain curves in figure 4 can be obtained by the addition of the corresponding curves from figures 2 and 3.

### 9.2. The Hodgkin–Huxley model of a neuron

The Hodgkin–Huxley model [28] describes ionic exchanges and their relationship to the membrane voltage in a biological neuron. This model has also been used as the basis for several other ionic models to describe a variety of excitable tissues such as cardiac cells [29]. The model is described by the following ODE equations:
9.6with
9.7where *V*_{m} is the membrane voltage, *C*_{m} is the membrane capacitance, *I*_{ext} is the external current applied; *I*_{Na}, *I*_{K} and *I*_{L} are the sodium, potassium and leakage currents, respectively; *V*_{Na}, *V*_{K} and *V*_{L} are the equilibrium potentials for sodium, potassium and leakage ions, respectively; *g*_{Na}, *g*_{K} and *g*_{L} are the maximum conductances for the channels of sodium, potassium and leakage ions, respectively; and *m*, *h* and *n* are the dimensionless gate variables, *m*, *h*, *n* ∈ [0, 1], that characterize the activation and inactivation of sodium and potassium channels. *C*_{m} is set to 1 μF cm^{−2}, and the equilibrium potentials are defined in millivolts (mV) relative to the membrane resting potential, *E*_{R}, as follows [30,31]:
9.8The inverse problem is of estimating the three parameters *g*_{Na}, *g*_{K} and *g*_{L} by measuring the membrane voltage *V*_{m} when a constant external current *I*_{ext} = 20 μA cm^{−2} is applied to the neuron. It is well known that when a relatively high constant external current is applied the neuron exhibits a tonic spiking pattern in membrane voltage *V*_{m} [32–34]. With nominal parameter values of *g*_{Na} = 120.0 mS cm^{−2}, *g*_{K} = 36.0 mS cm^{−2} and *g*_{L} = 0.3 mS cm^{−2}, and initial conditions of *V*_{m}(0) = −75 mV, *m*(0) = 0.05, *h*(0) = 0.6 and *n*(0) = 0.325, this tonic spiking behaviour, generated by solving equation (9.6), is shown in figure 5. The observation model reads
9.9where *V*_{mn} is the membrane voltage at time *t*_{n} and *ε*_{n} is the zero-mean measurement noise with variance *σ*^{2}_{noise}. As only *V*_{m} is measured the observation vector is **y**_{n} = [*y*_{n}]. As opposed to the Windkessel case where the effect of noise is evaluated, in this case the effect of number of observations, i.e. the observation frequency is evaluated. *N*_{obs} number of measurement time-points evenly distributed in the time interval *t* ∈ [0.0, 40.0] ms are studied. Four levels of observation frequencies resulting in four values of *N*_{obs} ∈ {100, 200, 400, 800} are used while *σ*^{2}_{noise} is set to 100.0 mV^{2} (standard deviation of 10.0 mV). Similar to the Windkessel example the following parametrization is used to impose zero-mean and unit-variance priors on the parameters.
9.10where *ξ*_{0} is the nominal parameter value, zero-mean and unit-variance normal distribution prior is imposed on the transformed parameter *θ*_{ξ}, resulting in the prior distribution imposed on the real parameter *ξ* to be a normal distribution with mean *ξ*_{0} and variance ς^{2}_{ξ}. The parameters ς_{ξ} are set to 10.0, 6.0 and 0.1 mS cm^{−2} for *g*_{Na}, *g*_{K} and *g*_{L}, respectively.

Figure 6 shows the posterior marginal variances (*a*–*c*) and the marginal information gains (*d*–*f*) for the three parameters for all the four observation frequencies. In all these plots, an arbitrarily scaled *V*_{m}(*t*) curve is shown in light grey for ease of interpretation relative to *V*_{m}(*t*) variations. As expected, increasing the measurement frequency results in larger amounts of information (and consequently larger reduction in the posterior variances). However, it is observed that the parameters *θ*_{gNa} and *θ*_{gL} benefit most from an increase in measurement frequency as opposed to the parameter *θ*_{gK} which benefits only marginally. This implies that at low observation frequencies the identifiability of *θ*_{gK} is good, while very low amount of information is available for the parameters *θ*_{gNa} and *θ*_{gL}. The behaviour for the parameter *θ*_{gK} (figure 6*b*,*e*) shows that the information about this parameter is concentrated mostly in the sharp rising phase of the action potential *V*_{m}. A similar behaviour, although less salient, is observed for the parameters *θ*_{gNa} and *θ*_{gL} (figure 6*a*,*d*,*c*,*f*). While the Hodgkin–Huxley model is quite complex with gating variables of different time-constants and dependence of ionic currents on powers (up to fourth power) of the gating variables, it is widely understood that the rising phases of the action potential *V*_{m} are related to the sodium and potassium currents. This may explain why information about the parameters *θ*_{gNa} and *θ*_{gK} is mostly concentrated in this region. Furthermore, if we accept that the sodium and potassium currents, in combination, are responsible for the rising action potential, then we should also expect a substantial amount of correlation between the parameters *θ*_{gNa} and *θ*_{gK} as it should be hard to distinguish between these two parameters. This is precisely what is observed by the CMI analysis, figure 7, where a large amount of mutual information is developed between these two parameters. For the case of *N*_{obs} = 100, the marginal information gain in the parameter *θ*_{gNa}, figure 6, is approximately 0.3 nats, and it is observed from figure 7 that approximately 0.7 nats of mutual information exists between *θ*_{gNa} and *θ*_{gK}. This implies that the amount of information that can be gained about *θ*_{gNa} by knowing *θ*_{gK}, in addition to the measurements, is larger than the amount of information gained by just the measurements. Indeed, as the observation frequency is increased more information is available about all the parameters individually. Figure 7 also shows that significant amount of correlation is built between the parameters *θ*_{gK} and *θ*_{gL} during the sharp rising part of *V*_{m}. For example, for *N*_{obs} = 200, the amount of CMI between *θ*_{gK} and *θ*_{gL} is approximately 0.14 nats (figure 7), approximately the same magnitude as the marginal information gain of 0.15 nats (figure 6) for the parameter *θ*_{gL}. At the same time, since the marginal information gain for *θ*_{gK} is approximately 1.25 nats (figure 6), the effect of this correlation, amounting to an information gain of 0.14 nats (figure 7), is not too significant for estimating *θ*_{gK}.

Finally, the effect of the CMI, i.e. the correlation, can also be seen in terms of the conditional variances and conditional information gains as shown in figure 8 for *N*_{obs} = 200. As discussed above the correlations between the pairs (*θ*_{gNa}, *θ*_{gK}) and (*θ*_{gK}, *θ*_{gL}) show that the conditional variances are significantly lower (and the conditional information gain is larger) for one parameter when the other parameter is additionally known. It should be noted that the correlations and information gains presented are specific to the protocol, i.e. a constant external current resulting in tonic spiking of the neuron and only *V*_{m} being measured. The information gains will behave differently if the protocol is changed, for example to intermittent step currents or continuously varying external currents. Therefore, one application of the methods proposed in this article can be in optimal design of experiments, where one may design the protocol such that maximal information gain occurs for individual parameters while CMI (correlations in the posterior distribution) are minimized.

### 9.3. Influenza A virus kinetics

The final example presented is for the kinetics of the influenza A virus. The following model was proposed by Baccam *et al.* [35] to describe viral infection
9.11where *V* is the infectious virus titre (measured in TCID_{50} ml^{−1} of nasal wash), *T* is the number of uninfected target cells, *I* is the number of productively infected cells and {*β*, *δ*, *p*, *c*} are the model parameters. The parameter *p* represents the average rate at which the productively infected cells, *I*, increase the viral titres, and the parameter *δ* represents the rate at which the infected cells die. The parameter *β* characterises the rate at which the susceptible cells become infected and *c* represents the clearing rate of the virus.

As opposed to the previous example where the initial conditions were assumed to be known, in this example, the initial conditions for the virus titre *V*_{0} and the number of uninfected target cells *T*_{0} are considered unknown and hence form the parameters of the dynamical system. Time is measured in days (d) and the initial condition for the number of infected cells *I*_{0} is assumed to be known at 0.0. Hence there are six parameters [*β*, *δ*, *p*, *c*, *V*_{0}, *T*_{0}] in total. The nominal values of the parameters are chosen to be *β* = 2.7 × 10^{−5} (TCID_{50} ml^{−1})^{−1} d^{−1}, *δ* = 4.0 d^{−1}, *p* = 0.012 TCID_{50} ml^{−1} · d^{−1}, *c* = 3.0 d^{−1}, *V*_{0} = 0.1 TCID_{50} ml^{−1} and *T*_{0} = 4 × 10^{8} based on the average patient parameters identified by Baccam *et al.* [35]. As in the previous examples, the following parametrization is used to impose zero-mean and unit-variance priors on the transformed parameters:
9.12where *θ*_{ξ} represents the transformed version of the real parameter *ξ*, *ξ*_{0} represents the nominal values of the parameter, and hence with a zero-mean and unit-variance prior on the transformed parameters *θ*_{ξ}, the prior imposed on the real parameter is of mean *ξ*_{0} and variance ς^{2}_{ξ}. The scaling parameters ς_{ξ} are set to 9 ×10^{−06}, 1.3, 0.004, 1.0, 0.03 and 2.0 × 10^{8} for *β*, *δ*, *p*, *c*, *V*_{0} and *T*_{0}, respectively, in their respective units. The solution to equation (9.11) for the nominal parameter values is shown in figure 9. It is observed that both the virus titre *V* and the number of infected cells *I* increase sharply until they peak at the 2–3 day mark. After this a decrease in both values is observed. The number of uninfected target cells *T* remains approximately constant until the 2 day mark after which a sharp decrease (approx. 4 orders of magnitude) is observed over the next 2 days leading to a plateau.

To study the sensitivity and information gain two cases are considered: first, when only *V* is measured; and second, when both *V* and *I* are measured. In the first case, the observation model reads:
9.13where *V*_{n} is the virus titre concentration at time *t*_{n} and *ε*_{n} is the zero-mean measurement noise with variance *σ*^{2}_{noise} = 2.5 × 10^{7} (TCID_{50} ml^{−1})^{2}, i.e. a standard deviation of 5 × 10^{3} TCID_{50} ml^{−1}. A total of 200 measurements are evenly distributed between 0 days and 10 days for the computation of marginal variances and information gains.

Figure 10 shows the marginal variances for all the parameters in solid lines and the conditional variances for a four pairs of parameters in dashed lines. Given the dynamics of the problem as shown in figure 9 it is not surprising that most of the information gain about all the parameters occurs in *t* ∈ [0, 4] days. The parameters *θ*_{β}, *θ*_{δ} and *θ*_{c} appear to be well identifiable given the large decreases in marginal variances. However, the initial conditions *θ*_{V0} and *θ*_{T0} show less decrease in the variances indicating problems in their identifiability. Finally, the parameter *p* appears to be unidentifiable given that its marginal variance decreases from 1.0 (standard deviation 1.0) to only 0.7 square units (standard deviation 0.84 units). Figure 11 shows the mutual information between all the pairs of the parameters, where the parameter pairs that show a high mutual information are plotted in dashed lines. For the parameters in these pairs of high mutual information, (*θ*_{δ}, *θ*_{c}) and (*θ*_{p}, *θ*_{T0}), the conditional variances are plotted in figure 10. The parameter pair (*θ*_{p}, *θ*_{T0}) is particularly interesting as the parameter *θ*_{p}, although unidentifiable individually, becomes very well identifiable, owing to the large mutual information it shares with *T*_{0}, if the initial condition *T*_{0} is known. This observation was proved through classical methods by Miao *et al.* [6] where it was shown that taking higher-order derivatives of equation (9.11) and eliminating the unmeasured variables, *T* and *I*, one obtains the following differential equation:
9.14

As the above equation does not contain the parameter *p*, in the absence of any other quantity, i.e. *T* and *I*, and the corresponding initial conditions, the parameter *p* is not identifiable. Miao *et al.* [6] also reported that when *T*_{0} is known, the parameter *p* becomes identifiable, which is consistent with the large mutual information. In the Bayesian approach adopted in this manuscript, a non-zero amount of knowledge (non-infinite variance) is inherently assumed in the prior for *θ*_{T0}, which results in a small amount of information gain (and hence a small reduction in the marginal variance from 1.0 to 0.7 square units). This small amount of information gain is a result of the knowledge assumed in the prior. However, it is not significant enough to hide the identifiability problem for *θ*_{p}. One can choose to impose prior of higher ignorance by increasing the prior variance of the real parameter *T*_{0} by increasing the scaling factor ς_{T0}. The results for four different values of ς_{T0} on the marginal variance of the parameter *θ*_{p} are shown in figure 12*b*. It is clear that a higher value of ς_{T0}, which implies higher ignorance in the prior for *T*_{0}, results in a decreasing amount of information gained about the parameter *θ*_{p}. This example shows how, without the use of classical analytical methods, see for example those presented in [6], which may not be easily applicable to all dynamical systems, the information theoretic approach can provide similar conclusions about parameter identifiability. Lastly, the classical sensitivity of the parameter *p* to the measurable *V* is shown in figure 12*a*, whose large magnitude does not indicate any problems of parameter identifiability. Finally, Miao *et al.* [6] reported that all the parameters of the influenza dynamical system were well identifiable if both *V* and *I*, or both *V* and *T* were measured. For the case when both *V* and *I* are measured, the marginal variances are shown in figure 13, which too shows that no identifiability problems persist in this case. Note that the error structure in the measurement of *I* was assumed to be identical to the measurement of *V*, equation (9.13).

## 10. Conclusion

A new class of functions called the ‘ISFs’ have been proposed to study parametric information gain in a dynamical system. Based on a Bayesian and information-theoretic approach, such functions are easy to compute through classical sensitivity analysis. Compared to the previously proposed generalized sensitivity functions (GSFs) [27] to measure such information gain, the ISFs do not suffer from the forced-to-one behaviour and are easy to interpret as correlations are measured through separate measures of mutual information as opposed to oscillations in GSFs. Furthermore, as opposed to GSFs, which are normalized, the ISFs can be used to compare information gain between different parameters and hence can be used to rank the parameters on ease of identifiability. They can be used to identify regions of high information content and indicate identifiability problems for parameters which show little to no information gain, or high mutual information (correlation) with other parameters. The application of ISFs is demonstrated on three models. For the Windkessel model, the effect of measurement noise is illustrated and it is shown that the insights provided by ISFs are consistent with those of a significantly more expensive Monte Carlo type approach [21]. For the Hodgkin–Huxley model, the effect of measurement frequency is illustrated, and finally, for the influenza A virus, it is shown how, even when classical sensitivity analysis fails to assess identifiability issues, the ISFs correctly reveal identifiability problems, which have been analytically proven through classical methods.

## Data accessibility

This article has no additional data.

## Competing interests

I declare I have no competing interests.

## Funding

This study was funded by Swansea University as part of the author's employment. Part of my time is funded by an EPSRC grant.

## Acknowledgements

The author thanks Dr Damiano Lombardi (Inria Paris) for fruitful discussions and insights.

## Appendix A. Differential analysis for equivalence of sensitivity and covariance evolution

From equation (3.2), the linearized dynamical system is
A 1Separating the random variables ** θ** and

**x**gives A 2

Combined with trivial dynamics for the parameters , the dynamics for the combined vector [**x**^{T}, *θ*^{T}]^{T} can be written as
A 3where **O**_{a,b} represents a zero matrix of size *a* × *b*. The above is concisely written as
A 4For the above stochastic differential equation, it is well known, see for example [36], that the covariance matrix of **ϰ**, denoted by ** Ξ**, evolves according to the following differential equation:
A 5Therefore, if the covariance matrix of

**ϰ**

_{n}is A 6then

**evolves according to equation (A 5) as**

*Ξ*A 7

The next task is to relate the above evolution of the covariance matrix with the evolution of sensitivity matrix. From equation (2.6), the sensitivity matrix **S** evolves as
A 8Therefore, taking the transpose of equation (A 8) yields
A 9The derivative of the matrix product **SS**^{T} can be written as follows:
A 10Substituting the derivatives from (A 8) with (A 9) into the above equation gives
A 11It is easy to see that if *Σ*_{θ, θ} = **I**_{p}, *Λ*_{n,θ} = **S** and *Σ*_{n,n} = **SS**^{T}, then the state covariance, *Σ*_{n,n}, and the cross-covariance, *Λ*_{n,θ}, from equation (A 7) evolve as
A 12and
A 13

## Appendix B. Asymptotic analysis of the conditional covariance

In this section, the behaviour of the conditional covariance matrix as *n* → ∞ is considered. From equation (5.4), *A*_{n} can be written as
B 1where *Γ*_{n} is a diagonal matrix with elements as follows:
B 2By applying the Kailath variant of the Sherman–Morrison–Woodbury identity the inverse of **A**_{n} can be expanded as
B 3Plugging this in equation (6.3) yields
B 4
B 5where
B 6

The matrix ** D** is symmetric, and can be factorized by singular value decomposition (SVD) as follows
B 7with
B 8and

*Φ*_{n}is a diagonal matrix with diagonal entries equal to the eigenvalues,

*λ*

_{i}, of

**D**

_{n}. B 9Owing to the symmetric nature of

**D**

_{n}, all the eigenvalues are real. Furthermore, if

**D**

_{n}is positive-definite then all eigenvalues are positive. Substituting

**D**

_{n}from equation (B 6) in equation (B 4) yields B 10 B 11 B 12 B 13 B 14

In the above **P**_{n} is a diagonal matrix with the entries
B 15

If the minimum eigenvalue of **D**_{n} is much larger than 1, i.e.
B 16then
B 17and
B 18Consequently, equation (B 10) yields
B 19Finally, from the above and equations (B 5), (B 2) and (5.5), the conditional covariance matrix can be written as
B 20

It can hence be concluded that if the minimum eigenvalue of *D*_{n} monotonically increases as *n* increases then
B 21Let the eigenvalues of **D**_{n} be denoted in decreasing order as *λ*_{1}(**D**_{n})≥*λ*_{2}(**D**_{n})≥ … ≥*λ*_{p}(**D**_{n}). The behaviour of the minimum eigenvalue of *λ*_{p}(**D**_{n}) is of concern. Note that **D**_{n} can be written as
B 22Consequently,
B 23 **D**_{n} and **Q**_{n} are both symmetric matrices. Let the eigenvalues of **Q**_{n+1} be denoted in decreasing order as *λ*_{1}(**Q**_{n+1})≥*λ*_{2}(**Q**_{n+1})≥ … ≥*λ*_{p}(**Q**_{n+1}). From equation (B 19) one has
B 24where Tr denotes the trace. Expressed in terms of the eigenvalues of the respective matrices, the above reads
B 25From several inequalities on the sums of eigenvalues of Hermitian matrices, specifically the *Ky Fan inequality* [37,38], one has
B 26Substituting *r* = *p* − 1 in equation (B 22) and subtracting it from equation (B 21) results in
B 27Consequently, if **Q**_{n+1} is full rank then *λ*_{p}(**Q**_{n+1}) > 0 and
B 28which implies that the minimum eigenvalue of **D**_{n} is monotonically increasing.

The above results are put in the perspective of classical nonlinear regression analysis [9,10] by assuming that the observation operator **H**_{i} is equal to identity for all *i*. Then, under a further assumption that is full-rank for all *i*, the conditional covariance matrix of the parameter is
B 29where is the Fisher information matrix defined as
B 30

## Appendix C. Conditional covariance for finite *n*

From equation (B 10) one has
C 1where *P*_{n} is given by equation (B 11). Consider the matrix (**D**_{n} + **I**_{p})^{−1}, which can be expanded as
C 2
C 3
C 4
C 5
C 6

Following the above and equation (B 18), the conditional covariance matrix can be written as C 7

- Received November 21, 2017.
- Accepted April 26, 2018.

- © 2018 The Authors.

Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.