## Abstract

One of the main challenges in the development of mathematical and computational models of biological systems is the precise estimation of parameter values. Understanding the effects of uncertainties in parameter values on model behaviour is crucial to the successful use of these models. Global sensitivity analysis (SA) can be used to quantify the variability in model predictions resulting from the uncertainty in multiple parameters and to shed light on the biological mechanisms driving system behaviour. We present a new methodology for global SA in systems biology which is computationally efficient and can be used to identify the key parameters and their interactions which drive the dynamic behaviour of a complex biological model. The approach combines functional principal component analysis with established global SA techniques. The methodology is applied to a model of the insulin signalling pathway, defects of which are a major cause of type 2 diabetes and a number of key features of the system are identified.

## 1. Introduction

Biological systems typically consist of large numbers of interacting components and involve processes operating across a variety of spatial, temporal and biological scales. Systems biology aims to understand such systems by integrating information from all functional levels into a single cohesive model. Mathematical and computational modelling are a key part of the systems biology approach providing a method for formally defining and analysing the structure and the function of a system.

One of the greatest challenges when building mathematical or computational models of biological systems is the precise estimation of parameter values [1]. Values for specific parameters measured *in vivo* are rare [2] and parameters are often estimated from experimental measurements made *in vitro* or by fitting of model simulations to experimental data. Consequently, parameter estimates are often associated with significant uncertainty. Examining the influence of these uncertainties on the model behaviour is crucial to the successful use of mathematical and computational models of biological systems both as predictive tools and as part of an iterative cycle of modelling and experimentation to understand the function of biological systems.

Sensitivity analysis (SA) provides a quantitative approach for investigating the impacts of parameter uncertainty on model outputs. SA is used in a variety of disciplines from environmental science to software engineering and in many fields is seen as a prerequisite for model building [3]. In systems biology, SA can be used in two key areas: quantifying the variation in model outputs to parameter uncertainty (often referred to as uncertainty analysis (UA)) thereby providing a measure of confidence in the predictive capacity of the model; identifying the parameters that contribute most to the variation in the model outputs allowing us to generate hypotheses about the biological mechanisms that drive the system behaviour which can be tested experimentally. By identifying the most influential parameters we can also improve the predictive capacity of the model by refining our estimates for those parameters.

There are two main classes of SA techniques, local and global. Local SA techniques investigate the effects of small variations in individual parameters around some nominal point and have been applied to a number of signal transduction and metabolic pathway models [4,5]. Where parameter values have large uncertainties or models are nonlinear global SA (GSA) techniques are more appropriate. GSA techniques investigate the effects of simultaneous parameter variations over large but finite ranges and allow the effects of interactions between parameters to be explored [3].

There has been an increased interest in the use of GSA in biological modelling in recent years. A comparative study of local SA and global SA techniques (partial rank correlation coefficients (PRCC) [6], Sobol's method [7] and the extended Fourier amplitude sensitivity test (eFAST) [8]) applied to a model of the ErK-MAPK signalling pathway by Zheng & Rundell [2] demonstrated the disparity between the results of the two approaches, and Marino *et al.* [9] developed a methodology for performing global SA of both deterministic and stochastic systems biology models which combined PRCC and eFAST.

In this paper, we consider two aspects of the application of GSA in systems biology: the application of GSA to time-dependent model outputs and the computational efficiency of the commonly used global SA techniques. We describe an alternative methodology for performing global SA of time-dependent model outputs and within that methodology compare existing GSA methods to evaluate their computational performance and utility in systems biology.

In biological systems, we are typically interested in understanding the dynamic behaviour of the system. To investigate how this behaviour depends on the model parameters, we must apply SA to model outputs which are functions of time. The application of SA to time-dependent model outputs is well established. Sensitivity indices are calculated at each time-point to produce a set of time-varying sensitivity indices. These indices provide information on which parameters are influential at particular times and can be integrated over time to identify those parameters which are most important in terms of the entire model output. This approach has been applied to a variety of biological systems [2,5,9,10]. However, by looking at individual time-points, we may miss interesting features in the model output. For example, it would be difficult to infer the impact of a parameter on the period of oscillations or the time at which a peak concentration occurs.

Alternatively, one may define a set of scalar features of the dynamic model output, such as the maximum concentration or the period of oscillations, and calculate sensitivity indices for these derived outputs [4,11]. To use this approach, we need to identify an appropriate set of features for the model of interest and construct algorithms to extract their values from the model outputs.

Here, we present an alternative methodology for the global SA of biological models based on an idea proposed by Campbell *et al.* [12]. The methodology combines existing GSA techniques with functional principal components analysis (fPCA) to investigate the parameter sensitivity of time-dependent model outputs. fPCA is used to transform the model outputs into an alternative format that captures the most important features in the model output. Existing GSA techniques are then used to identify the parameters, which are important in generating these features and hence in driving the associated system behaviour.

To address the second issue, the computational efficiency of GSA methods, we compare the use of two existing GSA techniques within our methodology. When a model contains a large number of uncertain parameters, a common occurrence in biological models, or has a significant run time, the computational cost of many GSA techniques can be significant. The efficiency of GSA techniques becomes increasingly important if they are to be used as part of an iterative cycle of model development and experimentation in which repeated analysis of the model may be required. Here, we compare the use of the variance-based method of Sobol with a more computationally efficient, but less quantitative design, the Morris method [13], to evaluate the utility of both techniques in a systems biology context.

## 2. Methods

This section presents an overview of the methodology including the use of functional principal components (PCs) and introduces the GSA techniques used in the methodology. The methodology is demonstrated by application to a mathematical model of the insulin signalling pathway which is described in §2.2.

### 2.1. Sensitivity analysis methodology

Consider a model of the form *y*(*t*) = *f*(*u, P,t*). The output of the model

**(**

*y**t*) is a set of curves describing the variation in the model variables over time. The output is some function (

**) of the external model input (**

*f**u*) and a set of

*k*model parameters, (

**= (**

*P**p*

_{1}

*,p*

_{2}, … ,

*p*)). Often this relationship between the model input and parameters and the model output will be described by a set of differential equations that describe the change in the model variables over time.

_{k}We assume that the value of each parameter is uncertain and that this uncertainty can be expressed as a probability distribution. The aim of the SA is to identify how the output of the model depends on the uncertainty in the parameters.

We begin by generating *N* parameter vectors ** P_{i}** = (

*p*

_{i}_{1}

*,p*

_{i}_{2}

*,*…

*,p*) for

_{ik}*i*= 1, … ,

*N*from the probability distributions for each parameter. The number of parameter sets and the way in which they are sampled depends on the chosen GSA technique (see §2.2 and figure 1

*a*). The model is then evaluated for each

**to generate the corresponding model outputs**

*P*_{i}*y*(

_{i}*t*) (

*i*= 1, … ,

*N*) (figure 1

*b*). For convenience, we assume here that the output of interest is a single model variable but the following can be replicated for any number of model variables.

fPCA is then applied to this set of *i* model outputs to generate the functional PCs and associated scores. Principal component analysis (PCA) is a multivariate statistical procedure, which is commonly used to identify the dominant modes of variation in a set of data consisting of *N* observations of *M* variables. PCA works by transforming the original data onto a new set of variables known as PCs. The new variables are such that they describe decreasing amounts of the variation in the data, i.e. the data vary most in the direction of the first PC, the second PC describes the second most important type of variation and so on. A consequence of this is that we can approximate the data in a subset of the new variables thus reducing the dimensionality of the data.

When our data consist of *N* observations of some function *y*(*t*), we can use fPCA to transform the data into a new set of functions (the functional principal components (fPCs)) which capture the most important types of variation in the data. As with multivariate PCA, the functions describe decreasingly small amounts of the variance in the data such that we can typically represent the original data using only the first few fPCs. The fPCs can be viewed as a set of curves ** ξ**(

*t*) = (

*ξ*

_{1}(

*t*),

*ξ*

_{2}(

*t*), … ,

*ξ*

_{q}(

*t*)) such that each model output

*y*(

_{i}*t*) can be written as a sum of those curves: 2.1where

*ω*

_{ij}is the PC score for model output

*i*on component

*j*. The PC scores (

*ω*

_{ij}) tell us the amount of component

*j*contained in output

*i*. For example, if the first PC,

*ξ*

_{1}(

*t*), represents an oscillatory time course of period

*T,*model outputs

*y*(

_{i}*t*) which display this type of dynamics will have high values for the first PC score (

*ω*

_{i1}). Details of the computation of the fPCs are given in the electronic supplementary material.

GSA techniques are then used to calculate the sensitivity of the PC scores (*ω*_{ij}) to the model parameters (figure 1*d*). If the scores for a given PC are sensitive to a particular parameter, then that parameter is important in producing the type of behaviour in the model output described by the corresponding component.

### 2.2. Sensitivity analysis techniques

The methodology uses existing SA techniques to generate the uncertain model parameters and calculate the sensitivities of the PC scores to these parameters. Two techniques, the method of Sobol [7] and the Morris method [14] are compared to evaluate their utility in the analysis of biological systems.

#### 2.2.1. The method of Sobol

Variance-based methods are a class of GSA techniques that quantify the importance of a parameter by the reduction in the model output variance which is obtained by ‘fixing’ the parameter to its ‘true’ value [15]. Unlike regression and correlation-based indices which are only suitable when the relationship between parameters and model outputs satisfy certain conditions of linearity or mono-tonicity, variance-based techniques are not dependent on such assumptions and hence are sometimes referred to as ‘model-free’ methods [3]. Marino *et al.* [9] identified variance-based techniques as a key tool in their methodology for applying GSA in systems biology owing to their ability to deal with any type of relationship between model parameters and model outputs. The method of Sobol [7] is one of the most commonly used variance-based methods, in part because it is relatively easy to implement compared with the other approaches [16] and because in its modified form, it is as efficient as any other variance-based technique including the eFAST method [17].

The method of Sobol is based on a decomposition of the model output *y*(*t*) = *f*(*u*,** P**,

*t*) into terms of increasing dimensionality. The function

*f*(

*u*,

**,**

*P**t*) can be written as the sum (note that we drop the dependence on

*u*and

*t*from the following for clarity of presentation): 2.2where 2.3where

*Ω*

^{k}is the

*k*-dimensional space of the parameters. The total variance of

*f*(

**) can be written as: 2.4and can be decomposed in the same manner as the function: 2.5**

*P*The terms of this decomposition are the contributions to the variance from term in (2.2) and are given by:
2.6The Sobol indices are defined as:
2.7The term gives the fraction of the total variance which is due to the combination of parameters . The first-order sensitivity indices, *S _{i}* =

*V*/

_{i}*V*, describe the effect of individual parameters on the model output while higher order terms describe the effect owing to interactions between parameters.

As an alternative to calculating the entire set of indices, we can calculate the ‘total effects’ [15] which describe the effect on the model output of a parameter, both individually and in all its possible interactions. If we define one set of parameters to contain only *p _{i}* and the other set

*P**contains all other parameters except*

_{∼i}*p*then the total variance can be written as: 2.8the sum of the variance owing to

_{i}*p*alone (

_{i}*V*), the variance owing to the set of parameters

_{i}

*P**which excludes*

_{∼i}*p*(

_{i}*V*) and the variance owing to the interaction between

_{∼i}*p*and all other parameters (V

_{i}

_{i}_{,∼i}). The total variance owing to

*p*(on its own and in interactions with other parameters) is then: 2.9and the total effect index is defined as: 2.10which describes the total variance accounted for by parameter

_{i}*i*individually and in all possible interactions with other parameters. The set of

*S*and

_{i}s*S*provide an efficient way to quantify the importance of individual parameters and interaction terms. The Sobol indices have a number of useful properties. The first-order effects quantify the amount of variance accounted for by uncertainty in each individual parameter. The sum of the first-order effects

_{Ti}s*∑*will be ≤1 and the difference between

_{i}S_{i}*∑*and unity provides a measure of the amount of variance which is accounted for by interactions. The ‘total effects’ quantify the variance accounted for by uncertainty in each parameter both on its own and in combination with uncertainty in other parameters. For any individual parameter,

_{i}S_{i}*i*, the difference between

*S*and

_{i}*S*indicates the extent to which it is involved in interactions with other parameters.

_{Ti}The integrals in (2.3) (2.4) and (2.6) can be evaluated using Monte Carlo integrals of the model outputs generated using random or quasi-random sampling of the model parameters. Using the procedure proposed by Saltelli [17], an estimate of both the first-order and total effects can be obtained at the cost of *N*(*k* + 2) model evaluations, where *N* is of the order of a few thousand. The main weakness of variance-based techniques is their high computational cost. If the model is expensive to evaluate or the number of parameters, *k*, is large, the use of variance-based techniques can be impractical. An alternative is the use of more efficient screening techniques such as the Morris method [14].

#### 2.2.2. Morris' screening design

Screening techniques have a lower computational cost than the variance-based techniques however the trade-off for this economy is that they tend to only provide qualitative measures of sensitivity, that is, they rank the parameters in terms of importance but give no information about how much more important one parameter is than another.

A number of screening designs have been proposed in the literature including the Morris method [14], iterated fractional factorial design (IFFD) [18], sequential bifurcation (SB) [19] and Cotter's design [20]. IFFD and SB are both group-screening techniques [3] in which the model parameters are combined into groups prior to the analysis. IFFD produces good results only if a very small number of parameters determine the variability in the model output while SB requires the user to know the signs of the effects (i.e. whether a parameter has a positive or negative influence on the output) before the analysis is performed [21]. Cotter's design may miss important parameters if they have effects on the model output which cancel each other out [15,21]. Morris' method does not depend on any of these assumptions and is therefore regarded as the most widely applicable of the screening designs [22].

The Morris method is a one-at-a-time (OAT) approach. OAT designs are typically forms of SA in which the parameters are varied individually by small amounts around the nominal point. The results of the analysis only identify the model behaviour in the small region of the parameter space around this point and if the model contains strong nonlinearities selection of a different nominal point can produce vastly different outcomes. In addition, standard OAT approaches do not measure the effect of parameter interactions. These problems can be overcome by using the OAT method proposed by Morris [14].

Consider a model with *k* parameters where ** P** = (

*p*

_{1},

*p*

_{2}, … ,

*p*) is a vector of parameter values and

_{k}*y*(

**) is the output of the model at parameter point**

*P***P**. Each parameter may take one of

*q*values. The Morris method defines the elementary effect of the

*i*th parameter at nominal point

**as: 2.11where**

*P**Δ*is selected such that

**+**

*P**Δ*is still in the set of allowable values for each parameter

*k*.

Morris's method removes the dependence of OAT methods on the choice of nominal point, ** P**, and captures the importance of parameter interactions by calculating

*r*elementary effects for each parameter at different nominal points,

**, … ,**

*P*_{1}**. The nominal points are chosen such that each parameter is varied over its entire range.**

*P*_{r}*r*is typically in the region of 10 [23,24]. The key to the Morris method is an efficient design that requires

*r*(

*k*+ 1) model runs to generate the

*r*elementary effects for each of the

*k*parameters [14]. Details of the algorithm can be found in the electronic supplementary material.

The mean and s.d. of the sample of *r* elementary effects for each parameter provide an approximate global sensitivity measure. They summarize the effects of varying the parameter across its entire range and capture the dependence of its sensitivity on the values of other parameters. A high mean, *μ*, indicates a parameter with an important overall effect on the output. A high s.d., *σ*, indicates a parameter with nonlinear effects on the output or one which is involved in interactions with other parameters.

If the sample of elementary effects for a given parameter contains both positive and negative elements, that is the relationship between the parameter and the model output is non-monotonic, they may cancel out producing a low value of *μ* for an important parameter. To overcome this, it has been suggested [22] that the mean of the absolute values of the elementary effects, denoted *μ** should be used. This modified Morris measure has been shown empirically to be a good proxy for the total effect indices, *S _{Ti}*, of the variance-based measures [24].

It is also important to consider the scaling of the elementary effects. Because the calculation of the elementary effects involves division by the parameter step size *Δ* (see equation (2.10)) the size of *Δ* will influence the sensitivity results; parameters with small values will produce larger effects. This can cause the incorrect classification of the importance of parameters. The use of relative sensitivities is well established and has been emphasized in the context of the Morris method by Sin & Gernaey [25]. The relative elementary effect for parameter *i* at point ** P** is given by:
2.12where

*σ*

_{y}and

*σ*

_{pi}are the s.d. of the model output

*y*(in this case the PC scores) and parameter

*p*

_{i}. The scaling of the elementary effects also removes the dependence on the magnitude of the model output. This allows the effects of a parameter on different outputs to be compared.

The use of the Morris and Sobol techniques in the analysis of time-dependent model outputs is compared by applying the methodology presented in §2.1 to a model of the insulin signalling pathway which is described below.

### 2.3. Insulin signalling model

The SA methodology described in §2.1 is applied to a model of the insulin signalling pathway. The actions of insulin are initiated when the hormone binds to its cell surface receptors triggering a signalling pathway which has pleiotropic effects in virtually all tissues [26]. The pathway is a key part of the blood glucose regulatory system and defects in the pathway are important in the development of insulin resistance, a major cause of type-2 diabetes. Identifying the parameters which drive the dynamic behaviour of the model may further our understanding of the mechanisms underlying insulin resistance and other related pathologies.

The model is based on a previously published model [27] which has been used in a number of other studies [28–31] and was originally developed to study the regulation of glucose uptake by GLUT4, the major insulin-dependent glucose transport in fat and muscle tissue [32]. Here, the model has been modified as part of efforts to model the liver–pancreas blood glucose regulatory system using a modular approach to model construction [33]. While there is some evidence that insulin may regulate glucose transport in hepatocytes via increased internalization of GLUT2 [34], the primary role of insulin in the liver is to regulate the synthesis of glycogen. The model was therefore extended to consider the inactivation of GSK3, a key regulator of glycogen synthesis.

The model consists of three subsystems representing receptor binding, receptor recycling and the post-receptor signalling pathway. The processes included in the model are shown schematically in figure 2 and described briefly below. The mathematical details of the model can be found in the electronic supplementary material together with a more detailed description of the processes included in the model.

Binding of insulin to its receptor results in increased tyrosine kinase activity of the receptor towards its substrates. This activity is negatively regulated by dephosphorylation of the receptor by protein tyrosine phosphatases (PTPs) [35]. There are at least nine substrates of the insulin receptor four of which are varieties of the insulin-receptor substrate (IRS) protein [36]. IRS proteins are phosphorylated by the kinase activity of the receptor and act as docking sites for molecules which contain specific sequences known as SH2 domains including phosphoinositide 3-kinase (PI3K). The recruitment of PI3K to the plasma membrane places it in the vicinity of its physiological substrate, phosphatidylinositol (4,5) bisphosphate (PI(4,5)P2) which it phosphorylates to produce PI(3,4,5)P3 (PIP3) [37]. PIP3 binds a variety of signalling molecules including phosphoinositide-dependent kinase 1 (PDK1) and protein kinase B (PKB), also known as Akt, modifying their activity and intracellular location. The co-localization of these molecules allows PDK1 to phosphorylate and activate Akt which in turn phosphorylates and inactivates glycogen synthase kinase (GSK3). GSK3 downregulates the activity of glycogen synthase hence the inactivation of GSK3 results in an increased rate of glycogen synthesis.

## 3. Results

This section presents the results of applying the new SA methodology described in §2.1 to the insulin signalling model to analyse the sensitivity of the GSK3 dynamics to uncertainty in the model parameters. The methodology produces a set of fPCs describing the key types of variation in the GSK3 output and, for each component, a set of sensitivity indices which measure the importance of the model parameters in producing the variation in GSK3 described by the corresponding component.

The methodology is applied using both the Sobol (*n* *=* 2000) and Morris (*r* = 20; *q* = 4) methods and the results compared. For each method, 21 parameters were allowed to vary in a uniform range of ±50% of their nominal values (see supplementary material). The external input to the model, *u*, is the concentration of insulin and is modelled as a step function of magnitude 1 × 10^{−6} *M* from *t* = 0 to *t* = 30 min. The model is evaluated to *t* = 60 min, sufficient time for dephosphorylation of GSK3 following removal of the insulin stimulus.

### 3.1. Principal components

The left-hand panels of figure 3 show the first three fPCs of the GSK3 dynamics generated via the Sobol method. The fPCs produced from the Morris method are qualitatively the same, that is, they describe the same types of variation in the GSK3 output (plots are included in the electronic supplementary material). It should be noted that because the parameter sampling is different for the two GSA methods, the PCs generated by the two methods may differ particularly if insufficient sample sizes are used. Together the first three PCs account for 99.7 per cent of the variation in the model output therefore we only present results for these three components. Other components describe minimal amounts of the variation in the model outputs.

The right-hand panel of figure 3 shows the mean GSK3 output (averaged over all model runs) plus and minus some multiple of each of the fPCs. Such plots are useful in visualizing the variation in the GSK3 time-course described by the PCs [38]. Here, they clearly show that: the first PC, which captures 90.6 per cent of the variation, describes a vertical shift in the time-course of inactive GSK3 with the greatest increase in the steady-state concentration; the second PC, which accounts for a much smaller amount of variation (8.4%), captures variation in the dynamics after the stimulus is removed (*t* > 30) and GSK3 is re-activated owing to dephosphorylation; the third PC (0.7% of the total variation) describes variation in the inactivation phase of GSK3 by phosphorylation.

### 3.2. Sensitivity indices

Figure 4 shows the SA results. The results of both methods are consistent, identifying the same subset of 10 parameters as important. All the important parameters are involved in the post-receptor signalling pathway with the exception of *k*_{−3}, the rate of insulin dissociation from the receptor. This is consistent with the view that post-insulin receptor defects represent the primary sites leading to insulin resistance [39]. Studies have found that insulin binding is normal in diabetic individuals and no difference has been found in receptor numbers in over half of type 2 diabetic patients [40]. The lack of importance of the receptor recycling subsystem parameters supports the recent findings of Giudice *et al.* [41] that internalization of insulin receptors is not important in the activation of the Akt pathway.

The first PC primarily affects the maximal phosphorylation of GSK3. The production and breakdown of PI(3,4,5)P3 (*k*_{9st}) and the de/phospohrylation of GSK3 (*k*_{−15} and *k*_{15}) are shown to be particularly important in producing variation in the scores on this component. The lipid phosphotases PTEN and SHIP2 which hydrolyse PI(3,4,5)P3 to PI(4,5)P2 and PI(3,4)P2, respectively, have been shown to be negative regulators of insulin signalling and have been proposed as possible therapeutic targets for type 2 diabetes [42,43]. The lesser importance of PI3K regulation (*k*_{8} and *k*_{−8}) supports the experimental observations that reduced insulin-stimulated activation of PI3K does not affect the downstream activation of Akt [44,45]. There is little interaction between parameters as indicated by the minimal difference between the first-order and total effects for individual parameters and the difference between the sum of the first-order indices, *∑ _{i}S_{i}* = 0.97 and unity. Similarly, the low values of

*σ*obtained via the Morris method imply that interactions between parameters are not significant.

The uncertainty in the second PC score is dominated by *k _{−}*

_{3}which accounts for approximately 65 per cent of the variation in this component according to the Sobol method.

*k*

_{−3}describes the deactivation of insulin receptors resulting from the dissociation of insulin and the dephosphorylation of the receptors by PTPs. This explains its importance in controlling the reactivation of GSK3 following removal of the external insulin input (the behaviour described by the second PC). This result is consistent with experimental evidence that insulin signalling can be enhanced by reducing the activity of PTPs [46]. As a consequence of this, there has been a focus on the study of PTPs as potential therapeutic targets for the management of insulin resistance [35]. One particular study has shown that inhibition of PTP-1B expression in obese, insulin-resistant non-human primates can reduce fasting concentrations of glucose and insulin [47]. Figure 4 also shows that there is an increased role of interactions in the second PC (

*∑*= 0.74, higher values for

_{i}S_{i}*σ*).

The third PC (which describes the initial phosphorylation of GSK3) is largely controlled by *k*_{−15}, the dephosphorylation rate of the kinase. This is in line with the view that processes downstream of Akt are crucial in propagating the insulin signal [48]. Hoehn *et al*. [49] have shown that only relatively small amounts of Akt phosphorylation are required to produce a maximal downstream response suggesting that defects in the regulation of Akt may be of less importance. As with the second PC, there is a significant interaction effect, especially for the parameters *k*_{8}, *k*_{9st}, *k*_{11d} and *k*_{15d} (which describe association of the IRS protein with PI3K, the conversion of PI(4,5)P2 to PI(3,4,5)P3, the phosphorylation (activation) of Akt and the rate of phosphorylation (inactivation) of GSK3, respectively) The importance of interactions highlights the need to use GSA methods to understand the behaviour of biological systems. Local methods, in which parameters are varied one at a time, do not allow the effects of interactions to be explored.

### 3.3. Computation times

The Sobol method requires *N*(*k* + 2) model evaluations. With *n* = 2000, *k* = 21 and a model evaluation time on the order of 2 s, this equates to approximately 1.06 days of computation on a desktop computer. This is compared with a computational time of approximately 15 min for the Morris method (*r*(*k* + 1) model evaluations, *r* = 20).

## 4. Discussion

We have presented a new approach for the SA of time-dependent model outputs which combines fPCA with established global SA techniques. The application of the approach to a model of insulin-dependent GSK3 inactivation identified a number of interesting features of the system. The important model parameters, with the exception of the rate of dissociation of insulin from the receptor, were restricted to the post-receptor signalling sub-model supporting the view that defects in the intracellular pathway represent the primary sites leading to insulin resistance. The analysis also identified an important role for insulin receptor deactivation suggesting that the PTPs which dephosphorylate the receptor may be potential targets for therapeutic interventions. These results support many of the existing theories about the function of the insulin signalling network which have been uncovered through experimental studies.

Liu *et al.* [31] have also analysed a version of the Sedaghat insulin pathway model [27] embedded in a larger model of glucose regulation. They performed a local SA (varying one parameter at a time) to calculate sensitivity indices at multiple time points in the simulated concentration of glucose. These indices were averaged to provide a single sensitivity measure for each parameter which describes its total effect on the glucose output. Consistent with our results they found that parameters governing PI(3,4,5)P3 conversion were particularly important. However, by combining fPCA with SA, we were able to identify different sets of parameters which affected different aspects of the model output. This is the key advantage of our methodology over the standard application of SA to time-dependent model outputs from which it is not so straightforward to infer the qualitative effect that parameters have on the model output. However, given that the main computational cost of most SA techniques is in the evaluation of the model and not the calculation of sensitivity indices, it is possible to calculate and compare both time-varying and PCA-based sensitivity indices for a model at minimal additional cost.

If we are interested in the sensitivity of a particular scalar feature of a dynamic model output (for example, peak concentration), it may be possible to evaluate that feature from each model run and calculate indices for it. This approach has been used in biological modelling but has its limitations. In their local SA of the oscillatory behaviour of the NF-κB signalling pathway, Ihekwaba *et al.* [50] found that for certain parameter values, the chosen features (amplitude and period of oscillations) could not be evaluated. This would cause problems in GSA techniques such as the Sobol method, where we require the complete set of model outputs to calculate the sensitivity indices. By using fPCA, we can avoid this problem as the features that we perform the SA on are identified from the data itself. The limitation of the fPCA-based approach is that it may not identify a type of output behaviour which is of biological interest if it does not contribute sufficiently to the variation in the model outputs. Similarly features of biological interest may be combined into single PCs.

A number of other studies have used mathematical modelling approaches to investigate insulin signalling. Giri *et al.* [28] used steady-state analysis of an insulin signalling model to investigate the translocation of GLUT4. They looked at the effect of changing the concentrations of various components in the model to replicate the variation in protein expression levels observed in pathological states. While we have focused on parameter sensitivities, GSA techniques can be applied to the initial conditions of a model in the same way to investigate similar questions. Cedersund *et al.* [51] and Brännmark *et al.* [52] used a model-based hypothesis testing approach, in which they compared different classes of models to experimental data, to identify the mechanisms which must be included to generate the changes observed in the insulin receptor substrate (IRS1) following exposure of cells to insulin. The GSA techniques presented in this paper are not commonly used to study the impact of qualitative inputs (such as model structure) on model outputs and instead typically focus on quantitative factors such as parameter values or initial conditions.

The selection of appropriate parameter distributions is an important part of global SA methods because the results of the analysis may be dependent on the choices [53]. This dependence on the choice of parameter distributions should be remembered when interpreting the results of the analysis of the insulin signalling model presented above. It can also be the most difficult and time-consuming stage of performing the analysis [3]. The choice of distribution is often governed by the availability of data but should also be guided by the purpose of the analysis. In the case of a biological or physiological system, a SA may have a number of different aims. If we are interested in understanding the behaviour of the system under normal conditions, we need to select ranges that represent the variation in the parameters observed in normal subjects. Alternatively, we may wish to investigate the important parameters in a particular disease state or condition. In this case, we should extend the ranges to include plausible values associated with the pathology of interest. More generally, we may be interested in investigating the parameters that the model output is sensitive to, for example, to identify potential targets for therapeutic interventions. In this type of analysis, the uncertainty distributions need not be based on the experimentally observed uncertainty in the model parameters. Instead, we may include parameters whose value is not regarded as uncertain to investigate the potential effects of artificially perturbing those parts of the system. A convenient form of input distribution in these cases is to adopt uniform ranges based on a percentage of the nominal parameter values.

The comparison of two SA methods, the Sobol method and Morris' screening design, has demonstrated that the Morris method is capable of providing qualitative sensitivity information for biological models which is consistent with the Sobol method at a greatly reduced computational cost. The Morris method, therefore, provides a practical approach for the analysis of models with large numbers of parameters or long computational run times. It also offers a useful approach for studying parameter sensitivities under multiple scenarios (for example, different external model inputs) in a realistic time frame. When more quantitative sensitivities are required, for example, to assess the relative reduction in output variance which could be achieved by interventions that target different steps in a signal transduction pathway, variance-based techniques such as the method of Sobol must be used. The computational cost of a quantitative analysis can potentially be reduced by first performing parameter screening using the Morris method to identify non-influential parameters and thus reducing the dimension of the parameter space to be studied in further analysis.

Systematically exploring the impact of parameter uncertainty on model outputs is an important part of the modelling cycle. Global SA provides a powerful tool, which allows us to identify the model parameters, and hence biological mechanisms that drive the system outputs. This information may support the existing understanding of the system or be used to suggest new hypothesis which can be tested experimentally. The methodology presented here provides a new way to apply global SA to investigate the function of dynamic models of biological systems in a computationally efficient manner.

- Received December 19, 2011.
- Accepted March 6, 2012.

- This journal is © 2012 The Royal Society