## Abstract

Most systems can be represented as networks that couple a series of nodes to each other via one or more edges, with typically unknown equations governing their quantitative behaviour. A major question then pertains to the importance of each of the elements that act as system inputs in determining the output(s). We show that any such system can be treated as a ‘communication channel’ for which the associations between inputs and outputs can be quantified via a decomposition of their mutual information into different components characterizing the main effect of individual inputs and their interactions. Unlike variance-based approaches, our novel methodology can easily accommodate correlated inputs.

## 1. Introduction

The analysis of networks represents a crucial focus of modern systems biology (Barabási & Oltvai 2004; Hwang *et al*. 2005; Kitano *et al*. 2005; Klipp *et al*. 2005; Wagner 2005; Alon 2006; Davidson 2006; Doyle & Stelling 2006; Kell 2006*a*,*b*; Palsson 2006). For many areas of interest, models of complex networks can be taken to have the form of a deterministic mapping from a set of *n* inputs to one or more output(s) (figure 1). The outputs can be considered separately so that for each output *Y*_{k} there is a mapUsually, the input–output mapping is not available in explicit form but can be evaluated numerically for any given inputs.

Global sensitivity analysis aims to rank the inputs *X*_{1}, …, *X*_{n} according to the degree to which they influence the output, individually and conjointly. Here, ‘inputs’ may also refer to intrinsic model parameters whose influence on the output is to be determined as in figure 1*b*. This type of global sensitivity analysis is commonly performed in a probabilistic manner by evaluating the model for multiple sets of randomly and independently selected input values drawn, for instance, from uniform distributions over suitable intervals. The output, being a function of the randomized inputs, thus also becomes a random variable. If the inputs are sampled independently, the variance of the output distribution can be decomposed into contributions by individual inputs, pairs, triplets and so forth. This procedure is well known in statistics as ‘analysis of variance’ (ANOVA; e.g. Box *et al*. 1978), and several authors have contributed to improve its computational efficiency for sensitivity analysis (e.g. Rabitz & Aliş 1999; Sobol 2001).

Rather than analysing the *variance* of the output distribution, we take a different route measuring output uncertainty in terms of Shannon's entropy (Shannon & Weaver 1949). Our starting point is the concept of the ‘communication channel’ (Cover & Thomas 2006), which enables us to view the model as a transmitter of information between inputs and outputs (figure 1*b*).

The *mutual information* of two variables is a quantity that measures their mutual dependence (Cover & Thomas 2006). Determining the mutual information *I*(*X*_{i};*Y*) between random sampling sequences of individual inputs *X*_{i} and their output counterpart can elucidate first-order input–output relations. Mutual information provides a general measure of association that is applicable regardless of the shape of the underlying distributions and—unlike linear- or rank-order correlation—insensitive to non-monotonic dependence among the random variables. Further insight can be obtained by unravelling *conditional dependencies* among the system inputs. Here, we define novel and general sensitivity measures of second and higher order by evaluating input correlations induced by conditioning on the output. To our knowledge, only a first-order information-based analysis has been discussed in the literature to date (Critchfield *et al*. 1986; Dalle Molle & Morris 1993, pp. 402–407).

While variance adequately quantifies the variability of distributions that are symmetrical and unimodal, entropy is calculated directly from the probability distribution function and thus provides a more general measure of output variability. Therefore, we further develop an information-theoretic framework for the sensitivity measures thus derived, based on the observation that their sum is bounded from above by the output entropy *H*(*Y*). From this viewpoint, the (information-theoretic) sensitivity indices quantify the amount of output uncertainty removed by the knowledge of individual inputs and combinations thereof.

Sensitivity analysis of this kind is also an analysis of the *total* mutual information *I*(*X*_{1}, *…*,*X*_{n};*Y*), which subsumes *all* input–output associations including interactions. The resultant summation theorem for the sensitivity measures is an information balance in which the sum equals *I*(*X*_{1}, *…*, *X*_{n};*Y*). Although in practice only effects of up to third- or fourth-order can easily be calculated explicitly, the joint impact of all higher order terms is provided by the remaining difference to *I*(*X*_{1}, *…*, *X*_{n};*Y*). We can therefore assign credit or influence fully to all the parameters of a system over a wide range of operating conditions.

For all variance-based approaches, the absence of input correlations is a critical prerequisite for the uniqueness of the variance decomposition (Saltelli *et al*. 2000, 2004). As will be demonstrated in our methodology, independent inputs merely simplify the analysis. If input correlations exist (e.g. due to non-orthogonal sampling), their effect can easily be taken into account. We apply the methodology successfully to a model of the NFκB signalling pathway and thereby define how to modify its behaviour to provide a designed maximum effect.

## 2. Methods

By randomly sampling the input space, a genuinely deterministic system can be analysed in stochastic terms. Random perturbation of the inputs creates a randomized output *Y* with a probability density *p*(*y*). Rather than attempting to find some parametric model of *p*(*y*), the output density is approximated by a histogram, and the output becomes a *discrete* random variable. The corresponding entropy(2.1)measures a (hypothetical) receiver's uncertainty about *Y* due to input perturbation. For instance, if one input *X*_{i} is fixed, the receiver's remaining uncertainty can be quantified by the *conditional* entropy(2.2)which is the average uncertainty in *Y* over all possible discrete values *x* that the input variable *X*_{i} can assume. The discretization of *X*_{i} and *Y* is, of course, arbitrary and should be chosen in relation to the number of system evaluations (simulation runs).

The mutual information is defined as the difference in output uncertainty with and without knowledge of *X*_{i}, and characterizes the influence *X*_{i} exerts on *Y*:(2.3)The link between uncertainty and association established by equation (2.3) is one of the fundamental concepts of Shannon's information theory and forms the basis of our framework for sensitivity analysis. Calculating the mutual information *I*(*X*_{i};*Y*) for each *X*_{i} constitutes a form of first-order sensitivity analysis, assessing only the influence of individual inputs.

### 2.1 An information-theoretic first-order sensitivity index

Critchfield *et al*. (1986) defined the *mutual information index* (MII), which in our notation is the mutual information normalized by the entropy of the output variable:(2.4)A first-order sensitivity analysis can be performed by calculating the MII of all inputs, where the mutual information is obtained by computing(2.5)Though *X*_{i} and *Y* are continuous variables, equation (2.5) contains discrete sums, indicating that, in practice, the probability densities are evaluated via the joint histogram and the marginal histograms of the input and output sequences.

### 2.2 Pairwise interactions

If we assume that, by design of the simulation, random input values are drawn independently, there will be no *a priori* correlations among the sequences of input values. However, if inputs interact in their influence on an output, one would expect to find associations in input sequences when conditioning on a particular value of that output. We show that the output-induced conditional dependence among two inputs, characterized by the *conditional mutual information*(2.6)provides a measure of the joint influence of the pair (*X*_{i}, *X*_{j}) on the output *Y*, on average.

To understand why this is indeed an appropriate measure, we consider the degree of association among (*X*_{i},*X*_{j}) and *Y*, that is the mutual information *I*(*X*_{i},*X*_{j}*;Y*). Since this quantity subsumes first- and second-order effects, one has to subtract the influence of the individual inputs, *I*(*X*_{i}*;Y*) and *I*(*X*_{j}*;Y*), in order to obtain the pure second-order effect of *X*_{i} and *X*_{j} on *Y*. Using an auxiliary formula proved in appendix A.3, one obtains(2.7)The second term on the right-hand side subtracts the effect of any *a priori* input associations due to the applied sampling scheme. If inputs are sampled independently, the term vanishes and the conditional mutual information by itself captures the joint effect of *X*_{i} and *X*_{j} on *Y*. Note, the simple structure of equation (2.7) makes it possible to apply arbitrary input sampling schemes, without having to be concerned about statistical independence. This reveals a considerable advantage of the information-theoretic approach over variance-based methods, which are not easily extended to non-orthogonal samples (Saltelli *et al*. 2000).

### 2.3 Higher order interactions

Capturing interactions among three or more inputs in information-theoretic terms requires generalizing the concept of mutual information beyond two variables. To characterize the genuine three-way interaction of input triplets, we apply the same rationale as in §2.2 and consider a decomposition of the mutual information of an input triplet (*X*_{1}, *X*_{2}, *X*_{3}) and the output (derivation provided in appendix A.3)(2.8)Having identified the first- and second-order terms on the right-hand side, the decomposition suggests interpreting the remainder as the genuine third-order sensitivity measure. Using the notation *I*_{3}, we define(2.9)This quantity is the conditional form of what McGill (1954) called ‘interaction information’. McGill also showed that interaction information is symmetric with respect to permutations of its arguments, meaning for the conditional form that(2.10)Note that interaction information can be negative. By virtue of equation (2.10), this can only happen when all three pairwise interactions have non-zero conditional mutual information. Negative interaction information then indicates an inner redundancy of the triplet (*X*_{1},*X*_{2},*X*_{3}), in the sense that the pairs (*X*_{1},*X*_{2}), (*X*_{2},*X*_{3}) and (*X*_{1},*X*_{3}) do not provide entirely independent pieces of information about *Y*. This situation rarely occurs in natural systems, although appendix A.1 presents a contrived and artificial example with negative interaction information.

Generalizing the definition of *I*_{3}, one can analogously quantify the fourth-order sensitivity via the fourth-order conditional interaction information(2.11)Higher order interactions can be defined accordingly.

### 2.4 The information balance: a summation theorem for sensitivity indices

Having identified measures for first-, second- and higher order sensitivities, we consider a decomposition of the total mutual information for arbitrary number of inputs. Generalizing equation (2.3), one obtains the general form of the information balance for a system with *n* inputs(2.12)In addition to *H*(*Y*), which is straightforward to compute, the ‘noise entropy’ *H*(*Y*|*X*_{1}, …, *X*_{n}) has to be evaluated. In a deterministic system, this quantity vanishes for continuous random variables. However, computing information-theoretic quantities in a continuous fashion would require parametric models of all random variables. Since the input–output mapping is often not given in closed form, it will generally be impossible to derive such models analytically. Moreover, there is no general parametrization scheme that can be fitted to the multitude of possible empirical distributions arising in various systems. Rather, the parametric distributions must be selected on a case-by-case basis, with no guarantee of obtaining a close fit. In our experience, it proved very difficult to match the heavy-tailed output histograms arising in our particular application (cf. §3) with any standard distribution function.

Hence, to make the information-theoretic quantities measurable, we choose to discretize all variables. The noise entropy then takes the role of the *residual uncertainty* in *Y* that persists, given all inputs with a finite precision determined by the imposed discretization. We shall refer to this residual uncertainty as the *discretization entropy*, denoted by *H*_{Δ.} In §3.4, we show how *H*_{Δ} can be estimated via Monte Carlo simulation.

Normalizing by *H*(*Y*)−*H*_{Δ}—the maximum amount by which the output uncertainty can be reduced by the parameters—one can rewrite the information balance for a discretized deterministic system as(2.13)Equation (2.13) provides the basis for a summation theorem, since it is possible to express the left-hand side in terms of the previously defined sensitivity indices, as shown in appendix A.3. Decomposition up to third order yields(2.14)Here, summations extend over all index combinations excluding permutations. While related decompositions of the information of an ensemble of variables have been considered previously (Watanabe 1960; Fano 1961; Panzeri *et al*. 1999; Amari 2001; Schneidman *et al*. 2006), they have never been applied in the context of sensitivity analysis (see appendix A.1 for a discussion of alternative decompositions).

Calculation of the conditional mutual information requires knowledge of the underlying marginal and joint probability density functions. In practice, these densities must be estimated empirically by means of marginal and joint histograms. Particularly, the empirical estimate of a joint density can be problematic when the amount of available data is insufficient to populate all the bins in its joint histogram. This leads to a systematic error (‘bias’) in the limited-sampling estimation of information; the higher the dimensionality of the histograms to be sampled, the larger the bias. Thus, the estimation of higher order interactions is particularly difficult. However, reliable correction of the sampling bias is possible using advanced statistical techniques (Panzeri & Treves 1996; Nemenman *et al*. 2004; Montemurro *et al*. in press). Given the amount of simulations we could produce in the particular application presented below, these techniques allowed an accurate elimination of the bias for up to third-order interactions. Only the first-order quantification would have been possible without using such bias reduction techniques.

Even though, for the practical reasons described above, sensitivity indices can only be evaluated up to a certain order, the remainder Δ*I*—the combined effect of all higher order interactions—can be assessed since all other terms in the equation are known. If the lower order sensitivity indices capture the essence of the dependence structure, the remainder will be a small fraction of *H*(*Y*)−*H*_{Δ}. A significant value of Δ*I* would indicate that important higher order interactions exist, which is generally not expected in most simple systems (Rabitz & Aliş 1999). In large networks, higher order interactions require an extreme number of connections, unless the degree of connectivity varies strongly across the network. Hence, one would expect to find a small number of local ‘hubs’ forming highly connected subnetworks. While this is still the subject of debate, we note that the complex networks arising in biological systems do indeed tend to have sparse intrinsic connectivity patterns (Wagner & Fell 2001; Barabási & Oltvai 2004; Csete & Doyle 2004).

### 2.5 Total sensitivity indices

A very useful concept in variance-based sensitivity analysis is the so-called *total sensitivity index* (Saltelli *et al*. 2005), which measures the overall influence that a particular input exerts on the output, comprising main effects and *all* interactions. In the ANOVA framework, the total sensitivity expresses the remaining output variance when all other inputs are kept fixed. The idea is to calculate this quantity without relying on the other sensitivity indices (first, second, third-order and so forth). If a total sensitivity index is zero, the corresponding input is irrelevant; if not, it is interesting to relate it to the other indices. For instance, comparing the total sensitivity index of an input with its first-order index reveals the degree to which the input is interacting with others.

This concept can be readily applied to information-based sensitivity analysis. The information-theoretic total sensitivity index for variable *X*_{i} is given by(2.15)The total sensitivity index can also be expressed as the sum of all sensitivity indices involving *X*_{i}:(2.16)Note, the sum of all total sensitivity indices is generally greater than 1 since expansions for different input variables will share certain sensitivity indices if the variables interact.

## 3. Information-theoretic sensitivity analysis of a model of the NFκB signalling pathway

As an example, we apply our methodology to parameter sensitivity analysis in systems biology. We consider a model of the (IκB)/NFκB signalling pathway (Hoffmann *et al*. 2002) and investigate the interdependencies among intrinsic parameters (in this case 64 reaction rate constants) with respect to their influence on the time course of the concentration of a particular metabolite, the nuclear transcription factor NFκB, which is a key component in early immune response.

In a nutshell, the pathway model works as follows. There are three main components: NFκB, IκB inhibitory proteins (IκBα and its isoforms IκBβ and IκBε) and the IκB kinase (IKK). The model describes the kinetics of interaction between these components, their transport between nucleus and cytoplasm, the inhibitor IκB degradation, as well as the NFκB-regulated gene expression and subsequent resynthesis of the inhibitors (figure 2). NFκB is normally bound in an IκB–NFκB complex. Following a step increase in the concentration of IKK, which models the effect of an extracellular stimulus (e.g. tumour necrosis factor (TNFα)), NFκB is released from the IκB–NFκB complex and enters the nucleus. The IκBs are rapidly degraded. In the nucleus, NFκB regulates the expression of genes leading to a resynthesis of the IκB inhibitor proteins. The newly synthesized IκB binds to the nuclear NFκB forming an IκB–NFκB complex and subsequently shuttles NFκB back to the cytoplasm, thus initiating a negative feedback loop. The cycle is repeated until all IKK has decayed. As a result of the delayed negative feedback, the concentration of nuclear NFκB exhibits an oscillatory behaviour (figure 3) that can be characterized in terms of features such as peak amplitude, frequency or phase (Nelson *et al*. 2004; Kell 2006*a*).

To exemplify the sensitivity analysis, we select one feature as output *Y*, namely the time difference between the first two peaks of the nuclear NFκB oscillation, *P*_{1}=*T*_{2}−*T*_{1}. Evaluating the input–output function thus involves numerically solving a system of 24 ordinary nonlinear differential equations (ODEs) corresponding to the reaction equations and subsequently determining the first two maxima in one component of the solution. The entire analysis is performed with respect to the selected feature, based on 680 000 simulations, which yield sufficiently accurate information measures up to third order. All parameters were varied simultaneously, each drawn independently from a uniform distribution over the interval from 0.9 to 2.0 times the nominal value, and discretized into 15 bins. The lower bound of the sampling interval is dictated by the empirical observation that oscillations are only guaranteed to occur for parameter values above about 0.9 of the nominal values given in Hoffmann *et al*. (2002).

### 3.1 First-order sensitivity indices

The first-order analysis reveals that only a small subset of about eight parameters out of 64 (or at most 11, if one includes parameters 1, 19 and 37) significantly influence the output feature *P*_{1} (figure 4). The parameters thus identified either directly affect the amount of available NFκB (9, cytoplasmic release; 19, nuclear import; 1, IκBα–NFκB association) or control the strength of the negative feedback, which depends on the production of the inhibitor protein IκBα (28, transcription; 36, constitutive translation), its transport into nucleus (38, nuclear import of IκBα), its destruction (37, degradation; 29, IκBα mRNA degradation; 62, IKK–IκBα catalysis). In addition, the feedback is *indirectly* affected by the IκB kinase (IKK). Therefore, its decay rate (61) and the rate of binding IKK to the IκBα–NFκB complex (52), a step prior to the release of NFκB, are also important. Parameters specifically relating to the inhibitor isoforms IκBβ or IκBε are insignificant, which is in accordance with experiments indicating that only the knockout of IκBα is lethal (Gerondakis *et al*. 1999).

The result is consistent with previous local sensitivity analyses of a different output feature (Ihekwaba *et al*. 2004, 2005) and in accordance with global sensitivity analysis of the overall time course of the nuclear NFκB concentration (Yue *et al*. 2006).

### 3.2 Second-order sensitivity indices

At the level of pairwise interactions, a rather small number of relevant pairs emerge (figure 5). No significant synergies are observed, meaning that only those pairs wherein at least one partner is individually relevant have significant interactions. The predominant contributions are by pairs where both partners have significant individual impact. The degree of interactivity, that is the number of relevant pairs in which a parameter appears, varies strongly. For instance, parameter 29 (the IκBα mRNA degradation rate) seems to play the role of a ‘super parameter’, in the sense that it is involved in most, and the strongest, interactions. Note that pairs with very small yet statistically significant sensitivities are not visualized in the interaction matrix (figure 5), due to limited diagram resolution.

Statistical significance can be assessed via a bootstrap test, which involves repeated random shuffling of the parameter sampling sequences and recalculation of the conditional mutual information from these shuffled sequences. Several hundred repetitions produce a bell-shaped distribution of random sensitivity values, the mean and standard deviation of which characterize the range of ‘chance values’ of the particular sensitivity index under consideration. If the index calculated from the original non-shuffled data is two or three standard deviations above its bootstrap mean, it can be considered statistically significant.

While the particular parameter set identified as most relevant is very biologically plausible, the strongly varying degree of parameter interactivity is surprising. For instance, it is not obvious why the degradation of the IκBα mRNA transcript (parameter 29) should play such a predominant role when compared with transcription (parameter 28) and translation (parameter 36). In fact, it has been suggested (P. Paszek & M. R. H. White 2007, personal communication) that this result might be an artefact of the model. The arising debate illustrates the usefulness of a detailed sensitivity analysis as a means of highlighting potential design flaws in complex models.

### 3.3 Third-order interactions

The sparseness of the interaction structure continues at the third order (figure 6). It is again the combinations of individually relevant parameters that exhibit the strongest tripletwise interactions. The assessment of statistical significance is analogous to the procedure described in §3.2. All third-order indices are positive.

### 3.4 Monte Carlo estimation of discretization entropy

Let *n* be the number of system inputs. Assume a discretization scheme where each input range is partitioned into the same number of bins, denoted by *n*_{bins.} Let *j*_{1}, …, *j*_{n} be the bin indices of the inputs *X*_{1}, …, *X*_{n}, with *j*_{1}=1, …, *n*_{bins}; *j*_{2}=1, …, *n*_{bins} and *j*_{n}=1, …, *n*_{bins}; and let Δ*x*_{1}, …, Δ*x*_{n} be the corresponding bin widths. Then the bins are defined as(3.1)The discretization entropy is the averaged conditional entropy of *Y*, for which uniformly distributed input values is simply(3.2)The summation extends over the total number of input bin combinations, which is (*n*_{bins})^{n}. Since this number can be very large, equation (3.2) cannot be generally evaluated. However, Monte Carlo estimates can provide excellent approximations. Figure 7 shows the estimated discretization entropy as a function of the number of bin combinations used to compute the average. Only several hundred bin combinations are required to obtain a reliable estimate. For each bin combination, about 100 evaluations of the feature *Y* were performed to estimate the local conditional histogram from which the conditional entropy in equation (3.2) is computed. The figure can be understood as a consequence of the output/feature uncertainty being small, for reasonably fine input discretization. Hence, the conditional histograms approximating tend to have only very few bins with non-vanishing probability. In our case, the output is discretized into 15 bins out of which usually only one or two have non-zero counts, which can therefore be estimated properly with about 100 data. Thus, 600×100=60 000 evaluations provide a reasonably accurate estimate of the discretization entropy.

### 3.5 Information balance

The block diagram of the information balance (figure 8) shows that higher order interactions do contribute significantly to the total sensitivity. Moreover, only a small subset of parameter pairs and triplets interact significantly (figures 5 and 6), and we expect such sparse connectivity to continue at higher orders.

### 3.6 Total sensitivity indices

Total sensitivity indices consist of conditional entropies of the type *H*(*Y*|{*X*_{1}, …, *X*_{n}}\*X*_{i}), which therefore can be estimated in a similar fashion to the discretization entropy, except that the value of the input *X*_{i} under consideration is allowed to vary over its entire range. All other inputs are evaluated within their bins, and the conditional entropy is again averaged over (in theory) all bin combinations. For the system under investigation, the Monte Carlo estimates exhibit a convergence comparable to that in figure 7.

Figure 9 shows the estimated total sensitivity indices of the eight most relevant parameters of the NFκB pathway model next to their first-order indices, revealing the different degrees of interaction. The diagram leads to two main conclusions. First, parameter 29 stands out in terms of its overall significance, since it has the strongest individual impact and also the highest degree of total impact, in the sense that almost 80% of the output uncertainty is removed by the information contributions of 29 and its interactions. Second, the fractional contribution of the interactions to the total sensitivity is higher in the other parameters, but, with the exception of parameter 36, their interaction impact (the difference of total and first-order sensitivity) is lower than that of 29.

A total sensitivity index equalling unity would indicate that the corresponding input quantity is ‘fully connected’, in the sense that it participates in *all* relevant interactions; sensitivity indices not involving this parameter would be irrelevant. For parameter 29, this is almost the case.

## 4. Conclusions

With the advent of advanced estimation techniques, mutual information has become a viable means of characterizing input–output interactions in complex networks. The framework developed in this paper lays the theoretical foundations for an information-theoretic sensitivity analysis that assigns credit or influence to input variables in terms of their overall contribution to a system's output entropy. However, our method is far more than a replacement of *analysis of variance* by *analysis of entropy*. The information-theoretic approach does not rely on implicit assumptions of normally distributed outputs and is easily generalized to include non-orthogonal input sequences. Moreover, the information-theoretic approach lends itself well to the analysis of systems with an intrinsically stochastic structure, such as biochemical reactions with small numbers of molecules (Wilkinson 2006). In this case, the noise entropy provides a combined representation of the output uncertainty due to discretization and all sources of intrinsic stochasticity.

## Acknowledgments

We thank the UK BBSRC for financial support, and Mike White, Pawel Paszek and Caroline Horton for useful discussion. This is a contribution from the Manchester Centre for Integrative Systems Biology (www.mcisb.org).

## Footnotes

- Received April 17, 2007.
- Accepted May 30, 2007.

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

- Copyright © 2007 The Royal Society