## Abstract

The field of systems biology has attracted the attention of biologists, engineers, mathematicians, physicists, chemists and others in an endeavour to create systems-level understanding of complex biological networks. In particular, systems engineering methods are finding unique opportunities in characterizing the rich behaviour exhibited by biological systems. In the same manner, these new classes of biological problems are motivating novel developments in theoretical systems approaches. Hence, the *interface* between systems and biology is of mutual benefit to both disciplines.

## 1. Introduction

The term ‘complexity’ is often invoked in the description of biophysical networks that underlie gene regulation, protein interactions and metabolic networks in biological organisms. There are categorically two distinct characterizations of complexity: (i) the classical notion of behaviour associated with the mathematical properties of chaos and bifurcations, and (ii) the descriptive or topological notion of a large number of constitutive elements with non-trivial connectivity. In both biological and more general contexts, a key implication of complexity is that the underlying system is difficult to understand and verify (Wen *et al*. 1998). Simple low-order mathematical models can be constructed that yield chaotic behaviour, and yet rich complex biophysical networks may be designed to reinforce reliable execution of simple tasks or behaviours (Lauffenburger 2000).

A systematic approach for analysing complexity in biophysical networks was previously untenable owing to the lack of suitable measurements and the limitations imposed in simulating complex mathematical models. Advances in molecular biology over the past decade have made it possible to probe experimentally the causal relationships between microscopic processes initiated by individual molecules within a cell and their macroscopic phenotypic effects on cells and organisms. These studies provide increasingly detailed insights into the underlying networks, circuits and pathways responsible for the basic functionality and robustness of biological systems and create new and exciting opportunities for the development of quantitative and predictive modelling and simulation tools. Model development involves the translation of identified biological processes to coupled dynamical equations, which are amenable to numerical simulation and analysis. These equations describe the interactions between various constituents and the environment, and involve multiple feedback loops responsible for system regulation and noise attenuation and amplification.

The discipline of *Systems Biology* has emerged in response to the challenges mentioned earlier (Kitano 2002*b*), and combines approaches and methods from systems engineering, computational biology, statistics, genomics, molecular biology, biophysics and other fields (Klipp *et al*. 2005; Palsson 2006; Szallasi *et al*. 2006). The recurring themes include: (i) integrative viewpoints towards unravelling complex dynamical systems, and (ii) tight iterations between experiments, modelling and hypothesis generation (figure 1).

The central thesis of this paper is that systems engineering methods are finding unique opportunities in characterizing the rich behaviour exhibited by biological systems. In the same manner, these new classes of biological problems are motivating novel developments in theoretical systems approaches. Hence, the *interface* between systems and biology is of mutual benefit to both disciplines.

## 2. Elements of systems biology

### 2.1 Networks and motifs in gene regulation

Biophysical networks can be decomposed into modular components that recur across and within given organisms. One hierarchical classification is to label the top level as a *network*, which is comprised of interacting regulatory *motifs* consisting of groups of 2–4 genes (Lee *et al*. 2002; Shen-Orr *et al*. 2002; Zak *et al*. 2003). At the lowest level in this hierarchy is the *module* that describes transcriptional regulation, of which a nice example is given in Barkai & Leibler (2000).

At the *motif* level, one can use pattern searching techniques to determine the frequency of occurrence of these simple motifs (Shen-Orr *et al*. 2002), leading to the postulation that these are basic building blocks in biological networks. Of relevance to the present discussion is the fact that many of these components have direct analogues in system engineering architectures. Consider the three dominant network motifs found in *Escherichia coli* (Shen-Orr *et al*. 2002):

coherent feedforward loop: in this, one transcription factor regulates another factor, and in turn the pair jointly regulates a third transcription factor,

single input module (SIM): in systems terminology, a single-input multiple output block architecture and

densely overlapping regulons: in systems terminology, a multiple-input multiple output block architecture.

Similar studies in a completely different organism, *Saccharomyces cerevisiae*, yielded six-related or overlapping network motifs (Lee *et al*. 2002):

autoregulatory motif: in which, a regulator binds to the promotor region of its own gene,

feedforward loop: as described earlier,

multi-component loop. effectively, a closed-loop with two or more transcription factors,

regulator chain: a cascade of serial transcription factor interactions,

single input module: as described earlier (SIM) and

multi-input module: a natural extension of preceding motif.

In effect, these studies prove that, in both eukaryotic and prokaryotic systems, cell function is controlled by sophisticated networks of control loops, which are cascading onto and interconnected with, other (transcriptional) control loops. The noteworthy insight is that the complex networks, which underlie biological regulation, appear to be made of elementary *systems* components like a digital circuit. This lends credibility to the notion that analysis tools from systems engineering should find relevance in this problem domain.

As emphasized in the introduction, an important point in systems biology is the *integrative* perspective, that is to say, the analysis of the system considered as a whole and across the different levels (gene, protein, metabolite, etc.), and not the reductionist analysis of individual components. So while it is useful to categorize the elements and levels of a hierarchical regulatory scheme, it is more useful to analyse such schemes for behaviours that emerge from combinations of motifs. Some simple examples of canonical regulatory constructs that yield specific classes of behaviour in gene networks include (Smolen *et al*. 2000):

positive feedback: multistability, oscillations, state-dependent response,

integral feedback: robust adaptation,

negative feedback: steady-state (homeostasis, adaptation),

time delay: complex response, oscillations and

protein oligomerization: multistability, oscillations, resonant stimulus frequency response.

In addition, stochastic fluctuations can induce random response to stimuli, random outcomes, as well as stochastic focusing. Such properties are characteristic of general networks, including social networks, communication networks and biological networks (Committee on Network Science for Future Army Applications 2006).

### 2.2 Dynamic models

While the consideration of motifs and network topology is essential for unravelling design principles in complex biophysical networks, it is necessary to understand the role of *dynamic* behaviour in ascribing meaning to the rich hierarchies of regulation. Some of the intrinsically dynamic features of biophysical networks have been analysed in a recent paper that shows the close relationship between dynamic measures of robustness and the abundance of particular network motifs for a wide range of organisms (Prill *et al*. 2005).

Attempts to detail dynamic behaviour in these networks have fallen into three broad classes of modelling techniques: (i) first-principles approaches, (ii) empirical model identification and (iii) a hybrid approach that combines minimum metabolic network knowledge with an objective function to yield a predictive model. In this section, we outline some key results in the development of mechanistic models, and in the following sections, we will address the other two topics as they are related to network inference and constraints.

Given detailed knowledge of a biological architecture, mathematical models can be constructed to describe the behaviour of interconnected motifs or transcriptional units (TUs). A number of excellent review papers have been detailed in recent years (Smolen *et al*. 2000; Hasty *et al*. 2001). In the majority of these studies, gene expression is described as a continuous-time biochemical process, using combinations of algebraic and ordinary differential equations (ODEs; Goldbeter 1996; Cherry & Adler 2000; Smolen *et al*. 2000). In a similar manner, models at the signal transduction pathway level have been developed in a continuous-time framework, yielding ODEs (Kholodenko *et al*. 1999). At the TU level, a detailed mathematical treatment of transcriptional regulation is described in Barkai & Leibler (2000). Mechanistic models for a number of specific biological systems have been reported, including basic operons and regulons in *E. coli* (trp, lac and pho) and bacteriophage systems (T7 and *λ*; e.g. Gilman & Arkin 2002).

Systems theory has found an enabling role in the analysis of the complex mathematical structures that result from the previously described modelling approaches. The language of systems theory now dominates the quantitative characterization of biological regulation, as robustness, complexity, modularity, feedback and fragility are invoked to describe these systems. Even classical control theoretic results, such as the Bode sensitivity integral, are being applied to describe the inherent tradeoffs in sensitivity across frequency (Csete & Doyle 2002). Robustness has been introduced as both a biological system-specific attribute, as well as a measure of model validity (Ma & Iglesias 2002). In the sections that follow, brief accounts of systems-theoretic analysis of biological regulatory structures are given, emphasizing where new insights into biological regulation have been uncovered.

## 3. Interfaces: examples

### 3.1 Network identification

Currently, our knowledge of essentially all biological systems is incomplete. Despite genome projects that allow enumeration—and, to a certain extent, characterization—of all genes in a system, this does not imply knowledge about all network components (for instance, all protein variants that can be derived from a single gene), interactions, and properties thereof (Kitano 2002*a*). Hence, an important task in systems biology consists of specifying network interactions, which can concern qualitative or quantitative properties (existence and strength of couplings), or detailed reaction mechanisms, for genome-based inventories of components.

Essentially, this is a systems identification problem. Given a set of experimental data and prior knowledge, the network generating the data is to be determined (Ljung 1999). Alternatively termed ‘reverse engineering’ (Tegner *et al*. 2003), ‘network reconstruction’ (MacCarthy *et al*. 2005) or ‘network inference’ (Gardner *et al*. 2003), the general network identification problem provides a key interface between science and engineering. Several, qualitatively different approaches for biological systems have been proposed, which can be roughly classified into three categories: data-driven, approximative and mechanistic.

#### 3.1.1 Data-driven methods

Empirical or data-driven methods rely on large-scale datasets that can be generated, for instance, through microarray analysis for gene regulatory networks. They include singular value decomposition analysis of microarray data (Alter *et al*. 2000; Holter *et al*. 2000), self-organizing maps (Tamayo *et al*. 1990), *k*-means clustering or hierarchical clustering (D'haeseleer *et al*. 2000), protein correlation and dynamic deviation factors (You & Yin 2000), and robust statistics approaches (Thomas *et al*. 2001; Zhao *et al*. 2001). For instance, clustering methods are routinely applied for identifying groups of co-regulated genes from microarray data. The interpretation of clustering results employs (implicit) models such as ‘co-expressed genes are likely to have a common regulator’. Data quality and algorithmic choices (for instance, of distance measures) critically influence the clustering results; in addition, validation of clustering results and techniques is an open issue (Datta & Datta 2003; Handl *et al*. 2005; Allison *et al*. 2006).

In contrast to the mechanistic approaches discussed later, most empirical approaches employ discrete-time grey box models (D'haeseleer *et al*. 1999; Weaver *et al*. 1999; Wessels *et al*. 2001; Hartemink *et al*. 2002). For instance, inference methods based on probabilistic graphical (e.g. Bayesian) models help to elucidate causal couplings between the network components (Friedman 2004). Their scalability for large systems and the ability to integrate heterogeneous datasets make them attractive (Lee *et al*. 2004; Klipp *et al*. 2005). Yet, these approaches deliver only qualitative descriptions of network function, and have inherent limitations. For instance, Bayesian models cannot cope with the ubiquitous feedback in cellular networks, since causal relationships have to be represented by directed acyclic graphs (Friedman 2004).

However, a number of challenges are present in treating experimental data for such problems: (i) the sampling rate is rarely uniform, and may be exponentially spaced by design, and (ii) data from multiple research groups are often combined (e.g. from WWW-posted data) to yield data records with inconsistent sampling, experimental bias, etc. From a systems engineering perspective, another critical point is the potentially divergent qualitative behaviour between continuous-time and discrete-time models of corresponding order (Pearson 1999). Recent work has shown the promise of continuous-time formulations of empirical models using modulating function approaches (Zak *et al*. 2003).

More generally, correctly identifying network topologies (corresponding to the model structure) clearly does not suffice for establishing predictive mathematical models. Experiences with engineered genetic circuits illustrate this point: with identical topology, qualitatively different behaviour can result and vice versa (Guet *et al*. 2002). Hence, quantitative characteristics, which are usually incorporated through model parameters in deterministic models, are also required. Corresponding identification methods are rooted in systems and information theory and, thereby, also provide the largest intersection among biology, other sciences and engineering.

#### 3.1.2 Linear approximations

The identification of dynamically changing interactions requires corresponding dynamic models. In a first approximation, we can consider linear systems, i.e. systems with additive responses to perturbations. In systems engineering, a standard form for linear time-invariant (where the shape of the output does not change with a delay in the input) systems with *n* states and *m* inputs is given by(3.1)with *n*×1 state vector ** x**(

*t*),

*n*×

*n*system matrix

**,**

*A**n*×

*m*input matrix

**and**

*B**m*×1 input vector

**(**

*u**t*). Linearization of the general dynamic system d

**(**

*x**t*)/d

*t*=

**(**

*f***,**

*x***,**

*p**t*) with parameter

**provides first approximations to the network dynamics, even for highly nonlinear systems such as those encountered in biological networks. Linear models capture the local dynamics, for instance, in the vicinity of a steady state, instead of aiming at more complicated global behaviours.**

*p*Mathematically, most methods reconstruct the system matrix ** A**, which corresponds to the Jacobian matrix

**=∂**

*J***(**

*f***,**

*x***)/∂**

*p***, from the measured effects of (sufficiently small) perturbations. However, direct recovery of the system matrix**

*x***will be unreliable with noisy data and inputs. In one of the studies using linear models and perturbation experiments to identify the structure of genetic networks, Tegner**

*A**et al*. (2003) therefore proposed an iterative algorithm that uses rational choices of perturbations to improve the identification quality. For a developmental circuit, despite high nonlinearities in the system, the reverse engineering algorithm, which involves building and refining an ‘average’ connectivity matrix in successive steps, recovered all genetic interactions (Tegner

*et al*. 2003). A related approach that uses linear models and multiple linear regression showed similar performance. The algorithm attempts to exploit the sparsity of systems matrices for biological networks owing to, for example, (estimated) upper bounds on the number of connections per node (Gardner

*et al*. 2003; Bansal

*et al*. 2006). Both algorithms are scalable—a central concept in engineering, but until recently considered of less importance in biology.

Newer approaches to systems identification aim at exploiting modularity in biological networks. For a modular system with one output per module, the method employs inversion of the global response matrix for identification of network connectivities and of local responses from perturbation experiments (Kholodenko *et al*. 2002). It requires a reduced number of measurements compared with other methods because only changes in so-called ‘communicating intermediates’ have to be recorded. Apparently, some simplifying assumptions have to be made; for example, modules are coupled by information flow only, and mass flow is negligible (Kholodenko *et al*. 2002). An important result of extending the modular identification to time-series data is that, for identifying all connections of a node, it is not necessary to perturb this node directly—inference can rely on detecting the network responses to remote perturbations (Sontag *et al*. 2004). Extensions to include the effects of uncertainties in experimental data and prior knowledge (Andrec *et al*. 2005), as well as the possibility of a unified mathematical framework (Cho *et al*. 2005) make modular identification methods particularly promising.

#### 3.1.3 Mechanistic models, identifiability and experimental design

Mechanistic models, owing to effects such as saturation in enzymatic reactions, pose particular challenges because they involve identification of nonlinear systems. Depending on whether model structure and parameters, or only the parameters have to be identified, the problems fall into the classes of mixed-integer nonlinear programs or nonlinear programs, respectively. As a clear limitation, finding a unique global optimum in the estimation, or convergence of the algorithms cannot be guaranteed. In addition, model identification comes at high computational costs owing to numerous model simulations (Maria 2004).

In terms of parameter estimation, which is a common problem in different scientific domains (Ljung 1999), realistic modelling of complex, nonlinear dynamics of biological networks has given new impulses for the evaluation of existing methods and development of new methods. For instance, though stochastic algorithms show superior performance over deterministic methods for parameter optimization in these systems, they are computationally expensive (Moles *et al*. 2003). Novel hybrid methods try to exploit synergies between both approaches in order to increase robustness and efficiency (e.g. Rodriguez-Fernandez *et al*. 2006 and references therein).

More fundamentally, ‘identifiability’ and design of informative experiments need to be addressed. Unstructured approaches to model identification are completely ill-posed when faced with, for instance, modelling a yeast cell with 6200 genes and four possible states per gene; we obtain an overall expression state dimension in excess of 10^{15} (Lockhart & Winzler 2000)! Clearly a number of *a priori* constraints and correlations must be exploited. For discrete models, usage of the experimentally observed upper bound on the number of interactions per species brings the amount of data needed for identification into realistic dimensions (Selinger *et al*. 2003). However, mere extrapolation of current high-throughput technology will not solve these high dimensional data issues. Several recent studies have highlighted the importance of proper design of perturbations to reveal the logical connectivity of gene networks (Wagner 2004; MacCarthy *et al*. 2005). Systems engineering concepts of experimental design to provide ‘rich’ datasets can be exploited to develop predictive mechanistic models.

Parameter estimation accuracies are central to measuring identifiability of mechanistic models. Low accuracies mean that the corresponding parameters may be varied to a greater extent—and still describe the data—than it is possible for parameters with high estimation accuracy (low associated error). They combine information on model sensitivities with experimental data (figure 2). More specifically, the Fisher information matrix (FIM) ** F**(

**) (Emery & Nenarokomov 1998), for a point in parameter space**

*p***, links model and experiment via state sensitivities**

*p***(**

*S**t*)=∂

**/∂**

*x***(see §3.4.1) and measurement covariance matrix for a discrete sampling time**

*p**t*

_{i},

**(**

*C**t*

_{i}). For an unbiased estimator, the Cramér–Rao theorem then gives a lower bound for the estimation error.

FIM-based approaches, for instance, yielded insight into the importance of suitable design of input perturbations for signalling networks (Zak *et al*. 2003), optimality criteria for the design of such inputs (Faller *et al*. 2003) and algorithms for the optimization of sampling times for dynamic experiments (Kutalik *et al*. 2004). New hybrid parameter estimation methods (Rodriguez-Fernandez *et al*. 2006) and closed-loop (i.e. integrating iterations between estimation, evaluation of the identifiability and experimental design) optimal identification procedures (Feng *et al*. 2004) rely on the FIM formalism. Note, however, that while these proof-of-concept studies with small models and synthetic data are valuable, the performance for real biological problems awaits assessment.

Information-rich datasets for integrative models will have to be derived from sources across all levels of biological regulation, such as the transcriptome, proteome and metabolic fluxes. Concomitantly, we need novel statistical frameworks for data integration (Hwang 2005). Systems identification would greatly benefit from the direct *in vivo* determination of kinetic parameters; the work by Ronen *et al*. (2002) for transcriptional control is a first step into this direction. As a complement, synthetic genetic circuits could provide means for controlled excitation of the system, for instance, by inducible genetic switches (Tegner *et al*. 2003), or through genetic oscillators to incorporate analysis methods in the frequency domain (Ljung 1999). Novel methods could also take known uncertainties associated with measurements—such as experimentally determined characteristics of stochastic noise (see §3.3)—explicitly into account. Finally, identification depends on adequate specification of the system and model (e.g. Kim & Tidor 2003). While models are currently either set up *ad hoc*, or through manual comparison of few alternative structures (including kinetic terms in the equations), uncertainties in biology pose a major challenge for systems sciences: deriving advanced approaches to model discrimination for the simultaneous identification of model structures and parameters.

### 3.2 Constraints and optimality

To understand complex biological systems, instead of starting from actual implementations and observations, one can reduce the problem by first separating the possible from the impossible, such as configurations and behaviours that would violate constraints. Systems approaches try to exploit three broad classes of constraints:

empirical: large-scale experimental analysis can provide constraints on possible network structures, such as the average or maximal number of interactions per component (see §3.1),

physico-chemical: laws of physics such as conservation of mass and thermodynamics impose constraints on cellular and network behaviours. These are used, in particular, for structural network analysis (SNA) with roots in the analysis of chemical reaction networks (Clarke 1988) and

functional: biological systems perform certain functions and their building blocks are confined to a large, yet finite set. Network structures and behaviours have to conform with both aspects.

Functional constraints constitute the main differences between complex physics and biology. In physics, they do not exist. Biological (as well as engineered) systems evolve to fulfil functions, and are constantly evaluated for their performance. Insufficient performance will lead to extinction, and better solutions are likely to survive. Hence, it is reasonable to assume some kind of optimality in biological systems. The immediate consequence of a purpose is a considerably smaller design space, in which effective and reliable network are rare and presumably highly structured. Understanding complexity in biology could, thus, employ a ‘calculus of purpose’—by asking teleological questions such as *why* cellular networks are organized as observed, given their known or assumed function (Lander 2004).

#### 3.2.1 Physico-chemical constraints in metabolism

Essential constraints for the operation of metabolic networks are imposed by (i) reaction stoichiometries, (ii) thermodynamics that restrict flow directions through enzymatic reactions and (iii) maximal fluxes for individual reactions. For instance, metabolism usually involves fast reactions and high turnover of substances when compared with regulatory events. Therefore, on longer time-scales, it can be regarded as being in quasi-steady state. The metabolite balancing equation for a system of *m* internal metabolites and *q* reactions(3.2)with the *m*×*q* stoichiometric matrix ** N** and the

*q*×1 vector of reaction rates (fluxes)

**formalizes this main constraint in SNA. As for most real networks**

*r**q*≫

*m*, the system of linear equation in equation (3.2) is underdetermined. However, all possible solutions are contained in a convex vector space, or flux cone (figure 3). Methods from convex analysis allow to investigate this space (Rockafellar 1970; Heinrich & Schuster 1996).

Two broad classes of methods for SNA have been developed: metabolic pathway analysis (MPA) and flux balance analysis (FBA; see (Papin *et al*. 2004; Price *et al*. 2004; Borodina & Nielsen 2005) for recent reviews). MPA computes and uses the set of independent pathways—generating rays in figure 3—that uniquely describe the entire flux space; owing to the algorithmic complexity, it can currently only handle networks of moderate size. FBA, in contrast, determines a single flux solution through linear optimization (Varma & Palsson 1993*c*), often assuming that cells try to achieve optimal growth rates. The computational costs are modest, even for genome-scale models. The approach was successful, for instance, in predicting the effects of gene deletions and the outcomes of convergent evolution in micro-organisms (Fong *et al*. 2003; Fong & Palsson 2004; Price *et al*. 2004). FBA, however, has to reverse-engineer and operate with an essentially unknown objective function. While maximal growth proved a reasonable assumption for lower organisms, higher cells may tend to minimize overall fluxes in the network (Holzhütter 2004). In general, FBA has proven effective for simpler organisms, and when the steady-state assumption is valid. However, there are many situations where these conditions do not apply, many of which are biophysically meaningful, such as the dynamic diauxic shift in *E. coli*.

#### 3.2.2 Extensions: dynamics and control

Stoichiometric constraints restrict the systems dynamics. Thus, the stoichiometric matrix ** N** is fundamental, not only for SNA, but also for dynamic processes in reaction networks, in which the reaction rates

**in equation (3.2) are time-dependent. For biological systems, the conservation of total amounts of certain molecular subgroups (‘conserved moieties’ such as ATP, ADP and AMP) is characteristic, and can be exploited for systems analysis. Classical work in chemical engineering addressed this topic for chemical reaction networks. For instance, Feinberg derived theorems to determine the possible dynamic regimes, such as multistability and oscillations, based on network structure alone (Feinberg 1987, 1988). Challenges posed by biological systems lead to renewed interest in these approaches and induced further theory development (Sontag 2001). Application areas in biology include stability analysis (Sontag 2001) and model discrimination by safely rejecting hypotheses on reaction mechanisms, thus, identifying crucial reaction steps (Conradi**

*r**et al*. 2005). Algorithms for the identification of dependent species in large biochemical systems—to be employed, for instance, in model reduction—have recently become available (Vallabhajosyula

*et al*. 2006).

Enabling FBA to deal with dynamics and regulation proceeded by incorporating additional time-dependent constraints that reflect knowledge on the operation of cellular control circuits—an approach termed ‘regulatory FBA’ (Covert *et al*. 2001). For instance, using superimposed Boolean logic models to capture transcriptional regulatory events has extended the validity of the methodology for a number of complex dynamic system responses (Covert *et al*. 2001) and for data integration (Covert *et al*. 2004). Other dynamic extensions of the FBA algorithm have been proposed in Mahadevan *et al*. (2002). With these more detailed models, steady-state analysis suggested that the complex transcriptional control networks operate in a few dominant states, i.e. generate simple behaviour (Barrett *et al*. 2005). Finally, pathway analysis also allows to approach features of intrinsically dynamic systems: for instance, it helps to identify feedback loops in cellular signal processing (Klamt *et al*. 2006). Hence, SNA-related approaches are about to extend to non-classical domains, in particular, through theory development induced by new challenges in systems biology.

#### 3.2.3 Functional constraints, optimality and design

In analysing living systems, one possibility is to start from the assumption that they have to fulfil certain functions, and that cells have been organized over evolutionary time-scales to optimize their operations in a manner consistent with mathematical principles of optimality. FBA demonstrates the utility of this assumption; note that its implicit functional constraint, i.e. steady-state operation of metabolic networks, is *not* self-explanatory. Similarly, other approaches invoking principles of optimal control theory have opened new avenues for systems analysis in biology.

The cybernetic approach developed by Ramkrishna & co-workers (Kompala *et al*. 1986; Varner & Ramkrishna 1998) is based on a simple principle: evolution has programmed or conditioned biological systems to optimally achieve physiological objectives. This straightforward concept can be translated into a set of optimal resource allocation problems that are solved at every time-step in parallel with the model mass balances (basic metabolic network model). Thus, at every instant in time, gene expression and enzyme activity are rationalized as choice between sets of competing alternatives, each with a relative cost and benefit for the organism. Mathematically, this can be translated into an instantaneous objective function. The researchers in this area have defined several postulates for specific pathway architectures, and the result is a computationally tractable (i.e. analytical) model structure. The potential shortcoming is a limited handling of more flexible objective functions that are commonly observed in biological systems (Savinell & Palsson 1992*a*,*b*; Varma & Palsson 1993*a*,*b*; Bonarious *et al*. 1997).

Instead of focusing on a single objective function, mathematical models and experimental data can be used to test hypotheses on optimality principles, given a specific cellular function to be fulfilled. For instance, extensions of FBA suggested that *E. coli* optimizes the tradeoff between achieving high growth rates and maintaining wild-type metabolic fluxes after gene deletions (Segre *et al*. 2002). MPA showed that the interplay between the metabolic network (the controlled plant) and gene regulation (the controller) in *E. coli* might be designed to achieve optimal tradeoffs between long-term objectives, such as metabolic flexibility, and short-term adjustment for metabolic efficiency (Stelling *et al*. 2002). Optimal production pipelines for biomass components, with fast responses to environmental changes and minimal additional efforts for enzyme synthesis, were predicted in detail to employ wave-like gene expression programs, which was later confirmed experimentally (Klipp *et al*. 2002; Zaslaver *et al*. 2004). Hence, at least certain cellular design principles can be revealed by evaluating assumptions on cellular optimality principles.

Finally, without assuming optimality, we can ask how functions in biological systems could be established in principle. Among others, drawing from analogies with engineered systems helps to understand more general design principles in biology. From nonlinear dynamics, for instance, it is well-known that functions such as oscillators and switches require some source of nonlinearity. Establishing such a function with biological building blocks, thus, allows only for certain circuit designs (Tyson *et al*. 2003; Kholodenko 2006). Similar ideas can prove powerful at different levels of abstraction. For instance, highly structured ‘bow-ties’ with multiple inputs, channelled through a core with standardized components and protocols to multiple outputs, could be the common organizational principles to establish complex production systems in engineering and biology (Csete & Doyle 2004). On the other hand, El-Samad & colleagues studied the bacterial heat-shock response, pointing out that the intertwined feedback and feedforward loops present can be assigned individual functions parallel to those loops in designed control circuits that have to yield fast responses in highly fluctuating environments (El-Samad *et al*. 2005). Notably, most of the examples discussed here involved new developments in theory to address challenges posed by biology; with respect to robustness as an important functional constraint, we will discuss these interfaces in more detail in §3.4.

### 3.3 Stochastic systems

Discrete stochastic modelling has recently gained popularity owing to its relevance in biological processes (McAdams & Arkin 1997; Arkin *et al*. 1998) that achieve their functions with low copy numbers of some key chemical species. Unlike the solutions to stochastic differential equations, the states/outputs of discrete stochastic systems evolve according to discrete jump Markov processes, which naturally lead to a probabilistic description of the system dynamics. A Markov process is a random process in which the future probabilities are dependent only on the present value, and not on past values. Such descriptions can find relevance in systems biology when the magnitude of the fluctuations in a stochastic system approaches the levels of the actual variables (e.g. protein concentrations). In addition, there are qualitative phenomena that are intrinsic to such descriptions that arise in biological systems, as mentioned later.

The idea that stochastic phenomena are essential for understanding complex transcriptional processes was nicely illustrated by Arkin & co-workers in the analysis of the phage *λ* lysis–lysogeny decision circuit (Arkin *et al*. 1998). The probabilistic division of the initially homogeneous cell population into subpopulations corresponding to the two possible fate outcomes was shown to require stochastic description (and could not be described with a continuous deterministic model). In particular, the coexistence of the two subpopulations necessitated such a formal characterization, and the relative sensitivity of the subpopulations to model parameters including external variables could be analysed with the resulting models. In a more recent work, Arkin & co-workers (Samoilov *et al*. 2005) have shown another example of a biological behaviour that is intrinsically stochastic in nature—namely the dynamic switching behaviour in a class of biochemical reactions (enzymatic futile cycles). In this case, the behaviour is more subtle than the lysis–lysogeny switch described earlier, where the existence of a bifurcation was at least evident in the continuous differential equation model. In the enzymatic futile cycle problem, the deterministic model gives no indication of multiplicity, yet the discrete stochastic model generates behaviours, including switching as well as oscillations, that indicate characteristics of bifurcation regimes. It is suggested that such noise-induced mechanisms may be responsible for control of switch and cycle behaviour in regulatory networks.

In the discrete stochastic setting, the states and outputs are random variables governed by a probability density function, which follows a chemical master equation (CME) (Gillespie 1976). The rate of reaction no longer describes the amount of chemical species being produced or consumed per unit time in a reaction, but rather the likelihood of a certain reaction to occur. Though analytical solution of the CME is rarely available, the density function can be constructed using the stochastic simulation algorithm (SSA; Gillespie 1976).

The discrete stochastic system of interest is described by a CME (Gillespie 1977)(3.3)where *f*(** x**,

*t*|

*x*_{0},

*t*

_{0}) is the conditional probability of the system to be at state

**and time**

*x**t*, given the initial condition

*x*_{0}at time

*t*

_{0}. Here,

*a*

_{k}denotes the propensity functions,

*ν*

_{k}denotes the stoichiometric change in

**when the**

*x**k*th reaction occurs and

*m*is the total number of reactions. The propensity function

*a*

_{k}(

**,**

*x***)d**

*p**t*gives the probability of the

*k*th reaction to occur between time

*t*and

*t*+d

*t*, given the parameters

**. As the state values are typically unbounded, the CME essentially consists of an infinite number of ODEs, whose analytical solution is rarely available except for a few simple problems. The SSA provides an efficient numerical algorithm for constructing the density function (Gillespie 1976). The algorithm follows a Monte Carlo approach based on the joint probability for the time to and the index of the next reaction, which is a function of the propensities. The SSA indirectly simulates the CME by generating many realizations of the states (typically of the order of 10**

*p*^{4}) at specified time

*t*, given the initial condition and model parameters, from which the distribution

*f*(

**,**

*x**t*|

*x*_{0},

*t*

_{0}) can be constructed.

This renewed interest in discrete stochastic simulation has motivated a number of *systems* engineering developments for the analysis of, and more efficient computation of, stochastic models. These include detailed analysis of the underlying assumptions invoked in using the SSA, with an emphasis on the distinction between separating fast and slow components as opposed to fast and slow reactions (Haseltine & Rawlings 2005). Gillespie has remained active in this area, and is currently collaborating with Linda Petzold to develop methods for accelerated tau-leaping methods, as well as effective numerical methods for step size selection (Cao *et al*. 2005, 2006). Kevrekidis & co-workers have introduced so-called ‘equation-free’ modelling approaches, which avoid the need for extensive Monte Carlo simulations. In the area of analysis, methods for formal sensitivity analysis of discrete stochastic equations enable the characterization of robustness properties of biological systems (Gunawan *et al*. 2005).

There has been simultaneous advancement in experimental methods for quantifying the characteristics of biological noise (Elowitz *et al*. 2002; Swain *et al*. 2002; Raser & O'Shea 2004) along with advances in computing and simulation. A number of groups have recently used dual reporter methods to track identical genes in the same cell to measure the impact of noise on expression. In the work of Elowitz & co-workers, the separate effects of stochastic behaviour in the transcriptional and translational processes in prokaryotes (so-called ‘intrinsic’ noise) are distinguished from noise effects arising from other cellular components that influence the rate of gene expression (so-called ‘extrinsic’ noise; Elowitz *et al*. 2002; Swain *et al*. 2002). O'Shea analyses eukaryotic systems with both cis- and trans-acting mutations to distinguish between the noise effects that are intrinsic to transcription as opposed to upstream processes that might ultimately influence expression (Raser & O'Shea 2004).

The *interface* of discrete stochastic systems and biology has clearly led to new insights into stochastic phenomena in biological systems, and has also spurred the development of more efficient computational methods for stochastic simulation, as well as analysis methods for these models. This *interface* will continue to motivate developments in systems engineering, with improved methods for imaging biological systems that include the ability to resolve spatial behaviours. Distributed stochastic models will require more sophisticated algorithmic developments, particularly as one builds models to truly address ‘systems-scale’ phenomena.

### 3.4 Robustness

In biology, as in engineering, robust performance refers to the attainment of a particular behaviour or response by a system in the presence of uncertainty. This appears to be a ubiquitous property of biological processes that are subject to constant uncertainty in the form of stochastic phenomena (McAdams & Arkin 1999), fluctuating environment and genetic variation (for a recent review on robustness in cellular functions, see Stelling *et al*. 2004*b*). Biology has adapted a number of approaches for coping with these sources of uncertainty, which include:

back-up systems (redundancy),

disturbance attenuation through feedback and feedforward control,

structuring of networked systems into semi-autonomous functional units (modularity) and

reliable coordination of network elements through hierarchies and protocols.

The robustness problems in systems biology have only begun to yield, in recent years, formal quantitative analyses, owing largely to their nonlinear (and non-stationary) nature. As with engineering systems, robust performance requires the precise specification of both a performance metric and the type/size of uncertainty. When both these elements are specified, it may be possible to analyse biological systems with the engineering tools, as will be shown in this paper. It is important to note that the performance metric is often difficult to be defined precisely in biology, as it is an implicit element of an evolved entity.

#### 3.4.1 Parametric sensitivity approaches to robustness analysis

Parametric sensitivity has found widespread application in the analysis and design of both scientific and engineering systems (Varma *et al*. 1999). In the field of systems biology, sensitivity analysis has been employed in a number of applications, including optimized design of synthetic circuits (Feng *et al*. 2004), design of experiment for optimal parameter estimation (Zak *et al*. 2003; Gadkar *et al*. 2005) and robustness analysis to provide insights into design principles (Stelling *et al*. 2004*a*). The sensitivity operator describes the change in the system's outputs owing to variations in the parameters that affect the system dynamics. High sensitivity to a parameter suggests that the system's performance (e.g. growth, temperature, etc.) can drastically change with small variations in the parameter. Conversely, a small value of the sensitivity suggests that the system is not strongly affected by the parameter.

The classical parametric sensitivity analysis applies to continuous deterministic systems, e.g. systems described by differential (or differential-algebraic) equations. The first order sensitivity coefficients are given by Varma *et al*. (1999)(3.4)where *y*_{i} denotes the *i*th output; *t*, the time; and *p*_{j}, the *j*th parameter. Equation (3.4) follows directly from the definition of parametric sensitivity, and assumes implicitly that the output *y*_{i} is continuous with respect to the parameter *p*_{j}. Sensitivity analysis for stochastic systems can be applied to problems in which the stochastic effects enter as additive Gaussian white noise (e.g. Langevin-type problems; Costanza & Seinfeld 1981; Dacol & Rabitz 1984 or as uncertainty in the parameters Feng *et al*. 2004). Recent extensions allow the treatment of discrete stochastic systems (see §3.3).

Using the sensitivity operator, one can computer the FIM (§3.1.3), thus indicating robust elements (large variances) and fragile elements (tight variances). Corresponding to such a characterization are parametric sensitivities, which are high for fragile elements and low for robust elements. Previous work has shown the utility of this approach for analysing robustness in complex biophysical networks (Stelling *et al*. 2004*a*). The FIM allows flexibility in choosing the appropriate criterion for optimality depending on the goal of both robustness and model identification. D-optimal design aims to maximize the degree of informativeness in data by maximizing the determinant of the FIM, which corresponds to the area/volume of an information hyperellipsoid (Emery & Nenarokomov 1998). On the other hand, A-optimal design is equivalent to reducing the hyperellipsoid of uncertainty in parameter estimates.

One limitation to parametric sensitivity for the analysis of biological systems is the inherently nonlinear character of such systems, while the classical sensitivity methods yield linear (i.e. local) results. One can improve upon this by performing analyses in a neighbourhood of operating points, thus extending the region of validity of the method.

#### 3.4.2 Systems engineering approaches to robustness

In control engineering, a standard tool for robustness analysis is the structured singular value (SSV), which allows to determine whether a particular dynamical system, subject to a specified (structured) uncertainty, is able to remain stable or to achieve a particular performance metric (e.g. Doyle 1982; Skogestad & Postlethwaite 1996; Zhou 1998). The two problems are known as *robust stability* and *robust performance*, respectively, and there are standard software packages available to facilitate this computation (e.g. Balas *et al*. 1995). The key idea is to transform the perturbed system into a new closed-loop operator, and then to test the stability of the operator. The basic idea is illustrated in figure 4, where the M operator denotes a nominal process system, and the Δ operator denotes the uncertainty in the system. Stability of the depicted system is equivalent to robust stability of the original problem, and if a feedback loop between suitably transformed input and output signals is closed, then an operator whose stability characteristics coincide with the attainment of robust performance in the original problem is obtained.

There are straightforward computational algorithms for determining the solution to the corresponding linear time-invariant stability problem in figure 4, for example, MATLAB's μ-Analysis and Synthesis Toolbox (Balas *et al*. 1995). Note, however, that extensions for time-varying and nonlinear uncertainty (Doyle *et al*. 1989) are necessary before applying the SSV analysis on nonlinear systems. The SSV-based methods have begun to find application in biological systems, with recent papers on ‘robustness’ properties of models (Ma & Iglesias 2002) and robust performance analysis of a signal transduction cascade (Doyle & Stelling 2005). One of the more important messages from the engineering robust control literature is the notion of ‘performance’, which requires a precise description in order to calculate the so-called ‘robust performance’. This idea has important consequences in biology, as robustness in one performance attribute may be quite different from the behaviour of another attribute (Bagheri *et al*.). The unique attributes of problems encountered in biology are also motivating the development of new algorithms for formal theoretic analysis (Sontag 2004).

Although the aforementioned engineering methods outline a formal framework for robustness analysis, with potential application for simple biological systems, there are intrinsic scaling problems with such methods. In particular, in the case of larger and more complex nonlinear systems, computational tractability of the corresponding analysis problem is a limiting concern, as is the fact that the methods are inherently conservative for nonlinear systems. This suggests that scalability of methods for robustness analysis is a major opportunity for future research.

#### 3.4.3 Robust yet fragile systems

There is a coexistence of fragility along with robustness in complex networked systems. This property is observed in a wide variety of systems that include forest fires, the Internet, metabolic networks and protein regulatory networks (Carlson & Doyle 1999, 2000, 2002). A nice treatment of the theoretical issues and algorithms in analysing robustness in biological networks is provided in the review paper (El-Samad *et al*. 2006). The concept of ‘Highly Optimized Tolerance’ has emerged to describe the optimal design of such networks in which robustness is maintained across a wide range of expected perturbations, while fragility (often catastrophic) is observed for the very rare and unlikely perturbations. This is accomplished through hierarchies of regulation that are built to withstand some expected class of disturbances. As an engineering system ‘evolves’ to incorporate ancillary layers of stabilizing control, these layers often expose the system to novel points of fragility, thus leading to so-called ‘spiralling complexity’.

In simple (linear) engineering systems, such tradeoffs are analysed using theory such as the Bode sensitivity integral to calculate the ‘conservation’ of robustness (captured by the sensitivity operator) across the frequency domain. This reveals that reduced sensitivity in some frequency range is exactly balanced by a heightened sensitivity in another range (i.e. ‘robust yet fragile’; Csete & Doyle 2004; Doyle & Csete 2005). The significance of the Bode sensitivity integral for biological (nonlinear) dynamics needs to be clarified, as the result is traditionally applied to linear systems. Yet it is widely understood that biological networks exhibit the same robust yet fragile tradeoffs. As an example, consider the exquisite timekeeping of circadian rhythm in neuronal cells. These clocks are well known to be *robust* to large fluctuations in temperature (Ruoff *et al*. 2005), and yet recent evidence has shown that timekeeping in cellular networks is *fragile* to the blocking of key receptors for intracellular synchronization (Aton *et al*. 2005). The former is an expected disturbance for which the system is designed to be robust, while the latter represents a highly unusual and unexpected disturbance for which the system is ill-equipped to handle. Furthermore, quantitative studies in systems such as the *Drosophila* circadian gene network have revealed similar tradeoffs between global (core cellular machinery) and local (circadian specific) functions (Stelling *et al*. 2004*a*).

#### 3.4.4 Two biological examples of robustness

The signal transduction system that mediates chemotaxis exhibits a type of adaptation in which the response to a persistent stimulus is reset to the pre-stimulus value, thereby enabling an enhanced sensitivity. For a number of years, researchers speculated a mechanistic explanation for this robust behaviour, and two hypotheses had emerged: (i) precise fine-tuning of several parameters to yield a consistent (robust) response under varied conditions, and (ii) structural organization that yielded this robust behaviour intrinsically (Barkai & Leibler 1997). The assumptions in this work require a specific mechanism of fine-tuning of the network structure so as to produce integral feedback, which is sufficient to make ‘adaptation’ perfectly robust to all remaining network parameters. John Doyle & co-workers at Caltech were able to use the internal model principle to demonstrate that the regulatory system was exploiting integral feedback control to achieve the robust level of adaptation exhibited in chemotaxis, and more generally in systems with such behaviour (Yi *et al*. 2000; Lander 2004). In other words, they showed that integral control is a necessary condition for robust perfect adaptation, and if the mechanism described in Barkai & Leibler (1997) is incorrect in some aspects, then some other fine-tuned structure must be present. This understanding suggests that many seemingly complex biological networks may employ redundancy and other structural motifs or modules (enumerated in an earlier section) to achieve relatively simple overall system behaviour (Lauffenburger 2000).

The gene network underlying circadian rhythm in flies and mammals has been the focus of detailed analysis in recent years (Goldbeter 1996; Reppert 2000; Winfree 2001; Young & Kay 2001; Goldbeter 2002). The biological details are coming into sharper focus, as new experiments yield clues to the detailed (and somewhat overlapping) molecular circuitry of both flies and mammals (Panda *et al*. 2002). Building upon the evolving biological knowledge, there have been many postulated mathematical models (Leloup & Goldbeter 1998; Scheper *et al*. 1999; Tyson *et al*. 1999; Lema *et al*. 2000; Smolen *et al*. 2001) that range in complexity from simple two-state oscillators to more biophysically detailed transcriptional feedback schemes. As with adaptation in chemotaxis, *robustness* is the dominant characteristic often associated with the circadian rhythm regulatory loop (e.g. Vilar *et al*. 2002), although formal systems-theoretic treatment of this behaviour is a notable absence among the published reports. In recent work, we have shown that systems engineering tools, notably robustness analysis, shed light on the underlying design principles in the gene regulatory architectures (Stelling *et al*. 2004*a*). In particular, the organization of fragility and robustness between global cellular components and circadian-specific components enables precision in circadian clock function.

## 4. Summary

Biological regulation has been reviewed and analysed from the perspective of systems engineering. Mathematical modelling approaches, both empirical and fundamental, have yielded descriptions of many complex systems, and systems-theoretic tools have been employed to provide hypotheses for biological behaviour, such as system robustness. Open challenges were described in the areas of network identification, constraints and optimality, stochastic systems modelling and robustness analysis. Synthetic biology represents one of the more promising future directions in this field (Arkin 2001; Benner & Sismour 2005), which requires a fusion of methods from both engineering and molecular biology in the design of biological circuits in order to achieve the aims at the interface of these disciplines.

## Acknowledgments

We acknowledge the financial support to F.J.D. from the Institute for Collaborative Biotechnologies through grant DAAD19-03-D-0004 from the US Army Research Office.

## Footnotes

- Received June 1, 2006.
- Accepted July 3, 2006.

- © 2006 The Royal Society