## Abstract

Dynamical systems describing whole cells are on the verge of becoming a reality. But as models of reality, they are only useful if we have realistic parameters for the molecular reaction rates and cell physiological processes. There is currently no suitable framework to reliably estimate hundreds, let alone thousands, of reaction rate parameters. Here, we map out the relative weaknesses and promises of different approaches aimed at redressing this issue. While suitable procedures for estimation or inference of the whole (vast) set of parameters will, in all likelihood, remain elusive, some hope can be drawn from the fact that much of the cellular behaviour may be explained in terms of smaller sets of parameters. Identifying such parameter sets and assessing their behaviour is now becoming possible even for very large systems of equations, and we expect such methods to become central tools in the development and analysis of whole-cell models.

## 1. Introduction

John von Neumann was famously dismissive of over-eager curve-fitting. ‘With four parameters I can fit an elephant, and with five I can make him wiggle his trunk’ encapsulates his position [1], and is a view shared by many. However, computational science is now exploring models of great complexity with hundreds and even thousands of parameters—and it is perhaps fitting that one of von Neumann's many great legacies, the modern computer, is enabling this; whole-cell models (WCMs) are perhaps the most ambitious example of large-scale models in the life sciences and here we discuss the statistical and parameter estimation challenges intrinsic to such efforts.

Mathematical models can have manifold uses in biology. First and foremost, they show that complexity can arise even from very simple dynamical systems [2]. At their best, such simple models capture essential aspects of biological systems and afford us with fundamental new insights. *Lotka*–*Volterra* models in ecology, the *Wright*–*Fisher* model in population genetics, *Turing* [3] and *French-flag* [4] models in developmental biology [5], the *repressilator* in synthetic biology [6] and the *standard model of gene expression* [7] in molecular biology are all examples of powerful yet simple models that have substantially contributed to our understanding of biological processes from the molecular to the eco-system level.

Despite the simplicity of these models—often to the extent that substantial and non-trivial analytical solutions are available for key aspects of their behaviour—their validity or relevance has been probed and demonstrated repeatedly. Models are not reality, nor are they meant to represent all aspects of reality faithfully [8]. Nevertheless, these simple models have essentially framed how we understand many key biological processes, and can serve as useful guides as to how we should best explore them in more detail. However, such models are quickly found wanting when more detailed data become available, or when related but more complicated processes are studied. For example, spatial structure is known to affect the validity of both ecological and population genetic processes, and at the very least models have to be modified to account for these changes. As models become more complex and incorporate often complex feedback structures [9,10], we can no longer rely on analytical techniques, and instead start to require computer simulations to explore their behaviour.

### 1.1. From simple to complex models

Some models aim to capture the essential hallmarks of life—such as metabolism, nutrient uptake, gene expression regulation and replication—but in a simplified representation that does not aim to replicate the true complexity of a whole organism [11–16]. These *coarse-grained* models have shown great promise and allow us to integrate molecular, cellular and population level/scale processes into a coherent—and analytically tractable—modelling framework. While real cells will be much more complicated, these simple model systems have successfully provided insight into fundamental cell physiology, e.g. processes affecting microbial growth rates [12,13,15,16].

Increasingly, there is interest in generating more realistic and complicated models that, rather than aiming to provide abstract representations of key features, incorporate extensive details of known components and interactions (or reactions) present in a system. In cell biology, for example, there are now numerous attempts at modelling aspects of metabolism, gene regulation and signalling at cellular level [17–24]. Perhaps the best established are metabolic models, where a powerful set of tools, based around *flux balance analysis* (FBA) [25], allows us to explore metabolic phenotypes *in silico* at a genomic level for an increasing range of organisms (and some individual cell types) [24,26,27]. However, such models are stoichiometric and thus give us information about biochemical reaction schemes and fluxes, but not details about the system dynamics.

Advances in both high-throughput experimentation and computational power have opened up the possibility of creating and analysing more complex dynamic models of biological systems, including many which represent processes occurring at different scales [28,29]. Numerous models now face the challenge of being *large* (in terms of numbers of species and parameters represented), multi-scale and/or *hybrid* in nature (incorporating multiple different mathematical representations) [23,28–30]. The most ambitious models to date—the *WCMs*—aim to provide faithful *in silico* representations of real biological cells, including all major cellular processes and components, and are both very large scale and hybrid (figure 1) [31–33].

There are several potential uses for such WCMs:

(1) To gain mechanistic insights, by serving as an

*in silico*‘blueprint’ through which we study the behaviour of real cells.(2) As a rational screening and predictive tool, to explore

*in silico*what might be hard or impossible to study*in vivo*.(3) To drive new biological discoveries, by showing where we lack sufficient understanding, and identifying promising future directions to pursue experimentally.

(4) To study

*emergent*phenomena which are only apparent when we consider a system as a whole.(5) To integrate heterogeneous datasets and amalgamate our current knowledge into a single modelling framework.

(6) Perhaps eventually to study, via virtual competitions between different cell architectures, evolutionary dynamics in unprecedented detail (but at enormous, currently crippling, computational cost).

(7) In the meantime, as the community strives to develop viable WCMs, the technological, computational and statistical challenges of model building will, no doubt, give rise to much fruitful research and re-usable methodology.

Here, we focus on the inference and statistical modelling challenges inherent to developing WCMs (and other complex models), in terms of model construction, parameter estimation, uncertainty and sensitivity analyses, and model validation and refinement. Some of these are generic modelling challenges—but worth reiterating—while others are specific to large-scale, multi-scale and hybrid models.

## 2. Combining models

The first example of a comprehensive WCM, published in 2012, describes the life cycle of a simple bacterium, *Mycoplasma genitalium*, accounting for the functions of all 525 known annotated genes [31]. This model was constructed by combining 28 submodels that describe cellular processes, including metabolism, gene expression regulation, protein synthesis, biomolecule assembly, signalling and cell division functions. Such a hybrid model requires a sophisticated and non-trivial simulation framework, as the different modelling modalities required, e.g. for FBA and stochastic gene expression, need to be reconciled with one another and staged appropriately such that cell–physiological processes are realistically scheduled.

For some of the obvious next candidates to generate WCMs, such as *Escherichia coli* or *Bacillus subtilis*, we already have vast amounts of metabolomic, transcriptomic and proteomic data, as well as knowledge summarized in databases. There are, for example, the genome-scale metabolic models that we have already touched upon above. These are complemented by gene regulation and protein–protein interaction networks that capture interactions that have been experimentally substantiated to different degrees. Construction of the next WCMs will probably follow similar approaches that combine existing submodels; the challenge is to (i) construct and parametrize appropriate submodels and (ii) intercalate these networks to enable computational analyses. An even more ambitious goal is to extend such approaches to study eukaryotic species such as *Saccharomyces cerevisiae* or mammalian cells, where not only do we have to deal with much larger genomes, but features such as subcellular compartments and more complex regulatory mechanisms.

Approaches and difficulties relating to submodel construction and parametrization are discussed in subsequent sections, but even once we have extensively characterized, carefully parametrized and validated models describing different aspects of a complete system, combining these will remain a formidable challenge [28,29,34]. Submodels were successfully integrated in the original WCM by assuming independence on short time scales, and defining a collection of cell variables that could be shared among the various distinct cellular processes [31]. However, we should bear in mind the difficulties faced in other scientific fields, where complex systems are frequently studied using multi-scale [29] and multi-physics modelling approaches [35]. In climate modelling, for example, multi-physics approaches combine models of atmospheric chemistry with models of ocean currents, which are then coupled, using, for example, partial differential equations. But in those cases the processes of connecting the different subsystems can introduce uncertainty and bias, as the feedback between different constituent parts of the larger system can also be complicated, and coupling different systems requires considerable fine-tuning of the equations linking the different subsystems.

It will be crucial to develop ways to assess how uncertainties and errors may be propagated through a complex model. Any inaccuracies in linking different submodels can severely compromise the compound model, irrespective of how carefully the submodels have been calibrated and tuned. We should also consider how else we can incorporate the impact of cellular context when modelling subsystems. The notion of extrinsic noise was introduced precisely to account for cell-to-cell heterogeneity in factors that are not explicitly captured by a model, but which may differ between cells [36]. Frequently, it is possible to capture the leading effects of such extrinsic factors by allowing for differences in the rate parameters between cells [37–39].

## 3. Parameter estimation

The models, *f*(*Y*; *θ*, *t*), we are after capture the behaviour of our organism/cell, or all of its constituent parts subsumed in the vector, *Y*, over time, *t*; *θ* denotes the vector of parameters (e.g. rate constants for metabolic and kinetic processes). Depending on the nature of the process being studied, and the level of knowledge we have about a system, we may require different modelling formalisms (e.g. deterministic, stochastic, logical or stoichiometric); frequently, we will make use of ordinary differential equations of the form
3.1where *ξ*(*Y*, *t*) is an optional additional term to denote stochastic processes (e.g. due to random timings of collisions between molecules in the cellular interior).

The structure of the model—in terms of the mathematical representation of the function *f*(*Y*; *θ*, *t*) which describes the system components and relationships—is defined according to our current knowledge, perhaps in combination with data-driven *network inference* techniques that aim to learn the likely structure of a system from observations of its variables [40–46]. However, we also need to obtain suitable estimates for the parameters, either from experimentally determined values or by using statistical approaches to estimate (or infer) these values by fitting model simulations to observed data.

### 3.1. Experimental estimates

The authors of the first WCM [31], and others [47], stress the need to only use experimentally measured parameters in biological models. This may appear a rigorous way to ensure that a model is properly calibrated against the available information, and that free (wiggle) parameters are avoided. There are, however, a number of pitfalls to such a strategy [48,49].

First, the mathematical models that we are considering are abstractions of much more complex processes (even within WCMs that attempt to incorporate functions of all known genes); therefore, the meaning of a parameter needs to be carefully evaluated in each case. For example, Michaelis–Menten kinetics are frequently used to model enzymatic reactions, but these model assumptions may be far removed from the true biophysical processes occurring inside a crowded cellular environment, and may grossly simplify the complex catalytic regulation that occurs in the real system. The model parameters may therefore not really reflect the biophysical constants that are experimentally accessible using *in vitro* or *in vivo* assays.

Second, biochemical reaction rates depend on numerous environmental (e.g. temperature, acidity, ionic strengths) and cellular (e.g. viscosity, allosteric regulation) factors. While we can aim to design experiments so that our measurements are as relevant as possible [50–53], few biological parameters can be measured precisely in their appropriate *in vivo* context. We are often forced to resort to parameter estimates obtained from *in vitro* assays or from other (related) species, but there are good reasons to be wary of such estimates: the thermodynamic and ecological differences can lead to pronounced differences between, for example, catalytic rates; similarly, differences in the architecture of the cell membrane and embedded transporters and receptors can affect transport as well as cell–environment interactions.

Third, any modelling using fixed parameters ought to be viewed with a healthy dose of scepticism: uncertainty and noise pervade all of cell biology, and failure to account for this appropriately can compromise analyses and further uses of the resulting model [48]. Such shortcomings may not necessarily be detectable in validation experiments—particularly when dealing with such complex models, numerous parameter combinations may provide a reasonable match to the data [54], yet the complexity renders these models intractable to existing approaches designed to identify these situations.

Finally, not all relevant parameters may be known (even when we include data from related species) and, in some cases, may not be experimentally accessible. A key advantage of modelling is that it enables us to explore the influence of processes that we cannot directly observe, by linking these processes in a mathematical framework to variables we can probe experimentally. If we restrict our models to only include parameters that we can estimate experimentally, we surely risk biasing our models (and thus conclusions) according to our current experimental limitations.

Ideally, we should make use of *in vivo* data from the target organism of interest where available to help estimate parameters in the correct cellular context. However, we should also exploit the diverse array of techniques available that allow us to infer the most probable parameter values from the observed behaviour of a system.

### 3.2. Statistical inference

Here by *inference* we mean the use of sound statistical methods to learn the parameter and, where possible, include an explicit assessment of the associated uncertainty. The *likelihood* is a central quantity for such methods [55]; it is defined as the probability of observing the data given a parameter, *θ*,
3.2Here Pr(*d* | *θ*) is the probability of an experimental observation, *d*, given the mathematical model *f*(*Y*; *θ*, *t*), for a given value of *θ*. Note that *θ* will for all interesting problems, including WCMs, be a vector containing all the model parameters.

The *maximum-likelihood estimate* (MLE) of *θ*, denoted by , is obtained by varying *θ* until the likelihood becomes maximal
3.3and it is the best estimate of *θ* given the data, , and the assumed model, *f*(*Y*; *θ*, *t*). Frequentist inference approaches aim to identify the MLE of *θ*; for most non-trivial biological models, we expect the likelihood surface (the likelihood function evaluated over the parameter space) to be complex and multi-modal in nature and we thus rely on numerical optimization algorithms rather than analytical approaches to find the maximum. *Local* optimization algorithms risk identifying local maxima, so there is strong reason to prefer *global* optimization approaches that aim to explore the parameter space more broadly [56,57]. For sufficiently simple models (those where the likelihood function in concave around a single maximum), even global uncertainty statements can be made. Experience suggests that this latter case is the exception rather than the rule for dynamical systems in cell and molecular biology [54,58]. Some approaches, such as profile-likelihood methods, try to assess the uncertainty for each parameter [59,60], which may hold particular appeal for those interested in inferring specific parameters with accuracy.

The likelihood is also used in *Bayesian inference* where it is combined with the *prior* *π*(*θ*)—a probability over the parameter space that reflects the level of existing knowledge (or lack thereof)—to arrive at the posterior distribution
3.4Here now, instead of providing a single estimate (plus potentially associated confidence intervals), we specify the probability distribution over the whole potential parameter space considered, *Ω*_{θ}. We can, if preferred also choose to report a point estimate, i.e. , for pragmatic reasons, but typically we find it preferable to consider the whole posterior distribution (at least conceptually).

The Bayesian framework offers considerable interpretational advantages over the traditional likelihood approach (reviewed elsewhere [61]), but comes, in its full form, with a computational burden that can prove prohibitive. Generally, for any half-way realistic model the denominator in equation (3.4) will be hard to evaluate (hence the need for Markov chain Monte Carlo and related methods in Bayesian inference). For many scientifically interesting problems, even evaluation of the likelihood is computationally unfeasible and a range of methods, including approximate Bayesian computation and other likelihood-free inference methods have risen to prominence [62–65], which extend the applicability of the Bayesian framework to problems that have computationally intractable likelihood functions.

### 3.3. Application to large-scale, hybrid models

For WCMs (and other large-scale, hybrid models), we will require further advances and improvements in both experimental techniques, and computational and statistical methods in order to ensure that these models are built on solid foundations.

Despite the wealth of ‘omics’ level data available for the most well-studied organisms, e.g. *E. coli* and *S. cerevisiae*, we still lack the comprehensive *in vivo* measurements needed to allow us to obtain the relevant parameter estimates in a systematic and automated way. The original *M. genitalium* WCM relied on the authors painstakingly compiling information from over 900 publications in order to parametrize the model (despite purposefully choosing a fairly small organism for their proof-of-concept study) [31]. To enable development of such complex biological models to become more mainstream will require community-wide efforts to establish accepted tools and standards for collating and annotating data from heterogeneous sources [66–68] as well as improved means of harvesting existing literature and data sources [23,69]. New experiments can expand the coverage of existing datasets and ensure consistency in terms of experimental conditions [70], as well as providing us with a better understanding of the relationships between *in vivo* and *in vitro* parameter estimates (these may differ by several orders of magnitude, thus *in vitro* estimates can be misleading) [71].

At present, none of the statistical inference methods outlined above are applicable at the scale of WCMs. However, smaller subsystems, such as individual pathways, regulatory motifs, receptor complexes or systems comprising small sets of metabolic reactions and the associated regulatory processes can be effectively parametrized using such methods [72]. For such systems, we can often estimate parameters, including uncertainty; and we are frequently able to assess parameter sensitivity (typically measured as the change in some model output, e.g. predicted protein abundance, in response to varying a single parameter). In some cases, experimental measurements of species concentrations may allow us to effectively decompose our models into smaller modules for efficient parameter estimation [73]. Bayesian inference methods in particular are limited in terms of scale and are generally only feasible for models with up to tens to hundreds of species and parameters [74,75]. Some optimization approaches are much more scalable though, with recent advances allowing parametrization of ODE models comprising hundreds to thousands of species and parameters [65,76,77]. As always, however, the chance of being trapped in local optima is high for such large-dimensional problems.

A combination of both inference and experimental estimation will probably be needed to parametrize complex biological models. It is currently impractical to use inference techniques within the context of a full WCM, unless considering very small pre-defined subsets of the parameters and, even then, the computational costs are enormous [67]. We can, however, make use of scalable inference techniques [78] to help us parametrize the component submodels, using experimental information where available as prior knowledge for the inference procedures. This will allow us to avoid some of the potential pitfalls outlined above of experimental estimates, and generate parameter estimates that take into account—to the best of our ability—the influences of cellular and system context, and make use of the most appropriate *in vivo* datasets. Crucially, rigorous statistical inference also enables us to explore the relationships between model parameters and start to understand and quantify the uncertainties inherent to any mathematical model.

## 4. Model and parameter uncertainty

There are uncertainties in both the structure and parameters associated with any mathematical model of a biological system. We often do not know the exact components and interactions that make up a given subcellular system (such as signalling or metabolic pathways) and, in particular, the crosstalk that occurs between such pathways. We necessarily use abstract and simplified mathematical representations of the dynamics, and frequently rely on phenomenological models, rather than modelling the fundamental physical and chemical processes that occur in the cell. For example, when modelling enzymatic reactions or gene regulation at a large scale, we often rely on Michaelis–Menten kinetics—even when the assumptions behind this modelling formalism do not hold (e.g. assumptions of irreversibility and time-scale separations)—or Hill kinetics, the latter of which has no established mechanistic interpretation [79,80]. Even when we attempt to include molecular details of the complete system (such as in a WCM) we are still forced to ignore many of the true complexities of the processes occurring, e.g. post-translational modifications or complex regulation of enzymatic reactions, as it is simply infeasible to represent these in such a large model.

### 4.1. Structural uncertainty

The structural and mechanistic assumptions inherent to our chosen model will influence the conclusions and predictions we draw. While uncertainties in parameter values are generally acknowledged and explored to some extent, structural uncertainty—the *inherent* ambiguity as to the ‘correct’ (least wrong) structure of the mathematical model—is often overlooked. However, there are methods we can use to explore how our choices about model definition—in terms of the system components we include, and the way we represent these mathematically—may be influencing our conclusions.

*Model selection* methods enable us to compare several proposed models (which correspond to our different hypotheses about a system) and determine which are best supported by the available data. Depending on the nature of the set of models, there are a range of methods available to rank our models within both frequentist or Bayesian inference frameworks—e.g. likelihood ratio tests, Akaike's (or other) information criterion, Bayes factors, or estimation of the marginal likelihood (the denominator in equation (3.4)) [81]. We can either choose to select the best-ranked model for our analyses, or use *model averaging* techniques to generate conclusions from a pool of models, with their relative contributions weighted according to how well they fit the observed data [82,83].

Increasingly, there are techniques that enable us to consider a collection of good models when making predictions or drawing conclusions. Particularly when working with data-driven models (i.e. inferred network structures consistent with the data), we can consider the model structure within a probabilistic framework, rather than assuming a fixed model structure before making predictions [84,85]. We can explore the *robustness* of our model predictions—whether the conclusions we draw are consistent across a set of good models, or whether they rely on one specific set of model assumptions [86,87]; in the latter case, we may want to be wary of such conclusions if we cannot be confident in the validity of those assumptions. *Ensemble modelling* approaches—which analyse the behaviour of a population of distinct models—have been central to the success of model predictions in other fields, e.g. climate forecasting, by allowing us to understand and quantify the uncertainties in our conclusions [88]; such methods are starting to be applied to biological models [42,89–91]. In situations where we cannot be sure of the best choice of model (even when using model selection techniques) and many scenarios are consistent with the available data, it is crucial to understand how much our assumptions may be influencing our results.

Although such techniques are not currently feasible to apply to WCMs, we should make sure that we consider these issues when constructing the constituent submodels, and when deciding what components we should include in our system, and how we represent these mathematically. It should be clear, however, that any WCM, no matter how carefully it has been constructed, will be subject to considerable structural uncertainty. That also means that there will be a potentially large number of model modifications and alternative models that will be equally capable of describing available data; and make essentially indistinguishable predictions about the system behaviour in many cases.

### 4.2. Parameter uncertainty

Regardless of how we estimate our parameters—whether through experimental determination or statistical inference—there will be some degree of uncertainty in the resulting values. In all cases, it is important to quantify the level of this uncertainty and, ideally, consider the relationships between model parameters, as well as determining to what extent these uncertainties influence our conclusions.

For experimental estimates, we face the difficulties discussed earlier in terms of how to best approximate the true cellular environment when carrying out measurements; there will be uncertainties in the methods we use for quantification; and of course many sources of environmental and cellular heterogeneity, only some of which we can control. With inferred parameter estimates, we again rely on observed experimental data (with their associated uncertainties) but also need to consider the limitations of the specific inference method we use. Particularly as models get larger, we are also likely to face issues around identifiability [42,77,92]. *Structural identifiability* is a property of the model structure and considers whether this allows us to uniquely determine the parameter values from system observations (assuming ideal conditions and data). This is a prerequisite for *practical identifiability*, which is dependent on the data available and whether these are sufficient to allow parameter determination (i.e. this reflects the information content of the data). Assessing these properties can help us modify our model structure and/or experimental design to deal with lack of identifiability [50,93].

For models parametrized with point estimates (single values for each model parameter), we can use sensitivity analyses to explore the impact of uncertainty in those estimated values. *Sensitivity analysis* determines how uncertainties in a model's inputs (e.g. parameter values or initial conditions) contribute to uncertainty in the output of the model (e.g. simulated dynamics) [58,94,95]. To perform a parametric sensitivity analysis, we perturb a single parameter—or better, combinations of parameters—and test how much this affects our model output, e.g. simulations of system dynamics. This allows us to quantify (to some extent) how the uncertainties and potential errors in our parameter estimates might propagate through our modelling analysis. Ideally, if we have assigned confidence intervals to our parameters (e.g. by estimating the potential magnitude of errors in our experimental measurements, or inferring confidence intervals along with point estimates in a frequentist framework) we can test the influence of perturbations of these magnitudes. The simplest methods—perturbing each parameter in turn—ignore potential dependencies between parameters, yet these may be strongly correlated, so ideally we should explore the multi-dimensional parameter space in more detail using global rather than local sensitivity analyses. However, this is of course computationally very demanding, particularly as models increase in size and complexity, and stochasticity becomes important.

Bayesian inference methods provide us with far more information as they allow us to infer the full, joint posterior probability distribution for the model parameters. This not only gives us details about the uncertainties in single parameters (from the shapes of the marginal probability distributions), but also allows us to see the full dependencies between different model parameters—allowing us to, for example, detect groups of parameters that can vary in coordinated ways while still providing a good match between model simulations and observed data (figure 2). The joint posterior distribution therefore provides a comprehensive assessment of the uncertainties in our inferred parameter values (albeit these are, of course, like any parameter estimates dependent on the structural assumptions made in our model). To understand how these uncertainties influence our conclusions, we can sample probable point estimates from the posterior distribution and compare the results obtained from our model using these different combinations of parameters—i.e. generate *posterior predictive distributions*.

The *M. genitalium* WCM was parametrized using experimentally derived point estimates [31]. These are of course subject to uncertainties to various degrees—and, during model refinement, several parameters needed to be updated in order to reduce discrepancies between model simulations and data observations. However, with such a complex model, we expect there to be strong dependencies between model parameters, so the refined estimates will always be conditional on the fixed values assumed for all other model parameters (which are of course also subject to uncertainty). Modern sensitivity and robustness analysis methods (e.g. [96,97]) may come to the rescue here and allow us to assess and mitigate parametric uncertainty even for models with many parameters.

Bayesian methods, while providing the richest information, are generally only feasible to apply to relatively small-scale models (e.g. tens to hundreds of species and parameters) due to the computational demands of these methods. Similarly, methods to assess identifiability of models and to characterize the impact of parameter uncertainties tend to be limited to similar size models, although some more recent developments can deal with slightly larger scale models of a few hundred parameters [92,98]. Of course, such methods are currently not compatible with the size and complex hybrid nature of WCMs. However, using these approaches to quantify parametric uncertainty in smaller scale constituent submodels, and assessing the impact that this has on our model conclusions, will allow us to be more confident in the quality of the WCM components.

Overall, a host of recent analyses have shown that from an estimation/inverse problem perspective it is important to focus on the joint distributions over parameters, which cannot be fully understood by looking at the uncertainties associated with individual parameters. If we have two parameters, *θ*_{1} and *θ*_{2}, with high uncertainties (whether this is expressed by broad marginal posteriors, flat likelihoods, flat profile-likelihoods, or some other flat cost function), once one of them, e.g. *θ*_{1}, is known we may already have a very good idea as to what the value of *θ*_{2} is going to be. Such conditional certainty in the presence of otherwise considerable (marginal) uncertainty appears to be a hallmark of many dynamical, including stochastic, systems [96,99].

In this context, the notion of *sloppy models* has gained some notoriety/prominence [100]. But statistical inference provides a natural framework in which such issues can be resolved straightforwardly: issues such as sloppiness and identifiability notwithstanding, the sets of parameters that affect system behaviour profoundly will be inferred relatively easily [101,102]. By contrast, parameters that are hard to infer exert less influence on the system's dynamics. A crucial question in this context is how many parameters fall into the two categories, of inferable and non-inferable parameters. For small systems (up to 60 parameters) typically one-third of the parameters are inferable using conventional likelihood criteria, and they suffice to understand and model the system dynamics [58]. Exploring such high-dimensional and complex posteriors is challenging, especially when visual inspection becomes unfeasible or at least problematic. Principal component analysis on posterior samples [63], or use of the Fisher information matrix [96,99]—which quantifies uncertainty in parameter estimates—offer potential routes, e.g. to identify those parameters that exert the greatest influence on system dynamics.

Two caveats are in order at this stage: (i) stitching different smaller and well-parametrized models together to form a single integrated model is likely to introduce complicated correlation structures among the set of parameters in the model which deserve closer attention and (ii) the parameters of an incorrect model may be inferred with relative precision. It is therefore important to consider both structural and parametric uncertainty in every biological modelling study. Scaling such approaches up to WCMs will be a technical challenge and require the development of suitable approximations, or potentially emulation or surrogate modelling approaches [28,103,104].

## 5. Model improvement and validation

Mathematical models represent our best current understanding and representation of the true biological systems. We should aim to continually refine and improve them as new data become available and we gain knowledge about the underlying systems. Model selection methods, outlined above, allow us to propose several alternative mechanistic representations and select those that are best supported by the available experimental data [81,82]. There are also *experimental design* approaches that aim to identify the most informative experiments to perform in order to improve our models and distinguish between different hypotheses or reduce uncertainty in parameter estimates (figure 3) [50–53,105]. Iterative cycles of model prediction, experimental data collection and model refinement enable us to gradually improve models—using well-targeted experiments—and gain mechanistic insight into a given system [53,75,93].

Again, such methods are currently not extendable to the scale and complexity of WCMs, but could—and should—be used to rigorously test, improve and validate smaller subsections of the complete model. In fact, to refine some of the parameter estimates in the *M. genitalium* WCM, reduced versions of the model were constructed in order to make it feasible to apply numerical optimization techniques [31,68]. This WCM was validated by comparing model predictions to several independent experimental datasets, and subsequently used to predict the response of the bacterium to various perturbations (in the form of single-gene mutations) [31,33]. Comparing model predictions to experimental data from various mutant strains identified several discrepancies, which could then be explored in more detail to identify aspects of the model—in this case parameter values—that required updating (and were later shown to be consistent with new experimental measurements).

Despite these successes, these improvements and refinements to the model are fairly ad hoc. For the development of large-scale, hybrid models to become more established and reliable, we need to develop more systematic and automated ways to test and refine these models, particularly when attempting to extend WCMs to more complex organisms and cells [68,106,107]. At present, we cannot identify how much we should trust different aspects of such models, which parts of the model require improvement, and how uncertainties and errors in the model structure and parameters (which are, to some extent at least, inevitable) may be influencing any conclusions drawn from the model.

## 6. Conclusion and outlook

The first comprehensive WCM is an impressive demonstration of how to successfully integrate many large and diverse submodels, and heterogeneous experimental data, into a single cohesive modelling framework. It demonstrates the feasibility, and potential utility, of developing far more complex and intricate models than are currently widely used in systems, cell and molecular biology. However, we need to be aware of the limitations and uncertainties associated with such models, particularly given that many of our established techniques for developing, validating and refining mathematical models simply cannot be applied at these scales.

Extensive analyses of smaller scale models have repeatedly demonstrated that uncertainties in both model structure and parameters are prevalent. Often, numerous models will be able to fit the observed data—even when dealing with very small systems—yet the predictions and conclusions we would draw from these models can differ substantially. Methods for constructing, parametrizing, refining and quantifying uncertainties in models are steadily becoming more scalable. They still, however, fall far short of being applicable at the scale of WCMs, but can be used to rigorously analyse and test the constituent parts that are included in such models. This will not overcome our lack of knowledge about uncertainties within the complete model, but at least can contribute to improving the quality of, and assessing the validity of the assumptions underlying the component submodels.

We should not overlook the roles that models at different levels of abstraction and complexity can play in advancing our understanding of biological systems. Despite the fact that WCMs attempt to represent all cellular components and processes, they still rely on simplified representations of the true processes, and are necessarily biased towards our current understanding. Of course, in some cases, we will need to consider the cellular context and larger system that a biological process is embedded in, in order to explain our observations. However, smaller models that are amenable to the diverse and powerful array of modelling techniques available are much better suited to provide us with detailed mechanistic insight. Using these tools, we can rigorously explore and compare potential hypotheses, quantify the uncertainties associated with our model inputs and outputs, and improve our understanding of complex biochemical processes and regulatory mechanisms occurring within a cell. Without a thorough assessment of our (un)certainty regarding their fundamental dynamical determinants, WCMs would risk representing little more than sophisticated databases that offer few computational advantages compared with models that are slightly less complex but more amenable to existing statistical and computational tools.

## Data accessibility

This article has no additional data.

## Authors' contributions

A.C.B. and M.P.H.S. contributed equally to conception and writing of this review.

## Competing interests

We declare we have no competing interests.

## Funding

A.C.B. is a BBSRC Future Leader Fellow.

## Acknowledgments

We thank the members of the Theoretical Systems Biology Group at Imperial College London for helpful discussions.

- Received March 30, 2017.
- Accepted June 22, 2017.

- © 2017 The Author(s)

Published by the Royal Society. All rights reserved.