## Abstract

The growth of microorganisms involves the conversion of nutrients in the environment into biomass, mostly proteins and other macromolecules. This conversion is accomplished by networks of biochemical reactions cutting across cellular functions, such as metabolism, gene expression, transport and signalling. Mathematical modelling is a powerful tool for gaining an understanding of the functioning of this large and complex system and the role played by individual constituents and mechanisms. This requires models of microbial growth that provide an integrated view of the reaction networks and bridge the scale from individual reactions to the growth of a population. In this review, we derive a general framework for the kinetic modelling of microbial growth from basic hypotheses about the underlying reaction systems. Moreover, we show that several families of approximate models presented in the literature, notably flux balance models and coarse-grained whole-cell models, can be derived with the help of additional simplifying hypotheses. This perspective clearly brings out how apparently quite different modelling approaches are related on a deeper level, and suggests directions for further research.

## 1. Introduction

Bacterial growth curves have exerted much fascination on microbiologists, as eloquently summarized by Frederick Neidhardt in his short commentary ‘Bacterial growth: constant obsession with d*N*/d*t*’ published almost 20 years ago [1]. When supplied with a defined mixture of salts, sugar, vitamins and trace elements, a population of bacterial cells contained in liquid medium is capable of growing and replicating at a constant rate in a highly reproducible manner. This observed regularity raises fundamental questions about the organization of the cellular processes converting nutrients into biomass.

Work in microbial physiology has resulted in quantitative measurements of a variety of variables related to the cellular processes underlying growth. These measurements have usually been carried out during steady-state exponential or balanced growth, that is a state in which all cellular components as well as the total volume of the population have the same constant doubling time, implying that the concentrations of the cellular components remain constant [2]. The measurements have enabled the formulation of empirical regularities, also called growth laws [3], relating the macromolecular composition of the cell to the growth rate [4,5]. A classical example is the linear relation between the growth rate and the fraction of ribosomal versus total protein, a proxy for the ribosome concentration, over a large range of growth rates [6–9]. The reported regularities between the growth rate and the macromolecular composition of the cell are empirical correlations and should not be mistaken as representing a causal determination of cellular composition by the growth rate [6,10]. In fact, it has been shown that, for certain combinations of media, the same growth rate of *E. coli* may correspond to different ribosome concentrations [6].

To unravel causal relations, it is necessary to go beyond correlations and consider the biochemical processes underlying microbial growth. These processes notably include the enzyme-catalysed transformation of substrates into precursor metabolites, the conversion of these precursors into macromolecules by the gene expression machinery, the replication of the cell when its macromolecular content has attained a critical mass and the regulatory mechanisms on different levels controlling these processes [11–14]. Moreover, for identifying causality, a dynamic perspective on microbial growth focusing on transitions between different states of balanced growth, and the time ordering of events during the transitions, is more informative than considering a population at steady state [10]. Whereas most measurements have been obtained under conditions of balanced growth, in which experiments are easier to control and reproduce, data on transitions from one state of balanced growth to another are also available in the literature (reviewed in [4]). One classical example is the measurements of the temporal ordering at which RNA, protein and DNA attain their new steady-state concentrations after a nutrient upshift [5,15]. Recent experimental technologies, allowing gene expression and metabolism to be monitored in real time, have opened new perspectives for studying the dynamics of bacterial growth on the molecular level [16,17].

The large and complex networks of biochemical reactions enabling microbial growth have been mapped in great detail over the past decades and, for some model organisms, much of this information is available in structured and curated databases [18,19]. While a huge amount of knowledge has thus accumulated, a clear understanding of the precise role played by individual constituents and mechanisms in the functioning of the system as a whole has remained elusive. For example, it is well known that in the enterobacterium *E. coli* the concentration of the second messenger cAMP increases when glycolytic fluxes decrease, leading to the activation of the pleiotropic transcription factor Crp. However, the precise role of this mechanism in the sequential utilization of different carbon sources by *E. coli* remains controversial [20,21].

Mathematical models have great potential for dissecting the functioning of biochemical reaction networks underlying microbial growth [22–24]. To be useful, they need to satisfy two criteria. First, they should not be restricted to subsystems of the cell, but provide an integrated view of the reaction networks, including transport of nutrients from the environment, metabolism and gene expression. In particular, they should account for the strong coupling between these functions: enzymes are necessary for the functioning of metabolism, while the metabolites thus produced are precursors for enzyme synthesis. In the words of Henrik Kacser, one of the pioneers of metabolic control analysis, ‘to understand the whole, you must look at the whole’ [25]. Second, models of microbial growth should be multilevel in the sense of expressing the growth of a population in terms of the functioning of the biochemical reaction networks inside the cells. Growth amounts to the accumulation of biomass, that is proteins, RNA, DNA, lipids and other cellular components produced in well-defined proportions from nutrients flowing into the cells. The two criteria amount to the requirement that models should capture the autocatalytic nature of microbial growth, the production of daughter cells from growth and division of mother cells.

Precursors of such integrated, multilevel models are the simple autocatalytic models of Hinshelwood, capable of displaying steady-state exponential growth and a variety of responses to perturbations reminiscent of the adaptive behaviour of bacteria [26]. Another early example is the coarse-grained model of a growing and dividing *E. coli* cell [27], which has evolved over the years into a model of a hypothetical bacterial cell with the minimal number of genes necessary for growing and dividing in an optimal environment [28]. In addition, we mention so-called cybernetic models describing growth of microbial cells on multiple substrates [29–31], and the E-CELL computer environment for whole-cell simulation [32]. In recent years, integrated, multilevel models of the cell have received renewed attention with the landmark achievement of a model describing all individual cellular constituents and reactions of the life cycle of the human pathogens *Mycoplasma genitalium* [33] and other genome-scale models of bacteria [34]. In addition, several coarse-grained models describing the relation between the macromolecular composition of microorganisms and their growth rate have been published [24,35–39].

At first sight, the above-mentioned models of microbial growth are quite diverse, in the sense that they have a different scope and granularity, make different simplifications, use different approaches to obtain predictions from the model structure and originate in different fields (microbiology, theoretical biology, biophysics and biotechnology). The aim of this review is, first, to show how a general framework for the kinetic modelling of microbial growth, including an analytical expression for the growth rate, can be mathematically derived from few basic hypotheses. Second, we show how additional simplifying assumptions lead to approximate kinetic models that do not require the biochemical reaction networks to be specified in full. The resulting models exemplify two widespread modelling approaches, flux balance analysis (FBA) and coarse-grained whole-cell modelling. The discussion of the different hypotheses and assumptions, including those related to the measurement units employed, which are often not explicit and/or buried in the (older) literature, reveals how the models are related on a deeper level. This will be instrumental for identifying their respective strengths and weaknesses as well as for indicating new directions in the study of the biochemical reaction networks underlying microbial growth.

## 2. Growth of microbial populations

An obvious view on microbial growth starts by considering the individual cells in a growing population (figure 1*a*). We denote by *n*(*t*) the number of cells at time *t* (h). Individual cells in a temporal snapshot of the population have different sizes, as they are in different stages between birth and division. Moreover, cell sizes at birth and division are different [40–42]. As a consequence, the size of the cells in a population at time *t* is best described by a statistical distribution. This distribution may change over time and with the experimental conditions. For instance, in conditions supporting a higher growth rate, the average size of the cell in the population is larger [6,43]. Several models of the cell size distribution and its dependence on the experimental conditions have been proposed, based on different hypotheses about the criterion determining when a cell divides (reviewed in [42,44]). When the size distribution is known at every time *t*, the number of cells in a growing population can be directly used to estimate the volume of the population.

In what follows, however, we will adopt another point of view and ignore the individual cells making up a population. Instead, we directly quantify the growing population in terms of its expanding volume Vol (l) (figure 1*b*), that is, the sum of the volumes of the cells in the population. This aggregate description is appropriate when one is interested in concentrations of molecular constituents on the population level rather than in individual cells, as in the kinetic models developed below. Moreover, it corresponds to most data available in the experimental literature, obtained by pooling the contents of all cells in a (sample of the) population.

We model the growth of a population of microorganisms by means of a deterministic ordinary differential equation (ODE):
2.1that is, the growth rate *μ* (h^{−1}) of the population is defined as the relative increase of the volume of the population. Both Vol and *μ* are functions of time *t* (h). For a constant steady-state growth rate *μ* = *μ**, we obtain the following explicit solution of equation (2.1): Vol(*t*) = Vol(0) · e^{μ* · t}, where Vol(0) represents the initial population volume. The doubling time of a population with a growth rate *μ** is given by *t*_{1/2} = ln2/*μ**. This is a direct consequence of the solution of equation (2.1), which stipulates that Vol(*t*_{1/2}) = 2Vol(0) = Vol(0) · e^{μ* · t1/2}, and therefore ln2 = *μ** · *t*_{1/2}.

The growth rate as defined by equation (2.1) is sometimes also called specific growth rate, in order to indicate that it concerns the increase in population volume per unit of population volume (), instead of the absolute increase in population volume (). In what follows, we will drop the qualifier ‘specific’. The growth rate definition of equation (2.1) should be distinguished from another definition of the growth rate as 1/*t*_{1/2}, that is, the number of doublings of the population volume per time unit. While the two definitions result in a quantity with the same unit, they do not mean the same thing and differ by a factor of ln2 [4]. Below, we use the growth rate definition of equation (2.1).

Models that do not distinguish individual cells but lump them into an aggregate volume have been called non-segregated as opposed to segregated models that do make this distinction [45–47]. If the population is composed of cells with the same growth rate, not much is lost by ignoring individual cells and using the population-level description of equation (2.1) (see the electronic supplementary material). There are situations, however, in which this assumption is not appropriate and in which essential features of the growth kinetics are shaped by the heterogeneity of the population [48–51]. For example, it was recently proposed that the lag observed in diauxic growth of *E. coli* on a glycolytic and gluconeogenic carbon source (e.g. glucose and acetate) is due to the responsive diversification of the population into two subpopulations upon the depletion of the (preferred) glycolytic carbon source and that only one of these subpopulations continues growth on the gluconeogenic carbon source [49]. Non-segregated models are obviously not suitable for describing such phenomena and models describing the dynamics of the distribution of individual cells in a population or of subpopulations need to be used instead.

## 3. Volume and macromolecular content of cells

The model of equation (2.1) is unstructured in the sense that it does not take into account the biochemical processes enabling cells to grow. By contrast, so-called structured models [45–47] explicitly describe molecular constituents of the cell and the biochemical reactions in which they are involved. Let *C*_{i} (g) be the (dry) mass of molecular constituent *i* contained in volume Vol (figure 1*c*). A common assumption supported by experimental data ([52] and references therein) is that the volume of the population is proportional to the biomass, that is, the total mass of the molecular constituents of the cells:
3.1with *B* (g) the biomass. Another way to frame the assumption is to say that the biomass density is constant. In other words,
3.2where 1/*δ* (g l^{−1}) denotes the constant biomass density. For bacterial cells, the cytoplasmic biomass density has a value of about 300 g l^{−1} [53,54], meaning that 70% of the cell content is water. Macromolecules make up most of the biomass. For *E. coli*, Bremer & Dennis [6] conclude that the sum of protein, RNA and DNA accounts for between 65% and 73% of the total cellular dry mass, depending on the growth rate, whereas Basan *et al.* [55] report a stable proportion of approximately 90%. In all of these cases, protein constitutes the largest mass fraction.

Consistent with the decision above to consider the population as a non-segregated volume, we define the concentration *c*_{i} (g) of each molecular constituent *i* in a population as
3.3If the cells all have the same concentration of constituent *i*, that is, if molecules are evenly distributed between the cells, then *c*_{i} also applies to the individual cells (see the electronic supplementary material). While this is a suitable approximation in many cases, there are also situations where variability of enzyme and metabolite concentrations occurs and may lead to a heterogeneous population of cells with different growth phenotypes [50,56].

An immediate consequence of the above definition is the following relation: 3.4In words, the assumption of the proportionality of volume and biomass implies that the total concentration of molecular constituents in a growing cell population is constant. While this corresponds to measurements for balanced growth, not many data are available for growth transitions (but see [57]).

The dynamics of each molecular constituent *i* are modelled by means of an ODE, obtained from equations (2.1) and (3.3):
3.5Note that a dilution term due to the growth of the population appears in the equation describing the dynamics of *c*_{i}. As a consequence, if the mass of a specific molecular constituent *i* remains constant (), but the population continues to grow (*μ* > 0), its concentration decreases (), as intuitively expected.

The growth rate itself is directly connected to the concentrations of the molecular constituents, because 3.6

Therefore, while it makes sense for a specific constituent *i* to dilute out when it is not produced, no growth dilution occurs if the mass of all molecular constituents remains constant ( for all *i*). In the latter case, it follows from equation (3.6) that the growth rate is 0 by definition.

It is increasingly realized that growth dilution may have important physiological consequences [52,58,59] and therefore cannot be neglected in mathematical models of cellular processes. In particular, the interaction of a synthetic circuit with the growth physiology of the cell, and the changes in the growth rate this entails, may have an unexpected nonlinear feedback on the dilution of transcription factors and thus on the functioning of the circuit. This was illustrated by a synthetic circuit in *E. coli* in which the alternative T7 RNA polymerase regulates itself and a fluorescent protein. Expression of the fluorescent protein causes a metabolic burden, impairing growth and thus growth dilution of T7 RNA polymerase. The resulting positive feedback was shown to lead to two different phenotypes: growth and growth arrest [59].

An important special case of microbial growth occurs when the growth rate and the concentrations of the individual molecular constituents are constant over time, that is, *μ* = *μ** and *c*_{i} = *c**_{i}, for all *i*. From *c*_{i} = *C*_{i}/Vol = *c**_{i} it follows that a doubling of the volume Vol of the population is accompanied by a doubling of the mass *C*_{i} of each molecular constituent, which explains why this situation of steady-state exponential growth is also referred to as balanced growth [2,60].

## 4. Biochemical reactions underlying microbial growth

The molecular constituents of the cell are continually produced and consumed by biochemical reactions. Many of these reactions are enzyme-catalysed, such as the metabolic reactions involved in the conversion of nutrients from the environment into building blocks for macromolecules (amino acids, nucleotides) and energy carriers (ATP, NADH). The building blocks and energy are consumed in large part by the transcription and translation reactions producing macromolecules. The metabolic reactions together form the metabolic network of the cell [14,61].

The term in equation (3.5) represents the net effect of the biochemical reactions on the concentration of molecular constituent *i*, separate from growth dilution. Usually, for intracellular reactions, the quantities of molecular constituents are expressed in molar rather than mass units. Hence, we introduce *X*_{i} = *C*_{i}/*α*_{i}, with *X*_{i} (mol) the molar quantity of constituent *i* and *α*_{i} (g mol^{−1}) the molar mass of *i*. The reason for this change in units is that kinetic models of biochemical reactions are based on the physical encounters of molecules in the cell [62,63], which is best expressed in terms of molar quantities. With this unit conversion, and *x*_{i} = *X*_{i}/Vol, equation (3.5) becomes
4.1The term can be further developed by explicitly accounting for the reactions producing and consuming the *i*th molecular constituent. Consider the *j*th reaction, in which constituent *i* participates with stoichiometry *N*_{ij}, that is, reaction *j* produces a net change of *N*_{ij} molecules of constituent *i*. If the reaction produces constituent *i*, then *N*_{ij} > 0, whereas if it consumes constituent *i*, then *N*_{ij} < 0 (if constituent *i* is not altered in the reaction, then *N*_{ij} = 0). We define *N*_{i} as the (row) vector of stoichiometry coefficients of constituent *i* for all reactions in the system. Moreover, we define the (column) vector of reaction rates *v*, such that *v*_{j} is the rate of the *j*th reaction (mol l^{−1} h^{−1}).

With the help of the above concepts, the effect of the biochemical reactions on the concentrations of molecular constituents can be rewritten as
4.2or in more compact form, denoting the (column) vector of the concentrations of all molecular constituents by *x*:
4.3This is the classical formulation of stoichiometry models of biochemical reactions, extended with a dilution term [62,64]. Equation (4.3) does not explicitly take into account that the reaction rates *v* depend on the concentrations of the molecular constituents participating in the reactions. That is, it would be more appropriate to write *v* in functional form *v*(*x*). The model of equation (4.3) describes the biochemical reaction system on the population level. If all cells have the same reaction rates, then the model applies also to the individual cells (see the electronic supplementary material). It should be noted though that reaction rates may differ between cells, even when the concentrations *x* of cellular constituents are identical, due to the intrinsic stochasticity of biochemical reactions [63].

As a consequence of the conversion of *C*_{i} to *X*_{i} and the introduction of reaction stoichiometries, the growth rate becomes,
4.4The growth rate thus equals the sum of all reaction rates in mass units ( (g l^{−1} h^{−1})), that is the net rate of accumulation of intracellular molecular constituents within a unit volume per unit time, relative to the total amount of molecular constituents within a unit volume (1/*δ* (g l^{−1})). The latter quantity can equivalently be written as , following equation (3.4) and *x*_{i} = *c*_{i}/*α*_{i}.

Combining all of the above, we obtain the following model for a growing microbial population:
4.5and
4.6We emphasize that the explicit expression for *μ* in equation (4.6) is not an ad hoc definition, but mechanically follows from the basic modelling assumptions underlying the stoichiometry model of equation (4.5), notably the assumption of constant biomass density. Figure 2*a* schematically projects the reaction network on a growing microbial population.

Textbooks on the modelling of biochemical reaction systems detail the different rate laws that specify how the reaction rates *v*_{j} depend on the concentrations *x* [62,64]. A common choice, relying on first principles, is to assume mass–action kinetics for the reactions, based on the random encounter of molecules in a well-mixed volume [62,63]. In many situations, however, it is more convenient to lump individual reactions into aggregate reactions that are described by approximate rate laws such as (reversible and irreversible) Henri–Michaelis–Menten kinetics, Monod–Wyman–Changeux kinetics, Hill kinetics, etc. [62,64]. The Henri–Michaelis–Menten rate law for an irreversible, enzyme-catalysed reaction with substrate concentration *x* and enzyme concentration *e* reads: , with , where *k*_{cat} (min^{−1}) is the so-called catalytic constant of the enzyme, quantifying the maximum number of substrate molecules converted per enzyme per minute. This expression, and many other approximate kinetic rate laws, can be derived from mass–action kinetics when making appropriate assumptions on the time scale of the rate of the elementary reaction steps. In the case of the Henri–Michaelis–Menten rate law, this concerns the association/dissociation of enzyme and substrate and the formation of the product [65,66].

## 5. Growth in a changing environment

Some of the reactions changing the molecular constituents of the cell correspond to exchanges with the environment, that is the uptake of substrates and the excretion of products. The environment is not explicitly modelled by equations (4.5) and (4.6) and the entries in *v* corresponding to the rates of these exchange reactions are therefore treated as external inputs. For many purposes, however, it is more appropriate to extend the model and include a (simple) representation of the environment. In what follows, we equate the environment with a bioreactor filled by a liquid medium of fixed volume containing the growing population of microorganisms as well as external substrates and products. The substrate and product concentrations in the medium are denoted by the vector *y*. Usually, external concentrations are expressed in terms of units g l^{−1}, that is mass in a fixed volume of medium.

The dynamics of the substrate and product concentrations in the medium can be described by the following differential equation:
5.1where *E* is the stoichiometry matrix for the exchange reactions, *α*_{y} is the diagonal matrix of molar mass coefficients of the external metabolites (g mol^{−1}) and Vol_{medium} is the (constant) volume of the medium (l). Usually, Vol≪Vol_{medium}. The multiplication of *α*_{y} · *E* · *v*(*x*, *y*) by Vol expresses the fact that the total rate of consumption of substrates and accumulation of products depends on the volume of the growing microbial population. The division of the resulting product by Vol_{medium} means that we are interested in the concentration of these substrates and products in the medium. Equation (5.1) can be rewritten in a more classical form by explicitly using the biomass variable *B* (g), introduced in the previous section, and the concentration of biomass in the medium *b* (g l^{−1}), defined as *b* = *B*/Vol_{medium}. It follows with equation (3.2) that
5.2and, consequently,
5.3

The above considerations lead to the following extended model, taking into account the dynamics of exchanges with the environment (figure 2*b*):
5.4
5.5
5.6and
5.7where we have used the equalities
5.8to obtain the biomass differential equation. For some purposes, it is useful to split the reaction rate vector *v*(*x*, *y*) into rates of exchange reactions *v*_{ex}(*x*, *y*) and rates of internal reactions *v*_{int}(*x*), where obviously the latter do not depend on the concentration of external substrates.

Interestingly, the above model can be used to derive an explicit relation between growth rate and substrate availability. A key insight for the derivation is that due to coupling of the molar mass coefficients and the stoichiometry coefficients, the expressions for the internal reaction rates in the right-hand side of equation (5.6) cancel out. Consider an arbitrary internal reaction, irreversibly converting one molecule of reactant A into *n*_{ab} molecules of reactant B (with molar masses *α*_{a} and *α*_{b}, respectively) at a rate *v*_{ab}. Note that the reaction rate *v*_{ab} occurs twice in the sum of equation (5.6): −*α*_{a}*v*_{ab} (for reactant A) and *α*_{b}*n*_{ab}*v*_{ab} (for reactant B). However, due to mass conservation, we must have *α*_{b}*n*_{ab} = *α*_{a}, so that the two terms in the sum cancel out. Extending this argument to every internal reaction gives
5.9where *E*_{k} denotes the *k*th row of *E*, corresponding to external metabolite *k*, and *α*_{y,k} the *k*th diagonal element of *α*_{y}. In words, the only remaining terms are the rates of the exchange reactions, because they occur only once in the sum of equation (5.6). The minus sign in −*E* is explained by the fact that, for uptake reactions, the sum of equation (5.6) includes the increase of intracellular biomass components rather than the decrease of extracellular metabolites (the opposite for excretion reactions). Note that it follows from equations (5.3), (5.8) and (5.9) that , expressing mass conservation.

Furthermore, assume that the exchanges of the cells with the environment can be reduced to the uptake of a single substrate S, used for the production of biomass. The concentration of the substrate in the medium is denoted by *s*, its molar mass *α*_{s} and its uptake rate *v*_{s}. Note that, in this case, *y* = *s*, *α*_{y} = *α*_{s} and *E* = − 1, so that we obtain *μ* = *δ* · *α*_{s} · *v*_{s}(*x*, *s*). That is, the growth rate is directly proportional to the substrate uptake rate, a relation sidestepping the biochemical reactions taking place inside the cells. If we further choose a saturating function for the uptake kinetics, , we obtain the so-called Monod equation [67]
5.10with . The Monod equation, which has the same mathematical form as the Henri–Michaelis–Menten rate law, is a well-known phenomenological relation that has been shown to fit quite well data of the steady-state growth rate of bacteria as a function of a single growth-limiting substrate [3,67]. More complex uptake patterns may occur when several substrates are available [68–71]. While in many bacteria the availability of a preferred carbon source represses the utilization of other, secondary carbon sources, a phenomenon known as carbon catabolite repression (CCR) [20], low growth rates or mixtures of secondary carbon sources without the preferred carbon source may disable CCR and lead to the co-utilization of different carbon sources.

In equations (5.4)–(5.7) it is implicitly assumed that the only changes in the concentrations of substrates and products in the environment occur through exchanges with the growing microbial population, making it an instance of a batch culture. The model can be easily adapted to other environments, such as continuous culture or fed-batch culture [72,73]. In a continuous culture, a fixed amount of medium per time unit, including microbial cells, is replaced by fresh medium, whereas in a fed-batch culture, nutrients are added over time without removing spent medium (and Vol_{medium} is no longer constant). While these different bioreactor regimes have been mostly used in the context of biotechnological applications, it is interesting to remark that complex natural environments, such as the digestive tracts of vertebrates and insects, can profitably be modelled as coupled series of bioreactors [74,75].

Equations (5.4)–(5.7) form a self-consistent kinetic model of a growing microbial population, taking up nutrients from the environment, converting these into biomass, and excreting by-products. In theory, the model is capable of accomodating all internal reactions and reactions exchanging substrates and products with the environment, from enzymatic reactions to signalling pathways and transcription and translation. Some of the examples of whole-cell models mentioned in the introduction can be seen, to some extent, as instances of this general scheme [28,32]. In practice, such models are not easy to build though. They quickly become very complex to handle, with hundreds of reactions and molecular constituents whose concentrations evolve on very different time scales. Moreover, many of the parameter values will be unknown or known only within an order of magnitude, creating difficult model identification problems [76–78].

## 6. Connecting metabolism and growth: flux balance analysis

The practical difficulties encountered when dealing with large kinetic models of microbial cells have motivated approximate models that are based on a number of simplifying assumptions. One well-known example are so-called FBA approaches [79–81]. Below we summarize how flux balance models can be obtained from the general modelling framework of equations (5.4)–(5.7), by progressively introducing additional modelling assumptions.

A first simplifying assumption consists in limiting the scope of the models to metabolism alone, disregarding proteins and other macromolecules. It may seem somewhat paradoxical to exclude the major constituents of biomass from a model of microbial growth, but equation (3.6) can be replaced with a new definition of the growth rate, based on the rate of consumption of biomass precursor metabolites. To this end, similar to what was proposed in a recent review of FBA [79], we distinguish between free metabolites and the same metabolites incorporated into proteins and other macromolecules. The former, with concentration vector *x*_{M}, are included in the model, whereas the latter, with mass vector *C*′_{M}, are not, although they will be used in the derivation of the model (figure 3). The biomass *B* (g) is assumed to consist only of the mass of metabolites incorporated into proteins and other macromolecules, that is, where *l* runs over the incorporated metabolites. For reasons of consistency, we also restrict *δ*, the inverse biomass density, to these incorporated metabolites. In agreement with the above, we define a new vector *v*_{M}, consisting of the rates of the exchange reactions and the reactions that produce metabolites in *x*_{M}, as well as the corresponding stoichiometry matrix *N*_{M}.

The coefficients *β*_{l} = *C*′_{M, l}/*B* represent the mass fractions of the incorporated precursor metabolite in the biomass. By definition, *β*_{l} ≥ 0 and , and we further suppose, as a second simplifying assumption, that these mass fractions are constant. The biomass composition has been empirically determined for several microorganisms, usually for a specific growth condition [82–84]. The incorporation of the precursor metabolites into the biomass, in the proportions *β*_{l} in which they compose the latter, can be seen as a macroreaction. To unambiguously define this macroreaction, we introduce the reaction rate vector *v*′_{M}, which describes the rate of incorporation into proteins and other macromolecules of the (free) metabolites. More precisely, *v*′_{M, l} (mol l^{−1} h^{−1}) represents the rate of incorporation of the metabolite having concentration *x*_{M, l}. Many of the rates *v*′_{M, l} will be 0, because the corresponding metabolites are not included in the biomass (*β*_{l} = 0). In principle, the degradation of macromolecules back to precursor metabolites would lead to additional reaction rates, but, given that proteins, the main component of biomass, are usually stable on the time scale of interest [85,86], the reverse reactions are ignored here.

From the above, and from applying the general growth rate expression of equation (4.6) to the biomass constituents *C*′_{M, l}, it follows that
6.1where *v*_{B} (g l^{−1} h^{−1}) is defined as the rate of the biomass macroreaction, that is, the total rate of incorporation of precursor metabolite mass into biomass per unit volume of the cell population. Moreover, the dynamics of the mass of each incorporated metabolite *l* in the growing microbial population is given by
6.2We also obtain from the definition of the biomass composition that
6.3so that combining equations (6.2) and (6.3) yields an expression for the rates of the individual incorporation reactions:
6.4In words, the rate of incorporation of each individual metabolite is proportional to the rate of the biomass reaction, modulated by the factor *β*_{l}/*α*_{l}.

The assumption of a constant biomass composition, leading to equation (6.4), means that the ratio of the time-varying variables *C*′_{M, l} and *B* is constant. Hence it follows from equations (3.2) and (3.3) that the concentrations of the pools of incorporated precursor metabolites *c*′_{M, l} are also constant for all *l* (i.e. *c*′_{M, l} = *C*′_{M, l}/Vol = *β*_{l}/*δ*). This can be interpreted as assuming that any changes in a slowly varying environment lead to a rapid adjustment of the rates in the metabolic network, and consistent with this, a rapid adjustment of concentrations of the free metabolites, so as to obtain invariant steady-state concentrations of the incorporated precursor metabolites. In other words, the metabolic system is at quasi-steady state with respect to the environment [62,87]. Indeed, measured *in vivo* response times of many metabolite pools in *E. coli* are on the order of seconds to minutes [16,88], whereas the concentrations of external substrates in equation (5.5) vary on a time scale set by the growth rate when they remain well above the half-saturation constant *K*_{s} defining the uptake kinetics. As an aside, we note that constant concentrations of incorporated precursor metabolites do not exclude that the concentrations of individual enzymes, not modelled here, may vary over time [89].

When further assuming, third, that growth dilution of metabolite concentrations *x*_{M} can be ignored, as its effect is negligible with respect to the turn over of metabolite pools by enzyme-catalysed reactions, we obtain the following modification of the stochiometry model of equation (5.4), now restricted to the metabolic network and the consumption of biomass precursor metabolites by the biomass reaction:
6.5where *N*_{B} = ( … , − *β*_{l}/*α*_{l}, … )^{T}. The quasi-steady-state value of the metabolite concentrations is indicated by an asterisk (*).

A fourth key simplification underlying FBA, in line with the quasi-steady state of metabolism, is to ignore the kinetics of the reactions and consider only fluxes, that is reaction rates at steady state. As a consequence, the explicit dependence of fluxes on concentrations disappears from the model and the fluxes become the new variables of the system: 6.6where we have dropped the steady-state symbol (*) from the fluxes.

Equation (6.6) is a linear system that is usually degenerate, in the sense that the number of rows in the matrix (*N*_{M} *N*_{B}) is much smaller than the number of columns. As a consequence, the system does not have a unique, but an infinite number of solutions, given by the kernel of the stoichiometry matrix, *ker*(*N*_{M} *N*_{B}) [90]. Hence, an infinite number of flux distributions satisfy the stoichiometry constraints. The space of solutions can be reduced by taking into account additional inequality constraints on the fluxes, obtained (directly or indirectly) from measurements:
6.7where and are lower and upper bounds on the fluxes, respectively.

One specific case of interest are measurements of the uptake and excretion fluxes *v*_{M, ex}. If these measurements are sufficiently precise, then a subset of solutions may be obtained in which the possible values for intracellular, non-measured fluxes remain within tight bounds. This approach, called (stoichiometric) metabolic flux analysis (MFA) [91], underlies, for example, the analysis of the influence of a post-transcriptional regulator, CsrA, on the flux distribution in central carbon metabolism in *E. coli* [92]. From measurements of the uptake and excretion fluxes of wild-type and mutant strains growing on glucose, estimates of glycolytic fluxes were obtained that, combined with measurements of metabolite pools and gene expression, allowed one to pinpoint the effect of CsrA on the activity of PfkA, a central glycolytic enzyme. If measurements are reduced to exact values, that is if , then the addition of the corresponding equality constraints may under certain conditions lead to a unique solution of equation (6.6) [93].

While flux measurements can thus be used to reduce the solution space, in many cases this is not enough to obtain sufficiently informative predictions of intracellular fluxes. One way to proceed is to select within the remaining set of solutions those that satisfy some optimization criterion, an approach called FBA [80,94]. The most frequently chosen criterion is the maximization of the growth rate. The choice of this criterion is based on the argument that a higher growth rate provides a selective advantage to microorganisms, because it allows competitors for shared resources to be outgrown. In our case, following equation (6.1), the growth rate is proportional to the rate of the biomass reaction, so that growth-rate maximization results in a linear optimization problem:
6.8FBA has been used in many applications [95], such as predicting growth rates of *E. coli* on different carbon sources [96] and in different mutants before and after adaptive evolution [97].

Various extensions of classical FBA as summarized by equation (6.8) have been proposed in the literature. For our purpose, a relevant extension is dynamic FBA. In this case, the solution of the FBA problem is embedded in a model of the dynamically changing environment, such that the concentration of external metabolites *y* provides constraints on the fluxes:
6.9In particular, nutrient uptake fluxes depend on the concentration of external metabolites. This dependence may, for example, follow a Henri–Michaelis–Menten rate law, as proposed in the previous section. Following the convention that uptake fluxes are negative, an uptake flux in *v*_{M}, involving external metabolite *k*, will typically have an upper bound 0 and a lower bound −*k*_{y,k} · *y*_{k}/(*K*_{y,k} + *y*_{k}), where *k*_{y,k} (mol l^{−1} h^{−1}) is the maximum uptake rate of external metabolite *k*, and *K*_{y,k} (mol l^{−1}) is its half-saturation constant. In dynamic FBA, in particular the so-called static optimization variant [98], at each time point *t* with a specific value of *y* = *y*(*t*), the following linear optimization problem is solved:
6.10The resulting values of the flux distribution *v*_{M, opt}(*y*), and the flux of the biomass reaction *v*_{B, opt}(*y*) leading to the maximal growth rate *δ* · *v*_{B, opt}(*y*), enter the model of the dynamically changing environment
6.11and
6.12Notice that, in general, the flux distribution *v*_{M, opt}(*y*) is not unique. To make the problem well-posed, additional criteria for selecting optimal solutions need to be specified. To this end, approaches to sample the set of possible flux distributions in a computationally efficient and biologically meaningful manner have been developed [99,100]. Other approaches explore the set of possible solutions by tying its geometry to the structure of the underlying reaction network [101,102].

The main limitation of FBA and dynamic FBA is that these approaches require strong assumptions to be made. To compensate for the absence of kinetic information, cells are hypothesized to optimize a specific objective function, here the growth rate. In many cases the use of growth-rate maximization is debatable [103,104] and it is not straightforward to specify in advance which alternative objective criterion is appropriate. The focus on metabolism excludes proteins and other macromolecules from the model. The absence of these major biomass constituents requires the definition of a new biomass reaction, which comes with additional assumptions on the dynamics of metabolite concentrations. Moreover, FBA models occlude the fundamental autocatalytic nature of the cell, in the sense that the products of metabolism are utilized for synthesizing proteins that in turn control metabolic reactions as well as transcription and translation processes [105]. While a number of extensions of FBA have been proposed in the literature [34,106–112], these do not entirely make up for the above-mentioned limitations.

## 7. Connecting gene expression, metabolism and growth: coarse-grained whole-cell models

Another way to sidestep the full complexity of the metabolic and gene regulatory networks controlling microbial growth is to preserve the modelling scheme of equations (5.4)–(5.7), but to simplify the equations in a different way. The kinetics of the reactions, and notably the regulatory interactions shaping the kinetics, are no longer ignored, as in the previous section. However, instead of accounting for individual molecular constituents of the cell, these are lumped into a few classes of constituents with their corresponding macroreactions. These approximations result in a model with the same scope, but that provides a more coarse-grained picture of the cell.

An example of this approach are so-called self-replicator models. These models provide a high-level description of the functions involved in the growth of a population, notably the conversion of external substrates into metabolic precursors (metabolism) and the synthesis of macromolecules, notably proteins, from these precursors (gene expression). The self-replicatory nature of the system originates in the catalytic role of the proteins in both metabolism (enzymes) and gene expression (RNA polymerase, ribosome). The principle of self-replicator models of microorganisms can be found in the work of Hinshelwood [26], Gánti [113] and Koch [114], to cite some early examples. More recently, Molenaar *et al.* [37] used self-replicator models as an analytical tool for explaining the phenomenon of overflow metabolism in various bacteria. They proposed that this wasteful excretion of carbon sources during fast growth arises from a trade-off between what the authors call metabolic efficiency (high production of precursors per unit substrate) and catabolic efficiency (high production of precursors per unit enzyme).

An example of a self-replicator system is shown in figure 4. In this case, following the scheme of equations (5.4)–(5.7), *y* = *s* represents the concentration of an external substrate, and *x* = (*p*, *r*, *m*)^{T} the concentrations of precursor metabolites P, ribosomes and other components of the gene expression machinery R, and enzymes M, respectively. The entries of the reaction rate vector *v* = (*v*_{p}, *v*_{r}, *v*_{m})^{T} denote the substrate uptake rate, enzyme production rate and ribosome production rate, respectively. With these substitutions, the general model of equations (5.4)–(5.7) can be rewritten as
7.1
7.2
7.3and
7.4where *n*_{p}, *n*_{r}, *n*_{m} are stoichiometry constants, and *α*_{s} and *α*_{p} (g mol^{−1}) are the molar mass coefficient of substrate and precursor molecules, respectively. We also introduce *α*_{m} and *α*_{r}, the molar mass coefficients of the components of the metabolic and gene expression machinery, respectively. The expression for the growth rate is obtained from mass conservation, which implies (as explained in §5) that *α*_{p} · *n*_{r} = *α*_{r} and *α*_{p} · *n*_{m} = *α*_{m}.

Note that, like in the previous section, protein degradation is ignored in the model, motivated by the observations that the half-lives of proteins are usually sufficiently long to be ignored on the time scale of interest. Moreover, the only macromolecules we consider are proteins, thus excluding RNA and DNA. This is motivated by the fact that the mass fraction of RNA and DNA is limited, maximally approximately 20% in *E. coli* [6], but it should be remarked that the gene expression machinery includes ribosomal RNA in addition to ribosomal proteins.

Equation 7.3, the expression for the growth rate, can be further analysed by making some additional assumptions beyond the fundamental hypothesis of constant biomass density [36]. Neglecting the contribution of the metabolic precursors to the biomass, we obtain from equation (3.2) that
7.5where *R* + *M* is the total amount of protein (in units g). As *R* = *α*_{r} · *r* · Vol and *M* = *α*_{m} · *m* · Vol, it follows from equation (7.5) that *α*_{r} · *r* + *α*_{m} · *m* = 1/*δ* and therefore . The equations describing the dynamics of *r* and *m* are therefore not independent, and one of them may be dropped from the system of equation (7.1). Moreover, substituting the expressions for and into , and using the equalities between the stoichiometry constants and the molar mass coefficients due to mass conservation, allows us to obtain an insightful approximate expression for the growth rate:
7.6
7.7That is, the growth rate equals the total mass of protein synthesized per unit time and unit volume, or equivalently the total mass of precursors consumed for protein synthesis per unit time and unit volume (*α*_{p} · (*n*_{r} · *v*_{r} + *n*_{m} · *v*_{m}) (g l^{−1} h^{−1})), normalized by the total mass of protein per unit volume (1/*δ* (g l^{−1})).

In what follows, we will write *v*_{ps} = *n*_{r} · *v*_{r} + *n*_{m} · *v*_{m} for the total protein synthesis rate (mol l^{−1} h^{−1}). Furthermore, we introduce the following kinetic expressions for *v*_{ps} and *v*_{p}:
7.8and
7.9where *k*_{r}, *k*_{m} are catalytic constants (min^{−1}) and *K*_{r}, *K*_{m} half-saturation constants (mol l^{−1}). Note that *m*, while not explicitly included in the model, is given by the conservation equation .

Giordano *et al.* [36] set *n*_{r} · *v*_{r} = *λ* · *v*_{ps} and *n*_{m} · *v*_{m} = (1 − *λ*) · *v*_{ps}, for 0 ≤ *λ* ≤ 1, and by means of the above expressions for *v*_{ps} and *v*_{p}, the value of *λ* resulting in the maximum growth rate during steady-state exponential growth was determined. The empirical regularities relating the growth rate to the ribomal protein mass fraction [24] could thus be reproduced. The analysis can be generalized to the situation where the system is not in steady state, but makes a transition from one state of balanced growth to another following a nutrient upshift. In this case, *λ* is not constant, but time-varying. Using concepts from optimal control theory [115], it can be shown that the *λ* leading to optimal biomass accumulation has a bang-bang profile, alternating periods of exclusive synthesis of R with periods of exclusive synthesis of M, until the new steady state is reached. A regulatory strategy defining *λ* in terms of *p* and *r* was proposed that approximates this optimal solution. Interestingly, this strategy has structural similarities with the action of the ppGpp system in *E. coli*, known to play an important role in growth control [116]. Several other coarse-grained models based on assumptions similar to the ones developed above can be found in the literature, all describing aggregated autocatalytic processes converting nutrients into proteins [24,35,37–39,117,118]. Some of the models are analysed from an optimization perspective, whereas others detail regulatory mechanisms controlling the growth rate in response to changes in the environment.

In the example above, coarse-graining of the microbial cell was carried out *a priori*, based on our understanding of the major cellular functions involved in microbial growth. An alternative to this top-down approach would be to start from an extensive characterization of the individual molecular constituents and the biochemical reactions in which they are involved and to group these together into functional modules. This bottom-up approach relies on appropriate criteria for defining modules, based on the structure or the dynamics of the network. A discussion of the wide variety of criteria proposed is beyond the scope of this review (see [119] instead). Once a modular structure of the network has been determined, however, the dynamics of each module can be described by formulating a macroreaction and defining a kinetic rate law for the macroreaction. Such an approach has been used, for example, for modelling the accumulation of lipids and carbohydrates in unicellular microalgae [120]. The modules in this study were defined by a time scale decomposition, grouping together molecular constituents that are at quasi-steady state on a given time scale (see also [121]).

The use of an abstract representation of cellular components and processes is a strength of self-replicators and other coarse-grained models, but also their limitation. It notably makes it more difficult to quantitatively account for data on the molecular level, for example perturbations of specific reactions or the addition of specific components to the growth medium. By contrast, the representation of individual biochemical reactions is a strength of FBA models discussed in the previous section. However, these models lack the dynamic feedback from gene expression and growth to metabolism that distinguishes self-replicator models. Can one imagine hybrid FBA–self-replicator models that combine the strengths of both? Given that the model simplifications underlying the two approaches are quite different, this may not be easy to achieve, although some interesting variants of FBA, including additional flux constraints derived from the catalytic activity and molecular weight of proteins, should be mentioned here [34,106,107,109,110]. An alternative strategy would be to embed a detailed kinetic model of some module of interest within a coarse-grained model of the entire cell. The latter strategy of localized fine-graining in a global coarse-grained model may strike an adequate compromise between the simultaneous needs of molecular detail, model tractability and adequacy with the experimental data.

## 8. Concluding remarks

The growth of microorganisms arises from the conversion of nutrients in the environment into biomass, mostly proteins and other macromolecules, by intracellular networks of biochemical reactions. The aim of this paper has been to review the literature in the context of a general modelling framework derived from basic assumptions about microbial growth and biochemical reaction networks. In particular, we have considered the cells in a population as a non-segregrated aggregate, characterized by their combined volume rather than by a distribution of individual cells. Concentrations of molecular constituents were correspondingly defined over the entire population volume and, at all times, the total mass of molecular constituents was assumed proportional to the population volume (constant biomass density). The dynamics of this system was described by a deterministic ODE model. Figure 5 summarizes some of the fundamental modelling choices underlying the modelling framework developed here [45–47].

The modelling framework has allowed the discussion of a broad variety of models integrating growth of microbial populations with the dynamics of the underlying reaction networks. The contribution of this paper does not so much lie in the derivation of the modelling framework, because most of the assumptions made and arguments advanced can be found in the (older) literature. Rather, the interest lies in bringing these insights together and making explicit modelling assumptions that are often forgotten or whose consequences may not always be recognized, including the careful consideration of the units of the different quantities. For example, this has brought to the fore that the first-order growth dilution term appearing in many models originates from the proportionality of the biomass and aggregate population volume. Moreover, the definition of biomass as the mass sum of the molecular consituents in the cell population was seen to lead to an explicit, analytic expression for the growth rate (instead of a heuristic definition added *a posteriori*). Finally, the fact that the total concentration of molecular constituents is constant contributes a constraint that can be usefully exploited for model calibration [37,38]. In general, making explicit the assumptions that underlie a model is critical for its use as a ‘logical machine’ converting assumptions about biological processes into testable predictions [122].

The focus on non-segregated, deterministic models entails a bias in that it ignores such important phenomena as transport, cell division and population heterogeneity. The existence of a lipid membrane containing proteins that allow the uptake and secretion of metabolites is one of the defining characteristics of microbial cells. A specific class of self-replicator models, sometimes referred to as protocells, addresses this issue by coupling biochemical processes inside the cell to the growth of the cell membrane, in some cases explicitly accounting for the three-dimensional cell shape and cell division [37,123,124]. The engineering of actual protocells is an interesting branch of ongoing work at the frontier of biological chemistry and biophysics [125,126], with applications in biotechnology [127]. Biomass synthesis and cell division are precisely coordinated during microbial growth [128], but the underlying mechanisms involved are still not well understood. Some variants of the above-mentioned protocell models, describing biomass accumulation and cell division in yeast on the global level, have integrated a simplified representation of the network controlling the cell cycle to provide a mechanistic basis for the synchronization of growth and division [124,129].

Population heterogeneity plays a key role in such diverse phenomena as resistance to antibiotics and biofilm formation. Heterogeneity often arises from the stochasticity of biochemical reactions, amplified by the small numbers of the cellular constituents involved in the reactions, especially in gene expression [63,130]. Stochastic models are necessary to explore bistability, the mathematical property that lies at the heart of the above-mentioned forms of population heterogeneity, but that cannot be analysed with the deterministic models discussed here. While full-scale stochastic models of the biochemical networks underlying cellular growth and division are rare, some models do introduce stochastic variables for mRNA and protein constituents [33,129]. For instance, one of the interesting aspects of the whole-cell model of *M. genitalium* [33] is that it combines a variety of different modelling formalisms for different cellular functions, including deterministic (FBA) models of metabolism, deterministic (ODE) models for cell division, and stochastic models for transcription, translation, and degradation of mRNA and proteins.

To a first approximation, current modelling efforts push in two directions. The first strategy attempts to construct whole-cell models that are as complete as possible, including a maximum of knowledge of cellular components and their interactions on the molecular level. The resulting models provide a detailed executable map of the cell with a variety of uses, for example the *in silico* screening of the effects of drug candidates, the design of genetically-modified organisms or the identification of gaps in our knowledge [131]. Owing to their size and complexity, the models are difficult to build, maintain, and revise however, requiring a sustained community effort for all but the simplest cells. Moreover, the level of detail included in the models may not make them most suitable for apprehending global principles of growth control shared between different microorganisms.

A second strategy consists in increasing the coarseness of the models while preserving their scope, notably by coupling growth to intracellular biochemical processes. The resulting models are much more tractable from a mathematical and computational point of view, and they are particularly suited for exploring the consequences of hypotheses on the global architecture of growth control. On the other hand, by stripping away molecular details and focusing on a few explanatory principles, such coarse-grained models run the risk of losing key features of microbial cells. In particular, the complexity of regulatory mechanisms may lead to unexpected cross-talk between cellular functions not accounted for in abstract models but possibly critical for their predictive success. Moreover, in addition to contributing to the beauty of living systems [132], the molecular details of regulatory mechanisms may also be important for matching the model with quantitative data and for understanding evolutionary trajectories of microorganisms. As an illustration of the latter point, a recent study attributed the increased growth of an *E. coli* strain in minimal media observed in adaptive laboratory evolution experiments to specific point mutations in the *β* subunit of RNA polymerase [133].

In our view, one of the most promising directions for further work lies in finding original combinations of the above-mentioned strategies. In particular, local fine-graining of functions of interest in a coarse-grained model of the cellular machinery responsible for growth and division may yield models that are at the same time robust over a range of growth conditions and that can be related to specific regulatory mechanisms on the molecular level. From the point of view of experimental validation, such models would have the advantage that predictions of the behaviour of modules developed in molecular detail can be directly tested against experimental data, as they will correspond to measurable concentrations of molecular constituents. At the same time, the embedding of detailed modules in a global model of cellular physiology will widen its applicability to experimental scenarios in which growth or other major aspects of the physiological state are perturbed. The approach also exemplifies the well-known adage that models are not universal but developed for a specific question. Indeed, combining local fine-graining with a coarse-grained view of cellular physiology does not yield a single model, but rather a family of models each developing in detail a specific function or mechanism, depending on the question at hand.

## Data accessibility

Electronic supplementary material is available from the journal web site.

## Author's contributions

All authors contributed to the development of the ideas summarized in the manuscript. H.d.J. drafted the manuscript and S.C., N.G., E.C., D.R., J.G. and J.-L.G. helped draft the manuscript. All the authors gave their final approval for publication.

## Competing interests

We declare we have no competing interests.

## Funding

This work was funded by the Programme Investissements d'Avenir, Bio-informatique, RESET (ANR-11-BINF-0005) and the Inria Project Lab AlgaeInSilico. We thank the research program Labex SIGNALIFE (ANR-11-LABX-0028-01) and Conseil Régional PACA for partial funding of the PhD thesis of S.C.

## Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.3929395.

- Received July 12, 2017.
- Accepted October 31, 2017.

- © 2017 The Author(s)

Published by the Royal Society. All rights reserved.