## Abstract

Model-guided design has become a standard approach to engineering biomolecular circuits in synthetic biology. However, the stochastic nature of biomolecular reactions is often overlooked in the design process. As a result, cell–cell heterogeneity causes unexpected deviation of biocircuit behaviours from model predictions and requires additional iterations of design–build–test cycles. To enhance the design process of stochastic biocircuits, this paper presents a computational framework to systematically specify the level of intrinsic noise using well-defined metrics of statistics and design highly heterogeneous biocircuits based on the specifications. Specifically, we use descriptive statistics of population distributions as an intuitive specification language of stochastic biocircuits and develop an optimization-based computational tool that explores parameter configurations satisfying design requirements. Sensitivity analysis methods are also performed to ensure the robustness of a biocircuit design against extrinsic perturbations. These design tools are formulated with convex optimization programs to enable rigorous and efficient quantification of the statistics. We demonstrate these features by designing a stochastic negative feedback biocircuit that satisfies multiple statistical constraints and perform an in-depth study of noise propagation and regulation in negative feedback pathways.

## 1. Introduction

The last two decades of intense efforts in synthetic biology have greatly expanded our ability to build synthetic biomolecular circuits by adopting the concepts and techniques from engineering disciplines. Model-guided design is one of such examples that have been routinely used to create safe and robust control systems in traditional engineering [1] and have been adopted in the design process of biocircuits [2]. To date, many biocircuit modules were engineered with the help of model-based simulations, including logic gates [3–5], oscillators [6–9] and genetic memory [10,11] to name a few. A current challenge of biocircuit engineering is to integrate these circuit modules and build systems for complex operations in real-world environments, which requires more stringent reliability of each circuit module. As is the case with any engineering system, a first key step to the robust design of such complex systems is to set appropriate and well-defined performance norms that can specify all the necessary features of systems' behaviour. Mathematical and computational tools then facilitate design space exploration to find parameter configurations that achieve pre-specified performance requirements. Compared with this ideal, the current design process of biocircuits is still immature in that models are used mostly for simulations to gain only qualitative insights rather than for quantitatively guaranteeing the performance of biocircuits by fully benefitting from advanced theory and algorithms. This motivates us to develop computational frameworks that systematically certify and optimize the performance levels of biocircuits.

One of the important features that should be carefully considered in the design process of biocircuits is cell-to-cell variation due to the stochastic chemical reactions in cells. Since the abundance of messenger RNA (mRNA), proteins and other reacting molecules, which define the state of biocircuits, is often low in biological cells, the events of molecular collision firing chemical reactions tend to be random and discrete in time. This leads to the temporal fluctuation of molecular abundance, which is referred to as intrinsic noise, and causes the variability of biocircuit states between cells [12–14]. To guarantee robustness against intrinsic noise, some biomolecular reactions in nature are known to comprise feedback regulation that attenuates the level of noise [15], and others actively leverage intrinsic fluctuations to achieve probabilistic differentiation and decision-making [16]. In the synthetic biology community, these stochastic regulation mechanisms were actively studied to create systems that enable collective decision-making using stochastic responses from multiple cells (e.g. [17–21]). As yet, however, computational tools are lacking for the rational design of biocircuits with user-specified noise characteristics. This situation suggests the necessity to develop a design-oriented computational tool for stochastic biocircuits based on a stochastic model of biomolecular reactions.

The dynamics of the intrinsic fluctuations are often modelled by a Markov process, where the Markov state is a semi-infinite lattice of integers representing the copy numbers of molecules in a cell, and a state transition occurs upon a chemical reaction. Unfortunately, the solution of the governing equation, chemical master equation (CME) [22], is not mathematically tractable in many cases. As a result, the majority of studies in current synthetic biology resort to time-consuming Monte Carlo-based simulations of single-cell trajectories to approximately evaluate the statistics of the cell population [23]. In recent years, other approaches were developed to complement the Monte Carlo approach by approximately computing the distribution [24] and its raw moments [25–27]. Among these, the moment expansion of the CME [25–27] is particularly useful for the design of stochastic biocircuits as it derives the equations of moment dynamics, which enable intuitive specifications of underlying distributions using descriptive statistics such as the mean and the variance. However, the approximate nature of the existing approaches often makes it difficult to rigorously quantify the statistics of the distributions. This is critical especially for the robust design of stochastic biocircuits with stringent specifications.

In this paper, we present a design-oriented computational framework for rigorous quantification of steady-state statistics of stochastic biocircuits using a moment approach (figure 1). To enable rational design of biocircuits based on intuitive specifications of noise statistics, we use the moment expansion of the CME and formulate optimization programs that directly compute descriptive statistics of molecular abundance based on the moment computation method [28–30]. In many cases, deciding a moment of some order requires information of higher order moments, leading to an infinite hierarchy of equations that is hard to solve without approximation. Here, we overcome this limitation by first truncating high order moments from the equations and then searching for all possible values that the truncated moments can take using mathematical optimization. With this approach, it becomes possible to compute the rigorous upper and lower bounds of statistical values using a finite set of equations. In particular, this problem is formulated as a class of convex optimization programs, for which efficient computational algorithms are studied in engineering science [31].

Our optimization-based synthesis approach is capable of characterizing feasible design parameter space that satisfies multiple and possibly incompatible statistical design requirements, allowing for rational engineering of biocircuits based on rigorous and intuitive specifications. These features are demonstrated using examples of a self-negative feedback biocircuit, where a repressor protein regulates its own expression. We show that, surprisingly, descriptive statistical values such as the mean and the variance of distributions are determined almost exactly by a truncated moment equation with the first sixth- or the eighth-order moments despite the infinite hierarchy of the original moment equation. Informed by these analysis results, we further perform an in-depth study of the negative feedback biocircuits to eluciate the mechanism of noise reduction and amplification by the feedback pathway. Finally, parameter sensitivity of the statistical values is analysed to ensure robustness of biocircuits against uncertain model parameters.

## 2. Results

### 2.1. Mathematical model of stochastic biocircuits

We start with a general model of stochastic biomolecular reactions and introduce an ordinary differential equation (ODE) model that describes the evolution of stochastic moments of biocircuits. Suppose a biocircuit consists of *n* species of molecules and *r* types of chemical reactions, and the copy numbers of the *n* molecules, or the state of biocircuits, fluctuate randomly in time due to the stochastic chemical reactions. To model the stochastic dynamics, we denote the copy number of the *j*th molecule by *x*_{j} (*j* = 1, 2, …, *n*) and define the probability that there are * x* = [

*x*

_{1},

*x*

_{2}, …,

*x*

_{n}]

^{T}molecules in a cell at time

*t*by

*P*

_{x}(

*t*). As an illustration example, we consider a simple transcription–translation process in figure 2

*a*. In this example, mRNA and protein copy numbers are the state variables of the biocircuit (

*n*= 2), and there are four reactions (

*r*= 4), transcription, translation and degradation of mRNA and protein. The heterogeneity of a cell population is then captured by the joint distribution of mRNA and protein copy numbers

*P*

_{[x1}

_{,x2}

_{]T}(

*t*), where

*x*

_{1}and

*x*

_{2}denote the copy numbers of mRNA and protein, respectively.

In general, the dynamics of the probability distribution *P*_{x}(*t*) are modelled by a set of ODEs called CME [22].
2.1where *w*_{i}( · ) is a propensity function (reaction rate) associated with the *i*th chemical reaction (*i* = 1, 2, …, *r*), and **s**_{i} is a *n*-dimensional row vector representing the stoichiometry of the *i*th reaction. We assume that the reactions are elementary and thus *w*_{i}( · ) is a polynomial of *x*_{j} (*j* = 1, 2, …, *n*) [32]. Specific forms of *w*_{i}( · ) and **s**_{i} for the transcription–translation process in figure 2*a* are summarized in electronic supplementary material, S.2. Unfortunately, the CME is hard to solve analytically in terms of *P*_{x}(*t*) as the entries of the vector * x* in (2.1) take all combinations of non-negative numbers, and the CME is composed of infinitely many equations. However, it is possible to obtain approximate distributions of

*P*

_{x}(

*t*) by simulating many numbers of single-cell trajectories using a Monte Carlo approach [23] as illustrated in figure 2

*b*. To quantitatively capture the important features of population distributions, we often use descriptive statistics such as the mean and the covariance , where the notation is defined by with the sum taken over all non-negative combinations of

*x*

_{j}'s. Other examples of useful descriptive statistics are the coefficient of variation (CV) and Fano factor , both of which quantify the dispersion of distributions. Correlation is also a useful measure when we are interested in the relation between two molecules.

The design-oriented computational framework presented in this paper complements the Monte Carlo approach by allowing for rigorous evaluation of descriptive statistics without approximation. For this purpose, we first derive an ODE model that describes the dynamics of raw moments, or a moment equation for short, based on the CME (2.1). To elucidate the following mathematical development, we first consider a specific model of the transcription–translation process in figure 2*a*. Let * m*(

*t*) denote a vector of raw moments , where we define raw moments of a distribution

*P*

_{[x1}

_{,x2}

_{]T}(

*t*) by 2.2In what follows, we derive an ODE model for

*(*

**m***t*) to describe the dynamics of the raw moments. The basic idea for the derivation is to multiply

*x*

_{j}′s to both sides of the CME (2.1) and take the sum of

*x*

_{j}′s for all non-negative numbers. For example, we obtain the dynamics of the variance by multiplying

*x*

^{2}

_{1}to both sides and taking the sum as 2.3where the left-hand side is by definition, and the last equality is obtained by substituting the propensity functions

*w*

_{i}(

*) and*

**x***P*

_{x}(

*t*) = 0 for

*x*

_{j}< 0 (

*j*= 1, 2). Using this approach, the moment dynamics are modelled by the ODE 2.4where the first entry of represents the sum of the zero-th order moments of

*x*

_{1}and

*x*

_{2}and guarantees the sum of the probability

*P*

_{x}(

*t*) to be one. Note that no approximation is used in the derivation so far, and equation (2.4) is a linear ODE. Thus, the temporal dynamics of the raw moments

*can be analytically obtained by solving (2.4).*

**m**When the reactions reach steady state, the left-hand side of (2.4) goes to zero, leading to a set of linear equations. Thus, the first- and second-order steady-state raw moments of the protein copy number *x*_{2} are obtained as
2.5and
2.6by solving the linear equations. Using these solutions, we can further compute the variance of the protein copy number as
2.7Substituting parameter values, we can confirm that the analytic solution indeed agrees with the simulated statistics (figure 2*c*). These analytic solutions provide useful insights for the design of stochastic biocircuits and allow for rigorous characterization of the parameter space that satisfies given design requirements. For example, (2.5) and (2.7) suggest that the mean of the protein copy number depends only upon the product of transcription and translation rates normalized by degradation rates, (*k*_{1}*k*_{3})/(*k*_{2}*k*_{4}), and that the variance of the protein copy number can be independently modulated by tuning *k*_{3}/(*k*_{2} + *k*_{4}) while maintaining the ratio (*k*_{1}*k*_{3}*D*_{T1})/(*k*_{2}*k*_{4}) to be constant.

### 2.2. Computing descriptive statistics using semi-algebraic optimization

Unfortunately, moment equations are not analytically solvable when a biocircuit of interest is slightly more complicated. This is because a moment of some order is dependent on higher order moments in many cases, and the moment equation becomes an infinite dimensional equation. More formally, we define raw moments of a distribution *P*_{x}(*t*) by
2.8with and refer to the sum as the order of the moment. Using the same approach to deriving (2.3), a general form of the moment equation is obtained as
2.9where *A* and *B* are constant matrices, * m* is a vector of raw moments up to the

*μ*th order and

*is a vector of the*

**u***μ*+ 1th or higher order moments

*(see electronic supplementary material, S.3 for details). Note that (2.4) is a special case of (2.9) with*

**u***B*= 0. Equation (2.9) implies that the

*μ*th order moments

*, which are the moments of our interest, depend upon the higher order moments*

**m***u*. Thus, it is not possible to uniquely determine the solution of the steady-state moment equation

*A*+

**m***B*= 0 as there are more variables than equations. In fact, analytic steady-state moments are available only in the special case of

**u***B*= 0, in which case

*is obtained by solving*

**m***A*= 0 as shown in (2.5) and (2.6). In general,

**m***B*= 0 holds if and only if all reactions are the zero-th or the first order, that is, the reaction rates

*w*

_{i}(

*) are affine in*

**x***(see electronic supplementary material, S.3).*

**x**To see an example, we consider a negative feedback biocircuit in figure 3*a*, where the expression of the repressor protein is self-regulated by the negative feedback. This biocircuit, despite a slight extension of figure 2*a*, contains a bimolecular reaction, namely the binding of the repressor to the promoter whose propensity function is given by *w*(* x*) =

*k*

_{5}

*x*

_{2}

*x*

_{3}. As a result, the matrix

*B*in (2.9) is no longer zero, and there are infinitely many solutions for the steady-state moment equation unless we know additional information that links

*and higher order moments*

**m***. A typical approach to determining the solution is to approximately express the higher order moments*

**u***using*

**u***m*by assuming some closure relation such that

*: =*

**u***φ*(

*) with some deterministic function*

**m***φ*( · ) [27]. Consequently, the moment equation becomes a closed equation , by which we can obtain an approximate solution of

*. A drawback of this approach is, however, that finding a good closure relation based on the CME is not an obvious task, and the approximation error is usually unknown*

**m***a priori*, which is critical for the design problem of stochastic biocircuits.

To overcome these issues, we here consider exploring all possible combinations of * u* and find a possible range of values that the raw moments

*can take without assuming the closure relation. A key observation is that the variables*

**m***and*

**m***must constitute moments of some probability distribution. In fact, there are algebraic conditions that the variables*

**u***and*

**m***must satisfy to be moments of some probability distribution defined on the positive orthant {[*

**u***x*

_{1},

*x*

_{2}, …,

*x*

_{n}] |

*x*

_{j}> 0,

*j*= 1, 2, …,

*n*} (electronic supplementary material, proposition A.1) [33,34]. As illustrated later, these conditions enable a systematic search of the raw moments over the constrained domain using mathematical optimization. Incorporating these conditions, we have the following proposition.

### Proposition.

*Consider stochastic chemical reactions modelled by the CME* (*2.1*). *The steady-state moments of the probability distribution satisfy*
2.10*where H*^{(γ1}^{)}(* m*,

*)*

**u***and*

*H*

_{k}

^{(γ2}

^{)}(

*,*

**m***)*

**u***represent moment matrices defined in electronic supplementary material,*(

*A.17*)

*and*(

*A.18*).

*The symbol X*⪰

*O represents that a matrix X is positive semi-definite*.

This proposition implies that the raw moments * m* and

*lie in the semi-algebraic set (2.10), which is characterized by the finite number of moments. Thus, we can search for all possible combinations of*

**u***and*

**m***without considering the infinite hierarchy of moment equations and find the upper and/or lower bounds of statistical values such as the variance, CV and Fano factor of molecular copy numbers. In what follows, we show that the problem of finding the upper and the lower bounds of these statistics can be formulated as a mathematical optimization problem, which we can solve efficiently using existing algorithms of mathematical programming.*

**u**We consider the negative feedback biocircuit in figure 3*a* and define *x*_{1}, *x*_{2} and *x*_{3} as the copy numbers of mRNA, repressor protein and free DNA, respectively. This biocircuit was previously analysed as a benchmark example of a feedback biocircuit using a moment closure approach [35]. Here, our goal is to compute the steady-state mean and the CV values of the repressor copy number *x*_{2} with mathematical rigour instead of using the approximate closure relation. To this end, we use the semi-algebraic constraint (2.10) and formulate a maximization problem of the mean and the CV as
2.11where *h*(* m*) is defined by for the mean, and
2.12for the CV, respectively. The solution of this problem then gives an upper bound of the mean and the CV, respectively. It should be noted that the lower bounds are obtained by solving the same form of optimization but by maximizing −

*h*(

*). For the biocircuit in figure 3*

**m***a*, the constraint (2.10) is specifically written as 2.13 2.14and 2.15with and when the moment equation is truncated by the first order, that is,

*μ*= 1. An advantage of using the optimization approach is that we can leverage efficient algorithms for mathematical optimization, whose techniques were extensively studied in engineering science. More specifically, the computation of these summary statistics can be formulated as semi-definite programming (SDP), which is a subclass of convex optimization program that maximizes a linear objective function over a semi-algebraic set specified by positive semi-definite matrices [31,36]. For example, the objective function for the mean maximization problem is

*h*(

*) =*

**m***m*

_{[0,1,0]}, which is linear, and the linear equality constraint in (2.10) can be rewritten as semi-definite constraints

*D*⪰

*O*and −

*D*⪰

*O*using the diagonal matrix

*D*with the

*i*th diagonal entry being the

*i*th entry of the vector

*A*+

**m***B*. Interior point method [37], which is implemented in many optimization software packages, then allows for efficient search of the globally optimal solution.

**u**Using the SDP approach, we computed the lower and the upper bounds of the mean repressor copy number for different values of *μ*, which is a user-specified parameter that determines the largest order of moments in the vector * m* in (2.10) (figure 3

*b*). As we increase

*μ*, the gap between the upper and lower bounds decreases, in general, allowing for better estimation of statistical values at the expense of computational time (see electronic supplementary material, S.4). Interestingly, for the biocircuit in figure 3

*a*, the estimated mean copy number of the repressor converged to 14.7 by only using the

*μ*= sixth order moments (figure 3

*b*), implying that the information of the mean is mostly encoded in the first sixth order moments. The optimization result was also consistent with the mean value obtained by Monte Carlo simulations [23] (figure 3

*c*).

The upper bound of the CV is an important performance norm to quantify the dispersion of the population distribution. As the objective function *h*(* m*) in (2.12) is not linear, the optimization for the CV is not directly solvable by SDP. However, the problem can be equivalently transformed to a minimization problem of

*γ*satisfying 0 <

*γ*≤ 1 and . The second inequality constraint can be equivalently represented by a semi-definite matrix using the Schur complement [31], leading to the form of SDP (see electronic supplementary material, S.5 for the details). We computed the upper bounds of the CV for different values of

*μ*as illustrated in figure 3

*d*, where the lower bounds were also computed for a reference (see electronic supplementary material, S.5). We observe that, similar to the mean value, the lower and the upper bounds of the CV converge to each other as we increase

*μ*.

Similar mathematical techniques apply to other descriptive statistics and allow us to rigorously compute variance, Fano factor and confidence ellipsoids of the molecular copy numbers using SDP (table 1). Specific forms of these optimizations and their mathematical proofs are summarized in electronic supplementary material, S.5. As an illustrative example, we computed the largest confidence ellipsoid of mRNA and protein copy numbers of the transcription–translation circuit in figure 2*a*. The two-dimensional confidence ellipsoid allows for visualizing the correlation between the two molecules (electronic supplementary material, figure S.1). In the design process, the confidence ellipsoids would be useful to investigate how tightly a target molecule is regulated by an upstream molecule. Other optimizations will be demonstrated in the following sections along with the design examples of a negative feedback biocircuit.

### 2.3. Synthesizing biocircuits with statistical design specifications

The process of biocircuit engineering requires many iterations of design–build–test cycles to achieve prescribed performance requirements. As the specifications are possibly incompatible or conflicting, computational design tools are important to efficiently explore and find the feasible design space of biocircuits. Our optimization approach allows for rigorous characterization of biocircuit parameter space satisfying multiple design requirements described by statistical constraints (table 1).

As a demonstration example of our optimization-based biocircuit synthesis method, we consider a biocircuit in figure 4*a*, where a reporter protein is added to the downstream of the negative feedback circuit in figure 3*a*. As design specifications, we require the biocircuit to satisfy the following three performance criteria at steady state

(i) the mean copy number of the repressor molecule is at least 20, ,

(ii) the CV of the repressor protein is less than 0.30, and

(iii) the CV of the reporter protein is less than 0.90, ,

where *x*_{2} and *x*_{6} denote the copy number of the repressor and the reporter proteins, respectively.

For illustration purpose, we consider the case where we can tune two parameters, the translation rate *k*_{3} and the degradation rate *k*_{4} of the repressor protein. These parameters can be tuned, for example, by engineering the ribosome binding site [38] and degradation tags of the protein, respectively. To find the parameter values that satisfy the specifications (i)–(iii), we solved the SDP for the mean and the CV values for different sets of fixed parameters in the parameter space. For each set of parameters, the optimization programs delivered the bounds of the mean repressor abundance (figure 4*c*), and the upper bound of the CV values of both repressor and reporters (figure 4*d*,*e*). Compiling these figures, we obtained the parameter map showing the feasible design space satisfying all of the three statistical design requirements (figure 4*b*).

Figure 4*b* implies that the three design features (i)–(iii) are in a trade-off relationship in that moving a parameter to one direction satisfies one constraint but violates another. Thus, one needs to carefully choose the parameters in the intersection of the feasible domains for the three constraints. The design–build–test cycles of biocircuits can be accelerated with this parameter map as it allows for narrowing the potential combinations of genetic parts to be tested and implemented.

To verify the result of the parameter space exploration, we simulated the stochastic biomolecular reactions of the negative feedback biocircuit using the stochastic simulation algorithm [23], where the parameters were taken from the feasible design space as illustrated by the red dot in figure 4*b*. As expected, the mean copy number of the repressor protein was 26.2 copy, the CV of the repressor and the reporter proteins was 0.273 and 0.725, respectively, which meets all the design specifications (electronic supplementary material, figure S.2).

### 2.4. Noise attenuation requires balanced expression and repression

We further investigated the statistical values of the negative feedback biocircuit in detail to better understand the underlying mechanisms that limit the feasible design space as shown in figure 4*b*. Figure 4*c* shows that increasing the translation rate of the repressor protein results in the increase of the repressor copy number at steady state despite the negative feedback. This implies that the translation of mRNA has more influence on the total copy number of the repressor protein than the negative feedback. Thus, a design strategy for meeting the specification (i), which is to maintain the copy number of the repressor at least 20 molecules, is to increase the translation rate *k*_{3}. On the other hand, figure 4*d* illustrates that the CV of the repressor copy number does not decrease monotonically with *k*_{3}, suggesting that the two design features (i) and (ii) are in a trade-off relationship.

It is interesting to observe that there is an optimal strength of translation *k*_{3} that minimizes the CV of the repressor copy number (figure 4*d*). As the intrinsic noise of biocircuits comes from the low copy nature of molecules, it is counterintuitive that both of the mean and the CV of the repressor increase at the same time as shown in figure 4*c*,*d*. We observed that this trend is generic for a wide range of the repressor–promoter binding rate *k*_{5}, which directly controls the strength of the negative feedback (figure 5*a*). We suspect that this is due to a trade-off relation between the strength of repression and transcription. More specifically, increasing the translation rate results in the attenuation of noise due to the high copy numbers of the repressor protein, but at the same time, it also increases the variance of the mRNA copy number due to the strong repression as illustrated in figure 5*b*. Then, the highly stochastic mRNA transcription indirectly contributes to increasing the dispersion of the copy number of the repressor protein. Figure 5*a* suggests that the former is dominant when the translation rate *k*_{3} is small, but the latter becomes dominant as the increase of *k*_{3}. These analysis results suggest that balancing the repression and the expression is the key to attenuate the noise in negative feedback biocircuits.

### 2.5. Comparing noise statistics of reporter protein and target molecule

A typical approach to probing molecular concentrations, or internal states, of biocircuits without disrupting cells is to express a fluorescent reporter protein at the downstream of a target molecule. The internal states are then indirectly quantified based on the fluorescence measurements. Ideally, a reporter protein would reflect the statistics of the target molecule including fluctuations. Thus, understanding the relation of noise statistics between the reporter protein and the target molecule is important for rigorously quantifying the internal states of biocircuits.

For the biocircuit in figure 4*a*, the copy number of the reporter protein is expected to negatively correlate with the repressor protein (*x*_{2}) and positively correlate with the reporter DNA (*x*_{4}) that is not bound by the repressor. To compare the mean expression levels of the free DNA and the reporter protein, we solved the steady-state moment equation for and obtained
2.16This implies that the mean copy number of the reporter protein does reflect that of the free DNA in a linear fashion, which is practically useful for quantification. We further investigated the relation between the mean values of the reporter and the repressor and plotted their relation in electronic supplementary material, figure S.3 (electronic supplementary material, S.6). Interestingly, the transfer curve is slightly off from the well-known Michaelis–Menten kinetics obtained from a deterministic ODE model. This implies that the noise in the upstream repressor biocircuit affects the mean value of the downstream reporter and that the simple Michaelis–Menten kinetics is erroneous especially when the biocircuit is highly stochastic.

Next, we investigated how the fluctuation of the free DNA abundance (*x*_{4}) is related to the variance of the reporter expression level by solving the SDP for the variance. As illustrated in figure 5*c*, the variance of the reporter is amplified through the transcription and the translation process. Ideally, the variance of the reporter should correlate with that of the free DNA in a linear fashion to enable facile quantification of the noise level. However, figure 5*c* shows that it is actually a nonlinear function of the free DNA variance. Thus, it is necessary to use a calibration curve such as figure 5*c* to quantitatively probe the higher order statistics of the free DNA. These analyses suggest that the expression level of a reporter protein should be carefully tuned especially when one wants to quantify not only the mean but also the variance of a target molecule.

### 2.6. Sensitivity analysis of descriptive statistics

The ability of model-based biocircuit design is currently limited by the uncertainty of parameter values in mathematical models. The source of the uncertainty partly lies in misidentified parameters due to insufficient and noisy measurements, but more inherently, it lies in extrinsic perturbations to host cell environments such as growth conditions. As a result, the process of biocircuit engineering often requires ad hoc tuning of circuit parameters to deal with the deviation of circuit performance from model predictions. Sensitivity analysis allows for quantifying the impact of model uncertainties on the behaviour of biocircuits and finding sensitive design parameters that need special attention in the build process.

We performed sensitivity analysis of the negative feedback biocircuit as shown in figure 4*a* around a nominal parameter value shown as the red dot in figure 4*b*. Specifically, let ** k*** = [

*k**

_{1},

*k**

_{2}, …,

*k**

_{r}]

^{T}denote a vector of the nominal parameters. We consider a parametric perturbation

*** ± Δ**

*k*

*k*_{i}, where Δ

*k*_{i}: = [0, 0, …, 0, Δ

*k*

_{i}, 0, …, 0]

^{T}is a (small) perturbation to the nominal parameter

*k**

_{i}. The goal of the sensitivity analysis is to find the range of the statistics under the perturbation |Δ

*k*

_{i}|≤

*δ*, where

*δ*is a given constant.

Using the SDP in electronic supplementary material, S.5, we computed the lower and the upper bounds of the mean and the CV of the repressor protein copy number when *k*_{1}, *k*_{2}, *k*_{3} and *k*_{4} are perturbed within 5%, 10% and 20% of the nominal value, and then plotted the worst-case statistics (the largest and the smallest values of the statistics) in figure 6. More specifically, the worst-case statistics were obtained by iteratively solving the SDP for the set of parameters within the range of the perturbation. Figure 6 shows that the mean and the CV values do not violate the design specifications (i) and (ii) even with the 20% perturbations, implying that the negative feedback biocircuit designed in figure 4 is quite robust. Looking further into figure 6*a*, we observe that the deviation of the mean copy number is almost equal between the perturbations, and there are no highly sensitive parameters that significantly affect the mean expression level of the repressor. On the other hand, figure 6*b* shows a clear trend that the transcription rate and the degradation rate of the protein, *k*_{1} and *k*_{4}, have larger impact on the CV of the repressor than the other two parameters. These results suggest that the promoter strength and the degradation machinery of the protein such as degradation tags should be more carefully tuned than the other tuning knobs to achieve a high signal-to-noise ratio.

Another important but often overlooked design parameter of biocircuits is the plasmid copy number, which is controlled by the replication origin of a circuit plasmid. Although the plasmid copy number is assumed constant as illustrated in figure 4*b*, variance of the plasmid copy number in real biological cells affects the behaviour of biocircuits. As exact control of the plasmid abundance is not possible, the variability of the plasmid copy number should also be considered to guarantee the robustness of biocircuits. We applied the same approach as above to compute the worst-case statistics of the negative feedback biocircuit against 5%, 10% and 20% deviation of the copy number of the repressor plasmid from the nominal value *D*_{T1}. As illustrated in figure 6*b*, the plasmid copy number is one of the most sensitive parameters that affects the CV of the repressor. Fortunately, figure 6 suggests that the range of the CV still remains within the performance specification (ii) even under the 20% perturbations. This concludes that the designed negative feedback biocircuit can operate robustly under the extrinsic uncertainty of the plasmid copy number.

## 3. Discussion

A promising approach towards robust engineering of complex biocircuits is to guarantee the performance of individual circuit components at high precision. However, the highly stochastic nature of biomolecular reactions often becomes an obstacle for rigorous assessment of biocircuit behaviours. To overcome this problem and enable model-guided robust design of stochastic biomolecular reactions, this paper has presented a computational framework that can mathematically certify the possible range of statistical values that the copy numbers of molecules can take.

Rigorous analysis of stochastic chemical reactions has been a long-standing issue in systems and synthetic biology. The source of the difficulty comes from the fact that the CME and its associated moment equations form an infinite hierarchy of equations. To circumvent this issue, the equations are often approximated based on some assumptions of underlying distributions. For the CME, linear noise approximation [25], *τ*-leaping and Langevin equations [39–42] are popular approaches to approximately simulate the stochastic fluctuations. For moment equations, moment closure is a widely accepted method to truncate the infinite hierarchy of equations using a fixed closure relation [27,43]. Although computationally efficient, these existing approaches often suffer from approximation errors that are not computable beforehand, which is critical for the design purpose of biocircuits.

In contrast with these approaches, the presented method allows for testing all possible values of statistics and finding their bounds without fixing the closure relation. The key to the theoretical development is the semi-algebraic conditions (2.10), also known as Stieltjes moment conditions [33,34], which enable computationally efficient search of moment values using convex optimization. Surprisingly, figure 3*b*,*d* suggests that the infinite hierarchy of moment equations can be solved almost exactly by considering the truncated moment equations with the sixth- or the eighth-order moments. In fact, all computational results presented in this paper are based on the equations with at most eighth- order moments. This implies that the underlying distributions of the molecular abundance can be mostly explained by these moments despite the infinite dimensional equations, which was also qualitatively observed in [44].

From a design viewpoint, the proposed method enables the worst-case analysis of biocircuit performance as it strictly guarantees the range of statistical values. It is thus amenable for robust synthesis of stochastic biocircuits that require high reliability. To demonstrate these features, we have explored the design space of the negative feedback biocircuit in figure 4*a* and obtained the parameter map of feasible design space with which the biocircuit satisfies design requirements. The optimization-based analysis elucidated that there is an optimal translation rate of the repressor protein that best attenuates intrinsic noise and that it is caused by the trade-off relation between repression and expression. It is worth noting that a similar trade-off relation was previously predicted for a metabolic pathway [45] and the repressilator [46] based on approximated model-based analyses. A similar U-shaped trend in figure 5*a* was also observed by experiments [47]. These examples suggest that even simple biocircuits can exhibit complex noise characteristics, which emphasizes the importance of advanced mathematical and computational frameworks for analysing stochastic biocircuits. In fact, increasing efforts have been made to enable rigorous quantification of stochastic biocircuits in recent years. In [29,30], a similar optimization-based approach was explored independently from this work, and the optimization programs were formulated to compute the raw moments instead of the descriptive statistics as demonstrated in this paper. The stationary finite state projection was also proposed to enable rigorous quantification of population distributions at steady state [48].

An important design feature that has not been addressed in this paper is multi-modality of distributions. In synthetic biology, multi-modal population distribution is often associated with multi-stability of the governing dynamics of biocircuits and is used to build switch-like systems [17]. Although an optimization-based approach was recently developed to design multimodal biocircuits by directly minimizing the deviation of the distribution from a desired shape [49], the lack of well-defined statistical metric for multi-modal distribution currently hinders direct application of our statistics-based biocircuit synthesis approach to such systems. Another important design feature that has not been addressed in this work is the temporal evolution of distributions. Oscillator is one of the important biocircult elements that periodically regulates the temporal behaviours of circuit states [8,13,50,51]. However, incorporating the transient and dynamic design features in our optimization framework is not straightforward at this point, though approximate analysis was previously demonstrated with the moment expansion approach [35]. Hence, our future efforts will be devoted to enabling rational biocircuit design with more complex statistical design specifications including multi-modality and transient behaviours.

## 4. Methods

### 4.1. Stochastic simulations

The stochastic simulation algorithm [23] was used to simulate time trajectories of molecular copy counts; 10 000 cells were simulated to draw a snapshot of the population distribution at 1440 min in figures 2*b*, 3*c* and electronic supplementary material, figure S.2 using MATLAB 2016b. The following parameter values were used for the simulations: *k*_{1} = 0.2 min^{−1}, *k*_{2} = ln(2)/5 min^{−1}, *k*_{3} = 0.5 min^{−1}, *k*_{4} = ln(2)/20 min^{−1}, *k*_{5} = 5 copy^{−1} min^{−1}, *k*_{6} = 1 min^{−1}, *k*_{7} = 0.2 min^{−1}, *k*_{8} = ln(2)/5 min^{−1}, *k*_{9} = 0.5 min^{−1}, *k*_{10} = ln(2)/20 min^{−1}. *D*_{T1} = 20 copy was used for the simulation in figure 2*b*, and *D*_{T1} = *D*_{T2} = 50 copy was used for figure 3*c*. The red dot in figure 4*b* corresponds to *k*_{3} = 0.4 and *k*_{4} = ln(2)/80. The initial copy numbers of molecules were assumed zero for all simulations.

### 4.2. Optimization programs for computing descriptive statistics

The semi-definite programs were solved with MATLAB 2016b and Sedumi 1.32 solver [52], where the following options of the solver were used: pars.eps = 0, pars.alg = 2, pars.theta = 0.01, pars.beta = 0.9, pars.stepdif = 1, pars.free = 1, pars.cg.maxiter = 500, pars.cg.refine = 10, pars.cg.stagtol = 5 × 10^{−20}, pars.cg.restol = 5 × 10^{−10}, pars.chol.canceltol = 10^{−20}, pars.chol.maxuden = 4000. The variables * m* and

*were normalized as appropriate by constants to avoid numerical instability.*

**u**For figure 4*c*,*d*, the order of the maximum moment of the moment equation was set *μ* = 8, and the upper bounds of the mean and the CV obtained by solving the SDP in electronic supplementary material, S.5 were plotted. For figure 4*e*, *μ* = 6 was used and the upper bound of the CV was plotted. To draw figure 5*a*,*b*, *μ* = 8 was used. The data points were obtained by iteratively solving the optimization problems for different values of *k*_{3}. Specifically, the translation rate *k*_{3} was varied from 0.025 to 0.9 (figure 5*a*) and from 0.1 to 1.0 (figure 5*b*). The data points in figure 5*c* were obtained by letting *k*_{3} = 0.01, 0.02, 0.03, 0.04, 0.05, 0.07, 0.10, 0.15, 0.25 and 0.50. To compute the variance of the free DNA and the reporter protein, we used *μ* = 8 and *μ* = 6, respectively. In figure 5*a*–*c*, the average of the upper and the lower bound was used to plot the values of mean, variance and CV. For the sensitivity analysis in figure 6, we used *μ* = 8 for all of the parameter perturbations. The nominal parameter values are shown in the ‘stochastic simulations’ section.

## Data accessibility

The program codes used in this study are available at GitHub (https://github.com/hori-group/stochastic_design).

## Author's contributions

Y.H. conceived of and designed the study. Y.S. implemented the computational codes. Y.S. and Y.H. performed the mathematical derivation and the analysis of the computational results. The manuscript was drafted by Y.H. and was edited by Y.S. and Y.H. Both authors gave final approval for publication.

## Competing interests

The authors have no competing interests.

## Funding

This work was supported, in part, by JSPS KAKENHI grant number JP16H07175 and Keio Gijuku Academic Development Funds.

## Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.3956566.

- Received September 28, 2017.
- Accepted December 8, 2017.

- © 2018 The Author(s)

Published by the Royal Society. All rights reserved.