## Abstract

Noise is often indispensable to key cellular activities, such as gene expression, necessitating the use of stochastic models to capture its dynamics. The chemical master equation (CME) is a commonly used stochastic model of Kolmogorov forward equations that describe how the probability distribution of a chemically reacting system varies with time. Finding analytic solutions to the CME can have benefits, such as expediting simulations of multiscale biochemical reaction networks and aiding the design of distributional responses. However, analytic solutions are rarely known. A recent method of computing analytic stationary solutions relies on gluing simple state spaces together recursively at one or two states. We explore the capabilities of this method and introduce algorithms to derive analytic stationary solutions to the CME. We first formally characterize state spaces that can be constructed by performing single-state gluing of paths, cycles or both sequentially. We then study stochastic biochemical reaction networks that consist of reversible, elementary reactions with two-dimensional state spaces. We also discuss extending the method to infinite state spaces and designing the stationary behaviour of stochastic biochemical reaction networks. Finally, we illustrate the aforementioned ideas using examples that include two interconnected transcriptional components and biochemical reactions with two-dimensional state spaces.

## 1. Introduction

Stochastic fluctuations in the levels of cellular components are essential to biological processes such as gene expression coordination and cellular probabilistic differentiation [1,2]. There are two commonly used approaches to modelling the time evolution of a spatially homogeneous mixture of molecular species that interact through a set of known chemical reactions [3]. The deterministic formulation specifies the time-rates-of-change of the molecular concentrations of component species with a set of coupled differential equations and assumes continuous variations in the molecular concentrations [4]. By contrast, the stochastic formulation may be carried out in a variety of ways. In particular, the chemical master equation (CME) is an exact model that describes the time evolution of the probability distribution over the set of all possible microstates of the system with a set of coupled linear differential equations [5,6]. Owing to the analytic and numerical intractability of the CME, approximations such as the chemical Langevin equation and the Fokker–Planck equation have been developed [7]. All three of the above stochastic models give rise to the aforementioned deterministic formulation, known as the reaction rate equations, in the thermodynamic limit when volume tends to infinity while the concentrations remain constant [8]. Although the deterministic approach is adequate in many cases, its assumption that a chemical reaction system evolves deterministically and continuously can be invalid at low molecular counts due to experimental evidence of stochastic effects and integer-valued molecular counts [5]. In such situations, stochastic modelling becomes necessary [9–11].

The CME describes a continuous-time Markov jump process: each state represents the molecular counts of all component species, and transitions between states correspond to changes in molecular counts via chemical reactions [12]. Analytic solutions to the CME are important in biological engineering for several reasons. For instance, simulations of biochemical reaction networks that are multiscale in time can be expedited by incorporating analytic solutions of the fast time-scale dynamics [13]. Analytic stationary solutions also enable accurate analysis and design of the effect of parameter values on the stationary behaviour of a reaction system [13]. In general, analytic forms of transient and stationary solutions to the CME for arbitrary reaction network topologies are still unknown, mainly because the number of states increases exponentially with the number of component species [14]. However, there are known results for some simple biochemical reaction networks with particular initial conditions [14], such as mass-conserving linear reaction systems with multinomial initial distributions [15,16], and linear reaction systems with Poisson initial distributions [17]. In the absence of analytic solutions, Monte Carlo algorithms, such as Gillespie's stochastic simulation algorithm (SSA) [3,18], are used to approximate the solution to the CME. Nevertheless, the computational cost of the SSA becomes enormous when there are numerous reactions present in the system, and the method does not guarantee error bounds on the approximate solution [19].

Mélykúti and co-workers [13,20] proposed a technique for determining the analytic stationary distributions of the CME for stochastic biochemical reaction networks whose state spaces can be constructed by gluing two finite, irreducible state spaces at one or two states sequentially. The stationary distribution on the combined state space is a linear combination of the equilibria of the single Markov jump processes [13]. An analogous method for gluing state spaces at two states was developed in [20]. Mélykúti and co-workers' gluing technique forms a basis for the construction of recursive algorithms that provides a fast way to compute analytical expressions for stationary distributions on large state spaces. To illustrate this recursion graphically, we can imagine gluing three triangles together at their vertices to form a triangular grid, and by forming increasingly larger triangular grids using existing ones, we can rapidly construct very large grids.

The gluing technique has a number of potential advantages over existing methods. The simplest approach that one might take to compute an analytic stationary distribution of a continuous-time Markov jump process is to solve for a left null vector of the transition rate matrix [12]. However, the dimension of the matrix is almost always infinite, often making the calculation exceedingly difficult. There are numerical methods such as the finite-state projection (FSP) algorithm [19] that can be used to truncate infinite state spaces. However, it is unclear how to employ the FSP algorithm to obtain analytic stationary solutions. The gluing technique can provide analytic solutions for finite-state spaces, and may be the basis for recursive algorithms that derive analytic solutions for infinite state spaces. The gluing technique also provides a link between stationary distributions to the CME and the graphical structure of the state space. Furthermore, predicting the dynamic behaviour of a biological network from that of its constituent modules is a central yet generally unsolved problem in systems and synthetic biology [21–24]. The development of the gluing technique can potentially contribute to computing analytic solutions using its recursive property [20].

While this work is built on the technique of Mélykúti and co-workers [13,20], we highlight three major contributions of this study. Firstly, we demonstrate how to implement the technique recursively to construct solutions to state spaces that are much larger than the ones considered in [13,20]. Secondly, we establish a link between graph theoretical characterizations of state spaces and the gluing strategy employed. Lastly, we apply the algorithms developed in this work to an area that is distinct from the motivations in [13,20]. Their goal was to carry out time-scale separation, while we use analytic solutions for designing equilibrium behaviour of biochemical systems, which is a relatively new application of interest in synthetic biology.

In this work, we explore the capabilities of Mélykúti and co-workers' gluing technique and introduce recursive algorithms that use the technique to compute the stationary solution to the CME. We first introduce basic notions, including the CME model, the relationship between Markov jump processes and graph theory, and the gluing technique. We then use graph theory to formally characterize the set of state spaces that can be constructed by carrying out single-state gluing of paths, cycles or both sequentially. We subsequently demonstrate the single and two-vertex gluing techniques with simple chemical reaction systems. In addition, we discuss extending the method to infinite state spaces and designing the stationary behaviour of stochastic biochemical reaction networks. Finally, we illustrate the aforementioned ideas using examples that include docking and undocking transitions of hairpin ribozyme [25], a protein folding pathway [26], a phosphotransfer system [9] and two interconnected transcriptional components [27].

## 2. Background

To introduce Mélykúti and co-workers' [13,20] gluing technique, we first present the CME and then explain its connection with graph theory.

### 2.1. The chemical master equation

The CME describes the time evolution of a chemically reacting system as a continuous-time Markovian random walk in the space of molecular counts of biochemical species [6,18]. Consider a system with *n* biochemical species in a well-stirred solution of fixed volume and temperature. We denote the molecular count of each species by . Suppose that the molecular counts of these biochemical species are specified by the initial state at the time *t*_{0} ≥ 0 and they change to other states *X* = (*X*_{1}, …, *X*_{n}) via *m* possible chemical reactions . For each reaction , let *a*_{j}(*X*) be the associated *propensity function* and the *state-change vector* [17], which we will define as follows. At state *X*, the probability that a single occurrence of takes place in an infinitesimal interval d*t* is given by *a*_{j}(*X*)d*t*, and *ν*_{j} is the vector of induced changes in *X*. Therefore, the probability that a particular reaction occurs first (i.e. the random walk jumps to the state *X* + *ν*_{j} next) is equal to . We assume *mass-action* kinetics in this work. The reaction equations of the zeroth-, first- and second-order reactions have left-hand side of the forms ∅︀, , (*l*≠*i*) and . These reactions have propensity functions *κ*, *κ**X*_{i}, *κ**X*_{i}*X*_{l} and , respectively, with an appropriate rate constant *κ* > 0.

Let denote the probability of the Markov chain being at state *X* at time *t* > *t*_{0}, conditional on the initial state *X*^{0}. The CME is then a set of coupled linear differential equations, with the equation for state *X* being
2.1The first summation term on the right-hand side of equation (2.1) corresponds to reactions via which the random walk can jump to state *X* in one step, and the second summation term considers reactions from which the random walk can leave *X* in one step [14].

### 2.2. The connection between continuous-time Markov jump processes and graph theory

In this subsection, we explain the connection between graph theory and the Markov jump process that underlies the CME. We only consider reversible chemical reactions. The state space of the Markov jump process described by the CME naturally gives an undirected graph without self-loops (i.e. edges that connect a vertex to itself): vertices represent states, and an edge exists between a pair of vertices if and only if there exists a reversible reaction that allows transition between the two corresponding states.

For instance, gene expression regulation can be described as the reversible binding of transcription factors with copies of a gene. Let , and denote the transcription factor, the free gene and the gene-transcription factor complex, with molecular counts *T*, *G* and *G*^{*}, respectively. Let *κ*_{b} and *κ*_{d} be the binding and unbinding rate constants. The reaction equation is then given by
2.2

If we assume the reaction system to be closed (i.e. mass-conserving), then mass conservation leads to the relation *T* + *G* + 2*G*^{*}=*N*, where is a constant that is determined by the initial condition. For instance, if the initial condition is (*T*, *G*, *G*^{*}) = (1, 1, 2), then the state space with the corresponding transition rates is

and the corresponding graph is a path of length 3.

In the rest of this work, we refer to the graphic structure that is given by a state space, the states thereof and the corresponding transition rates simply as the state space for brevity. In §2.3, we discuss deriving the stationary distribution of a continuous-time Markov jump process by gluing two state spaces together at one or two states.

### 2.3. The stationary distribution of a continuous-time Markov jump process glued together from two state spaces at one or two states

Having introduced the CME and its connection with graph theory, we are ready to present the technique proposed by Mélykúti and co-workers [13,20]. Mélykúti *et al.* developed a technique in [13] for solving the stationary distribution of a continuous-time Markov jump process by gluing the state spaces of two finite, irreducible, continuous-time Markov jump processes at exactly one vertex. Mélykúti & Pfaffelhuber [20] considered gluing state spaces at two vertices simultaneously. In both papers, vertices are glued together if and only if they correspond to the same state. Without introducing any new jumps or losing any existing jumps, a new Markov process is naturally identified on the combined state space. No existing literature has studied gluing at more than two vertices.

We first consider the one-vertex gluing technique in [13]. Suppose that *A* and *B* are two continuous-time Markov jump processes with finite, irreducible state spaces. Suppose that process *A* has a known stationary distribution *ξ*^{A} on states indexed by {1,2, … , *r*}, and process *B* has a known stationary distribution *ξ*^{B} on states indexed by {1,2, … , *s*}. The order in which the states are indexed does not affect the result. Without loss of generality, we assume gluing the two state spaces at state *r* of process *A* and state 1 of process *B*. We relabel the states by keeping the indices of process *A* the same and increasing the indices of process *B* by *r* − 1. The new Markov process has a unique stationary distribution [13] given by
2.3where *ψ* = (*ξ*^{A}_{r} + *ξ*^{B}_{1} − *ξ*^{A}_{r}*ξ*^{B}_{1})^{−1} is a normalizing constant. Figure 1 illustrates the one-vertex gluing technique using the example in §2.2.

When the two processes to be glued together have exactly two pairs of identical states, we employ the two-vertex gluing introduced in [20]. With the same setting as before, we now glue state *r* − 1 of process *A* to state 1 of process *B*, and state *r* of process *A* to state 2 of process *B*. We relabel the states by keeping the indices of process *A* the same and increasing the indices of process *B* by *r* − 2. The glued vertices are not necessarily consecutive in index, but we assume the index introduced for simplicity of notation. In general, deriving the stationary distribution of the new Markov process obtained by two-vertex gluing entails lengthy calculations. However, Mélykúti & Pfaffelhuber [20] prove that if the proportionality condition specified by the equation
2.4is satisfied, then the stationary distribution on the combined state space is
2.5where *ψ* = [*ξ*^{A}_{r−1} + *ξ*^{B}_{1} − *ξ*^{A}_{r−1}(*ξ*^{B}_{1} + *ξ*^{B}_{2})]^{−1} is a normalizing constant. The condition in equation (2.4) is necessary for the stationary distribution on the combined state space to preserve the proportions of probability within each Markov jump process. It is interesting that this condition is also sufficient.

## 3. Characterizing graphs that can be obtained by gluing paths, cycles or both, at one vertex sequentially

In this section, we introduce definitions in graph theory based on [28], and then use them to characterize the set of state spaces that can be obtained by gluing paths, cycles or both, at one vertex sequentially. This set of state spaces provides a natural starting point for studying the gluing technique [13,20] for two main reasons. First, analytic solutions are already known for the stationary distributions on path-like and circular state spaces [12,29,30]. Second, one-vertex gluing involves only simple arithmetic and is computationally efficient. We find that (i) graphs obtained by gluing paths at one vertex sequentially are trees, (ii) graphs obtained by gluing cycles at one vertex sequentially are ‘trees of cycles’, and (iii) graphs obtained by gluing paths and cycles at one vertex sequentially are ‘trees of trees and cycles’. We give formal propositions describing these results in this section and proofs in the electronic supplementary material.

Let *X*^{(k)} = {*A*⊆*X*: |*A*| = *k*} be the set of *k*-element subsets of *X*. A *graph* *G* is defined by an ordered pair (*V*, *E*) where *V*(*G*)≠∅︀ is a finite set, called the *vertex set*, and *E*(*G*) ⊆ *V*^{(2)} is an *edge set*. We consider undirected graphs without self-loops in this work, as explained in §2.2. In other words, if {*u*, *v*} ∈ *E*(*G*), then *u*≠*v* and we can denote {*u*,*v*} by *uv* or *vu* equivalently. If *uv* ∈ *E*(*G*), then vertices *u* and *v* are *adjacent* and are both *endvertices* of edge *uv*. The *degree* of a vertex *v* ∈ *V*(*G*), denoted by *d*_{G}(*v*), is the number of distinct vertices that are adjacent to *v* in graph *G*. A vertex with zero degree is said to be *isolated*. A *path of length n*, called an *n-path*, is a graph with the vertex set {*v*_{i} : *i* = 1, 2, … , *n* + 1} and the edge set {*v*_{i}*v*_{i+1} : *i* = 1, 2, … , *n*}. By convention, a 0-path is a vertex. A *cycle of length n*, called an *n-cycle*, is a graph with the vertex set {*v*_{i} : *i* = 1, 2, … , *n*} and the edge set {*v*_{i}*v*_{i+1} : *i* = 1, 2, … , *n* − 1}∪{*v*_{n}*v*_{1}}. A graph is *connected* if any two distinct vertices are joined by a path. A connected and acyclic graph is called a *tree*. Furthermore, a graph *H* is a *subgraph* of *G*, written as *H* ⊆ *G*, if *V*(*H*) ⊆ *V*(*G*) and *E*(*H*) ⊆ *E*(*G*). Suppose that *G* is a connected graph. For our purposes, we define the graph *G* − *H* as the graph obtained by first deleting edges of *H* from *G* and then removing isolated vertices of the remaining graph (e.g. figure 2). Similarly, if *G* has three or more vertices and *e* ∈ *E*(*G*), then the graph *G* − *e* is obtained by first deleting the edge *e* from *G* and then removing any isolated endvertex of *e* in the remaining graph.

### Proposition 3.1.

*A graph can be obtained by gluing paths together at one vertex sequentially if and only if the graph is a tree.*

Figure 3 gives an example of a tree and one way to construct the graph by gluing paths at one vertex sequentially.

### Proposition 3.2.

*A graph can be obtained by gluing cycles together at one vertex sequentially if and only if the graph satisfies all of the following conditions:*

(i)

*the graph is connected,*(ii)

*every vertex has an even degree, and*(iii)

*any two distinct cycles have at most one common vertex.*

A graph that satisfies all the conditions in proposition 3.2 can be thought of colloquially as a ‘tree of cycles’. Figure 4 gives an example of a tree of cycles and one way to construct the graph by gluing cycles at one vertex sequentially. For a formal definition, see the electronic supplementary material.

### Proposition 3.3

*A graph can be obtained by gluing paths and cycles together at one vertex sequentially if and only if the graph satisfies both of the following conditions:*

(i)

*the graph is connected, and*(ii)

*any two distinct cycles share at most one common vertex.*

Intuitively, a graph that satisfies both the conditions in proposition 3.3 is a ‘tree of trees and cycles’ (e.g. figure 5).

The one-vertex gluing technique [13] provides an easy method for computing the stationary distributions of biochemical reaction systems whose state spaces are characterized by propositions 3.1–3.3. We note that any state space can be constructed by gluing simple graphs together at one or two vertices consecutively, such as by adding one edge at a time. However, this strategy entails a large number of steps to build large graphs and thus is computationally inefficient. Moreover, deriving the stationary distribution of the new Markov process obtained by two-vertex gluing is complicated, in general, except when the proportionality condition in equation (2.4) holds. In §4, we explore the development of recursive algorithms based on the one- and two-vertex gluing techniques [13,20] through a set of biochemical reaction systems.

## 4. Computing analytic stationary solutions to the chemical master equation using recursive algorithms

In this section, we propose recursive algorithms that apply the gluing technique to find stationary solutions to the CME of a set of biochemical reaction systems. In §4.1, we define the set of biochemical reactions with finite state spaces for which our algorithms are intended and demonstrate with examples. In §4.2, we derive the analytic stationary solution of two interconnected transcriptional components, for which the state space is infinite, and discuss using the solution to design the stationary distribution.

### 4.1. Closed systems with reversible, elementary reactions

In this subsection, we will focus on a set of biochemical reactions that have finite state spaces, which can be constructed by applying the gluing technique finitely many times. We study a biochemical reaction system with an infinite state space in §4.2.

We define the term *elementary reaction* in definition 4.1 for the purposes of our study.

### Definition 4.1.

Consider three distinct biochemical species and . For *i* = 1, 2 and 3, let *α*_{i}, *β*_{i}∈{0, 1, 2}. A reaction of the form
4.1is an *elementary reaction* if , and .

The reactions specified by definition 4.1 can be categorized into five types: , , , and . These reactions can be seen as building blocks for more complex reactions. For instance, is an approximation to the reactions , where is an intermediate.

We study reversible, elementary reactions in closed systems throughout this subsection. The graphical structures of the state spaces of these biochemical reaction systems are determined by the initial state and the number of reversible reactions present. The gluing technique requires that both of the two constituent Markov chains are irreducible [13,20]. Assuming that all reactions are reversible makes all edges in the state space bidirectional, simplifying the process of dividing the state space into parts that each correspond to an irreducible Markov chain.

#### 4.1.1. One reversible reaction

There are three types of closed systems with exactly one reversible, elementary reaction, which we list in table 1. The state spaces of these reactions are paths whose lengths are determined by the initial state. Examples of such reaction systems occur when studying the change in conformational states of RNA secondary structure [25]. In particular, the docking and undocking transitions of the hairpin ribozyme can be modelled by the first-order reversible reaction in table 1 [25].

We first present the known results in [29] about the stationary distributions on finite, path-like state spaces. We then propose a recursive algorithm that uses the one-vertex gluing technique. Let and consider a continuous-time Markov jump process on a path-like state space with vertices indexed by {0, 1, … , *N*}. Suppose that *p*_{i} is the transition rate from *i* to *i* + 1 for *i* ∈ {0, 1, …, *N* − 1}, and *q*_{i} from *i* to *i* − 1 for *i*∈{1, 2, …, *N*}. The stationary distribution (*ξ*_{0}, *ξ*_{1}, … , *ξ*_{N}) is given by the relation and the normalization equation .

Here, for *x* ≥ 0, let ⌊*x*⌋ denote the integer part of *x*. Alternatively, the stationary distribution can be obtained by gluing paths of lengths 1, 2, 4, … , 2^{⌊log2}^{N⌋−1} at one endvertex sequentially and then adding a path of length *N* − 2^{⌊log2} ^{N⌋}. The stationary distribution on a path of length *N* − 2^{⌊log2} ^{N⌋} can be obtained using a similar strategy. Consider the example in §2.2 with initial state (*T*, *G*, *G*^{*}) = (7, 7, 0). Figure 6 illustrates graphically our recursive algorithm to obtain the stationary distribution on the state space, which is a path of length 7.

#### 4.1.2. Two reversible reactions

There are eight types of closed systems with exactly two reversible, elementary reactions such that the reactants and products are coupled, which we list in table 2. The state spaces of these reactions are diagonally truncated grids, the sizes of which depend on the initial state. Biological examples of such reaction mechanisms include protein folding pathways [26] and a phosphotransfer system [9]. We propose a recursive algorithm that applies the gluing technique to compute analytic stationary distributions on such state spaces. Specifically, we study the first-order reversible reactions in table 2. This system can represent a protein folding pathway with transitions between an unfolded denatured state, a folding intermediate state and a native state [26]. We then use a similar analysis framework to derive the stationary distribution of a phosphotransfer system that can be modelled using the second to last reaction system in table 2.

We first consider a closed system with two coupled monomolecular reversible reactions
4.2The total number of molecules, denoted by *N*, is a constant that is determined by the initial state. It is sufficient to denote each state by a 2-vector (*X*_{1}, *X*_{2}) as *X*_{3} can be calculated using the conservation law *X*_{3} = *N* − *X*_{1} − *X*_{2}. Figure 7 shows the state space when *N* = 3 and the corresponding graphical representation. All closed systems with exactly two reversible, elementary reactions have state spaces of such a diagonally truncated grid structure.

When *N* = 1, the state space is simply a 2-path. For integer *N* ≥ 2, we propose an algorithm that takes *N* and *κ*_{j}, where *j*∈{1, 2, 3, 4}, as inputs and returns the stationary distribution of the corresponding biochemical reaction system. The algorithm constructs the state space recursively by sequentially gluing together small graph components. Figure 8 demonstrates the case when *N* = 3. Specifically, we construct the state space by first gluing together three L-shaped components sequentially at one vertex and then checking the proportionality condition, as given in equation (2.4), for the remaining missing edges. We label states in the state space from left to right and from bottom to top so that the index increases naturally as the graph grows. The method is applicable to any integer *N* ≥ 2.

In general, we can construct the state space by first gluing together *N* L-shaped components sequentially at one vertex and then checking the proportionality condition, as given in equation (2.4), for all missing edges. For each L-component, we call the state with the lowest index state 1, the state with the second lowest index state 2 and so on. For *j*∈{1, 2, 3, … , *N*}, the stationary distribution *ξ*^{ j} on an L-shaped component with state 1 being (*X*_{1}, *X*_{2}) = (0, *j*) is given by
4.3where *ψ*_{j} is a normalizing constant. After gluing together *N* of these L-shaped components sequentially at one vertex, the graph contains all of the states from the state space. In order to complete the construction, we need to add edges by gluing at the two endvertices of each missing edge simultaneously. The proportionality condition, as given in equation (2.4), holds at all missing edges between states (*i*, *j* − *i*) and (*i*, *j* − *i* − 1) for *j*∈{2, 3, 4, … , *N*} and *i*∈{1, 2, 3, … , *j* − 1}. Therefore, the stationary distribution of the original state space is the same as that of the state space constructed with L-shaped components which is given by the equation
4.4with the normalizing constant *ψ* = (1 + *κ*_{2}/*κ*_{1} + *κ*_{3}/*κ*_{4})^{−N} for *j*∈{1, 2, 3, … , *N*} and *i*∈{0, 1, 2, … , *j*}. Moreover, the state with the highest index, which corresponds to (*X*_{1}, *X*_{2}) = (0, 0), has probability of (*κ*_{3}/*κ*_{4})^{N} in the stationary distribution. Equation (4.4) can be rearranged into the form of the probability mass function of a multinomial distribution with *N* trials and probabilities *ψ*^{1/N}, *ψ*^{1/N}(*κ*_{2}/*κ*_{1}), *ψ*^{1/N}(*κ*_{3}/*κ*_{4}), which agrees with Jahnke & Huisinga [14], Gadgil *et al.* [16] and Krieger & Gans [31]. It follows that the marginal distributions of *X*_{1}, *X*_{2} and *X*_{3} are all binomially distributed [16]. We can use the solution derived in equation (4.4) to derive an analogous expression for the stationary distribution of a phosphotransfer system.

Phosphotransfer systems are a common motif in cellular signal transduction [9]. Let ^{*} be a phosphate donor that can transfer its phosphate group to a protein in its inactive form, denoted by . The standard phosphotransfer reactions can be modelled by the biochemical equations
4.5where is the complex of bound to and the phosphate group. Let *A*, *B*, *A*^{*}, *B*^{*} and *C* denote the numbers of proteins , , the corresponding proteins with a phosphate group, and complexes , respectively. We assume that *A*^{*} = *B* and *A* = *B*^{*} initially. We also suppose that proteins and are conserved in the system, namely *N* = *A*^{*} + *C* + *A* and *N* = *B* + *C* + *B*^{*}, where *N* is a constant that is determined by the initial condition. Under our assumptions, it follows that *A*^{*} = *B* and *A* = *B*^{*} throughout the reaction process. With slight changes of our analysis of the biochemical reaction system modelled by equation (4.2), we can obtain the stationary distribution of the phosphotransfer system. In particular, a 2-vector (*X*_{1}, *X*_{2}) now denotes a state in which *A*^{*} = *B* = *X*_{1}, *C* = *X*_{2}, and *A* = *B*^{*} = *N* − *X*_{1} − *X*_{2}. The state space of the phosphotransfer system has a diagonally truncated grid structure, similar to that of the biochemical reaction system modelled by equation (4.2). However, the propensity functions of the reactions and are quadratic functions instead of the linear form for equation (4.2). Thus, for *j*∈{1, 2, 3, … , *N*}, the stationary distribution on an L-shaped component with state 1 being (*X*_{1}, *X*_{2}) = (0, *j*) is given by (cf. equation (4.3))
4.6where is a normalizing constant. The proportionality condition holds at all missing edges. Therefore, the stationary distribution of the original state space is the same as that of the state space constructed with L-shaped components which is given by the equation (cf. equation (4.4))
4.7where is a normalizing constant. Moreover, the state with the highest index, which corresponds to (*A*^{*}, *B*, *C*, *B*^{*}, *A*) = (0, 0, 0, *N*, *N*), has probability of (*κ*_{3}/*κ*_{4})^{N}/*N*! in the stationary distribution.

#### 4.1.3. Three reversible reactions

A *planar graph* is a graph that can be drawn on the plane in such a way that edges intersect only at their endvertices [28]. There are three possible closed systems with exactly three reversible, elementary reactions such that the reactants and products are coupled and the state spaces are planar graphs. We list these reaction systems in table 3. In such a system, each elementary reaction gives rise to the same biochemical change as the overall effect of the other two reactions. If this condition does not hold or a biochemical reaction system contains more than three reversible, elementary reactions, then its state space is not a planar graph, and our recursive algorithm does not apply.

Consider a closed system with three coupled monomolecular reversible reactions
4.8Figure 9 shows the state space when *N* = 3 and the corresponding graphical representation. All closed systems with exactly three reversible, elementary reactions as given in table 3 have state spaces of such a diagonally truncated grid structure.

We observe that the recursive algorithm introduced in §4.1.2 constructs part of the state space of the three-reaction system as given in equation (4.8). The proportionality condition in equation (2.4) holds for all of the diagonal edges if and only if the reaction rate constants satisfy 4.9Equation (4.9) is a special case of the Wegscheider conditions [32], which are necessary for a closed system with the reactions given by equation (4.8) to be in the steady state [33]. Therefore, the three-reaction system has the same stationary distribution as that of the two-reaction system as given in equation (4.4).

### 4.2. Computing the stationary distribution of two interconnected transcriptional component system

In this subsection, we study the stationary behaviour of two interconnected transcriptional components [27] by developing a recursive algorithm that is analogous to the algorithms introduced in §4.1. Specifically, we first model the cascade of two connected transcriptional components using the CME. We then apply the gluing technique on a finite subset of the infinite state space, providing an error bound on the approximate solution. We subsequently obtain the analytic stationary solution of the CME using asymptotic analysis, arriving at the same results as in [27]. Finally, we discuss using the analytic solution to design stationary distributions by searching over the parameter space of the biochemical reactions.

#### 4.2.1. Modelling two interconnected transcriptional components using the chemical master equation

We first present the chemical reactions that are introduced in [27] to model two interconnected transcriptional components and then provide the corresponding CME model.

We illustrate a transcriptional component connected to a downstream component in figure 10. The upstream component comprises the constitutive expression of a transcription factor , which subsequently binds reversibly to a promoter at the downstream component. Let denote the complex formed by and . The reactions are then given by the chemical equations
4.10where *κ* > 0, *δ* > 0, *κ*_{on} > 0 and *κ*_{off} > 0 are the corresponding reaction rates. Let *P*, *C* and *Z* be the numbers of , and , respectively. Since the total amount of DNA is conserved, it always holds that *P* + *C* = *N* for some constant *N* that is determined by the initial condition.

If the initial condition is , then the conservation constant is given by *N* = *c*_{0} + *p*_{0}. It is sufficient to denote each state by a 2-vector (*C*, *Z*) as *P* can be calculated using *P* = *N* − *C*. Since there can be arbitrarily many transcription factors, the reaction system described by equation (4.10) has an infinite state space, as illustrated in figure 11.

Let be the probability that the stochastic process is in state at time *t* > 0, conditional on the initial state (*C*, *Z*) = (*c*_{0}, *z*_{0}). Each state (*c*, *z*) contributes exactly one linear ordinary differential equation to the CME, which is given by
4.11

The CME of two interconnected transcriptional components comprises infinitely many differential equations that have the form of equation (4.11). Therefore, we need to truncate the infinite state space to a finite subset that can then be recursively constructed by applying the gluing technique finitely many times.

#### 4.2.2. Truncating infinite state spaces with guaranteed accuracy

Applying the initialization step of Munsky and Khammash's FSP algorithm [19], we approximate the stationary distribution on the infinite state space. Specifically, depending on the desired accuracy of the approximate solution, we truncate the infinite state space by imposing a maximum value, , on the number of transcription factors considered. For instance, figure 12 presents the truncated state space of figure 11 when *M* = 2. We provide a graphical illustration of truncating the infinite state space in figure 11 to a finite subset as shown in figure 12 in the electronic supplementary material.

We now derive *M* as a function of the desired accuracy. Let *Ω* be the infinite state space, and suppose that *ε* ∈ (0, 1) is the level of acceptable error. We choose *M* such that the total probability that the finite state approximation fails to capture is at most *ε*. Let *Ω*_{f} ⊂ *Ω* denote the corresponding truncated finite state space. In other words, the total probability accounted for by *Ω*^{c}_{f} ≡ *Ω*\*Ω*_{f} is at most *ε*. In order to calculate the total probability that is accounted for by the states in *Ω*^{c}_{f} at equilibrium, we regard the states in *Ω*_{f} as one state, called *ω*_{f}, and states in *Ω*^{c}_{f} as state *ω*_{c}. The transition rate from *ω*_{f} to *ω*_{c} is the sum of the transition rates from *Ω*_{f} to *Ω*^{c}_{f} and vice versa. Therefore, the transition rate from *ω*_{f} to *ω*_{c} is , and the reverse transition rate is . At equilibrium, the probabilities of *ω*_{f} and *ω*_{c} are *ξ*_{f} = (*M* + 1)(2*δ* + *κ*_{on}*N*)/(2*κ* + *κ*_{off}*N* + (*M* + 1)(2*δ* + *κ*_{on}*N*)) and *ξ*_{c} = (2*κ* + *κ*_{off}*N*)/(2*κ* + *κ*_{off}*N* + (*M* + 1)(2*δ* + *κ*_{on}*N*)), respectively. We choose such that *ξ*_{c} ≤ *ε*. For , let ⌈*x*⌉ denote the least integer that is no smaller than *x*. Hence, we choose
4.12in order to achieve the desired accuracy.

#### 4.2.3. Analytic stationary solutions on infinite state spaces

We propose a recursive algorithm that, together with asymptotic analysis, yields the analytic stationary solution to the CME of two interconnected transcriptional components. The algorithm takes *N*, *M*, *κ*, *δ*, *κ*_{on} and *κ*_{off} as inputs and returns the stationary distribution on the corresponding truncated finite state space. The algorithm constructs the truncated state space recursively by gluing small graph components together sequentially. In the limit of *M* → ∞, the stationary distribution of the truncated state space converges to that of the original infinite state space.

We demonstrate the recursive algorithm for the case when *N* = 3 and *M* = 2 in figure 13, but the method is applicable to the general case of and . Specifically, we construct the truncated state space by first gluing together three T-shaped components and a path sequentially at one vertex and then checking the proportionality condition, as given in equation (2.4), for the remaining missing edges. Similar to §4.1.2, we label states in the truncated state space from left to right and from bottom to top so that the index increases naturally as the graph grows.

In general, a T-shaped component consists of *M* + 2 states. There are three steps for constructing a truncated state space: (i) glue together *N* of T-shaped components sequentially at one vertex, (ii) add a path of length *M* to the existing state space at its vertex with the highest index, and (iii) check the proportionality condition for all missing edges.

For *c*∈{0, 1, 2, … , *N* − 1}, the stationary distribution *ξ*^{c} on a T-shaped component for which state 1 (i.e. the vertex with the lowest index) represents (*C*, *Z*) = (*c*, 0) is given by
4.13where *ψ*_{c} is a normalizing constant. The stationary distribution on the *M*-path is given by
4.14for *i*∈{1, 2, 3, … , *M* + 1}, where *ψ*^{N} is a normalizing constant.

After step (ii), the graph contains all the states from the truncated state space. In order to complete the construction, we need to add *N*(*M* − 1) edges by gluing at the two endvertices of each edge simultaneously. The proportionality condition holds at all missing edges between states (*c*, *z*) and (*c* + 1, *z* − 1) for *c* ∈ {0, 1, 2, … , N−1} and *z* ∈ {2, 3, 4, … , *M*}. Therefore, the stationary distribution on the truncated state space is already obtained in step (ii). For instance, when *N* = 3 and *M* = 2, the stationary distributions of figure 13*d* and figure 13*e* are the same.

We may truncate the infinite state space to a finite subset by bounding the maximum value of *Z* because for *c* ∈ {0, 1, 2, … , *N*}. Moreover, the dependence of on *M* is only through the normalizing constant. Therefore, by normalizing over and *c*∈{0, 1, 2, … , *N*}, we obtain the accurate stationary distribution of the original infinite state space, which is given by the equation
4.15

Equation (4.15) can be written as a product of a function of *c* and a function of *z*, which implies independence between the stationary behaviour of the upstream and downstream transcriptional components. In addition, the stationary distribution of *C* is binomially distributed with *N* trials and success probability [(*δ**κ*_{off})/(*κ**κ*_{on}) + 1]^{−1}. The stationary distribution of *Z* follows a Poisson distribution with mean *κ*/*δ*. Our results confirm theorem 5.1 in [27]. However, the proof of [27] relies on the deficiency zero theorem [34] and the theorem of product-form stationary distributions (theorem 4.1 of [35]).

#### 4.2.4. Designing the stationary behaviour of stochastic biochemical reaction networks using analytic solutions

Mathematical methods for designing distributional properties of stochastic biochemical systems have been developed in conjunction with the field of synthetic biology as a means to engineer predictable biochemical responses [36,37]. The stationary distribution design in this subsection is simpler than the general methods introduced in [36,37] because of the analytic solutions that we obtained using the gluing technique [13,20].

Computing stationary distributions in analytic form enables us to design the equilibrium behaviour of biochemical reaction networks simply by tuning their reaction rate parameters. For the two-component transcriptional system, this corresponds to altering the dilution rate or the degradation rate, making more RNA polymerases available to increase the transcription rate, or even choosing a transcription factor with the desired strength of binding and unbinding to DNA.

In this subsection, we focus on designing features of the stationary distribution of the two-component transcriptional system by adjusting the parameters of the biochemical reactions given in equation (4.10). We formulate and find solutions for two potential design problems in propositions 4.2, 4.3 and corollary 4.4. We consider two sets of design features; namely, the mean and the variance of the marginal probability distributions, and the location of the peak of the joint stationary distribution. For the latter, we prove that the stationary distribution has a unique global maximum if and only if the conditions in proposition 4.3 are satisfied. Considering only stationary distributions with a unique global maximum simplifies the subsequent design of the peak's location. We present the proofs of propositions 4.2 and 4.3 in the electronic supplementary material.

### Proposition 4.2.

*Consider the system of two interconnected transcriptional components that are modelled by reactions as given in equation* (4.10), *where* *κ* > 0, *δ* > 0, *κ*_{on} > 0 *and* *κ*_{off} > 0 *are the corresponding reaction rate constants. Let* *P*, *Z* *and* *C* *be the numbers of promoters* , *transcription factors* *and* − *complexes* , *respectively. Let* *α* = *κ**κ*_{on}/(*δ**κ*_{off}), *β* = *κ*/*δ* *and* *γ* = (*N**α* − 1)/(*α* + 1), *where* *N* *is a constant given by* *N* = *P* + *C* *because of DNA conservation. In (i)–(iii), we set up and solve three design problems using the marginal stationary distributions of* *Z* *and* *C*.

(i)

*Since the marginal stationary distribution of**Z*is Poisson distributed, its mean and variance are equal. The design problem of fixing the mean of*Z*at an objective value*μ*_{z}>*0*is feasible, and the solution is*β*=*μ*_{z}, with N and the reaction rate constants being arbitrary otherwise.(ii)

*The design problem of setting the mean of C at an objective value**μ*_{c}∈ (*0*, N) is feasible, and the solution is*α*=*μ*_{c}/(N −*μ*_{c}), with N and the reaction rate constants being arbitrary otherwise.(iii)

*The design problem of choosing the variance of C to be an objective value**σ*^{2}_{c}> 0*is feasible if and only if**σ*^{2}_{c}≤*N*/4,*and the solutions are*,*with**N**and the reaction rate constants being arbitrary otherwise*.

We now consider designing the location of the peak of the joint stationary distribution as given in equation (4.15). In proposition 4.3, we find necessary and sufficient conditions for the existence of a unique global maximum and provide its location. We include the proof of proposition 4.3 in the electronic supplementary material. Corollary 4.4 shows that the uniqueness of the peak simplifies the design of its location.

### Proposition 4.3.

*Consider the system of two interconnected transcriptional components that are modelled by the reactions in equation* (4.10). *With the same notation as in proposition* 4.2, *the stationary distribution in equation* (4.15) *has a unique global maximum if and only if* *N* > 1, *β* > 1, 0 < *γ* < *N* − 1, *and* . *In this case, the maximum is at* (*c*^{*}, *z*^{*}) = (⌊*γ*⌋ + 1,⌊*β*⌋).

### Corollary 4.4.

*Under the constraints* *N* > 1, *β* > 1, 0 < *γ* < *N* − 1 *and* , *designing the location of the unique global maximum of the two-component transcriptional system modelled by the reactions in equation* (4.10) *is equivalent to finding* *N*, *β* *and* *γ* *such that* (⌊*γ*⌋ + 1,⌊*β*⌋) *is the objective location*.

When the global maximum of the two-component transcriptional system exists and is unique, there are infinitely many parameter values that can lead to the objective location of the peak, and no closed-form solutions exist because of the floor-function form.

Propositions 4.2, 4.3 and corollary 4.4 illustrate that analytic solutions can greatly facilitate the design of stationary distributions. Design problems can be formulated as feasibility problems over decision variables, which are *α*, *β*, *γ* and *N* in the case of the two-component transcriptional system. Specifically, solving the four design problems in propositions 4.2, 4.3 and corollary 4.4 is equivalent to tuning the ratio of the production and decay rates of the transcription factor, the ratio of the binding and unbinding rates of the transcription factor with promoters, and the total amount of DNA. Solving these types of design problems constitutes another area of application of analytic stationary solutions to biochemical reaction networks.

## 5. Summary and discussion

In this work, we have explored the capabilities of a method that was proposed by Mélykúti and co-workers [13,20] to derive analytical expressions for the stationary solutions to the CME. The method relies on gluing simple state spaces together recursively at one or two states. We have introduced graph theoretical characterizations of state spaces that can be constructed by performing single-state gluing of paths, cycles or both sequentially. Moreover, we have characterized a set of stochastic biochemical reaction networks for which the state spaces can be constructed using the gluing technique finitely many times. For these reaction networks, we have developed recursive algorithms that apply the gluing technique to obtain stationary distributions in analytic form. Combining recursion and asymptotic analysis, we have extended the method to derive the stationary distribution for an infinite state space and illustrated with the example of two interconnected transcriptional components. In addition, we have discussed using analytic stationary distributions to design the desired possible distributional behaviour of stochastic biochemical systems.

While stationary distributions can be obtained symbolically by computing left null vectors of transition rate matrices, the method discussed in this work provides an alternative that is of interest for a number of reasons. Firstly, it is interesting from a theoretical perspective, as it provides a link between the graphical structure of the state space associated with the CME and the corresponding stationary distribution. To our knowledge, this link to graph theory is a novel direction for studying the gluing technique, and might yield useful insights in future. Secondly, the possibility of creating recursive algorithms that employ the repeating structure of the state space may provide better computational or analytical tractability than methods that rely on matrix algebra to compute left null vectors of transition rate matrices. Thirdly, as we have demonstrated with the example of two interconnected transcriptional components, the recursive nature of the algorithms that can be developed using this method may enable the derivation of analytic solutions for infinite state spaces.

Analytic stationary distributions of biochemical reaction systems modelled by the CME are valuable for analysing and designing their distributional properties. Using these analytic solutions, we can search over the parameter space of the model for stationary distributions with features such as the distributional shape, modality and moments, similar to Baetica *et al.* [36]. Stationary distribution design is part of designing distributional responses of stochastic biochemical systems [37]. These methods are useful in, for instance, synthetic biology, where a principal goal is to design genetic circuits that meet user-specified design criteria. They are also important for designing population-level distributions of heterogeneous cell responses when these distributions capture unique information that is not encoded in individual cells [38].

Another application of analytic stationary distributions is to reduce the CME model using the quasi-steady-state approximation [39,40]. This involves separating a system into fast and slow subsystems and subsequently removing the fast dynamics from the model by substituting in stationary distributions of the fast subsystems at each time step of the slow subsystem. While traditionally developed for deterministic modelling, stochastic model reduction has drawn significant attention in recent years [41], and the methods developed in this study hope to extend the set of tools for the computation of stationary distributions.

For future work, it is valuable to broaden the set of stochastic biochemical reaction networks for which analytic forms of stationary distributions can be obtained using recursive algorithms. One potential breakthrough would be to generalize Mélykúti and co-workers' technique in order to allow gluing state spaces at more than two states simultaneously. Since many biochemical reactions are irreversible, it is also important to study the gluing technique on directed state spaces that are weakly connected graphs. For biochemical reaction networks with known analytic stationary distributions, a natural next step is to achieve complex distributional design by optimizing over possibly large parameter spaces.

## Data accessibility

The data used in the paper is available in the electronic supplementary material.

## Authors' contributions

The project was formulated by A.-A.B., V.S. and R.M.M. X.F.M., A.-A.B. and V.S. performed the mathematical analyses. X.F.M. constructed the algorithms. A.-A.B. illustrated the design of stationary distributions. The paper was written by X.F.M., A.-A.B. and V.S. All the authors edited the paper.

## Competing interests

We declare we have no competing interests.

## Funding

This research is funded by the Air Force Office of Scientific Research through grant no. FA9550-14-1-0060 to the project Theory-Based Engineering of Biomolecular Circuits in Living Cells.

## Acknowledgements

The authors gratefully thank Anandh Swaminathan, Justin Bois, Enoch Yeung, Albert R. Chern, Scott C. Livingston and Charlie Erwall for insights and comments. We thank Reza Ghaemi and Domitilla Del Vecchio for granting us permission to use the figure of two interconnected transcriptional components.

## Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.3780113.

- Received March 2, 2017.
- Accepted May 2, 2017.

- © 2017 The Author(s)

Published by the Royal Society. All rights reserved.