Abstract
Combinatorial complexity is a major obstacle to ordinary differential equation (ODE) modelling of biochemical networks. For example, a protein with 10 sites that can each be unphosphorylated, phosphorylated or bound to adaptor protein requires 3^{10} ODEs. This problem is often dealt with by making ad hoc assumptions which have unclear validity and disallow modelling of sitespecific dynamics. Such sitespecific dynamics, however, are important in many biological systems. We show here that for a common biological situation where adaptors bind modified sites, binding is slow relative to modification/demodification, and binding to one modified site hinders binding to other sites, for a protein with n modification sites and m adaptor proteins the number of ODEs needed to simulate the sitespecific dynamics of biologically relevant, lumped bound adaptor states is independent of the number of modification sites and equal to m + 1, giving a significant reduction in system size. These considerations can be relaxed considerably while retaining reasonably accurate descriptions of the true system dynamics. We apply the theory to model, using only 11 ODEs, the dynamics of ligandinduced phosphorylation of nine tyrosines on epidermal growth factor receptor (EGFR) and primary recruitment of six signalling proteins (Grb2, PI3K, PLCγ1, SHP2, RasA1 and Shc1). The model quantitatively accounts for experimentally determined sitespecific phosphorylation and dephosphorylation rates, differential affinities of binding proteins for the phosphorylated sites and binding protein expression levels. Analysis suggests that local concentration of sitespecific phosphatases such as SHP2 in membrane subdomains by a factor of approximately 10^{7} is critical for effective sitespecific regulation. We further show how our framework can be extended with minimal effort to consider binding cooperativity between Grb2 and cCbl, which is important for receptor trafficking. Our theory has potentially broad application to reduce combinatorial complexity and allow practical simulation of a variety ODE models relevant to systems biology and pharmacology applications to allow exploration of key aspects of complexity that control signal flux.
1. Introduction
Biochemical networks are bewilderingly complex. One look at a comprehensive view of metabolic pathways or signalling networks [1,2] leaves one with a sense that predicting their behaviour requires more than intuition and an illustration. Yet, advances in metabolic engineering and systems pharmacology depend on our ability to make such predictions. Thus, many see progression of these and similar fields as depending on our ability to build mathematical, computational representations of relevant biochemical networks, which can subsequently be used to predict their behaviour.
Because biochemical networks, on a basic level, are simply a collection of coupled chemical reactions, one can model their behaviour with differential equationbased chemical kinetics approaches [3,4]. One fundamental but common roadblock to theoretically precise implementation of such approaches is ‘combinatorial complexity’ [5–7]. Combinatorial complexity arises when network entities have several sites that can each be in several states. For example, a protein that contains 10 phosphorylation sites, each of which can be unphosphorylated, phosphorylated or noncompetitively bound to a downstream adaptor, leads to 3^{10} (approx. 59 000) unique chemical species. A typical approach to avoid this combinatorial complexity is arbitrarily lumping sites together [3,8,9], but the validity and consequences of such approaches are unclear. Importantly, such reduction approaches result in the inability to track sitespecific phenomena, such as differential phosphorylation and dephosphorylation rates on different sites of the same protein. Such sitespecific phenomena can have important biological impact. For example, the phosphatase SHP2 dephosphorylates only particular phosphotyrosines on receptor tyrosine kinases that are important for controlling Ras deactivation [10]. This is noteworthy as constitutive Ras activation drives progression of numerous cancer types [11]. On the epidermal growth factor receptor (EGFR), phosphorylation of Y1068 is correlated with longer survival in nonsmall cell lung cancer patients, whereas phosphorylation of Y1173 is correlated with shorter survival [12]. There are multiple proteins, such as Grb2, PI3K, SHP2, PLCγ1, RasA1 (p120RasGAP) and Shc1, which bind to these and the other approximately 10 tyrosine phosphorylation sites on EGFR in both unique and overlapping manners with varying affinities [13]. The relative recruitment levels of these various proteins determine the strength of downstream signalling to pathways such as Akt, ERK or PKC, which subsequently dictate biological responses such as proliferation, migration, differentiation or apoptosis [14]. However, predicting the recruitment levels of these binding proteins without considering sitespecific effects, and therefore combinatorial complexity, is challenging: each EGFR tyrosine is phosphorylated at different rates [15], and the host of binding proteins has distinct affinities for the various EGFR phosphotyrosines [13] as well as different expression levels [16].
Incorporating tens of thousands or even millions of differential equations into a model to account for all these sitespecific phenomena in a theoretically precise manner is problematic for several reasons. One is writing the code necessary to simulate so many differential equations. However, rulebased modelling specification languages (e.g. BioNetGen [17,18] and Kappa [19]) allow one to write a small number of reaction rules that are then used to automatically build a set of differential equations, providing the tools necessary to tackle this codewriting problem. Theory that detects independence between sites and states also allows one to reduce combinatorial complexity in a rigorous manner [19–23], but interactions are common and block reduction through such means. Another problem, which rulebased approaches cannot solve, is practical numerical integration of that many differential equations. This is particularly the case considering that the model must be integrated not a single time, but many times over to be useful for routine tasks such as parameter estimation and sensitivity analysis. One potential solution is incorporating kinetic Monte Carlo approaches into a rulebased framework for socalled networkfree simulation [24,25]. Such approaches do not enumerate all potential states a priori, but rather let initial conditions of the network stochastically propagate into realized states. Yet, such approaches are also computationally intensive for relatively small networks, require specialized software and code (whereas differential equation solvers are widely available and easy to use) and may be difficult to programmatically couple with other models (which are likely to be differential equationbased). Recently, powerful theoretical approaches have been developed that allow one to calculate steadystate behaviour of biochemical networks without having to consider combinatorial complexity [26–29]. Yet, this theory has limitations that preclude its wide application to practical simulation of several systems: (i) enzymes cannot act as substrates, but it is common for signalling network proteins to be both; (ii) there is limited ability to model feedback, but various forms of feedback are central to shaping signalling network control of cell fate [14,30], and (iii) in many situations, signalling dynamics, rather than steadystate behaviour, control biological responses [30,31]. Therefore, a theory that can reduce combinatorial complexity yet retain both the use of differential equations and the ability to describe these important network features may have widespread usefulness.
Here, we provide analytical equations that allow one to reduce combinatorial complexity for a biological scenario that is common to a variety of systems: a protein such as a receptor has multiple modification sites that bind to downstream adaptors, the adaptors sterically hinder each other from simultaneous binding and the modification/demodification kinetics are fast relative to downstream adaptor recruitment. We show that the size of a reduced system which tracks the biologically relevant levels of total recruited adaptor protein and models sitespecific phenomena does not depend on the number of modification sites, allowing for significant reduction in combinatorial complexity. Predictions of this theory are shown to match well to nonreduced system simulations over a wide range of kinetic parameters, and finally we demonstrate how the theory can be applied to model sitespecific EGFinduced signalling through EGFR.
2. Results and discussion
2.1. General modelled system and assumptions
The theory developed in this manuscript considers any protein or stable protein complex having multiple sites that can each be modified and demodified, and then the modified form of this site binds to a downstream protein. Modifications may include, for example, phosphorylation, or any other posttranslational modification. The theory allows for bound proteins to alter the modification and demodification rates in a sitespecific manner. This overall scenario is common in signalling networks.
Specifically, we assume the following conditions hold

(1) Adaptor binding is slow relative to modification/demodification cycles, such that these cycles reach quasiequilibrium.

(2) The modification sites are located in close physical proximity to one another relative to the size of the adaptors, such that adaptor binding to one site hinders binding events on other sites.

(3) Binding of adaptor j occurs with rate constants k_{onj} and k_{offj} to one or more modified sites. The assumption simplifies the initial derivation and can be relaxed to allow different on rate constants as indicated later in the manuscript.

(4) Modification and demodification of site i occur as effective firstorder processes with rate constants k_{fi} and k_{ri} when adaptor is not bound.
2.2. A simple twosite scenario
We first consider a simple scenario, where a protein has two sites, site 1 and site 2, which bind adaptors A and B, respectively (figure 1a). The states of the protein's sites are described by an ordered pair, where the first position corresponds to the state of site 1 and the second position corresponds to the state of site 2, 0 denotes unmodified, 1 denotes modified and 2 denotes bound to adaptor. Completely unmodified protein (0, 0) can be modified on any site in any order, modified site 1 can bind to adaptor A, and modified site 2 can bind to adaptor B. Because we assume that the sites are in close proximity and bound adaptor sterically hinders access to other sites, we do not consider that enzymes can access any site when adaptor is bound (e.g. (2, 1) cannot go to (2, 0) and vice versa). However, whether these processes are allowed or not has no bearing on the results of the theory derived in this manuscript.
Many systems function by modulating modified site levels to recruit proteins which subsequently turn on signalling cascades to enact biological function. Thus, the total lumped level of recruited protein is often of biological significance. In ErbB receptor signalling, for example, the recruited proteins SOS, PI3K and PLCγ turn on the pathways for ERK, Akt and PKC, respectively, subsequently affecting cellular processes. We seek to derive expressions for the total, lumped amount of protein recruitment, while retaining the ability to capture, mechanistically, the effects of sitespecific phosphorylation and/or dephosphorylation, differential binding affinities and varying protein expression levels on such recruitment. This lumping of recruitment is depicted in figure 1a by dashed rectangles, with [2_{A}] representing the total amount of bound adaptor A, and likewise with [2_{B}]. Thus, overall, we seek a reduced representation for the dynamics of [2_{A}] and [2_{B}]. These differential equations are given as2.1and2.2where [B_{A}] and [B_{B}] represent the bulk concentrations of adaptors A and B, respectively. The quantity [(1, 0) + (1, 1)] represents the total concentration of modified site 1, and [(0, 1) + (1, 1)] the same for site 2. From equations (2.1) and (2.2), it is clear that if we can represent [(1, 0) + (1, 1)] and [(0, 1) + (1, 1)] in terms of a common variable, then we may be able to close the system and negate the need to track the dynamics of each protein ‘microstate’.
Consider now assumption 1 above, which states that modification cycles are in quasiequilibrium. For the first site, this gives 2.3Likewise, we can also obtain2.4Following this approach and substituting into equations (2.1) and (2.2), we obtain2.5and2.6Now, only five variables appear in these equations: the two lumped states ([2_{A}] and [2_{B}]), the two free adaptor concentrations and the amount of completely unmodified protein (0, 0). It is trivial to write the differential equations for the free adaptor concentrations based on species balances2.7
To close the system and obtain an exact reduced description for the lumped states, we must now represent the differential equation for (0, 0) in terms of only modelled quantities, [2_{A}] and [2_{B}]. Based on overall protein species balance, we have2.8
Substituting in quasiequilibrium relationships (equations (2.3) and (2.4)), we obtainwhere2.10Thus, given the above assumptions, the reduced system can be described exactly by a set of three differential equations, none of which depends explicitly on the modification site states.
2.3. Twosites, one adaptor
Now, consider the case when a single adaptor A binds to both site 1 and site 2 (figure 1b). In this case, we have2.11Note here the coefficient of two prior to the (1, 1) state, indicating that adaptor A can bind to either modification site, doubling the concentrationdriving force. Similar to above, by substituting in the quasiequilibrium relationships, we obtain2.12and2.13Here, β_{1} and β_{2} are defined as above in equation (2.10). Thus, in this case, this system can be reduced even further, needing only one state for the single lumped adaptor and one for the total amount of unmodified protein.
2.4. Two sites, two adaptors, overlapping binding
Now, we consider a slightly more complex scenario where the two adaptors A and B may both bind to sites 1 and 2, competing with each other (figure 1c). Because different adaptors may be bound to the same site, we clarify this ambiguity by denoting adaptor identities in the schematic with a subscript (i.e. 2_{A} and 2_{B}). The lumped bound adaptor states and differential equations are given by2.14and2.15As above, the modified, unbound protein states can be recast through quasiequilibrium relationships (equations (2.3) and (2.4)) which give2.16and2.17Again, the β factors are as defined in equation (2.10). Following equations (2.8)–(2.10) for the dynamics of the (0, 0) state, we obtain2.18Thus, just as above, the system can be reduced in a similar manner, and the dynamics of lumped adaptor states can be exactly described with only three equations.
Because next we will derive the results for a system of general size, it is instructive at this point to rehash the general procedure we have used in these different scenarios to derive an exact reduction of the system dynamics in terms of lumped adaptor states:

(1) define lumped, bound states for each adaptor and specify their differential equations;

(2) use quasiequilibrium relationships to represent the concentrationdriving force for adaptor binding to modified proteins in terms of the concentration of completely unmodified protein; and

(3) use overall species balance and quasiequilibrium relationships to derive the differential equation for completely unmodified protein.
Next, we follow this procedure to derive equations for the general situation.
2.5. A protein with n sites and m adaptors
Based on the above analysis of various twosite, twoadaptor systems, we now consider the general case when there are n modification sites and m adaptors. First, we present some definitions:

(1) let [2_{j}] denote the summation over all states that have adaptor j bound, where ;

(2) let [0] be the state where all n sites are unmodified;

(3) let β_{i} denote the summation over all states that have site i modified (), but are not bound to adaptor on any other site. As in equation (2.10), the βs are cast in terms of quasiequilibrium relations with [0] factored out;

(4) let d_{ij} be an indicator variable, taking the value of 1 if adaptor j binds to site i, and 0 otherwise; and

(5) let 1/α denote the summation over all unbound protein states. As in equation (2.10), it is written in terms of quasiequilibrium relations with [0] factored out.
Given these definitions, the differential equation for the lumped state for adaptor j [2_{j}] is2.19
The term quantifies the total concentration of modified proteins that adaptor j is able to bind, and assumption 2 above allows us to neglect any bound states from this term. The summation adds up concentrations of only those modified sites where adaptor j is able to bind (‘filtered’ by the indicator variable). We also note that is possible to relax assumption 3 for on rate constants by incorporating them inside the summation, into the indicator variables. Before writing the expression for the β factors, recall from above (equations (2.3) and (2.4)) that based on quasiequilibrium relationships, in general, the concentration of a protein with modified sites given by indices in the set X, C_{X}, is given by2.20
Now, for ease of notation, let . As an example, for a fivesite system and X = [1,3,5], we would have C_{x} = [0]K_{1}K_{3}K_{5}. Given equation (2.20), the β factors (see definition 4) are given by the following2.21
Equation (2.21) iterates over all potential combinations of modified protein states where site i is modified. The constant factor K_{i} in front dictates that site i must be modified. The first summation accounts for all doubly modified states, the second summation for all triply modified states, and so on. There are a total of n − 1 nested summations; the index z on the last summation has a value of n in the first complete loop through all the summations and therefore executes only once, accounting for the state where every site is modified.
The differential equation for the [0] state is, as above, given by overall species balance2.22where the α factor (see Definition 5) is defined by2.23Equation (2.23) gives all potential combinations of modified sites. Here, there are n nested summations. The first summation corresponds to states having only a single site modified, and second summation corresponds to states having any two sites modified and so on. Similar to above, the index z corresponding to the last summation has a value of n in the first loop through, and it therefore executes only once and corresponds to the state where all sites are modified.
From this analysis, it can be seen that when the aforementioned assumptions apply, the number of species needed to describe the lumped dynamics of bound adaptor states exactly is m + 1 (one for each lumped adaptor state and one for the totally unmodified protein), and does not depend on the number of modification sites n. Thus, by applying these assumptions, one can side step the combinatorial explosion of potential states arising from multiple modification sites, while still retaining the ability to model effects of sitespecific phenomena accurately.
2.6. Comparison of theory with simulations
The abovementioned equations dictate the dynamics of a reduced system assuming the specified conditions hold exactly. However, in practice, these conditions do not hold exactly; they only hold to an approximation. Under what conditions do theoretical predictions match exact simulations? Here, we perform simulation studies to answer these questions with a BioNetGen model of a 10site receptor with four binding proteins that fully considers combinatorial complexity (see Material and methods and figure 2a). This model contains 6148 ODEs.
First, we investigated how the main assumption of timescale separation between modification/demodification rates and binding on rates affected the agreement between BioNetGen simulations and our theory (figure 2b—first three rows). Here, binder levels and affinities are taken as equal for clarity of presentation (they are altered directly below). When there was a 100fold or greater difference between modification rate constants and the binding onrate, the agreement between our theory (black lines) and BioNetGenbased ODE simulations (blue x's) was excellent. When the difference was 10fold, discrepancies between theory and simulation were observed, although they disappeared as the system reached steadystate. Thus, our theory may also be applicable in such scenarios so long as discrepancies occurring on the order of 1 min are not a concern. Many meaningful biological changes occur over much longer timescales (hours), so such discrepancies may in fact be quite tolerable, and worth the tradeoff for model reduction capacity provided by our theory.
We further challenged our theory to account for a number of biologically plausible scenarios (figure 2c): changing binder concentrations (first row); changing on and off rate constants for binders (second row); changing sitespecific modification rate constants (third and fourth row) and changing the binding stoichiometry (fifth row—d matrix). In each case, predictions from our theory match well to simulations from the full combinatorial complexity model. Thus, over a wide range of conditions, our theory was able to reduce the dimension of the system from 6148 to five independent ODEs, over a 10^{3}fold reduction. Further reduction is possible, if a larger combinatorially complex model is considered.
2.7. Applying the theory to model ligandinduced epidermal growth factor receptor activation
Here, we show how our theory can be applied to a practical scenario where EGF binds to the EGFR, causing signalling through asymmetric dimers (figure 3a) [32]. Before proceeding, however, we must evaluate whether the four assumptions underlying our theory may reasonably apply to the EGFR system. This analysis of assumptions is presented in the electronic supplementary material. Although there are certainly some potential deviations from the assumptions in a strict sense, in many respects, we conclude that they may reasonably apply to the EGFR system. One significant deviation is the cooperative recruitment of the E3 ubiquitin ligase cCbl to the EGFR, depending on the presence of Grb2 [33]. The theory as developed above does not allow for such a possibility, but we derive a special case here that allows us to introduce this feature without considering the full array combinatorial complexity. The derivation is also shown in the electronic supplementary material, and our simulation analysis of this scenario is presented below.
2.7.1. Choosing tyrosine phosphorylation sites, binders and rate constant parameters
There are multiple potential tyrosine phosphorylation sites on EGFR and many different proteins which can bind to their phosphorylated form. We limited ourselves to EGFR tyrosines that are that are Cterminal to the kinase domain or reasonably close to it, eliminating tyrosines Nterminal of Y920, yielding nine sites (table 1). To reduce the number of potential binding proteins, we relied on two sets of experimental data. First, we required that binding proteins have less than a 3 μM K_{d} to any EGFR tyrosine phosphorylation site [13]. Second, we required that binding proteins have detectable concentrations as measured in NIH3T3 cells [16]. This resulted in six binding proteins—Grb2, PI3K (PIK3R1—we assume the SH2 domain containing regulatory subunit controls levels of the complete PI3K complex), PLCγ1, SHP2 (PTPN11), RasA1 (p120 RasGTPase activating protein) and Shc1, each with known expression levels and unique dissociation constants for the considered phosphotyrosines (table 1). Taking k_{off} = 0.1 s^{−1} (see the electronic supplementary material) gives the k_{on}s. Sitespecific kinase catalytic constants were previously measured [34], and we assumed k_{f} = 0.17 s^{−1} if no sitespecific data were available (table 1). Firstorder phosphatase rate constants were taken as 0.1 s^{−1} [35].
2.7.2. Simulating epidermal growth factorinduced primary binding responses
With the model specified based on a variety of experimental data as detailed above, we could now perform simulations of EGFinduced binding of these six proteins to the variety of EGFR phosphotyrosines. We compared the predictions of our reduced model theory, which consists of only 11 independent ODEs, to simulations using NFsim accounting for combinatorial complexity [24] (our computers ran out of memory when trying to generate the network with BioNetGen and perform ODE simulations). Our reduced model predictions for the dynamics of primary binder recruitment to EGF match well to NFsim results (figure 3b—first two rows). There is some discrepancy between our theory and NFsim results for the completely dephosphorylated receptor state at short times. We suspect this discrepancy is mainly due to the time required for newly formed receptor dimers to reach a phosphorylation equilibrium, which our theory treats as instantaneous. Grb2 and Shc1 are predicted to be the predominant primary binding proteins, with the others having recruitment levels approximately an order of magnitude lower. Interestingly, the mechanism by which Grb2 and Shc1 are the predominant binders are predicted to be different based on table 1. Shc1 has moderate abundance but high affinity for many sites, whereas Grb2 has high abundance which makes up for its low affinity to a single site.
2.7.3. Analysis of SHP2
Previous models have elucidated mechanisms of sitespecific signalling by SHP2 in a variety of systems [36–38]. It has been shown that SHP2 has activity specifically towards Y992 on EGFR [39,40]. We simulated this situation by allowing SHP2bound states to exhibit phosphatase activity towards pY992 (figure 3b, third row), which predicted that recruitment of RasA1, a RasGAP that turns off the ERK pathway, would be inhibited. This is functionally similar to simulated effects of a Y992F mutation (figure 3b—fourth row). These simulation results are consistent with previous experimental studies on both scenarios, in that SHP2 is needed for activation of the ERK pathway [39,41,42], and that expression of the Y992F EGFR mutant leads to sustained EGFinduced ERK activity [43].
We found that the catalytic efficiency (k_{cat}/K_{m}) for SHP2 had to be on the order of 100 s^{−1} nM^{−1} to have an appreciable effect on sitespecific phosphorylation levels (results in figure 3b are for 1000 s^{−1} nM^{−1}). This is a remarkable seven orders of magnitude higher than reported values of 10^{−5} s^{−1} nM^{−1} [44]. Perhaps, in vivo phosphatase activation processes play a role, but because the reported k_{cat}s for these phosphatases are already quite high (approx. 10 s^{−1}), such mechanisms are unlikely to explain the entire effect. Another possibility is local concentration of the receptorphosphatase complex with its phosphorylated substrates. If one simply considers concentration of the phosphatase at the plasma membrane and 100 nm of cytoplasm associated with it, one obtains a 60fold increase in effective concentration (assuming a 20 μm cell diameter and 2000 μm^{3} cell volume). Because this factor would apply to both phosphatase and its substrate, these factors multiply to give a 3600fold increase (60 × 60) in reaction rate. Even further concentration within membrane microdomains such as clathrincoated pits is known to occur for many RTK systems [45]. It is estimated that clathrincoated pits account for approximately 2% of the plasma membrane area [46]. Taking this further concentration factor into account for phosphatase and substrate (50 × 50) yields the approximately 10^{7}fold increase in effective reaction rate needed for sitespecific dephosphorylation to have appreciable effects on phosphorylated receptor levels. Thus, these modelling results suggest that local concentration of phosphatase with phosphorylated receptors in plasma membrane subdomains is a key feature of sitespecific dephosphorylation.
2.7.4. Refining the model for analysis of Grb2–cCbl cooperativity
A recent study has shown that cCbl is cooperatively recruited to EGFR as EGF dose is increased, and that this cooperativity is dependent on Grb2 and the presence of the recruitment sites for both Grb2 (Y1068) and cCbl (Y1045) [33]. It was suggested that this effect may be due to cooperativity on the level of the phosphorylation sites themselves, implying that a Grb2–cCbl complex may bind cooperativity to EGFR, because Y1045 and Y1068 are more likely to be phosphorylated together. We extended our theory to accommodate such cooperative behaviour by adding Y1045, cCbl, and the Grb2–cCbl complex to the model (see the electronic supplementary material), which resulted in only a slight increase of model complexity (21 states). We assumed that cCbl alone could bind to phoshphorylated Y1045 with similar affinity as Grb2 for its site, and the Grb2–cCbl complex could bind to receptors having both Y1045 and Y1068 phosphorylated, with 100fold greater affinity than either protein alone. To accomplish this, we created a new site that takes into account the simultaneous phosphorylation states of Y1045 and Y1068 while still using our model reduction assumptions (see the electronic supplementary material). Using this extended model, we performed simulations to explore whether these assumptions could reproduce the cooperative behaviour in cCbl binding to EGFR observed [33]. Surprisingly, we found that there was no predicted difference in recruitment of Grb2 or cCbl as a function of EGF dose (figure 3c), despite the higher affinity of the Grb2–cCbl complex and the reliance on two sites to be simultaneously phosphorylated. Thus, more complex mechanisms, such as ordered cooperative binding of Grb2 and cCbl to pY1045/pY1068, or increased phosphorylation of one of these sites as a result of phosphorylation of the other, may be needed to explain the observed cCbl dose responses.
3. Conclusion
Combinatorial complexity can cripple mechanistic modelling of signal transduction. Here, we show that given assumptions which may be biologically plausible, the dynamics of biologically relevant lumped adaptor states do not depend on the number of modification sites, eliminating combinatorial complexity when such assumptions are satisfied, while still describing the effects of sitespecific phenomena. It may apply to, for example, RTK and histone modification systems. Yet, it is important to note that regulatory complexity, which is defined as the large number of parameters required to model such systems, remains a problem that is not addressed by the current theory but rather by other recently published approaches [47]. We demonstrate numerically that our theory for describing a reduced system of lumped adaptor states provides predictions which match well to the true system dynamics. This correspondence is maintained over a wide range of parameter values, the edge of which comes remarkably close to violating the theory's main assumptions, but without significant deviation between reduced and full model simulation results. Importantly, existing data suggest that our theory is applicable to the EGFR system, and we demonstrate how to apply it, even when the noncooperative binding assumption must be relaxed for Grb2–cCbl interactions. Analysis of our EGFR model suggests that sitespecific tyrosine dephosphorylation may only be possible by concentrating phosphatases with their substrates by a surprising 10^{7}fold, which may be possible with membrane microdomain signalling. Wide application of this theory may allow for practical simulation of large biochemical networks that otherwise would not be feasible to explore key aspects of receptor signalling complexity in a succinct manner.
4. Material and methods
4.1. Simulation of the full 10site model
BioNetGen (v. 2.2.2) was used to generate and simulate the ODEs for the full 10 site model described in figure 2, and MATLAB (R2013a, The Mathworks, Natick, MA) with ode15s was used to apply our theory. Unless otherwise specified as in figure 2 legend, initial concentrations of receptor and free binding proteins were 10 nM, modification rate constants were 1 s^{−1}, and binder on and off rates were 0.001 s^{−1} nM^{−1} and 0.1 s^{−1}, respectively. The BNGL file associated with this model is given in the electronic supplementary mateial, text 1, and MATLAB code is provided in the electronic supplementary material.
4.2. Simulation of the ligandinduced signalling model
BioNetGen (v. 2.2.2) was used to specify the reaction rules for the ligandinduced EGFR model and NFsim (v. 1.11) was used for simulation. Ligand binding on and off rates was both 0.01 (s^{−1} nM^{−1} and s^{−1}) and dimerization on and off rates were 0.01 s^{−1} nM^{−1} and 0.001 s^{−1}. The BNGL file associated with this model is given in the electronic supplementary material, text 2, and MATLAB code is given in the electronic supplementary material. We assumed that newly formed receptor dimers would be completely unphosphorylated, but according to assumption 1, be rapidly converted to the equilibrium between phosphorylated and unphosphorylated receptor states. Moreover, in equation (2.10), the firstorder kinase and phosphatase rate constants were timeinvariant; however, here they may change over time. Given these considerations, we have4.1where D_{T} is the total concentration of active receptor dimers (figure 3a), which changes as a result of ligand binding and receptor dimerization processes. Because we had to track dynamics of the parameter α, we used ode45 in MATLAB (R2013a) to integrate the ODEs. This allowed us to keep track of αvalues between time steps numerically in a straightforward manner. Such tracking could potentially be accomplished with ode15s but would require more complex coding, because it is a variabletime step, iterative solver.
Funding statement
Funding from the Icahn School of Medicine at Mount Sinai and the NIH grants P50 GM071558 (Systems Biology Center New York), R01GM104184 and U54HG008098 (LINCS Center).
Acknowledgements
We acknowledge Avner Schlessinger and Joseph Schlessinger for helpful discussions.
 Received November 3, 2014.
 Accepted November 27, 2014.
 © 2014 The Author(s) Published by the Royal Society. All rights reserved.