Abstract
A common feature of biological networks is the geometrical property of selfsimilarity. Molecular regulatory networks through to circulatory systems, nervous systems, social systems and ecological trophic networks show selfsimilar connectivity at multiple scales. We analyse the relationship between topology and signalling in contrasting classes of such topologies. We find that networks differ in their ability to contain or propagate signals between arbitrary nodes in a network depending on whether they possess branching or looplike features. Networks also differ in how they respond to noise, such that one allows for greater integration at high noise, and this performance is reversed at low noise. Surprisingly, smallworld topologies, with diameters logarithmic in system size, have slower dynamical time scales, and may be less integrated (more modular) than networks with longer path lengths. All of these phenomena are essentially mesoscopic, vanishing in the infinite limit but producing strong effects at sizes and time scales relevant to biology.
1. Introduction
Biological networks exhibit a wide range of structural features at multiple spatial scales [1–3]. These include local circuitry reflecting the logic of regulation among small numbers of elements [4], and motifs of statistically overrepresented patterns within larger networks of interactions [5], through to macroscopic properties of complete networks, including the description of the degree distributions and the largescale geometrical features of networks [2]. Among the most interesting geometrical properties of biological networks is the property of selfsimilarity or scale invariance [6,7], in which characteristic topological features are present at all scales from the local organization of individual nodes, through to aggregations at the largest network scales.
For genetic and proteomic regulatory networks, as well as social networks and a variety of distribution networks, including respiratory and circulatory networks, the mechanisms generating selfsimilar structures have been well explored [8–17]. A growing body of empirical work investigates selfsimilar network structures, including motif overabundances at different coarsegrained scales. The topology of networks under coarsegraining, or agglomeration, of nodes has formed a central focus in both empirical [6] and theoretical [18–24] works. However, the functional implications of these topological properties remain poorly understood.
Functional explanations of selfsimilarity tend to fall into one of three broad classes. Robustness explanations consider the connectivity properties under perturbation, and contrast, for example, scalefree and exponential degree distributions [25–27]. Adaptive optimization theories argue that selfsimilarity provides an efficient means of provisioning densely distributed resource sinks with a minimum of cable cost [28,29]. Hence, networks such as the circulatory system can efficiently provide energyrich compounds to the cells of the body, and neural networks can efficiently integrate information from a large variety of sensory inputs [30,31].
Finally, neutral theories suggest that selfsimilarity is not in itself an optimized property of biological networks, but a consequence of highly conserved developmental processes with local rules of assembly that generate characteristic macroscopic properties [32–35]. Mathematical studies have shown how motif abundances can be the consequences of constraints on largescale topological properties [36]; conversely, largescale topological features might arise from constraints on a single local property [37]. In either case, the connection between the large and smallscale properties of a network may have emerged first without functional meaning.
In this contribution, we investigate the functional implications of selfsimilar assembly, as the nodes of a system adjust their internal states in response to their neighbours and in the presence of environmental noise. We find a tension between the smallworld properties of a network and the rapidity of the transition to an ordered phase. For a fixed number of vertices and links, selfsimilar networks with smallworld properties tend to show more gradual transitions, both dynamically and as a function of noise. The nestedhub structure of such networks provides a bottleneck restricting the possible paths to distant parts of the network. In contrast, hierarchical assemblies, characterized by nesting and a more open structure, have a sharper transition to the ordered state.
Our most surprising results are that while there is some advantage to smalldiameter, smallworld networks in the highnoise regime, a completely different architecture—that of nested networks, which eliminates bottlenecks at the expense of longer average paths—provides greater integration in the lownoise regime. Further, smallworld networks produced by branching show dramatically longer dynamical time scales than their nested counterparts. Both features of the nested architecture are driven by the presence of multiple paths between points, as we establish both by simulation and by analytic calculation of the graphical structures that underlie the problem.
A central theme of our investigation is the differences between these constructions in the mesoscopic regime: N ≫ 1, but finite. As we shall see, various properties that vanish in the infinitesize limit lead to pronounced differences in behaviour at the finite scales relevant to biology. Our use of both analytic and numerical techniques allows us to investigate two distinct regimes relevant to this mesoscopic phase: analytic results describe the finitesize–infinitetime equilibrium, while numerical simulations show the finitesize–finitetime properties, relevant in the case of strong nonequilibrium effects.
2. Constructing selfsimilar networks
We first introduce a deterministic, algorithmic approach for constructing hierarchical, selfsimilar networks. Our methods use the notion of a construction template, or base motif, that provides the seed for selfsimilar construction. Alternative stochastic approaches include defining hierarchical assemblies in terms of correlations in an otherwise random network [38], through biases introduced into an ensemble [39], or through highdimensional generalizations of deterministic constructions [40,41]. The pseudofractal [42] and the ‘flower’ graphs of [19] are an alternative deterministic construction. An advantage of the deterministic assemblies is that exact calculations of critical behaviour become possible.
The selfsimilar networks we describe take two forms, depending on their assembly mechanisms (figure 1). The assembly mechanism is stated formally in appendix A; it relies on the specification of (i) a motif pattern M (in figure 1, for example, the triangle) and (ii) a method f, of replacing nodes in the pattern by new, ‘smaller scale’, copies of the original motif. This method can then be iterated, deterministically, to produce networks of increasing size and complexity.
Visually, our constructions possess fractallike properties, with selfsimilarity upon coarsegraining. Our formal definition of the construction of these networks amounts, in the reverse direction, to a specification of a renormalization group transformation [43].
The two simplest choices of node replacement lead to two different kinds of network: a branching topology, characterized by the absence of largescale loops, and a nested topology, where the loop structure of the base motif is replicated on all scales. We consider the scaling of average network diameter, 〈d〉, the geodesic distance averaged over all distinct pairs of nodes.
2.1. Branching assembly
As a network grows, a particular unit may preserve the same ‘localstructural’ relationship at each level of the hierarchy. For example, the central node of a star may be the central node of the network at all levels of iteration. These networks are characteristic of circulatory and vascular networks, where each node, regardless of its position in a hierarchy, tends to perform the same function [44,45].
In the formalism of appendix A, such a mapping is provided when f(i, j) is equal to i. Iterations increase inequality in the network, producing degree distributions characterized by a motif scale, with an exponential tail of nodes with a ‘runaway’ influence on the rest of the system. Biological networks with this property include the neural network of Caenorhabditis elegans [46] and the smallworld networks of Watts & Strogatz [47]. Exponential tails to the degree distribution are found also in the original Erdös–Rényi random graph.
An illustration of a branching iteration on the triangle is shown in figure 1a. As the order increases, loops, loops with free loops and so forth are produced; the highest vertex degree increases exponentially in the number of vertices—these are nodes on the largest ‘superloop’. All loops, or, in general, subgraphs, may be detached by a single cut.
2.2. Nested assembly
In contrast to branching iterations, a nested iteration is when the unit—vertex or subgraph—takes on the characteristics of its neighbours. A node is no longer restricted to a single local structural geometry, but participates in structures at multiple spatial scales. This is common in communication and computational networks, characterized by extensive feedback loops and connectivity to topologically distinct regions.
In the formalism of appendix A, this second mode of network assembly is provided by f(i, j) equal to j. While branching structures look treelike,^{1} nested networks are characterized by the replication of subgraph loop structures on increasingly larger scales.
Nested networks are shown in figure 1b; a more complicated example is that of figure 2, where two iterations of a motif overabundant in Escherichia coli [48] are shown. As shown in figure 3, nested graphs have larger diameters; they lack the ‘smallworld’ property of logarithmic scaling of diameter with size found through replication.
2.3. Topological properties
The two cases we have considered, pure branching or nesting of a motif pattern M, can be considered extremes of how a network might assemble. Branching networks tend to increase inequalities in the degree distributions of vertices while keeping loops at an approximately constant density, whereas nested networks create many more loops while reproducing (almost) the degree distribution of the lower levels.
Another difference between the assembly rules is the scaling of the average diameter. As can be seen in figure 3, branching, with its treelike hierarchy of central hubs, produces smallworld graphs where the network diameter scales only as the logarithm of network size [47].
This can be understood by considering successive construction steps. The increase in the number of vertices at each iteration leads to an exponential scaling of system size with iteration number: 2.1where n is the number of vertices in the base motif and N_{i} is the total number of vertices at iteration i. The tree structure of branching networks, however, means that the distance between nodes on the perimeter (i.e. those nodes with the greatest separations on the graph) increases only by a constant, proportional to n. The diameter, in other words, increases linearly at each iteration, and so the network as a whole has only a logarithmic scaling between diameter and system size.
Diameter increases much more rapidly for the nested structures. Crossing such a structure requires crossing the nested subgraphs, and so the separation between distance points increases proportional to N_{i} as well as n. This leads to a powerlaw scaling, with diameter increasing as a power of the number of vertices. The index of the average diameter scaling is the average diameter of the base motif. All of these relationships are shown in figure 3.
When two formation patterns are mixed so that a graph may switch from branched to nested, the result is networks as in figure 4. As a means of selforganization, mixed iterations lead to a kind of selfdissimilarity, where coarse graining reveals different organizational principles at different levels. Evidence for selfdissimilarity under coarsegraining has been found in both biological and engineered systems [49].
3. Signalling, modularity and noise
Whereas some network features and motif structures could arise through simple genetic or developmental stochastic processes, nonessential conservation rules or owing to the local constraints of physics and chemistry, we show that two extremes of network structure can still have important functional implications for the ways in which different parts of a network become correlated, or exchange information.
A crucial concept for this work is that of noise, which accounts for the influence of random events and unobserved degrees of freedom in a system. Particular examples of noise might include the smallnumber fluctuations in reactants that affect metabolic processes, the coupling of observed neurons to part of the larger, unobserved network, or the use of mixed strategies in a gametheoretic system. In the absence of strong theories for the noise properties of a particular case, we use a maximum entropy model, as described below.
In particular, for our dynamics, we take nodes to have two states (‘on’ or ‘off’) approximating the discrete switching events observed in a number of systems from the cellular [50] to the social [51]. We follow recent work showing the dominance of pairwise interactions in system behaviour [52–54], and consider networks with pairwise constraints described by a maximum entropy model. This amounts to requiring the full state of the system—the switchstate of all N vertices—be given by the Boltzmann distribution. This is then the Ising model on an arbitrary graph.
We can then write the Boltzmann distribution of spins P({σ}), as given by the set of pairwise constraints J_{ij}, and external fields h_{i}, 3.1
The J_{ij} are simply the edges of the different networks we consider. The system is in the maximum entropy state with only one observable—average total energy, or number of satisfied pairs—fixed [55]. Usually, the h_{i} are taken to be zero; when they are nonzero it amounts to external constraints acting on single nodes—such as one might expect in a network partially devoted to sensing external conditions.
The most important parameter for this study is the overall factor of β. Large values of β correspond to the lownoise regime; conversely, as β goes to zero, the coupling between nodes is swamped by random fluctuations. We refer to β as the inverse noise, and focus on how changing β leads to changes in how the network correlates and processes information in both equilibrium and nonequilibrium situations.
Determining the correlational and informationtheoretic properties of the networks involves finding the joint probabilities of the states of the network, P({σ}). In general, we are interested in quantities such as P(σ_{i}, σ_{j}), the joint probability of two nodes i and j being in the same, or opposite, switching states.
There are many different approaches to finding, or approximating, P({σ}); they are valid in different regimes. For the construction rules we consider, for motif structures with maximum degree of two (i.e. chains), exact solutions of the Ising model are possible via a renormalization group transformation, and for structures with maximum degree of three, an exact solution for the partition function in zero field is generally possible [56]. For arbitrary motifs, however, the partition function for the ith iteration can no longer be written as the partition function for the (i − 1)th with a suitable change of coupling, J → J′.
In this paper, we adapt the ‘direct configurational' method (DCM; e.g. [57]), which allows exact computation in small, finite networks. The selfsimilar properties of our networks allow these computations to be extended to graphs with many hundreds of nodes.
The computation of P({σ}) can be done via the normalizing term, or ‘partition function’, Z, in the denominator of equation (3.1). Derivatives of Z with respect to h then give moments of P({σ}). These computations not only provide an exact solution, but also decompose graphically into sums of ‘paths of influence’ closely related to the Feynman diagrams of condensed matter and particle physics. Appendix B describes these calculations, which provide a rigorous basis for the qualitative discussion of how multiple paths lead to critical phenomena [58].
4. Phase transitions and the mesoscopic regime
Despite the simplicity of equation (3.1), the model has a rich set of behaviours, including (depending on the graph structure) a critical point, β_{c}. The characterization of critical phenomena has been a central theme of the study of complex networks [59]. With exact expressions for the correlations in hand (see appendix B), we can study the nature of the order–disorder transition on the different hierarchies presented here.
Despite the length of the expansions—ratios of two power series in tanh β to 𝒪(N)—the general behaviour of the correlation functions for different networks is similar, with a monotonic rise from the disordered to the ordered state. The leadingorder behaviour in the highnoise limit as β → 0 is (tanh β)^{rmin}, where r_{min} is the shortest distance between the two vertices under consideration. The failure of this approximation is due to the increasing number of paths of influence available, which can allow longer paths to dominate if they increase in number quickly enough to offset the exponential suppression in signal.
For branching networks, the treelike structure suggests that a phase transition in the bulk is prevented at nonzero noise by the nucleation of boundary spins as happens in the Cayley tree (as distinguished from the Bethe lattice) [60]. Because nested graphs can also be detached by a constant number of cuts at any iteration—even when N → ∞ —the critical point in the thermodynamic limit is also expected to be zero [61], similar to the kind of ferromagnetic frustration found in random graphs [62].
For these reasons, it is thus useful to define a critical point for a finite system without reference to a thermodynamic limit but through the behaviour of various correlations that, although never mathematically singular, do show the existence of a transition between two distinct behaviours.
For the particular example of the Cayley tree, Mélin et al. [63] introduced the notion of a crossover noise, β_{g}. Decreasing noise, which pushed β above β_{g}, was associated with the emergence of nonGaussianity, a slowdown of dynamics, and glassy behaviours such as ageing; the critical noise parameter β_{g} goes to the infinitesize limit (1/β_{c} goes to zero) very slowly (as log(log N)), so that the thermodynamic limit is not representative even for very large systems. The slow approach to thermodynamic limits is often found in finite ramification structures such as the Cayley tree [64].
We investigate these systems below using analytic tools (§4.1) and numerical simulation (§4.2), on networks of size N ∼ 300. Because of the extremely slow scaling discussed earlier, there is a significant range of system sizes where equilibrium properties do not vary in appreciable amounts. This region, which we refer to as the mesoscopic regime, is qualitatively different from the infinite size limit. The networks we study here are in this regime—but so are much larger networks, and system sizes nine orders of magnitude larger, for example, are expected to have properties that differ by only a factor of two.
4.1. Stationary aspects
We measure two quantities related to the stationary, equilibrium properties of the two networks, focusing on their critical phenomena. First, we consider the heat capacity, C (see Tkacik et al. [65] for an example of its use in biological systems). At constant external field, we have 4.1where S is the entropy, V the volume (here, the number of nodes) and 𝒩 the weighted number of states accessible to the system. The heat capacity measures the (logarithmic) number of states accessible per (log) unit noise. It has a maximum, more or less sharply peaked, as a function of β. As one heats the system through this point, the number of accessible states increases dramatically, and the intensity and variety of the cooperative phenomena in the transition can be quantified by the height of the peak. Nested networks are characterized by a greater concentration of states, as can be seen in figure 5a; they can be said to have sharper transitions to the ordered state.
The approach to a phase transition is often defined in terms of a transition between an exponential, and powerlaw, decline in the correlation function as a function of distance. If we measure the correlation between pairs of nodes separated by a distance Δr, we can define a correlation length, D, 4.2where, on these inhomogeneous networks, we take Δr to be the geodesic distance between points. Figure 5b shows the transition that occurs as one passes into the lownoise regime: nested networks, with multiple paths between distant points, allow distant parts of the network to correlate at β ∼ 1.
As noted in §2, nested networks have larger diameter. In the case of the q = 4 iteration, the nested networks have a maximum diameter of 32, compared with nine for the more tightly structured branching networks. This means that at high noise (small β), nested networks allow for greater modularity—distant parts of the network are less correlated. The transition that occurs at β ∼ 1, where D for nested networks becomes much larger, reverses this property; nested hierarchies at low noise have stronger longrange correlations. We return to the question of modularity in §4.3, where we address it through simulation.
The correlation length D exceeds the average diameter at β roughly 0.8 (branching) and 0.9 (nested). By analogy with infinitelimit, and homogeneous, systems, one can consider this noise level as where the effective mass of longrange fluctuations becomes zero. In contrast with the standard infinite limit phase transition, this critical point occurs near, but not precisely at, the maximum of the network heat capacity.
4.2. Dynamical aspects
Heat capacity and correlation length are both static measures of modularity and signalling. We also expect dynamical signatures of the crossover in finite networks. In this section, we show that although branching networks are much smaller in diameter (figure 3) than nested networks, they have much longer time scales (figure 6). Nodes are actually less coupled compared with the higherdiameter nested networks, where multiple paths between nodes exist. In general, there are many dynamics compatible with the stationary distributions of equation (3.1). We take the standard Glauber dynamics [66,67] with each update step being associated with a different randomly chosen node.^{2}
Given a sufficiently long time series for any pair of nodes, we can then measure the time scales of their dynamics. We focus here on the decay of the overlap, 4.3where a single step, Δt equal to one, is an update of a randomly chosen spin. The function C(t, t_{w}) decays from unity (at t_{w} equal to zero) down to (at noises below the glassy phase) a noise floor given by the Poisson statistics of uncorrelated spins. It can be used to measure a number of different properties, including that of ageing below the spin glass transition [63]. Here, we measure τ_{w}, the time it takes C(t, t_{w}) to cross a particular threshold. In figure 6, the threshold is taken to be 0.5, so that τ_{w} is the average time for a node to flip with 25 per cent probability. Because of the longtailed distribution of relaxation times, we follow Billoire & Marinari [70] in estimating τ_{w} by the median, instead of the mean.
Relaxation times for spinglass systems are themselves timedependent—the longer one lets the system run, the longer the correlation time becomes. This is referred to in the physics literature as ‘aging’ [71]—correlational properties depend on the age (time since initialization by random initial conditions) of the system. We also see evidence for timedependent correlation functions past the critical point, similar to that found by Mélin et al. [63], but focus here on the contrasting behaviour of the relaxation time at constant age. We are here in the strongly outofequilibrium regime (long time scales on a newly initialized network).
Figure 6a shows how τ_{w} scales with β. The strongest differences between the two networks emerge in the lownoise (highβ) regime. In particular, branching networks, with their hubandspoke topologies, show time scales more than two orders of magnitude longer than their nested counterparts.
The differences are due to bottlenecks to communication that exist between distant parts of the network in the branching case. Because all paths between distant nodes must pass through a single point, the speed of communication is limited by the time scale for that single point to change state.
This is similar to the frustration effects seen in the Cayley tree by Mélin et al. [63]. In contrast, in the nested case, there are multiple paths between distant points, and this means that longtimescale frustration effects are limited—made rarer, although not completely eliminated.
That small world networks, if they rely on hubandspoke topologies, are actually slower is connected to the emergence of longlived metastable states. The analogues of domain walls for inhomogeneous networks—separated parts of the system that fall into opposite states of local consensus—emerge at low noise. These walls propagate through the system until they meet bottlenecks—places where disparate parts of the network connect via a single node—and are effectively pinned for long periods. Multiple paths, in contrast, increase the number of points of contact between different neighbourhoods.
The dispersion in behaviour (figure 6b) for both networks is large. This shows the effect of nonequilibrium dynamics and an (effective, since finitetime) breaking of ergodicity. In some cases, the initial conditions, after burnin, may lead to a particularly ordered configuration from which the system departs with only vanishing probability. In other cases, metastable states are not as longlived, and relaxation can happen quickly.
That one can achieve dispersion in behavioural time scales of nearly five orders of magnitude from a system with only approximately 10^{2} nodes is remarkable. The dispersion, which itself sees an exponential rise at β ∼ 1, is another indication of the presence of a finitesize critical phase, present only in the mesoscopic regime.
4.3. Fluctuation localization
Although branching networks are smaller—nodes are, on average, closer to each other—we have shown by simulation that the time scales of dynamical change are much longer (figure 6). Meanwhile, we can determine how many new configurations become accessible as the noise declines from our analytic determination of the heat capacity (figure 5).
In this section, we examine features relevant to informationprocessing, which is a property of both the stationary properties of the network (how many configurations are accessible) and the dynamical ones (how quickly one configuration turns into another).
In particular, we ask about the entropy of the system over finite time, and how and where that information is stored: locally (in single nodes), on the motif scale, or nonlocally, across widely separated motifs. Such questions are essential to biological function: distinct substructures must not only process information by means of local motif patterns, but also communicate the results of that processing to more distant nodes. Anomalous concentrations of a metabolic product, say, may be detected by influences on one part of the system, but may need to trigger a transcriptional cascade in a different module.
For systems where bits are largely independent, the multiinformation is close to zero, indicating that very little information is exchanged between subgroups. When nodes come to process information in complex ways, however, the multiinformation becomes larger, indicating that apparent randomness at the local scale becomes pattern exchange on larger scales. Formally, the multiinformation at any particular scale is the decrease in entropy seen when the distributions taken by the smaller scales are combined into a joint probability distribution.
We measure the multiinformation [72], a generalization of mutual information used to describe cooperative informationprocessing [73,74]. For the case of three subsystems, whose internal states are represented by a vector, we have for the multiinformation 4.4
We consider the multiinformation between the motif and global scale, with the three the most widely separated, but equidistant triangle motifs chosen as the x_{i} sets. In words, the first term of equation (4.4) is the total entropy of the subsystems considered in isolation of each other; if there were no longrange synchronization, this would be equal to the second term, and the multiinformation would be zero. Conversely, because the maximum amount of information in the subsystem is nine bits, the maximum amount of multiinformation is six bits (all three distant motifs perfectly correlated).
Figure 7 shows these results, computed directly from simulation. We estimate the multiinformation using the NSB estimator [75,76]—we find it leads to good estimates of simulated datasets with the dramatically lower entropies one expects past the mesoscopic critical points.^{3}
Both the branching and nested structures show a distinct window at which longrange synchronization is the strongest and roughly half a bit can be communicated between distant parts of the system. At first, as noise decreases, distant nodes become more correlated (as in figure 5), and the multiinformation rises; however, at low noise (large β), fluctuations on all scales are frozen in. In both cases, this window appears around the same noise level as the peak of the heat capacity; this provides additional support to the description of a mesoscopic phase transition, because more conventional thermodynamic systems are known to have maximum multiinformation at the critical point [77].
5. Discussion
In contrast with the regular lattices of field theory, complex networks are characterized by both smallscale pattern and largescale structural diversity. On small scales, repeating network motifs [78] indicate strong local inhomogeneity. On large scales, networks may be characterized by modularity or by largescale motifs visible under coarsegraining or aggregation of vertices [79,80]. The study of such transformations on complex networks has uncovered evidence for selfsimilarity [6], and smallscale and largescale network structures, for example, are found to be correlated in cellular networks [81].
In this contribution, we compared two alternative topologies—networks of branching motifs (with one pattern replicated many times) and networks of nested motifs (where patterns play the role of templates). Branching networks have a familiar treelike structure and possess the smallworld property; their benefits include efficient signal propagation at high noise. Nested networks retain selfsimilarity but without small world scaling, and confer benefits such as redundant paths between distant nodes at the cost of longer path lengths.
A central theme has been the difference at the onset of a mesoscopic version of a phase transition. Phase transitions in general occur in networks when the exponential fading of a correlation along a particular path is balanced by the exponential increase in the number of paths between the two points [58]. In complex networks, this implies that structural inhomogeneity on a range of different scales will be relevant for the critical behaviour analogous to that found in more regular systems.
Our investigation has uncovered a number of counterintuitive properties of smallworld systems. Smaller diameter networks adjust more slowly, have shorter correlation lengths and cannot achieve the levels of nonlocal integration seen in those nested systems. Our analytic exposition of the problem shows explicitly how the onset of correlations are driven by the existence of multiple paths between points; our simulations show how the existence of such paths allows for the more rapid dissipation of inhomogeneity. Multiple paths are thus central for both informationprocessing and the time scales of coordination.
In some cases, the characteristic features of the smallworld topology listed in table 1 are desirable. They can lead to greater modularity, on longer time scales, than they would for more ‘open’ topologies with longer path lengths. At low noise, their fluctuations are more localized, meaning that fluctuations in distant structures are increasingly independent, and disjoint memories do not merge and fade as fast. Depending on the nature of computation, these may be desirable properties—as they are, for example, in the case of the liquid state model [82].
The existence of such paths also bears on the question of network robustness—particularly under targeted attack [83]. When all correlational information between two nodes must travel along a single path, the failure of any intermediate node is catastrophic. Conversely, robustness to node deletion will, in general, increase as the number of distinct paths between points increases, even if the number of edges remains constant.
We suggest that our work is particularly relevant to the study of informationprocessing in the brain [84,85]. On the one hand, the maximum entropy model of §3 has formed the basis of a powerful set of models for the description of observed neural correlations [86], and the informationtheoretic quantities we have investigated are directly related to the Tononi ϕ measure [87] and the C_{N} measure of Tononi et al. [88]. These latter measures consider various bipartitions; the fractal structure of our networks naturally suggests extensions of these measures to the tower of higherorder correlations as described in Schneidman et al. [72].
On the other hand, the multiscale structure of the brain—from scales of 50 mm to centimetres—is wellestablished [89, and references therein]. The topological and dynamical properties of certain random and deterministic selfsimilar wirings, relevant to neuroscience, have been under recent investigation [89,90]. Our work has direct bearing on explicit models of cortical network architecture [91], and in particular suggests that smallworld path lengths may not be the only way in which a network might optimize informationprocessing.
Selfsimilar network properties have proved relevant to the study of a vast range of other natural systems, from generegulatory [10,11] and metabolic networks [12], all the way up to food webs [17] and human social networks [13–16]. In the case of social networks, for example, branching networks with completegraph motifs are smallworld examples of the robust social quilts studied by Jackson et al. [92], while ‘span of control' theories [93] address the consequences of hierarchy for informationprocessing and dynamics [94]. Hierarchical structure may also be associated with the emergence of long time scales associated with strategic informationprocessing in animal systems [95].
In parallel, the maximum entropy models we consider here have proved useful not only in studies of neural functioning, but also in studies of the immune system [96], and animal behaviour [97,98]. In many cases, such systems are found at criticality [99], making it important to understand the mesoscopic regime.
The analysis of this paper suggests that statistics related to the existence of multiple paths in a network may be an important way to determine how relevant structural features have been organized to achieve the contrasting properties found in table 1. It may not be necessary to compute all the relevant Feynman diagrams in a graph to answer central questions about the nature of the critical point and ordered phase. When calibrated against the exactly solved models of this paper, statistics related to the scaling of the number of paths between vertices as a function of distance may be sufficient to study both the nature of the critical point and the existence of nonequilibrium effects. We leave this question for future investigation.
Most theoretical studies have focused on comparing the functional implications of selfsimilar and nonselfsimilar networks. We have found none that considers the functional implications of alternative selfsimilar networks. If it proves to be true that constraints of development account for, and impose, widespread network selfsimilarity, then variations on a fractal theme will become the principal means by which development might tinker with functionally important properties.
Acknowledgements
S.D. thanks Van Savage, Tanmoy Bhattacharya, Simeon Hellerman, George Bezerra and the Institute for the Physics and Mathematics of the Universe, University of Tokyo, Japan, and acknowledges the support of an Omidyar Fellowship. S.D. and D.C.K. acknowledge the support by National Science Foundation (grant no. 1137929), ‘The Small Number Limit of Biological Information Processing’. D.C.K. acknowledges a John Templeton Foundation award on the Origins and Evolution of Regulation in Biological Systems.
Appendix A. Formal definitions of branching and nested networks
Beginning with a motif, M, with N(M) vertices, we build up, by iteration, a larger structure, S(q, M), where q is the number of iterations and S(q = 0, M) is M. The motif directs the assembly of increasingly larger structures, in a recursive fashion, providing the graph with both small and largescale inhomogeneity.
At the qth iteration, replace the vertices in S(q−1, M) by separate copies of M, and rewire the system while maintaining the local motif structure. One might take the vertices of a triangle, for example, and replace each of them by a copy of the same threenode structure. The different ways to accomplish this model how a network may develop the internal structure of its subsystems; going in the other direction, a particular choice defines a coarsegraining operation that might form an element of a renormalization group.
More formally, for each vertex i in M, replace the vertex by a copy of M, M_{i}. For each vertex j in M, connect the free edges—those remaining from the previous iteration that were attached to vertex i—to the internal vertices of M_{i}, by some mapping f(i, j) (generally not symmetric).
At the qth iteration, take M and replace each vertex i in M with copies of S(q−1, M). The rewiring now takes an edge from the jth subunit to the ith subunit, and attaches it to the f(i, j) vertex in S(q−1, M). The f(i, j) vertex for S(2, M) is defined as the f(i, j)th vertex in the f(i, j)th subunit, and so forth for higher values of q.
Graph S(q, M) has N(M)^{q+1} vertices and , or , bonds. The average degree of S is always close to that of the local graph M, so that sparse networks remain sparse; however, the higher moments of the degree distribution may grow dramatically depending on the choice of f.
Going from S(q, M) to S(q−1, M) is a form of renormalization [43]. Once M is chosen, the remaining choice is that of the assembly rule, f(i, j); we consider the two simplest cases f(i, j) equal to i (branching assembly), and to j (nested assembly). These operations are easier to see graphically; for the example of a triangle motif being replicated at multiple scales by the two methods, see figure 1. For a more explicit example of how the f(i, j) rule works, see figure 8.
Appendix B. Ising solutions in the direct configurational method
In §4.1, we examined stationary properties of the system. Because of the divergence of time scales discussed in §4.2, it is difficult to determine reliable measurements of these properties from simulation. Here, we discuss the DCM, which allows one to write down expressions for these properties analytically. The expressions are long, but tractable by analytic methods that use computer algebra. They enable us to separate finitesize–finitetime effects (accessible by simulation) from finitesize–infinitetime effects associated with equilibrium.
A number of different expansions for the correlations can be written in the highnoise (i.e. β ≪ β_{c}) limit. The most common, known as the linkedcluster expansion [100], has formed the centre of studies of the Ising model on regular lattices [101–103].
Because of the attention paid to lattices with great amounts of symmetry and of infinite extent, less often used are the exact solutions, expressible as a power series with a finite number of terms (tanh βJ)^{n}, available for lattices of finite size. This ‘direct configurational method’ [57, ch. 2] allows one to write an expression for the partition function of a graph directly, by enumerating all of the subgraphs (including disconnected subgraphs) of the original lattice with all vertices even. Similar expressions, with some vertices ‘rooted’ in various ways, allow one to determine correlation functions through partial derivatives of Z.
For a system of any appreciable size, enumerating the disconnected graphs is a nearly impossible computational task. Finding the free energy, F, equal to ln Z, turns such a sum of disconnected graphs into a far shorter sum involving only connected graphs, with multiple bonds between vertices allowed, weighted in a new fashion [104, ch. 20]. These are the usual Feynman diagrams, and allow one to handle an arbitrarily large lattice to finite order in β. When the lattice has translation symmetries, bond and vertexrenormalization [100,105–107] becomes possible, making computations to very high order possible (currently around 20th order [108]).
In the case of a biological network, however, many of these techniques become impractical; the standard renormalization procedures are frustrated by the strong inhomogeneity in the network, and the unrenormalized graphs are far more numerous and still require computation of the symmetry factors. When a network is characterized by repeating motifs within a larger lattice, however, the enumeration of subgraphs becomes plausible.
In the DCM, to compute the partition function, Z, on a graph G, we take all subgraphs g of G with vertices even; this set is written E(G) and includes disconnected subgraphs. We can then write B 1where n(g) is the number of edges in graph (or subgraph) g, N(g) is the number of vertices, and v is tanhβJ. We take E(G) to include the ‘null graph’ with no edges. Finding the derivatives of Z with respect to a set of external fields amounts to allowing some vertices to be odd. We write, for example, B 2where E(G, a, b) are the subgraphs with all vertices even, when the effective number of edges coming in to vertices a and b are both incremented by one (note that E(G, a, a) is the same as E(G)). Then, B 3and higherorder (connected) correlations yet can be computed as B 4
Direct enumeration of all possible disconnected subgraphs rapidly becomes prohibitive, because computation time is exponential in the number of edges. For the motifs, however, with small n(M) (e.g. less than 10), the computation can be done on a modern desktop machine. Our general method will be to compose the partition function for S(q, M) from the partition function for S(q − 1, M).
B.1. Branching networks in the Ising model
Determining the partition function for the branching assembly rule is reasonably straightforward. It is aided by the treelike hierarchy that arises as the graph is built up; all disconnected, even graphs at any stage can be decomposed into the union of the set of disconnected graphs on the N(M) subgraphs S(q−1, M) and the disconnected graphs on the additional motif M that now forms the ‘highestlevel’ of the network. B 5
The free energy per vertex, ln Z_{q}/N^{q}, is a slowly decreasing function of q.
Computing the correlation function of such a system is again aided by the treelike hierarchy. At stage q, copies of the S(q−1, M) graph are placed at the N(M) locations. A vertex A on one of those copies can then be referenced by a string of q numbers , where a_{q} is the vertex number of M into which the S(q−1, M) graph containing A is placed.
Consider, to begin with, the correlation function between vertex A, {a_{1}, a_{2}}, and B, {b_{1}, b_{2}}, in S(2, M). When the roots are found on different subgraphs (i.e. a_{2} ≠ b_{2}), B 6where P_{1},_{ab} is the sum of all graphs on M even in all vertices except at vertices a and b which are odd (‘subgraphs of M rooted at a and b’): B 7
In words, the path from A to B requires leaving the subgraph containing A at a_{2}, crossing M, and entering the subgraph containing B at b_{2}. The additional factors of Z_{1}, the partition function on M, come from the other subgraphs that, if they are entered, must be left from the same vertex. The generalization to n roots is straightforward.
The general form for P can be written as B 8or, in words, that one must get to the most connected node on one's subgraph, and from there travel over the highestlevel M to the most connected node of the destination subgraph.
We consider two vertices {a} and {b} to be separated by a copy distance d, where d is the number of subgraphs one must traverse to reach B from A (formally, if a_{d} ≠ b_{d} but either d is the generation of the graph or a_{d+1} = b_{d+1}). The correlation function has the form of an exponential cutoff: B 9where 𝒫(d) is the set of all vertex pairs separated by copy distance d, and is the average correlation between different pairs in M.
In nesting, local interactions are increasingly less aware of the larger structures in which they are embedded; as copy distance increases, correlations die exponentially. Furthermore, the correlation between two vertices depends only on their relative positions in the hierarchy; the pair is insensitive to the extent of the rest of the graph.
These effects are due to the way in which the subgraphs are wired together; all interactions between different subgraphs pass through singlevertex bottlenecks that restrict the number of paths. In the next section, we shall see how the nested construction opens these bottlenecks—at the cost of larger graph diameters—and alters the critical behaviour.
B.2. Nested networks and the general form
The branching computations were reasonably simple because of the absence of redundant paths, or loops, above the motif scale. (Formally, the difference—the set of unshared edges—between two paths decomposes into a union of even subgraphs on the motif M.) The absence of larger redundant paths has many implications in addition to how it affects the correlation functions; for example, connections between distant nodes may be cut by removal of a single vertex.
The nested rule partition function appears harder to compute because of the existence of loops and redundant paths on all scales. However, a general algorithm for the computation of an arbitrary P_{q,{a},{b}} may be specified. One decomposes the problem into two parts. One first considers how to traverse the ‘coarsegrained’ graph, at the highest level; and then considers how to travel ‘within’ each coarsegrained vertex to complete the path. The difference between nested and branching then amounts simply to which particular node address on subgraph A allows one to jump to subgraph B.
More formally, P_{q,{a},{b}} is the sum over the motif M in the following way.

— At level q, one has a set of roots, , . Each of these roots corresponds to a root in one of the S(q−1, M) copies. For example, the copy number a_{q} has a root .

— Consider in turn each subgraph m in motif M (where m can be disconnected or connected, odd or even).

— Each edge of that subgraph gives two additional roots, one associated with each end of the edge. For example, an edge between nodes a_{q} and b_{q} leads to two new roots, one for the copy a_{q} and one for the copy b_{q}.

— If the graph has been constructed by branching, the additional root for the a_{q} copy is (a list q−1 entries long).

— If the graph has been constructed by nested iteration, the additional root for the a_{q} copy is (a list q−1 entries long).

— Thus, for each S(q−1, M) copy, we have a set of roots, r_{i}.

— Add together v^{n(m)} and the product of the N(M) P_{q−1,ri}.
Note that ensuring the final path is even is deferred to the bottom level, when is computed.
Equation (B 1) is the basis of the DCM; some examples of this expression for small graphs can be found in Sykes & Hunter [109]. While enumeration of graphs much larger than 30 bonds is impossible, using the methods described in the text, it is possible to build up much larger graphs with branching and nested properties of interest. With these equations, and equations (B 5) and (B 8), an arbitrary hierarchy may be constructed, because there is no restriction on the form of P_{q−1}.
We have checked the central formulae explicitly through subgraph enumeration on figure 2; the results for three iterations we have checked through seventh order in β, and thus in v, by an unrenormalized linkedcluster expansion, using Feynman diagrams in the standard fashion [100,110].
Footnotes
↵1 Formally, they have logarithmic scaling of diameter, and no loops on scales above the motif size.
↵2 We expect other local update rules, such as Metropolis [68], to have similar dynamical properties, with differences appearing only on introduction of nonlocal rules such as those of Swendsen & Wang [69].
↵3 When used to estimate the multiinformation by subtraction, we find that the estimator is not unbiased; this effect is overwhelmed, however, for multiinformation measurements larger than 10^{−2} bits, by the intrinsic dispersion of simulation runs.
 Received December 1, 2011.
 Accepted February 7, 2012.
 This journal is © 2012 The Royal Society