## Abstract

Inferring properties of the interaction matrix that characterizes how nodes in a networked system directly interact with each other is a well-known network reconstruction problem. Despite a decade of extensive studies, network reconstruction remains an outstanding challenge. The fundamental limitations governing which properties of the interaction matrix (e.g. adjacency pattern, sign pattern or degree sequence) can be inferred from given temporal data of individual nodes remain unknown. Here, we rigorously derive the necessary conditions to reconstruct any property of the interaction matrix. Counterintuitively, we find that reconstructing any property of the interaction matrix is generically as difficult as reconstructing the interaction matrix itself, requiring equally informative temporal data. Revealing these fundamental limitations sheds light on the design of better network reconstruction algorithms that offer practical improvements over existing methods.

## 1. Introduction

Networks are central to the functionality of complex systems in physics, engineering, biology and medicine [1–3], often serving as a conduit to the system dynamics and fundamentally affecting their dynamic behaviour [4–10]. Network reconstruction (NR)—recovering properties of the underlying interconnection network of the system from temporal data of its nodes—thus provides a first step to understand, diagnose and control the dynamics of diverse networked systems [11–17]. Let be the activity of node *i*—representing the concentration of a certain biomolecule, or the abundance of a certain species, etc.—in a networked system of *n* nodes. Assume that the system dynamics is dominated by pairwise interaction and can be described by a set of ordinary differential equations (ODEs)
1.1where the *coupling functions* specify how two nodes interact, and represents some known signals. The *interaction matrix* captures direct interactions between nodes, defining the interconnection network of the system by associating *a*_{ij} to the directed edge *j*→1. By appropriately choosing the coupling functions, equation (1.1) models a broad class of networked systems [18].

Given some function of the interaction matrix—which we call a *property*— NR aims to recover from given temporal *data* , , and given knowledge of the coupling functions. Reconstructing the identity property (i.e. *A* itself) is equivalent to the classical *parameter* (PI) problem [19,20]. But we can also reconstruct other properties of *A*, such as its *sign pattern* , *connectivity pattern* , *adjacency pattern* ( is the Kronecker delta) or *in-degree sequence* . Note the adjacency pattern is obtained from the connectivity pattern by ignoring self-loops. Many characteristics of networked systems such as sign stability [21], robustness [22], controllability [8,9,23], observability [10] and epidemic thresholds [24] can be determined from the above properties without knowing *A*. In general, cannot be reconstructed from correlation or association analysis, but requires more sophisticated NR algorithms [11–17,25,26]; see also electronic supplementary material 1 for a summary of existing NR algorithms. However, after a decade of extensive studies, some core problems in NR remain open [27–33]. In particular, we still lack an identifiability analysis characterizing the conditions on the temporal data and knowledge of the coupling functions that are necessary to reconstruct a desired property [34,35].

Our main contribution here is to derive such necessary conditions in the ideal case (i.e. all state variables are measured without any measurement noise), rendering fundamental limitations of NR in the sense that those conditions are necessary regardless of which NR algorithm is used. By checking these conditions, we can decide if an NR algorithm failing to recover is due to some design flaws, or due to limitations intrinsic to the available temporal data and/or our knowledge about the coupling functions. The relevance of our analysis is that we are able to establish, for the first time, if NR problems can be solved with less informative data than that required for solving the classical PI problem, which is one of the major motivations for using NR instead of PI algorithms. Indeed, our intuition suggests that this should be possible because we are recovering less information (e.g. *K* instead of *A*). But, is this true?

A property can be reconstructed if and only if (iff) any two interaction matrices with different properties produce different node trajectories , a notion of *identifiability* or *distinguishability* [19]. Note that in case all state variables are measured (i.e. if the so-called ‘observation function’ satisfies ), an ODE model with parameters *A* is called *structurally identifiable* if any two distinct parameters produce different right-hand sides of the ODEs [36]. Structural identifiability is a property of the considered model only, independent of the measured temporal data. Hence, the lack of structural identifiability can only be solved by modifying the model itself (e.g. by decreasing the number of unknown parameters). This is not the focus of our work, as we are concerned with the fundamental limitations governing which properties of the interaction matrix can be inferred from given temporal data. To make sure the system (1.1) is structurally identifiable, here we introduce a mild assumption that for any *j*. Note that structural identifiability does not necessarily lead to identifiability, because indistinguishability may arise for particular temporal data.

Define the *interconnection vector* and *regressor vector* of node *i*. Then, equation (1.1) can be rewritten as
1.2with the state vector. Distinguishing is then equivalent to the distinguishability of for .

Often, the true coupling functions are unknown to us and we only know a *family* of regressors to which the true regressor *f*_{i} belongs. Such uncertainty inevitably produces indistinguishability of the interconnection vector, because the parameters *a*_{ij} in (1.1) are not uniquely determined. We model the members of as *deformations* of the true regressor obtained by applying some transformation . Thus, is characterized by the *group* of admissible transformations for node *i*. We consider that is the group of continuous functions that preserves pairwise interactions in (1.1). Characterizing the uncertainty of the coupling functions in this way provides us with a geometric interpretation of the fundamental limitations of NR. Let be the subgroup of admissible linear transformations (associated with non-singular matrices with non-zero entries only in its diagonal and *i*-th column; figure 1*a*; electronic supplementary material 2). Let denote the transpose of , i.e. if and only if . Hereafter, we use the following observation: because , a *necessary* condition to reconstruct a property when is that it can be reconstructed when . Consequently, in order to characterize the fundamental limitations of network reconstruction, we focus on linear transformations only.

Two candidate interconnection vectors are indistinguishable if
1.3for some matrix . Multiplying this equation by from the left and integrating over we obtain
1.4where is a constant *n*×*n* matrix. Condition (1.3) obviously implies (1.4), but the converse is not so obvious (proposition 1 of electronic supplementary material 3). Hereafter, we write *M*_{i} instead of , unless the specific time interval is important for the discussion. From (1.4), the set of all pairs of indistinguishable interconnection vectors for node *i* is
1.5

The above analysis reveals two classes of fundamental limitations of NR (figure 1). *First*, unknown coupling functions cause two vectors to be indistinguishable if they can be transformed to each other via some . This set of indistinguishable vectors is the partition of by the *orbits* of the group (figure 2*a*; electronic supplementary material 2). An orbit is considered to be low dimensional if its dimension is less than *n* (e.g. the purple, blue, green and brown orbits in figure 2*a*). The orbits in figure 2*a* show we can only distinguish the adjacency pattern (i.e. if depends on *x*_{j} for , see proposition 2a and example 1 in electronic supplementary material 5). Other properties such as edge weights, connectivity pattern or degree sequence are indistinguishable and cannot be reconstructed (proposition 2b in electronic supplementary material 5). This is a consequence of the assumed prior knowledge of the coupling functions, because only the adjacency pattern is invariant to the transformations in , but other properties of *A* are not. *Second*, even if the coupling functions are exactly known, indistinguishability will appear if the temporal data are not informative enough in the sense that different interaction matrices produce the same node trajectories (figure 2*b*); see, for example, [37,38] and references therein. Equation (1.4) shows that indistinguishability due to uninformative data appears when and contains a subspace different from **0**. In other words, when the endpoints of *v*_{1} and *v*_{2} can be joined by a hyperplane parallel to . Such hyperplanes are called *fibres* of the quotient space .

Combining these two sources of indistinguishability, *v*_{1} is indistinguishable from *v*_{2} if it is possible to transform *v*_{2} using an element of in a way that the hyperplane passing through *v*_{1} and is a fibre (figure 2*c*). As orbits of intersected by a fibre of become indistinguishable, we can ‘glue’ them together to form a partition of into sets of indistinguishable interconnection vectors (figure 2*c*). Consequently, can be reconstructed only if all two sets in the collection belong to different orbits . If is contained in low-dimensional orbits, then we can reconstruct the adjacency pattern of the interaction matrix (right panel of figure 2*c*). Otherwise and all interconnection vectors are indistinguishable (left panel of figure 2*c*). Furthermore, once indistinguishability appears, there will always be indistinguishable interconnection vectors that are arbitrarily far from each other. Consequently, the reconstruction error of the edge weights and other properties such as sign pattern and adjacency pattern can always be maximal.

The matrix *M*_{i} in (1.4) is unknown when the true regressor *f*_{i} is unknown. Indeed, choosing any , , gives
Thus, transforms into (and vice versa), and we can only know the orbit corresponding to this subspace. For example, the condition for some implies that
1.6because (i.e. for any *G*_{i}). We can then tell if *M*_{i} is non-singular using any . Condition (1.6) is known as *persistent excitation* (PE) in the literature and it is necessary and sufficient to solve the PI problem [19,20]. In general, we can prove that (lemma 1, electronic supplementary material 6), so we can build the partition of indistinguishable vectors using any .

The above analysis only depends on the group property of the transformations, hence it can be straightforwardly extended to any other linear group , allowing us to shrink or enlarge according to our uncertainty about the coupling functions. Thus, in order to reconstruct some property of the interaction matrix, it is necessary that (i) our uncertainty about the coupling functions is small enough (i.e. any two sets in belong to different orbits of ); and (ii) the measured temporal data are informative enough (i.e. hyperplanes parallel to do not glue orbits together). As an obvious example, in order to reconstruct the edge weights, it is necessary to know the coupling functions exactly (i.e. ) because only then do any two vectors belong to different orbits.

It is possible to reduce to when we know the appropriate coupling functions to use (e.g. the generalized Lotka–Volterra equations for ecological systems, or linear functions if the system remains close to its steady state [18]). In this case indistinguishability emerges from uninformative data only. Consequently, a property can be reconstructed iff all two sets in the collection can be *separated* by a fibre (electronic supplementary material, figure S1). A fibre is a hyperplane and thus partitions in two regions; we say it separates *P*_{y1} from *P*_{y2} if *P*_{y1} belongs to one region and *P*_{y2} belongs to the other region or the fibre (figure 2*b*). By specifying the coupling functions we can reconstruct more information such as the interaction matrix itself, if the PE condition (1.6) holds. Without PE it is still possible to distinguish, for example, the adjacency pattern of the interconnection vector when is exactly ‘horizontally’ oriented. In fact, from the right panel of figure 2*c*, we can separate the sets of vectors with different adjacency patterns (orange and red regions) using the same red region as the separating fibre. However, such a situation is *non-generic* (i.e. pathological) in the sense that an infinitesimal change in the fibre's orientation will eliminate the distinguishability. Note also that other properties such as sign pattern, connectivity pattern or degree sequence are indistinguishable. Indeed, we can prove that there are only two generic cases (electronic supplementary material 7): (i) and indistinguishable vectors emerge only due to uncertain coupling functions (i.e. ); and (ii) is non-trivial and is contained in the *n*-dimensional orbit, so all interconnection vectors become indistinguishable (i.e. ). This implies that, even if the coupling functions are well known, the PE condition (1.6) is generically necessary to reconstruct any property.

Next, we show that prior information of *A* is the only way to relax the PE condition. Consider, for the sake of clarity, that the coupling functions are exactly known. Prior information shrinks the domain of the property from to , i.e. . Two typical cases are: (i) *a*_{ij} takes a finite number of values (e.g. binary interactions [25]) so is a discrete set as each is a point (electronic supplementary material, figure S4a); and (ii) *a*_{ij} are bounded as for some known constants . In this case , where *P*_{0} is an *ε* neighbourhood of zero and each of the 3^{n}−1 remaining sets lies in a different orthant of (associated with distinct sign patterns; see electronic supplementary material, figure S4b). In case (i), *A* itself can be reconstructed without PE if we can separate each point composing with a fibre. If , this is generically possible because an infinitesimal deformation will change any ‘pathological’ orientation that contains two points. In case (ii), the sign or connectivity pattern can be reconstructed without PE if there is a gap between the sets *P*_{y} such that a fibre can separate them (electronic supplementary material, figure S4b), but *A* itself cannot be reconstructed because it is impossible to separate two points inside one *P*_{y}. In electronic supplementary material 10, we illustrate our results in a basic NR problem using steady-state data, and in electronic supplementary material 4 we show how to use our analysis to build an NR method that correctly reconstructs the sign pattern of the interaction matrix. Note also that, if , providing the value of for *d*_{i} entries of *a*_{i} as prior information guarantees a correct reconstruction of . Sparsity of the interaction matrix can be equally effective prior information (electronic supplementary material 9).

Our analysis lays a firm theoretical basis to some fundamental limitations of network reconstruction that have been observed before via computer simulations—such as the role of the informativeness of the temporal data [34] and the fact that different networks may produce exactly the same temporal response [35]. A particular instance of our analysis is when different networks, such as those obtained from chemical reactions, produce the same dynamical model (i.e. they are structurally unidentifiable [37,39,40]) and hence they are indistinguishable for all temporal data. Indeed, despite the fact that the identifiability of the identity property has been studied in different disciplines including statistics and system identification [19,20,25,37,38], a similar analysis for all the other properties we can reconstruct using NR was lacking. The impact of uncertain coupling functions on the properties that can be reconstructed was unknown too. Our analysis can be straightforwardly extended to higher-order interactions (e.g. ) and some nonlinear parametrizations, e.g. (see electronic supplementary material 12). Note that, by contrast to the case of structural identifiability [40], nonlinearities in the coupling functions do not always enhance the distinguishability of a property for given temporal data (electronic supplementary material 14). We can also analyse the effect of noise and more general uncertainty of the coupling functions at the cost of less constructive results [41]. In such a case, the partition of indistinguishable interconnection vectors is given by a general equivalence relation; this makes it harder to characterize the fundamental limitations of NR, as the role of the uncertainty about the coupling functions and the informativeness of the temporal data cannot be disentangled anymore. The derived fundamental limitations for NR also apply in the case of noisy measurements, in the sense that the distinguishability of a property with noiseless data is a necessary condition for its distinguishability under noise. A more detailed analysis in the spirit of *practical identifiability* [42] would characterize the parameters *a*_{ij} and their properties that are more sensitive to noise. We note that the necessity of measuring all time-varying nodes in the network leads to another fundamental limitation in NR [43]; see also electronic supplementary material 11. Our results indicate that, as in the case of PI [39], a better knowledge of the system's coupling functions and prior information of the interaction matrix are extremely useful for NR (even when some algorithms explicitly ignore all knowledge of the system [31,32]). Furthermore, the generic necessity of PE means it can also be a guideline to informative enough experiments for general NR problems, letting us take advantage of the existing theory for optimal experiment design [44–46]. This calls for the design of better algorithms that incorporate such information and provide a guarantee of correct NR under PE (electronic supplementary material 4).

## Authors' contributions

M.T.A. conceived the project. Y.-Y.L. and M.T.A. designed the project. M.T.A. did the analysis with the help of G.L. and J.A.M. All authors analysed the results. M.T.A. and Y.-Y.L. wrote the manuscript.

## Funding

This work was supported by CONACyT, México (grant no. 207609), the John Templeton Foundation: Mathematical and Physical Sciences grant PFI-777 and European Commission grant no. 641191 (CIMPLEX).

## Acknowledgements

We thank Jean-Jacques Slotine and Travis Gibson for reading the preliminary version of this paper. We also thank the reviewers for insightful comments that helped us improve the paper.

## Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.3672004.

- Received November 30, 2016.
- Accepted January 3, 2017.

- © 2017 The Author(s)

Published by the Royal Society. All rights reserved.