## Abstract

Recent works revealed that the energy required to control a complex network depends on the number of driving signals and the energy distribution follows an algebraic scaling law. If one implements control using a small number of drivers, e.g. as determined by the structural controllability theory, there is a high probability that the energy will diverge. We develop a physical theory to explain the scaling behaviour through identification of the fundamental structural elements, the longest control chains (LCCs), that dominate the control energy. Based on the LCCs, we articulate a strategy to drastically reduce the control energy (e.g. in a large number of real-world networks). Owing to their *structural* nature, the LCCs may shed light on energy issues associated with control of nonlinear dynamical networks.

## 1. Introduction

The past 15 years have witnessed tremendous advances in our understanding of complex networked structures in various natural, social and technological systems, as well as the dynamical processes taking place on them [1–6]. Progress has also been made in the area of controlling complex networks, where the ultimate goal is to control nonlinear dynamical processes on complex networks. The interplay between nonlinear dynamics and complex networks makes formulating a general control framework too difficult to be addressed at the present. A reasonable compromise is to study linear controllability [7–23] while retaining the complex network topology in hope to gain insights into the fundamental control issues that can be useful for controlling nonlinear dynamical networks. Some representative results are the following. A key issue was to determine the minimum number of driver nodes required to steer the system from an arbitrarily initial state to an arbitrarily final state in finite time. In this regard, a pioneering approach [10] was to adopt the classic structural controllability theory of Lin [24] to directed complex networks whose structural controllability can be accessed via the maximum matching algorithm [25–27]. The effects of the density of in/out degree nodes were incorporated into the structural controllability framework [17], which was also applied to protein interaction networks [19]. Based on the classic Popov–Belevitch–Hautus (PBH) rank condition [28], an exact controllability framework was developed [16,20].

An issue of physical importance is the energy required to control a complex network. The energy bounds were first obtained for specific classes of networks [13]. For example, if the network adjacency matrix is positive definite, the lower bound of the energy approaches a constant but if the matrix is semi-positive definite, the lower bound scales algebraically with the control time. In these cases, the upper bound of the energy can still diverge. Quite recently, it was found [22,23] that under certain conditions (e.g. setting the number of controllers to one or the entire network size), for scale-free networks the actual control energy follows a power-law (algebraic) distribution with respect to uncertainties in the selection of the target state. In the extreme case where the structural controllability theory stipulates, mathematically, that a single controller be sufficient to control the entire network, there is a high probability for the energy to diverge. While the results provide insights into the feasibility of controlling complex networks, the physical underpinning for the algebraic energy distribution and energy divergence needs to be understood, which is the main purpose of our study.

In this paper, we uncover the general mechanism responsible for the control energy, without any restrictions on the number of controllers, topology and the target state. Our main result is the following. We find that, for any given complex network, the fundamental entities responsible for the control energy possess a chain structure, which we call the longest control chains (LCCs). (LCCs are conceptually different from control signal paths (CSPs), or stems [10,18]—see §2.3 for explanations.) Identification of LCCs enables us to obtain a physical understanding of the energy distribution, providing an explanation for our numerically discovered phenomenon that energy divergence is prevalent in real-world networks. The understanding also allows us to articulate effective strategies to drastically reduce the control energy (e.g. by many orders of magnitude). Because LCCs are a structural concept, we expect it to be useful for addressing the energy issue associated with control of nonlinear networks.

## 2. Results

### 2.1. Control energy distribution

We use the standard setting of network controllability [7,10,16]:
**x**=[*x*_{1}(*t*),…,*x*_{N}(*t*)]^{T} is the state variable of the whole system, vector **u**=[*u*_{1}(*t*),…,*u*_{M}(*t*)]^{T} is the control input or the set of control signals, *A* is the *N*×*N* adjacency matrix of the network, and *B* is the *N*×*N*_{D} control matrix specifying the set of ‘driver’ nodes [10]. The goal of the structural and exact controllability theories is to determine the minimum number of driver nodes, *N*_{D}, for networks of various topologies. With the input driving signal **u**_{t}, the control energy is defined as
**u**_{t}. The optimal control signal is

We use the Erdos and Rényi (ER) type of directed random networks [30,31] and the Barabási–Albert (BA) type of directed scale-free networks [2] with a parameter *P*_{b}. Specifically, for a pair of nodes *i* and *j* with a link, the probability that it points from the smaller degree to the larger degree nodes is *P*_{b}, and 1−*P*_{b} is the probability that the link points in the opposite direction (if both nodes have the same degree, the link direction is chosen randomly). We implement the maximum matching algorithm [10] to obtain the control matrix *B* and calculate the minimum energy for any given control time *t*_{f}. For an ensemble of randomly realized network configurations with identical structural properties, the control energy *E* can be regarded as a random variable. We find that, for a vast majority of the networks in the ensemble, the required control energy is enormous and tends to diverge. For the cases where the energy can be reasonably computed, it follows an algebraic distribution with fat tails, as shown in figure 1 with the scaling exponent of about 1.5, regardless of the network type and size.

### 2.2. Structural determinants of control energy and energy distribution

We develop a physical understanding of the large control energy required and also the algebraic scaling behaviour in the energy distribution. To gain insights, we first consider a simple model: a unidirectional, one-dimensional string network, for which an analytic estimate of the control energy can be obtained [23] as
*l* is the chain length (the number of nodes on the string) and λ_{Hl} is the smallest eigenvalue of the underlying *H*-matrix, denoted by *H*_{l}, which is related to the Gramian matrix by *H*≡*e*^{−Atf}*We*^{−ATtf}. Numerical verification of equation (2.4) is presented in figure 2*a*. Although equation (2.4) is obtained for a simple one-dimensional chain network, we find numerically and analytically that it also holds for random and scale-free network topologies.

A brief derivation of equation (2.4) is as follows. The control energy required by the system can be expressed as [13]
**x**_{0} is the initial state of the system. The matrix *H* is positive definite and symmetric with its inverse satisfying *H*^{−1}=*QΛQ*^{T}, where *Λ* and *Q* are the corresponding eigenvalue and eigenvector matrices, respectively. Thus, *H*^{−1} with λ_{H} denoting the smallest eigenvalue of *H*. Numerically, we find that **q**_{i} is the *i*th column of *Q*. If the initial state **x**_{0} is chosen to satisfy **x**_{0} does not change *E*(*t*_{f})’s order of magnitude. A detailed derivation can be found in [23].

We see from figure 2*a*, that the energy required for control tends to increase exponentially with the chain length, indicating that even for a simple one-dimensional chain network of limited length, such as *l*=7, the required control energy can be unbearably large. The exponential behaviour holds for complex networks as well, as shown in the inset of figure 2*b* for ER random and BA scale-free networks. In fact, figure 2*b* indicates a strong positive correlation between the average control energy for the network and *E*_{L}, the energy required to control the LCC (to be described below) embedded in the network.

### 2.3. Concepts of control signal paths and longest control chains

In a networked system, control signal and energy originated from the driver nodes travel through one-dimensional-string-like paths towards each of the non-driver nodes. As discussed, identifying maximum matching so that the network is deemed structurally controllable does not guarantee convergent control energy. When maximum matching is found, one can divide the whole network into *N*_{D} CSPs, namely, *N*_{D} *stems* [10,18], each being a unidirectional one-dimensional string led by a driver node that passes the control signal onto every node along the path, illustrated as the vertical paths in figure 3. CSPs thus provide a picture indicating how the signals from the *N*_{D} external control inputs reach every node in the network to ensure full control (in the sense of structural controllability).

How does the control energy flow through the network? To address this question, we distinguish two types of links: one along and another between the CSPs, as shown in figure 3. It may seem that the latter links are less important as the control signal propagates along the former set of links. However, owing to coupling, a node’s dynamical state will affect all its nearest neighbours’ states which, in turn, will affect the states of their neighbours, and so on. In principle, any driver node is connected with nodes both along and outside its CSP. Correspondingly, an arbitrary node in the network is influenced by every driver node, directly through the CSP to which it belongs, or indirectly through the CSPs that it does not sit on. Intuitively, the ability of a driver node to influence a node becomes weaker as the distance between them is increased. In order to control a distant node, exponentially increased energy from the driver is needed. The chain starting from a driver node and ending at a non-driver node along their shortest path is effectively a control chain. Intuitively, control energy flows through the control chains, while the control signal is propagated along the CSPs. We define the length of the longest control chain (LCC), *D*_{C}, as the *control diameter* of the network, as shown in figure 3.

To find the LCCs in a network, we first use the maximum matching algorithm to find all the driver nodes. We then identify the shortest paths from each of the driver nodes to each of the non-driver nodes. Finally, we pick out the longest such shortest paths as LCCs. The computational complexity of finding the LCCs are that associated with the maximum matching algorithm plus searching for the longest shortest paths, which is feasible for large networks. There can be multiple LCCs. The node at the end of an LCC is most difficult to be controlled in the sense that the largest amount of control energy is required. The number *m* of such end nodes dictates the degeneracy (multiplicity) of LCCs. An example is shown in figure 3, where we see that, although there are multiple LCCs, their ends converge to only three nodes, leading to *m*=3. Since the energy required to control a one-dimensional chain grows exponentially with its length in such a way that even one unit of increase in the length can amplify the energy by orders of magnitude (figure 2*a*), the energy associated with any chain shorter than the LCC can typically be several orders of magnitude smaller than that with the LCC. Thus, the total energy is dominated by the LCCs. Owing to the typically low value of *m*, a single LCC essentially dictates the energy required to control the whole chain system. This is true especially for networks with long LCCs. Intuitively, the probability to form long LCCs is small. Accordingly, a longer LCC tends to have smaller value of degeneracy *m*. As a result, the longest LCCs have almost no degeneracy (*m*=1) so that they effectively rule the control energy of the whole network.

In the structural controllability theory, a network is deemed more structurally controllable if *N*_{D} is smaller [10]. However, as the number of driver nodes is reduced, the length of the chain of nodes that each controller drives on average must increase, leading to an increase in the LLC length and accordingly an exponential growth in the control energy. In the ‘optimal’ case of structural controllability of *N*_{D}=1, the LLC length will be maximized, leading to unrealistically large control energy that prevents us from achieving actual control of the system.

Taken together, with respect to the previously defined concept of stem [10,18], we emphasize that a stem is a path that propagates control signal from the input in the absence of a feedback loop (or a circle), and each such path is determined by maximum matching, which allows a node to control at most one of its immediate neighbours. Thus, a stem is in fact a CSP, which is quite different from the concept of an LCC. Particularly, for a driver node and a non-driver node, a control chain is the shortest path between them. For a set of driver nodes and all non-driver nodes in the entire network, an LCC is simply the longest control chain. While control signals propagate through the CSPs, the energy may not flow along the same paths due to the interactions among the nearest neighbours via the physical connections. For example, the state change of a node can lead to state changes of all its nearest neighbours through the energy exchange between this node and all its neighbours. That is, control energy flows through the LCCs, not CSPs.

### 2.4. Control energy reduction scheme

Our finding of the LCC skeleton suggests a method to significantly reduce the control energy. A straightforward solution is to break the LCCs with *redundant* controllers beyond those obtained via maximum matching along the LCCs. Adding a redundant control input at the *i*th node of a unidirectional one-dimensional chain of length *l* breaks it into two shorter sub-chains of length *i*−1 and *l*−*i*+1. Roughly, the control energy is the sum of energies required to control the two shorter components, which is dominated by energy associated with the longer one owing to the exponential dependence of the energy on the chain length. For *i*≈*l*/2, the length of the longer part is minimized. In simulations, applying a single redundant control input presents an extremely efficient energy reduction strategy for one-dimensional chains: several orders of magnitude of reduction in the energy can be achieved. As a realistic physical example, we considered a bidirectional cascading parallel R-C circuit of seven units, which is effectively a one-dimensional chain of seven nodes with a self-loop at each node. For this system, the redundant control input can be realized by inducing external current input into a capacitor. A reduction in the joule energy of nearly 10 orders of magnitude is achieved [23].

For a complex network, the strategy of adding a redundant control at the middle of each LCC dramatically outperforms the strategy of randomly adding an identical number of control inputs. Treating the amount of the normalized energy reduction Δ*E*/*E* as a random variable across the ensemble of networks, we obtain a monotonously increasing distribution function *P*(Δ*E*/*E*), reflecting the effect of energy reduction. We find that high (low) Δ*E*/*E* values are more (less) probable under LCC-breaking strategy when compared with the random strategy. For networks with longer LCCs, the strategy works more effectively. For example, for networks with *D*_{C}=5, nearly 40% of the networks reach Δ*E*/*E*≈1, but the same reduction can be achieved only for 2% of the networks via the random strategy.

For a large number of real-world networks, the mathematical structural controllability theory stipulates that some of them are controllable with a few driving signals [10]. We find that most of these networks (15 out of 17) require unrealistically high energies due mainly to their long LCCs. Strikingly, even with unlimited energy supply, the number of driver nodes as determined by the maximum matching algorithm is generally insufficient to fully control the whole system, where there exists a number *M*^{★} of nodes that never converge to their target states. These observations lead to the speculation that, in order to fully control a realistic network, significantly more driver nodes are needed than those identified by maximum matching. For example, *N*^{★}_{D}=*N*_{D}+*M*^{★} driver nodes should be deployed to gain full control of the system, as shown in figure 4*a* for *n*_{D} and *b*. The quantity *E*^{★}—the control energy with *M*^{★} augmented driver nodes without any LCC-breaking control input. The energy reduction due to random control signal augmentation is again much less effective in most cases, giving further evidence that the control energy of real-world networks is generally determined by their LCCs.

To the best of our knowledge, the failure to converge to the target state with limitless energy supply has not been observed for any modelled networks: it only occurs in empirical (real-world) networks. It may be caused by some atypical structural properties. For example, for some networks, the uncontrollable nodes form self-loops but without any incoming links. For some other networks with an unusually high average degree, an ill-conditioned Gramian matrix can arise, prohibiting the system from converging to the target state. However, such uncontrollable cases (even with infinite supply of energy) are not expected to be generic.

## 3. Discussion

We fill a significant knowledge gap in the extremely active field of controlling complex networks by developing a physical understanding of the recently discovered phenomena of algebraic distribution and divergence of the required control energy. This is achieved through identification of LCCs, the fundamental structure embedded in the network that dominantly determines the energy. The understanding leads naturally to an optimization strategy to significantly reduce the control energy in real-world networks: increasing the number of controllers slightly by placing redundant control signals (beyond the number determined by the structural-controllability theory) along the LCCs. At present, the issue of control energy can be addressed only in the linear controllability framework, but we believe that the LCCs, due to their structural origin, can find counterparts in formulating theories of controllability and observability for nonlinear dynamical networks [9,21,32,33], a largely open problem deserving much attention and great research effort.

Our results that the LCCs essentially determine the network control energy are consistent with the finding in [34]. In particular, for a one-dimensional unidirectional chain with exactly one control input, its control energy has the standard definition that leads to the mathematical conclusion of energy divergence for long chains, while the finding that anisotropic networks can be readily controlled was not obtained under exactly the same setting. The dependence of a network’s control energy on its LCCs can be further demonstrated via our energy reduction schemes, effective for both modelled and real-world complex networks.

## 4. Methods

### 4.1. Longest control chain and control energy distribution

The construction illustrated in figure 3 provides a structural profile to estimate the control energy. In particular, a network can be viewed as consisting of a set of control chains, and the total energy *E* required has two components: *E*^{(1)}, the sum of energies associated with all control chains, and *E*^{(2)}, the interaction energies among the chains, defined as *E*−*E*^{(1)}. The control chains are approximately independent of each other so that *E*^{(2)} is negligible as compared with *E*^{(1)}. Each control chain is effectively a one-dimensional string. Owing to the exponential increase in the energy with the chain length, *E*^{(1)} can be regarded as the sum of control energies associated with the set of LCCs. The required energy to control the full network can thus be approximated as
*E*_{DC} denotes the energy required to control an LCC of length *D*_{C}, and *m* denotes its degeneracy (multiplicity), as shown in figure 3. Reasoning from an alternative standpoint, an arbitrary combination of *D*_{C} and *m*, which we call an LCC *skeleton*, effectively represents a network, and the entire network ensemble is equivalent to all the possible LCC skeletons according to a joint probability function *P*(*D*_{C},*m*). The energy distribution for the original network can be characterized by the distribution of the energy required to control the LCC skeleton in the ensemble. Through simulations, we find that the marginal distribution of *D*_{C}, *P*_{DC}(*D*_{C}), decays approximately exponentially:
*a* and *b* are positive constants. Using
*A* and *B* are positive constants, and consequently the distribution of *E*_{DC} as
*m* for networks with *D*_{C}>2 also exhibits an exponential decay:
*c* and *g* are positive constants. Note that *D*_{C} and *m* are uncorrelated, since a positive correlation would imply that the number of LCCs increases with their length, which is unphysical, and a negative correlation would lead to an exponential divergence of either *P*_{DC}(*D*_{C}) or *P*_{m}(*m*). We thus have
*Γ*(*b*/*B*) and *Γ*(*b*/*B*,*gE*) are the Gamma and incomplete Gamma functions, respectively. The distribution of *E* can then be expressed as
*e*^{−gE}/*E*≈0 due to the typically large value of *E*. Since numerically the difference between the two Gamma functions is approximately constant:
*P*_{E}(*E*) can be simplified as
*providing a theoretical explanation for the algebraic energy distribution*, where
*B*≈2 and *b*≈1. Accordingly, the theoretical estimate of the scaling exponent is 1+*b*/*B*≈1.5, which is consistent with the numerical value. The nearly identical exponent in the distribution of *E*_{DC} indicates that the role of the LCC degeneracy *m* is insignificant in the control energy and its distribution. It is the combination of the exponential decay in the distribution of the control diameter and the exponential increase in the energy required to control LCCs with their length which gives rise to the power-law energy distribution of the LCCs and ultimately leads to the algebraic distribution in the actual energy required to control the original network.

We remark that, since the probability *P*_{m}(*m*) has an exponential dependence on *m* and does not explicitly contain the energy *E*_{DC}, the summation in *m* from 1 to *E*/*E*_{DC} is a finite geometric sum, which differs from the integral in only a constant coefficient and does not affect the algebraic scaling exponent 1+*b*/*B*. In our analysis, the reason to treat *m* as a real valued number is that the relation *E*≈*mE*_{DC} holds only in the order-of-magnitude sense, implying that *E*/*E*_{DC} is typically not an integer. The limits

If we treat *m* (1≤*m*≤*E*/*E*_{DC}) as an integer and accordingly set the integral upper bound of *E*_{DC} as *E*, the cumulative energy distribution function will have a similar form to equation (4.9) with only a small difference in the constant coefficient. This would not have any significant effect on the probability density function *P*_{E}(*E*) in equation (4.12). Numerically, *E* is typically a large number in the order of at least 10^{10}. As a result, the numerical difference caused by the integral upper bound is negligible.

### 4.2. Double-chain interaction

Our analysis of the LCC-skeleton predicts power-law distribution of the required energy for practically controllable networks, which agrees qualitatively with numerics. However, interaction energy *E*^{(2)} among the coexisting chains is ignored. In a physical system, interactions among the basic components can play an important role in determining the system’s properties. To obtain a more accurate estimate of the behaviours of the control energy, we need to include the interactions among the chains. The necessity is further justified as there are discrepancies between the actual control energy and that from the LCC-skeleton, as exemplified in figure 2*b*. Thus, in order to reproduce the numerically obtained energy distributions, we must incorporate the interactions among the LCCs into the model. However, including the interactions makes analysis difficult, as there are typically a large number of interacting pairs of chains. To gain insight into the role played by the interactions, it is useful to focus on the relatively simple case of two interacting chains.

Our double-chain interaction model is constructed, as follows. Consider two identical unidirectional chains, denoted by *C*1 and *C*2, each of length *D*_{C}. Every node in *C*1 connects with every node in *C*2 with probability *p*, all links between the two chains are unidirectional. A link points to *C*2 from *C*1 with probability *p* and the directional bias *D*_{C} nodes and multiple randomized inter-chain links as determined by the parameters *p* and *a*, the distribution of the control energy displays a remarkable similarity to that for random networks, in that a power-law scaling behaviour emerges with the exponent about 1.5. The power-law distribution holds robustly with respect to variations in the parameters *p* and *b*, suggesting a universal pattern followed by pair interactions, regardless of the length of the chains. In particular, interactions between two chains, LCC or not, have a similar effect on the control-energy distribution. These results indicate that the double-chain interaction model captures the essential physical ingredients of the energy distribution in controlling complex networks.

## Authors' contributions

Y.C.L. devised the research project. Y.Z.C. and L.Z.W. performed numerical simulations. Y.Z.C., L.Z.W., W.X.W. and Y.C.L. analysed the results. Y.Z.C. and Y.C.L. wrote the paper. All authors gave final approval for publication.

## Competing interests

The authors declare no competing financial interests.

## Funding

This work was supported by ARO under grant no. W911NF-14-1-0504.

- Received January 28, 2016.
- Accepted March 17, 2016.

© 2016 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.