## Abstract

The persistence of homological features in simplicial complex representations of big datasets in *R ^{n}* resulting from Vietoris–Rips or Čech filtrations is commonly used to probe the topological structure of such datasets. In this paper, the notion of homological persistence in simplicial complexes obtained from power filtrations of graphs is introduced. Specifically, the

*r*th complex,

*r*≥ 1, in such a power filtration is the clique complex of the

*r*th power

*G*

^{r}of a simple graph

*G*. Because the graph distance in

*G*is the relevant proximity parameter, unlike a Euclidean filtration of a dataset where regional scale differences can be an issue, persistence in power filtrations provides a scale-free insight into the topology of

*G*. It is shown that for a power filtration of

*G*, the girth of

*G*defines an

*r*range over which the homology of the complexes in the filtration are guaranteed to persist in all dimensions. The role of chordal graphs as trivial homology delimiters in power filtrations is also discussed and the related notions of ‘persistent triviality’, ‘transient noise’ and ‘persistent periodicity’ in power filtrations are introduced.

## 1. Introduction

Topological data analysis is concerned with determining the topological structure of data [1]. One such approach to analysing large sets of discrete data points in

Traditional persistence theory typically uses Vietoris–Rips (Rips for short) or Čech Euclidean filtrations of *power filtration* of a simple graph *G*. It uses the graph distance *G* as the associated proximity parameter and generates the filtration by increasing the value of *r*. The *r*th complex in the filtration is the clique complex of the *r*th power *G* (such a filtration is called an *r filtration* and each complex in the filtration is called an *r complex*: note that an *r* complex is a Rips complex whose simplices are subsets of the vertices of *G* that are a distance of at most *r* from each other in the discrete metric space defined by *G*). Although persistent homology in a Euclidean filtration yields information about a dataset's topology in *r* filtrations, persistent homology features are considered to reflect important topological properties. Homology features which do not persist are generally regarded as relatively unimportant ‘topological noise’.

Because of the increasing interest in topological data analysis, much recent attention has been devoted to the development of software packages which can perform persistent homology computations with relative efficiency, e.g. [12–14] (it is interesting to note that a quantum mechanical algorithm has recently been developed that will exponentially speed up these computations—but alas—the quantum computers required to execute the algorithm do not yet exist [15]). It is shown below that in certain cases the girth of *G* can be used to reduce or eliminate such computations for *r* filtrations of *G* by defining not only a range *r* index values for which all of the homology features in all dimensions of the associated *r* complexes remain unchanged, but also a lower bound for the persistence lifetimes of every such feature. Although quantifying the girth of *G* can also place demands on computational resources, its evaluation can be cost-effective when compared with the resources required to compute all of the homology groups for each *r* complex in the

It will also be shown that chordal graphs—which have homologically trivial clique complexes—serve several important functions in *r* filtrations. As the power filtration of a connected graph *G stabilizes* at some power *r _{s}*(

*G*) (or

*r*when the graph referenced is clear) as a complete graph (which is a chordal graph), the associated filtration of

_{s}*r*complexes stabilizes at the

*stabilization distance r*(

_{s}*G*) as a homologically trivial simplex. In addition, if—in a power filtration of a connected graph

*G*–

*persistent periodic homology*feature peculiar to

*r*filtrations that can also induce persistent periodicity in

*transient noise*, i.e. ‘topological noise’ having the smallest possible lifetime in an

*r*filtration. It can also be the case that all of the complexes associated with

*persistent trivial homology*when

*persistent triviality*.

In order to make this paper relatively self-contained, relevant definitions and terminology are summarized in the next section. The theory of persistent homology in *r* filtrations (i.e. *r persistence*) is introduced in §3. Required preliminary lemmas are stated in §4 and the main results are developed in §5. Illustrative examples are presented in §6 and closing remarks comprise the paper's final section.

## 2. Definitions and terminology

A simple *graph G* is a pair *vertices* and *edges* or the empty set. The *order* of *G* is the cardinality *size* of *G* is the cardinality *adjacent vertices* when *e* is said to *join u* and *v* while *u* and *e* and *v* and *e* are said to be *incident*. Two distinct edges in *G* are *adjacent edges* if they have a common vertex. The *degree* *u* and the sum *Randić index* *G*. A graph is *regular* if each of its vertices has the same degree. Graphs *G* and *F* are *isomorphic graphs* (denoted

A *walk* is an alternating sequence of vertices and edges beginning with *u* and ending with *v* such that every edge joins the vertices immediately preceding and following it. A *path* is a *length*. In this case, *u* is said to be connected to *v*. *G* is *connected* if its order is one or if every two vertices in *G* are connected. The *graph distance* *G* is *cycle*. The graph *cycle graph* on *n* vertices. The *length* of a cycle is the number of edges contained within it and the shortest cycle of *G* is the *girth* *G*. A *chord* of a cycle is an edge between non-consecutive vertices in the cycle.

The *r*th *power* *G* is the graph with vertex set *u* and *v* in *G* is at most *r*. A graph is a *chordal graph* if every cycle of length of at least four has a chord. A graph *complete graph* on *n* vertices when *F* is a *subgraph* of *G,* denoted *G* induced by *G* is a *chordal cycle* if the subgraph induced by *sunflower* is the graph *n* independent vertices *suspended* if there is a vertex *G*, then *G* is *free*. A *clique* in *G* is either a vertex or a complete subgraph of *G*.

An *abstract simplicial complex S* on a finite set *A* is a family of sets *σ*. The elements of *A* are the *vertices* *of S* and each *k*-simplex *k*-dimensional *face of S*. The *clique complex* *G* is the abstract simplicial complex whose faces are the cliques of *G*. Associated with each complex *S* is a chain complex of abelian groups *k*-cycles in *S* and *k*-boundaries in *S*. If *k*-simplices in *S*, then *k*th *homology group* of *S* is the quotient group *k*-cycles by factoring out boundary cycles. If *S* is comprised of *m* connected components, then *m* copies of *S* is of dimension *δ*, then it has *S* is *homologically trivial* when

If *S* and *T* are abstract simplicial complexes, then a *simplicial map* *S*, then *T*. The simplicial map *induces homomorphisms* *S* and *T*. Complexes *S* and *T* are *homotopy equivalent* (denoted *S* and *T* are homotopy equivalent, then

## 3. Persistent homology theory for power filtrations of graphs

When applied to a complex, homology detects the presence of connected components, holes and voids in the complex. Rather than use the homology of a single complex as a description of the topology of a graph, it can be preferable to describe a graph's topology by identifying topological features detected by the homology that persist in a filtration of the graph. As already mentioned, persistent homological features can indicate potentially important topological properties of a graph, whereas features which do not persist can generally be regarded as relatively unimportant ‘topological noise’.

Useful insights into the topology of a simple graph *G* can be obtained from an understanding of the homology of clique complexes derived from the *r*th powers *G* (e.g. if *G* is connected and *r* can provide a measure of how close *G* is to being a homologically trivial simplex). To this end, let *r* vary over an appropriate distance range *G* to produce the *r* filtration and induced homology sequence given by the diagrams
*G* is connected, then the filtration stabilizes with the homologically trivial simplex

A non-zero homology class *c* is ‘born’ at *c*] is not in the image of *c*] is 0 in *r persistence lifetime of c* is then *c*] that is ‘born’ at *survivor*. *Transient noise* is a class *c*] is not in the image of *r* filtration.

A visualization of a complete *r* persistence analysis of *G* is given by the *complete r persistence barcode for G* which is a graphical representation of the multiset of all lifetime intervals for finite *k*. The *binary r persistence barcode for G* is a binary string *r*th entry corresponds to *G* exhibits persistent triviality which indicates that the topology of *G* is effectively homologically featureless (and possibly uninteresting) for all powers of *G*. However, if 010 is a substring of *r* corresponding to the position of the 1 in *G*'s vertices that are manifested as short-lived (but possibly interesting) topological features in

## 4. Preliminary lemmas

Results needed to prove or discuss the main results of this paper are presented in this section for the reader's convenience. The following lemmas have been established elsewhere and are stated here without proof.

### Lemma 4.1 ([16]).

*If* *then* *when*

### Lemma 4.2 ([17,18]).

*If G is a connected graph of order at least 3 with Randić index* *and girth* *then* *where equality applies on the left* (*right*) *if and only if G is a regular graph with a triangle*

### Lemma 4.3 ([19]).

*If G is a connected chordal graph, then* *is homologically trivial*.

### Lemma 4.4 ([20]).

*If G is a chordal graph with no sunflower, then* *is chordal for all*

### Lemma 4.5 ([21]).

*If G is a chordal graph, then so is* *for any*

### Lemma 4.6 ([20]).

*If G chordal and* *is not chordal, then G has at least one sunflower* *which is not suspended in G*.

## 5. Main results

In what follows, it is assumed that *G* is a simple graph. However, before continuing—for completeness—the above observations concerning *r* filtration stabilization are generalized as the following theorem. Since the consequence of the theorem is obvious, the theorem is stated without proof.

### Theorem 5.1.

*If G comprises m connected components* *then*

### 5.1. Persistent homology in *r* filtrations

The last theorem suggests the following persistent homology theorem for *r* filtrations:

### Theorem 5.2.

*If G has* *connected components*, *then the m non-zero homology classes of*

*are* *survivors*.

*Proof.*

This is an obvious consequence of theorem 5.1 and the fact that since *r* is a graph distance in *G* and the number of connected components remains invariant in

Hereafter, for the sake of simplicity and without loss of generality, it will be assumed that *G* is a connected graph.

The next result shows that the girth of *G* defines a power index range over which non-zero homology classes of

### Theorem 5.3.

*For some counting number* *let* *. If* *then*

*Proof.*

Lemma 4.1 implies that

It is clear that theorem 5.3 only applies when

The importance of chordal graphs as delimiters for persistent trivial homology in *r* filtrations is expressed in the next theorem. Chordal graphs have been studied extensively over the last several decades and efficient algorithms have been developed which can recognize when a graph is chordal (e.g. [22]).

### Theorem 5.4.

*Suppose that* *is an* *-free chordal graph. Then* *for every*

*Proof.*

Since *c*] are ‘born’ and ‘die’ in *c* can be no greater than

### 5.2. Persistent periodicity and transient noise in *r* filtrations

While it is the case that all chordal graphs are not closed under powers [23] (a situation for which closure occurs (lemma 4.4) has been applied in theorem 5.4), it is nonetheless true that every odd power of a chordal graph *G* is also chordal (lemma 4.5). Thus, the presence of a chordal graph at *r* filtration—at a minimum—guarantees that at least a *periodic homological triviality persists* in the associated complexes in the filtration. In what follows, it will be assumed that all power indices do not exceed the stabilization distance

### Theorem 5.5.

*If* *is a chordal graph, then*

*Proof.*

As

Thus, every entry corresponding to

As indicated by lemma 4.6, it can be the case that *G* is (however, if *G* are necessarily chordal [20] and *G* exhibits persistent triviality). This situation can produce transient ‘topological noise’ that is a feature peculiar to *r* filtrations and is described by the following corollary. Because this result is an obvious consequence of the last theorem, it is stated without proof.

### Corollary 5.6.

*Assume that G is a chordal graph and* *is not a chordal graph. Then every non-zero homology class* *is transient noise*.

A special case of this—*persistent periodic transient noise*—arises when every *r _{s}*th and final entry in

*G*that may provide useful insights into the properties of

*G*(e.g. see the sunflower graph example in the next section).

It is interesting to note (lemmas 4.4 and 4.6) the significance of sunflower subgraphs *r* filtrations. Obviously, if *G* is chordal, then *G* is closed under powers, the clique complexes of all powers of *G* (through its stabilization distance) are homologically trivial, and *G* exhibits persistent triviality. Perhaps not so obvious is the case where a sunflower subgraph in a chordal graph *G* prevents *G*, then *G* cannot be chordal because the suspension itself induces chordless cycles in *G*. Consequently,

## 6. Examples

It is worthwhile to illustrate several of the theorems developed above using graphs of relatively small order. As a first example consider the cycle graph *r* persistence barcode is the string

This example also illustrates theorem 5.4. In particular, as

Now consider the sunflower graph *c* can be contracted through the four 2—simplices corresponding to the petals of the sunflower onto the central square defined by the vertices 2,4,5 and 7. In addition, it is easily determined by inspection that

Because there are no non-zero homology classes in *r*).

Both Rips filtrations and Čech filtrations are used in the analysis of large datasets in *Σ* of data points in *k*-simplices are those subsets of *Σ* whose closed ball neighbourhoods of radius *β* have a common point of intersection, whereas points in the simplices in the associated Rips complex are pairwise within a distance *ϵ*. A Čech (Rips) filtration of *Σ* is performed by varying the value of the ball radius *β* (the value of *ϵ*) and constructing a Čech (Rips) complex for each *β* (*ϵ*) value of interest.

In what follows, a Čech filtration of a dataset *Σ* of 30 points in *Σ*. For the purpose of comparison, an *r* filtration of an associated relative neighbourhood graph *Σ* is used to highlight several advantages that can be obtained by using power filtrations instead of Euclidean filtrations in the analysis of large datasets.

The graphs of four Čech complexes obtained from a Čech filtration of the 30 point dataset *Σ* are shown in figure 2 for ball radius values *β* increases, the number of connected components in the graphs of the associated complexes decreases until the graph of the complex *γ* at *γ*, the lack of persistence in zero dimensional homology and the *β* range over which the rank of the zero-dimensional homology groups decreases from 14 to 1 provides a good description of the Euclidean compactness of *Σ* in *γ* would be detected by

A relative neighbourhood graph [24] on a dataset *X* has *X* as its vertex set with an edge between *ρ* and centre *x*, and *X* is that it provides a single graph representation of *X* that serves as a ‘primal sketch’ of its topological features.

The relative neighbourhood graph *Σ* are shown in figure 3. Observe that: (i) *Σ* that eventually emerged in the Čech filtration of *Σ* at a sufficiently large *β* value; and (ii) the cyclic structure of *Σ* persists in *c* is the cycle in

As a final example, consider the relative neighbourhood graph *Γ* representation of another dataset *a*. Observe that the single graph *Γ* immediately discerns three cycles (at different scales), whereas they only gradually emerge from a Čech filtration of *Ψ* as *β* increases. The power filtration of *Γ* provides essentially the same information about *Ψ* as the Čech filtration but—unlike the Čech filtration—it has the advantage that (in this case) it provides this information with a single graphical representation of *Ψ* so that *Ψ* is accessed only once to compute *Γ*. It is easy to see from theorem 5.2 that *Γ*.

## 7. Closing remarks

This paper has introduced the notion of using homological persistence in simplicial complexes obtained from power filtrations of simple graphs as an approach to probing their topological structure. This method is especially useful when applied to graphs with girths greater than 10. In these cases, the homologies of complexes in the filtration remain isomorphic and persist over a range of power indices that increases with increasing girth. An interesting feature of power filtrations of graphs is the fact that the emergence of a pre-stabilization chordal graph in a filtration signals the presence of trivial or periodic homology which persists for the remainder of the filtration.

Using as examples datasets in

In closing, it is important to note that the results of this paper can be applied directly to such cases as physical network and social network analysis where the data are naturally represented as a simple graph. In addition, in manifold learning a single simple graph is constructed and used to produce a data embedding into a lower dimensional space. The results of this paper provide a mechanism for understanding the topology of this graph along with the potential for producing additional information relevant to the associated embedding [25].

## Data accessibility

The following R code was used to generate the example 30 point dataset *Σ* in §6:

set.seed(435243)

N <- 30

theta <- seq(0,2*pi,length=N+1)[-1]

z <- cbind(cos(theta)+rnorm(N,0,.1),sin(theta)+rnorm(N,0,.1))

The following R code was used to append dataset *Σ* with 10 additional points to produce the example 40 point dataset *Ψ* in §6:

set.seed(89820)

r <- .1

n <- 10

thetar <- seq(0,2*pi,length=n+1)[-1]

x <- cbind(r*cos(thetar)+rnorm(n,0,.1*r),r*(sin(thetar)+rnorm(n,0,.1*r))

y <- rbind(z,x)

## Authors' contributions

A.D.P. developed the theory and drafted the manuscript. D.J.M. reviewed the theory, developed examples illustrating the theory, generated the figures and helped draft the manuscript. Both authors gave final approval for publication.

## Competing interests

We have no competing interests.

## Funding

Both authors were supported to perform this work by a grant from the Naval Surface Warfare Center Dahlgren Division's In-house Laboratory Independent Research Program.

- Received March 30, 2016.
- Accepted September 13, 2016.

- © 2016 The Authors.

Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.