# Spectral Clustering on Graphs

## An insight into the graph Laplacian

Clustering stands as a fundamental analytic tool in unsupervised learning occurring in the domain of data science and machine learning. Typically, the data we want to cluster is represented as multidimensional features, often realized as a point in Euclidean space. These features enable the use of prevalent clustering algorithms to categorize data points, grouping them based on their similarities within this multi-dimensional space.

However, the nature of graphs is quite different. In graphs, entities are not inherently characterized by feature representations. Instead, graphs consist of nodes interconnected by edges. These connections are what define similarity in graphs. A group of densely connected subset of nodes are considered to be similar to each other than the ones having sparse or no connection between them. As a result, clustering in graphs hinges on connectivity — specifically, the edges present in the graph. In the following sections, let us progressively build an intuitive understanding of graph clustering, employing graph connectivity to effectively partition graphs.

## Clusters on graphs

Graph clustering involves the division of graphs into multiple subgraphs, where each subgraph comprises nodes that exhibit similarity among themselves. Examining the graph depicted in Figure 1, it’s visually evident that there exist two subgraphs characterized by high connectivity within each and minimal connectivity between them. If we had to partition the graph into two, we’d pretty much agree on the split point. However, how do we establish this partition mathematically?

# Graph Laplacian

Let’s introduce some mathematical terminologies on the way to defining a graph Laplacian. A graph, denoted as G(V, E), comprises vertices or nodesV = {v¹, v²,..} and edges E = {(v¹,v³), (v⁵,v⁸),…}. The edges are better represented in the form of a matrix, which is referred to as the adjacency matrix A. This square matrix of size N*N (where N is the total number of nodes) denotes edge connections, with each entry indicating the weight of a connection between nodes. For undirected graphs, such as the one under consideration here, the adjacency matrix is symmetric. For ease of illustration, let’s assume all edges possess a weight of w=1 and a value of w=0 between nodes denotes the absence of an edge. In the case of the graph depicted in Figure 1, the adjacency matrix is as follows:

The degree matrix D is defined as a diagonal matrix with diagonal elements representing the total number of edges connected to a node (i.e. the degree of a node). For the graph presented in Figure 1, the degree matrix is as follows:

Finally, the graph Laplacian is defined as L = D - A :

## Interpretation of the Laplacian

The Laplacian operator we’ve defined looks markedly different from what we know from mathematics, where it is a differential operator that operates on a scalar field. Specifically, it occurs in areas of electrostatics where the Laplacian of electric potential measures the electric charge (or current); in heat, the Laplacian of temperature measures the heat source. In the following paragraphs, we’ll see how the graph Laplacian has a similar role.

To gain a deeper insight into graph behavior, let’s draw an analogy from physics. If the nodes in a graph are considered to be electrical bodies with a certain potential and the edges are the wires with certain conductance ( a unit conductance in our case), then the entire graph transforms into an electrical network. Consequently, current flows through these ‘wires.’ A row of the Laplacian matrix represents the connectivity of a particular node: the diagonal elements signify the total conductance for incoming current into the node, while the non-diagonal elements denote the conductance of edges for the outgoing current directed to corresponding nodes within the graph.

When the potential value within each node is X, the total current flowing into every node can be expressed as LX. A notable observation is that: if all nodes lie at the same potential, the net current flowing into each node in the network is zero i.e. if X=[1, 1, …, 1], LX = 0. This outcome is also consistent with electrical theories.

The multiplication of Laplacian with the potential of the nodes outputs the measure of a current source (net current flowing into each node).

# Alright, how does this help with clustering?

An insight for utilizing Laplacian in clustering lies in one simple but key observation: in a subnetwork of highly connected nodes, the similar the potential, the lesser the net current flow into them. This observation plays a pivotal role in mathematically defining the notion of similarity.

When the current flow in the network is minimum, the potential of a group of highly connected nodes are similar.

This insight allows us to segregate clusters based on potentials corresponding to minimal current flow. However, like in most mathematical scenarios, there is one major hurdle at the end. As previously discussed, the minimum possible net current in nodes is 0. In this scenario, all nodes have the same potential values and we’ll end up with a giant cluster consisting of the entire graph which is contrary to our objective. Hence, in addition to minimal flow, having dissimilarity in potentials becomes imperative for effective clustering. This necessitates some more analysis of the Laplacian in terms of its eigenvalues and eigenvectors. And in case you were wondering, this is where the word ‘spectrum’ enters into the realm of graph theory.

## Analyzing the Laplacian

The spectrum of a matrix denotes its eigenvalue decomposition. If we consider the Laplacian as a transformation from node potentials to current flow, we can analyze the amount of flow through the eigenvalues of the Laplacian. The smallest eigenvalue corresponds to the least flow, and conversely, larger eigenvalues indicate higher flows. When the eigenvalue is zero, corresponding to a net current flow of zero, it holds no significance for clustering. Therefore, to enable effective clustering, we turn our attention to the second smallest eigenvalue and its corresponding eigenvector. This choice ensures that the potentials are dissimilar, thereby facilitating clustering based on these dissimilarities.