Network Centrality and Clustering

Overview

We previously introduced the concept of a graph to describe networks. To further analyze networks we will discuss network centrality and clustering.

Motivation

A common question in the analysis of networks is to understand the relative importance of the nodes in the network.

For example:

in a social network, who are the most influential individuals?
on the Web, which pages are more informative?
in road network, which intersections are most heavily used?

The key idea is that the structure of the network should give us some information about the relative importance of the nodes in the network.

Centrality and Clustering

To analyze networks we want to be able to classify:

important nodes, and
important groups of nodes.

Determining which nodes are important nodes leads to the notion of centrality.

Determining which groups of nodes are important leads to the notion of clustering.

In order to classify important nodes and groups of nodes, we will use graph theory and linear algebra.

Centrality

Do some nodes in the network have a special role? Are some nodes more important than others?

These are questions of centrality (or prestige).

Definition

Centrality in graph theory and network analysis refers to measures that identify the most important vertices (nodes) within a graph. These measures assign numbers or rankings to nodes based on their position and influence within the network. Centrality can help determine which nodes are most influential, critical for connectivity, or central to the flow of information.

We will study three basic notions of centrality:

Closeness Centrality: A central node is close to all others.
Betweenness Centrality: A central node is on many paths through the network.
Eigenvector Centrality: A central node is connected to other central nodes (sometimes called status centrality).

To introduce concepts here, we’ll look at a very famous dataset in the history of network analysis: Zachary’s karate club.

Karate Club Graph

The back story: from 1970 to 1972 the anthropologist Wayne Zachary studied the social relationships inside a university karate club.

While he was studying the club, a factional division led to a splitting of the club in two.

The club became split between those who rallied around the club president and those who rallied around the karate instructor.

You can read the story of the Karate club here. This dataset has become so famous that it has spawned its own academic traditions.

Here’s a view of the social network of the karate club.

Code

Gk = nx.karate_club_graph()
np.random.seed(9)
fig = plt.figure(figsize = (12, 6))
ax1 = fig.add_subplot(121)
nx.draw_networkx(Gk, ax = ax1, pos = nx.circular_layout(Gk), 
                 with_labels = False, node_color='skyblue')
plt.title('Circular Layout')
plt.axis('off')
ax2 = fig.add_subplot(122)
nx.draw_networkx(Gk, ax = ax2, pos = nx.spring_layout(Gk), 
                 with_labels = False, node_color='skyblue')
plt.title('Spring Layout')
plt.axis('off')
plt.show()

Closeness Centrality

The closeness centrality of a node $i$ is an indicator of the proximity between node $i$ and all the other nodes in the graph.

Let $G=(V, E)$ be a connected graph with $n$ nodes. Let $d(i,j)$ be the shortest path distance between node $i$ and node $j$ in $G$.

The standard way of formulating closeness centrality is the reciprocal of the total distance to all other nodes

\[ \text{closeness}(i) = \frac{1}{\sum_{j \in V} d(i,j)}. \]

Large Closeness Centrality: Indicates that a node is, on average, close to all other nodes in the network. This means it can quickly interact with or reach other nodes. Nodes with high closeness centrality are often considered central or influential because they can efficiently spread information or resources throughout the network.
Small Closeness Centrality: Indicates that a node is, on average, farther away from all other nodes in the network. These nodes are less central and may take longer to interact with or reach other nodes. Nodes with low closeness centrality are often on the periphery of the network and are less influential in terms of spreading information or resources.

Code

# Assuming Gk is already defined as a graph
# Calculate closeness centrality
cent = list(nx.closeness_centrality(Gk).values())

# Set random seed for reproducibility
np.random.seed(9)

# Create figure and subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

# Draw the first subplot with circular layout
pos1 = nx.circular_layout(Gk)
nodes1 = nx.draw_networkx_nodes(Gk, pos=pos1, ax=ax1, node_color=cent, cmap=plt.cm.plasma)
nx.draw_networkx_edges(Gk, pos=pos1, ax=ax1)
ax1.set_title('Closeness Centrality')
ax1.axis('off')

# Draw the second subplot with spring layout
pos2 = nx.spring_layout(Gk)
nodes2 = nx.draw_networkx_nodes(Gk, pos=pos2, ax=ax2, node_color=cent, cmap=plt.cm.plasma)
nx.draw_networkx_edges(Gk, pos=pos2, ax=ax2)
ax2.set_title('Closeness Centrality')
ax2.axis('off')

# Add colorbar to the figure
cbar = fig.colorbar(nodes1, ax=[ax1, ax2], orientation='horizontal', fraction=0.05, pad=0.05)
cbar.set_label('Closeness Centrality')

# Show the plot
plt.show()

In this graph, most nodes are close to most other nodes.

However we can see that some nodes are slightly more central than others.

Code

plt.figure(figsize = (6, 4))
plt.hist(cent, bins=np.linspace(0, 1, 30))
plt.xlabel('Closeness Centrality', size = 14)
plt.ylabel('Number of Nodes', size = 14)
plt.title('Distribution of Closeness Centrality', size = 16)
plt.show()

Betweenness Centrality

Alternatively we might be interested in the question, is the node on many paths?

If we picture the network as a conduit for information, then betweenness captures how important a node is to the communication process, or how much information passes through the node.

First, let’s consider the case in which there is only one shortest path between any pair of nodes.

Then, the betweenness centrality of node $i$ is the number of shortest paths that pass through $i$.

Mathematically we have

\[ \text{betweenness}(i) = \sum_{i \neq j \neq k \in V} \begin{cases} 1 &\text{if path from }j\text{ to }k\text{ goes through }i, \\ 0 &\text{otherwise}. \end{cases} \]

For a graph with $n$ nodes, we can convert this to a value between 0 and 1 by dividing by ${n \choose 2} = n(n-1)/2$.

In a general graph, there may be multiple shortest paths between $j$ and $k$.

To handle this, we define:

$\sigma(i \mid j,k)$ is the number of shortest paths between $j$ and $k$ that pass through $i$, and
$\sigma(j,k)$ is the total number of shortest paths between $j$ and $k$.

Then we define the dependency of $i$ on the paths between $j$ and $k$:

\[ \text{dependency}(i \mid j,k) = \frac{\sigma(i \mid j,k )}{\sigma(j,k)}. \]

You can think of this as the probability that a shortest path between $j$ and $k$ goes through $i$.

Finally

\[ \text{betweenness}(i) = \sum_{i \neq j \neq k \in V} \text{dependency}(i \mid j, k). \]

Code

# Create the graph
Gk = nx.karate_club_graph()

# Calculate betweenness centrality
cent = list(nx.betweenness_centrality(Gk).values())

# Set random seed for reproducibility
np.random.seed(9)

# Create figure and subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

# Draw the first subplot with circular layout
pos1 = nx.circular_layout(Gk)
nodes1 = nx.draw_networkx_nodes(Gk, pos=pos1, ax=ax1, node_color=cent, cmap=plt.cm.plasma)
nx.draw_networkx_edges(Gk, pos=pos1, ax=ax1)
ax1.set_title('Betweenness Centrality')
ax1.axis('off')

# Draw the second subplot with spring layout
pos2 = nx.spring_layout(Gk)
nodes2 = nx.draw_networkx_nodes(Gk, pos=pos2, ax=ax2, node_color=cent, cmap=plt.cm.plasma)
nx.draw_networkx_edges(Gk, pos=pos2, ax=ax2)
ax2.set_title('Betweenness Centrality')
ax2.axis('off')

# Add colorbar to the figure
cbar = fig.colorbar(nodes1, ax=[ax1, ax2], orientation='horizontal', fraction=0.05, pad=0.05)
cbar.set_label('Betweenness Centrality')

# Show the plot
plt.show()

We start to see with this metric the importance of two or three key members of the karate club.

Note that many nodes will have a betweenness centrality of zero – no shortest paths go through them.

Code

plt.figure(figsize = (6, 4))
plt.hist(cent, bins=np.linspace(0, 1, 30))
plt.xlabel('Betweenness Centrality', size = 14)
plt.ylabel('Number of Nodes', size = 14)
plt.title('Distribution of Betweenness Centrality', size = 16)
plt.show()

Adjacency Matrices

To define the next centrality, we need to represent graph as a matrix.

Given an $n$-node undirected graph $G = (V, E)$, the adjacency matrix $A$ is defined as

\[ A_{ij} = \begin{cases} 1, & \text{if $(i, j)\in E$ }\\ 0, & \text{otherwise} \end{cases}. \]

Below we show the adjacency matrix of the karate club graph.

Code

from PIL import Image, ImageFont, ImageDraw
from contextlib import contextmanager

@contextmanager
def show_complete_array():
    oldoptions = np.get_printoptions()
    np.set_printoptions(threshold = np.inf)
    np.set_printoptions(linewidth = 200)
    try:
        yield
    finally:
        np.set_printoptions(**oldoptions)
        
A = nx.adjacency_matrix(Gk, weight=None).astype('int').todense()
with show_complete_array():
    img = Image.new('RGB', (440, 530), color = (255,255,255))
    #fnt = ImageFont.truetype("Pillow/Tests/fonts/FreeMono.ttf", 30)
    ImageDraw.Draw(img).text((0,0), str(A), fill=(0,0,0))
img

An important way to think about adjacency matrices is that column $j$ holds $j$’s neighbors.

The adjacency matrix is nonnegative and symmetric.

Eigenvector Centrality

Eigenvector centrality is a measure used in network analysis to determine the influence of a node within a network. The centrality score of a node is proportional to the sum of the centrality scores of its neighbors. The main idea of eigenvector (or status) centrality is that high status nodes are connected to high status nodes.

Given a graph with $n$ nodes, let $A$ be the adjacency matrix of the graph, where $A_{ij}$ is 1 if there is an edge from node $i$ to node $j$, and 0 otherwise.

The eigenvector centrality $x_i$ of node $i$ is defined as

\[ x_i = \frac{1}{\lambda} \sum_{j=1}^{n} A_{ij} x_j, \]

where

$x_i$ is the eigenvector centrality of node $i$.
$\lambda$ is a constant (an eigenvalue).
$A_{ij}$ is the element of the adjacency matrix corresponding to the edge from node $i$ to node $j$.

The above equation says that the importance of node $i$ is proportional to the sum of the other nodes $j$ when $i$ is connected to $j$.

In matrix form, this can be written as

\[ \mathbf{Ax} = \lambda \mathbf{x}, \]

where $\mathbf{x}$ is the eigenvector corresponding to the the eigenvalue $\lambda$ of the adjacency matrix $\mathbf{A}$. The eigenvector $\mathbf{x}$ gives the centrality scores for all nodes in the network.

In general, there may be multiple eigenvalues $\lambda$ for which the above equation holds. In practice, the largest eigenvalue is used because it captures the most significant mode of influence propagation in the network.

Geometric Intuition

Eigenvector as a direction:
- The eigenvector is a direction in the network space. Applying the adjacency matrix (representing connections between nodes) to this vector, stretches the vector along that direction. The amount of stretch is determined by the eigenvalue.
High centrality, high stretch:
- A node with a large eigenvector centrality corresponds to an eigenvector that experiences a large stretch when transformed by the adjacency matrix. This indicates that it is connected to many other nodes that are also considered high-status.
No direction change:
- The eigenvector doesn’t change its direction when transformed, only its magnitude. This means that a node with a high eigenvector centrality remains relatively central even when considering its connections to other central nodes.

Key Points on Eigenvector Centrality

Eigenvector: The vector $\mathbf{x}$ represents the centrality scores.
Largest Eigenvalue: The centrality scores are derived from the eigenvector corresponding to the largest eigenvalue of the adjacency matrix.
Perron-Frobenius Theorem: For a connected graph, the largest eigenvalue is positive. In addition, there is only 1 corresponding eigenvector with all positive entries.

The Perron-Frobenius Theorem is an important theoretical tool when working with adjacency matrices, which are by definition nonnegative and positive.

Code

Gk = nx.karate_club_graph()
cent = list(nx.eigenvector_centrality(Gk).values())
np.random.seed(9)
fig = plt.figure(figsize = (12, 6))
ax1 = fig.add_subplot(121)
nx.draw_networkx(Gk, ax = ax1, pos = nx.circular_layout(Gk), 
                 node_color = cent,
                 cmap = plt.cm.plasma,
                 with_labels = False)
plt.title('Eigenvector Centrality')
plt.axis('off')
ax2 = fig.add_subplot(122)
nx.draw_networkx(Gk, ax = ax2, pos = nx.spring_layout(Gk), 
                 node_color = cent,
                 cmap = plt.cm.plasma,
                 with_labels = False)
plt.title('Eigenvector Centrality')
plt.axis('off')
plt.show()

Code

plt.figure(figsize = (6, 4))
plt.hist(cent, bins=np.linspace(0, 1, 30))
plt.xlabel('Eigenvector Centrality', size = 14)
plt.ylabel('Number of Nodes', size = 14)
plt.title('Distribution of Eigenvector Centrality', size = 16)
plt.show()

Centrality Comparison: Karate Example

Let’s compare the three versions of centrality we’ve looked at so far.

Code

Gk = nx.karate_club_graph()
fn = [nx.closeness_centrality, nx.betweenness_centrality, nx.eigenvector_centrality]
title = ['Closeness Centrality', 'Betweenness Centrality', 'Eigenvector Centrality']
#
fig, axs = plt.subplots(2, 3, figsize = (14, 8))
for i in range(3):
    cent = list(fn[i](Gk).values())
    np.random.seed(9)
    nx.draw_networkx(Gk, ax = axs[0, i], 
                 pos = nx.spring_layout(Gk), 
                 node_color = cent,
                 cmap = plt.cm.plasma,
                 with_labels = False)
    axs[0, i].set_title(title[i], size = 14)
    axs[0, i].axis('off')
    #
    axs[1, i].hist(cent, bins=np.linspace(0, 1, 30))
    axs[1, i].set_ylim([0, 27])
    axs[1, i].set_xlabel(title[i], size = 14)
axs[1, 0].set_ylabel('Number of Nodes', size = 14)
plt.show()

It appears that we have identified the teacher and club president as important nodes based on our centrality measures.

Clustering and Partitioning

We now turn to the question of finding important groups of nodes.

Why might we want to cluster graph nodes?

Assigning computations to processors in a parallel computer
Segmenting images (finding boundaries between objects)
Clustering words found together in documents, or documents with similar words
Divide and conquer algorithms
Circuit layout in VLSI
Community detection in social networks

Min $s$-$t$ cut

We’ll start with a problem that is fundamental to many other problems in graph analysis. It involves finding the smallest set of edges that, if removed, would disconnect a specified source node (s) from a target node (t) in a graph.

The goal of a min $s$-$t$ cut is to partition the graph into two disjoint subsets such that (s) is in one subset and (t) is in the other, and the total weight (or number) of the edges that need to be removed to achieve this separation is minimized.

Below is a figure depicting a schematic diagram of the railway network of the Western Soviet Union and Eastern European countries.

This is from a historical analysis paper of Alexander Schrijver in Math Programming, 91: 3, 2002. The idea was to determine a minimum cut, which would best halt the flow of materials in the railway network.

For an interesting historical perspective on the min-cut problem and its relation to the Cold War, see “On the history of the transportation and maximum flow problems,” by Alexander Schrijver, in Mathematical Programming 91.3 (2002): 437-445.

Let $G = (V, E)$ be a weighted graph. A weighted graph assigns a weight $w(e)$ to each edge $e\in E$.

An $s$-$t$ cut $C$ of $G$ is a partition of $V$ into $(U, V-U)$ such that $s \in U$ and $t \in V-U$.

The cost of a cut is the total weight of the edges that go between the two parts:

\[ \text{Cost}(C) = \sum_{e(u,v),\, u\in U,\, v\in V-U} w(e). \]

This is a very famous problem that can be solved in time that is polynomial in the number of nodes $|V|$ and edges $|E|$.

Increasingly better solutions have been found over the past 60+ years.

What can a min $s$-$t$ cut tell us about a graph?

Let’s look at the karate club, in which I’ve highlighted the president and the instructor (in red and blue, respectively).

Code

G=nx.karate_club_graph()
np.random.seed(9)
pos = nx.spring_layout(G)
cut_edges = nx.minimum_edge_cut(G, s=0, t=33)
#
fig = plt.figure(figsize=(12,6))
node_color = 34 * ['skyblue']
node_color[0] = 'tomato'
node_color[33] = 'dodgerblue'
nx.draw_networkx(G, pos=pos, 
                 with_labels=True, node_size=1000,
                 node_color = node_color,
                 font_size=16)
plt.axis('off')
plt.show()

As mentioned, when Wayne Zachary studied the club, a conflict arose between the instructor and the president (nodes 0 and 33, respectively).

Zachary predicted the way the club would split based on an $s$-$t$ min cut.

In fact, he correctly predicted every single member’s eventual association except for node 8.

Code

Gcopy = G.copy()
Gcopy.remove_edges_from(cut_edges)
cc = nx.connected_components(Gcopy)
node_set = {node: i for i, s in enumerate(cc) for node in s}
colors = ['dodgerblue', 'tomato']
node_colors = [colors[node_set[v]-1] for v in G.nodes()]
fig = plt.figure(figsize=(12,6))
nx.draw_networkx(G, node_color=node_colors, pos=pos, 
                 with_labels='True', node_size=1000, font_size=16)
plt.axis('off')
plt.show()

Minimum Cuts

In partitioning a graph, we may not have any particular $s$ and $t$ in mind.

Rather, we may want to simply find the minimal way to disconnect the graph.

Clearly, we can do this using an $s$-$t$ min cut, by simply trying all $s$ and $t$ pairs.

Let’s try this approach of finding the minimum $s-t$ cut over all possibilities in the karate club graph.

Code

Gcopy = G.copy()
Gcopy.remove_edges_from(nx.minimum_edge_cut(G))
cc = nx.connected_components(Gcopy)
node_set = {node: i for i, s in enumerate(cc) for node in s}
#
colors = ['tomato', 'dodgerblue']
node_colors = [colors[node_set[v]] for v in G]
fig = plt.figure(figsize=(12,6))
nx.draw_networkx(G, node_color=node_colors, pos=pos, with_labels='True', 
                 node_size=1000, font_size=16)
plt.axis('off')
plt.show()

This is in fact the minimum cut. Node 11 only has one edge to the rest of the graph, so the min cut is 1.

As this example shows, minimum cut is not, in general, a good approach for clustering or partitioning.

To get a more useful partition, we need to define a new goal. The new goal will be to find a balanced cut.

Balanced Cuts

The idea to avoid the problem above is to normalize the cut by the size of the smaller of the two components. The problem above would be avoided because the smaller of the two cuts is just a single node.

This leads us to define the isoperimetric ratio

\[ \alpha = \frac{E(U, V\setminus U)}{\min(|U|, |V\setminus U|)}, \]

and the isoperimetric number of G

\[ \alpha(G) = \min_U \frac{E(U, V\setminus U)}{\min(|U|, |V\setminus U|)}, \]

where $E(U, V\setminus U)$ is the number of edges between the two parts.

The idea is that finding $\alpha(G)$ gives a balanced cut – one that maximizes the number of disconnected nodes per edge removed.

Unfortunately, this is a challenging problem that is not computable in polynomial time.

However, we can make good approximations, which we’ll look at now.

To do so, we’ll introduce spectral graph theory.

Spectral Graph Theory

If you want to study this in more detail, some excellent references are

Allow Me to Introduce Spectral and Isoperimetric Graph Partitioning by Jonathan Shewchuck, which has outstanding visualizations and physical intuition.
Spectral and Algebraic Graph Theory by Daniel Spielman which provides proofs and much more detail.

Spectral graph theory is the use of linear algebra to study the properties of graphs.

To introduce spectral graph theory, we define some terms.

Let $G$ be an undirected graph with $n$ nodes. We define the $n\times n$ matrix $D$ as a diagonal matrix of node degrees, i.e., $D = \text{diag}(d_1, d_2, d_3, \dots)$ where $d_i$ is the degree of node $i$.

Assuming $G$ has adjacency matrix $A$, we define the Laplacian of $G$ as

\[ L = D - A. \]

Below we show the Laplacian matrix $L$ for the karate club network as a heatmap.

Code

L = nx.laplacian_matrix(nx.karate_club_graph()).todense()
plt.figure(figsize = (7, 7))
sns.heatmap(L, cmap = plt.cm.tab20)
plt.axis('equal')
plt.axis('off')
plt.show()

Now let us think about an $n$-component vector $\mathbf{x} \in \mathbb{R}^n$ as an assignment of values to nodes in the graph $G$.

For example, $\mathbf{x}$ could encode node importance or strength or even a more concrete notion like temperature or altitude.

Here is an amazing fact about the Laplacian of $G$.

The quadratic form

\[ \mathbf{x}^TL\mathbf{x} = \sum_{(i,j)\in E} (x_i - x_j)^2, \]

i.e., $\mathbf{x}^TL\mathbf{x}$ is the sum of squared differences of $\mathbf{x}$ over the edges in $G$.

A proof of this is given in the web notes.

To see that

\[ \mathbf{x}^TL\mathbf{x} = \sum_{(i,j)\in E} (x_i - x_j)^2, \]

first consider $\mathbf{x}^TL\mathbf{x}$. Writing out the quadratic form explicitly, we have that

\[ \mathbf{x}^TL\mathbf{x} = \sum_{i, j} L_{ij}x_i x_j. \]

Now, taking into account the values in $L$, we see that in the sum we will have the term $d_i x_i^2$ for each $i$, and also 2 terms of $-x_ix_j$ whenever $(i,j)\in E$.

Turning to

\[\sum_{(i,j)\in E} (x_i - x_j)^2 = \sum_{(i,j)\in E} x_i^2 - 2x_ix_j + x_j^2, \]

we have the same set of terms in the sum.

Now, let’s think about vectors $\mathbf{x}$ that minimize the differences over the edges in the graph.

We can think of these as smooth functions on the graph – neighboring nodes don’t differ too much.

To find such smooth vectors, we solve this optimization problem

\[ \min_{\Vert \mathbf{x}\Vert = 1}\sum_{(i,j)\in E} (x_i - x_j)^2. \]

We constrain $\mathbf{x}$ to have a nonzero norm, otherwise $\mathbf{x} = \mathbf{0}$ would be a trivial solution.

We can express this in terms of the graph Laplacian:

\[ \min_{\Vert \mathbf{x}\Vert = 1}\sum_{(i,j)\in E} (x_i - x_j)^2 = \min_{\Vert \mathbf{x}\Vert = 1} \mathbf{x}^TL\mathbf{x}. \]

From linear algebra, we know that when

\[ \lambda = \min_{\Vert \mathbf{x}\Vert = 1} \mathbf{x}^TL\mathbf{x}, \]

then $\lambda$ is the smallest eigenvalue of $L$ and $\mathbf{x}$ is the corresponding eigenvector.

We are connecting functions on the graph $G$ with eigenvectors of the matrix $L$. This is quite remarkable.

What do we know about $L$?

$L$ is symmetric. Therefore the eigenvectors of $L$ are orthogonal and its eigenvalues are real.
$L$ is positive semidefinite. Therefore the eigenvalues of $L$ are all positive or zero. (For a proof see the notes.)

We can order the eigenvalues from largest to smallest $\lambda_n \geq \dots \geq \lambda_2 \geq \lambda_1 \geq 0.$

How do we know that $L$ is positive semidefinite?

Consider $\sum_{(i,j)\in E} (x_i - x_j)^2.$ This is always a nonnegative quantity.

As a result, $\mathbf{x}^T L\mathbf{x} \geq 0$ for all $\mathbf{x}$.

Assume that $G$ is connected.

Then $L$ has a single eigenvalue of value $\lambda_1 = 0$. The corresponding eigenvector is $\mathbf{w}_1 = \mathbf{1} = [1, 1, 1, \dots]^T$.

It is easy to verify that

\[ L{\mathbf 1}={\mathbf 0}. \]

Recall that row $i$ of $L$ consists of $d_i$ on the diagonal, and $d_i$ -1s in other positions.

The second-smallest eigenvalue of $L$, $\lambda_2$, is called the Fiedler value.

We know that all of the other eigenvectors of $L$ are orthogonal to $\mathbf 1$, because $L$ is symmetric.

As a result, a definition of the second smallest eigenvalue is:

\[ \lambda_2 = \min_{\Vert \mathbf{x}\Vert = 1, \;\mathbf{x}\perp {\mathbf 1}} \mathbf{x}^TL\mathbf{x}. \]

Note that $\mathbf{x} \perp {\mathbf 1}$ implies that $\mathbf{x}$ is mean-centered.

Fiedler Vector

The corresponding eigenvector to $\lambda_2$ is called the Fiedler vector.

It minimizes

\[ \mathbf{w}_2 = \arg \min_{\Vert \mathbf{x}\Vert=1,\;\mathbf{x}\perp {\mathbf 1}} \sum_{(i,j)\in E} (x_i - x_j)^2. \]

If we think of $x_i$ as a 1-D coordinate for node $i$ in the graph, then choosing $\mathbf{x} = \mathbf{w}_2$ (the eigenvector corresponding to $\lambda_2$) puts each node in a position that minimizes the sum of the squared stretching of each edge.

Recall from physics that the energy in a stretched spring is proportional to the square of its stretched length.

If we use the entries in $\mathbf{w}_2$ to position the nodes of the graph along a single dimension, then using the Fiedler vector $\mathbf{w}_2$ for node coordinates is exactly the spring layout of nodes that we discussed in the last lecture – except that it is in one dimension only.

This is the basis for the spectral layout that we showed in the last lecture.

In the spectral layout, we use $\mathbf{w}_2$ for the first dimension, and $\mathbf{w}_3$ for the second dimension.

$\mathbf{w}_3$ is the eigenvector corresponding to

\[ \lambda_3 = \min_{\Vert \mathbf{x}\Vert = 1, \;\mathbf{x}\perp \{\mathbf{1}, \mathbf{w}_2\}} \mathbf{x}^TL\mathbf{x}. \]

Let’s look again at layouts for the football network.

Code

plt.figure(figsize = (12, 6))
ax1 = plt.subplot(121)
nx.draw_networkx(football, ax = ax1,
                 node_size=35, 
                 edge_color='gray', 
                 pos = nx.spectral_layout(football),
                 with_labels=False, alpha=.8, linewidths=2)
plt.axis('off')
plt.title('Title 1 Football -- Spectral Layout', size = 16)
ax2 = plt.subplot(122)
nx.draw_networkx(football, ax = ax2,
                 node_size=35, 
                 edge_color='gray', 
                 pos = nx.spring_layout(football, seed = 0),
                 with_labels=False, alpha=.8, linewidths=2)
plt.axis('off')
plt.title('Title 1 Football -- Spring Layout', size = 16)
plt.show()

What is the difference between the spectral layout and the spring layout?

In one dimension, they are the same, but in multiple dimensions, the spectral layout optimizes each dimension separately.

Spectral Partitioning

This leads to key ideas in node partitioning.

The basic idea is to partition nodes according to the Fiedler vector $\mathbf{w}_2$.

This can be shown to have provably good performance for the balanced cut problem.

See Spectral and Algebraic Graph Theory by Daniel Spielman, Chapter 20, where it is proved that for every $U \subset V$ with $|U| \leq |V|/2$, $\alpha(G) \geq \lambda_2 (1-s)$ where $s = |U|/|V|$. In particular, $\alpha(G) \geq \lambda_2/2.$

There are a number of options for how to split based on the Fiedler vector.

If $\mathbf{w}_2$ is the Fiedler vector, then split nodes according to a value $s$:

bisection: $s$ is the median value in $\mathbf{w}_2$
ratio cut: $s$ is the value that maximizes $\alpha$
sign: separate positive and negative vaues ($s = 0$)
gap: separate according to the largest gap in the values of $\mathbf{w}_2$

Here is a spectral parititioning for the karate club graph.

Code

G = nx.karate_club_graph()
f = nx.fiedler_vector(G)
s = np.zeros(len(f), dtype='int')
s[f > 0] = 1
#
fig = plt.figure(figsize=(12,6))
colors = ['tomato', 'dodgerblue']
np.random.seed(9)
pos = nx.spring_layout(G)
node_colors = [colors[s[v]] for v in G]
nx.draw_networkx(G, pos=pos, node_color=node_colors, with_labels='True',
        node_size=1000, font_size=16)
plt.axis('off')
plt.show()

Interestingly, this is almost the same as the $s$-$t$ min cut based on the president and instructor.

Spectral Clustering

In many cases we would like to move beyond graph partitioning, to allow for clustering nodes into, say, $k$ clusters.

The idea of spectral clustering takes the observations about the Fiedler vector and extends them to more than one dimension.

Let’s look again at the spectral layout of the football dataset.

Here we’ve labelled the nodes according to their conference, which we will think of as ground-truth labels.

Code

import matplotlib.patches as mpatches
import re
cmap = plt.cm.tab20
#
# data from http://www-personal.umich.edu/~mejn/netdata/
football = nx.readwrite.gml.read_gml('data/football.gml')
conf_name = {}
with open('data/football.txt', 'r') as fp:
    for line in fp:
        m = re.match(r'\s*(\d+)\s+=\s*([\w\s-]+)\s*\n', line)
        if m:
            conf_name[int(m.group(1))] = m.group(2)
conf = [d['value'] for i, d in football.nodes.data()]
#
#
plt.figure(figsize = (12, 12))
nx.draw_networkx(football,
                 pos = nx.spectral_layout(football), 
                 node_color = conf,
                 with_labels = False,
                 cmap = cmap)
plt.title('Conference Membership in Football Network')
patches = [mpatches.Patch(color = cmap(i/11), label = conf_name[i]) for i in range(12)]
plt.legend(handles = patches)
plt.axis('off')
plt.show()

Now, the key idea is that using the spectral layout, we have placed nodes into a Euclidean space.

This means we could use a standard clustering algorithm in that space.

Code

plt.figure(figsize = (12, 12))
nx.draw_networkx(football,
                 pos = nx.spectral_layout(football), 
                 node_color = conf,
                 with_labels = False,
                 edgelist = [],
                 cmap = cmap)
plt.title('Conference Membership in Football Network')
patches = [mpatches.Patch(color = cmap(i/11), label = conf_name[i]) for i in range(12)]
plt.tick_params(left=True, bottom=True, labelleft=True, labelbottom=True)
plt.legend(handles = patches)
plt.show()

This plot shows that many clusters are well-separated in this space, but some still overlap.

To address this, we can use additional eigenvectors of the Laplacian, i.e., $\mathbf{w}_4, \mathbf{w}_5, \dots$.

The idea of spectral clustering is:

use enough of the smallest eigenvectors of $L$ to sufficiently “spread out” the nodes,
cluster the nodes in the Euclidean space created by this embedding.

More specifically, given a graph $G$:

Compute $L$, the Laplacian of $G$
Compute the smallest $d$ eigenvectors of $L$, excluding the smallest eigenvector (the ones vector)
Let $U \in \mathbb{R}^{n\times d}$ be the matrix containing the eigenvectors $\mathbf{w}_2, \mathbf{w}_3, \dots, \mathbf{w}_{d+1}$ as columns
Let the position of each node $i$ be the point in $\mathbb{R}^d$ given by row $i$ of $U$
Cluster the points into $k$ clusters using $k$-means

Let’s explore the results of spectral clustering using $d = 2$ dimensions.

Code

# Here is a complete example of spectral clustering
#
# The number of dimensions of spectral layout
k = 2
#
# Obtain the graph
football
#
# Compute the eigenvectors of its Laplacian
L = nx.laplacian_matrix(football).todense()
w, v = np.linalg.eig(L)
v = np.array(v)
# 
# scale each eigenvector by its eigenvalue
X = v @ np.diag(w)
#
# consider the eigenvectors in increasing order of their eigenvalues
w_order = np.argsort(w)
X = X[:, w_order]
#
# run kmeans using k top eigenvectors as coordinates
from sklearn.cluster import KMeans
kmeans = KMeans(init='k-means++', n_clusters=11 , n_init=10)
kmeans.fit_predict(X[:, 1:(k+1)])
centroids = kmeans.cluster_centers_
labels = kmeans.labels_
error = kmeans.inertia_
#
# visualize the result
plt.figure(figsize = (14, 7))
ax1 = plt.subplot(121)
nx.draw_networkx(football,
                 ax = ax1,
                 pos = nx.spectral_layout(football), 
                 node_color = labels,
                 with_labels = False,
                 edgelist = [],
                 cmap = cmap,
                 node_size = 100)
plt.tick_params(left=True, bottom=True, labelleft=True, labelbottom=True)
plt.title(f'Spectral Clustering, 11 Clusters, dimension = {k}')
ax2 = plt.subplot(122)
nx.draw_networkx(football,
                 ax = ax2,
                 pos = nx.spectral_layout(football), 
                 node_color = conf,
                 with_labels = False,
                 edgelist = [],
                 cmap = cmap,
                 node_size = 100)
plt.title('Conference Membership in Football Network')
patches = [mpatches.Patch(color = cmap(i/11), label = conf_name[i]) for i in range(12)]
plt.tick_params(left=True, bottom=True, labelleft=True, labelbottom=True)
plt.legend(handles = patches, loc='center left', bbox_to_anchor=(1, 0.5))
plt.show()

This is pretty good, but we can see that in some cases the clustering is not able to separate clusters that overlap in the visualization.

Which makes sense, as for the case $d = 2$, we are running $k$-means on the points just as we see them in the visualization.

Let’s try $d = 3$. Now there will be another dimension available to the clustering, which we can’t see in the visualization.

Code

# Here is a complete example of spectral clustering
#
# The number of dimensions of spectral layout
k = 3
#
#
# Compute the eigenvectors of its Laplacian
L = nx.laplacian_matrix(football).todense()
w, v = np.linalg.eig(L)
v = np.array(v)
# 
# scale each eigenvector by its eigenvalue
X = v @ np.diag(w)
#
# consider the eigenvectors in increasing order of their eigenvalues
w_order = np.argsort(w)
X = X[:, w_order]
#
# run kmeans using k top eigenvectors as coordinates
from sklearn.cluster import KMeans
kmeans = KMeans(init='k-means++', n_clusters=11 , n_init=10)
kmeans.fit_predict(X[:, 1:(k+1)])
centroids = kmeans.cluster_centers_
labels = kmeans.labels_
error = kmeans.inertia_
#
# visualize the result
plt.figure(figsize = (14, 7))
ax1 = plt.subplot(121)
nx.draw_networkx(football,
                 ax = ax1,
                 pos = nx.spectral_layout(football), 
                 node_color = labels,
                 with_labels = False,
                 edgelist = [],
                 cmap = cmap,
                 node_size = 100)
plt.tick_params(left=True, bottom=True, labelleft=True, labelbottom=True)
plt.title(f'Spectral Clustering, 11 Clusters, dimension = {k}')
ax2 = plt.subplot(122)
nx.draw_networkx(football,
                 ax = ax2,
                 pos = nx.spectral_layout(football), 
                 node_color = conf,
                 with_labels = False,
                 edgelist = [],
                 cmap = cmap,
                 node_size = 100)
plt.title('Conference Membership in Football Network')
patches = [mpatches.Patch(color = cmap(i/11), label = conf_name[i]) for i in range(12)]
plt.tick_params(left=True, bottom=True, labelleft=True, labelbottom=True)
plt.legend(handles = patches, loc='center left', bbox_to_anchor=(1, 0.5))
plt.show()

We can see visually that using 3 dimensions is giving us a better clustering than 2 dimensions.

What happens as we increase the dimension further?

To evaluate this question we can use Adjusted Rand Index.

ARI Spectral Clustering

Code

import sklearn.metrics as metrics
#
# Compute the eigenvectors of its Laplacian
L = nx.laplacian_matrix(football).todense()
w, v = np.linalg.eig(L)
v = np.array(v)
# 
# scale each eigenvector by its eigenvalue
X = v @ np.diag(w)
#
# consider the eigenvectors in increasing order of their eigenvalues
w_order = np.argsort(w)
X = X[:, w_order]
#
max_dimension = 15
ri = np.zeros(max_dimension - 1)
for k in range(1, max_dimension):
    # run kmeans using k top eigenvectors as coordinates
    kmeans = KMeans(init='k-means++', n_clusters = 11, n_init = 10)
    kmeans.fit_predict(X[:, 1:(k+1)])
    ri[k - 1] = metrics.adjusted_rand_score(kmeans.labels_, conf)
#
plt.figure(figsize = (8, 6))
plt.plot(range(1, max_dimension), ri, 'o-')
plt.xlabel('Number of Dimensions (Eigenvectors)', size = 14)
plt.title('Spectral Clustering of Football Network Compared to Known Labels', size = 16)
plt.ylabel('Adjusted Rand Index', size = 14)
plt.show()

Based on this plot, it appears that the football graph can be best described as approximately six-dimensional.

When we embed it in six dimensions and cluster there we get an extremely high Adjusted Rand Index.

Code

print(f'Maximum ARI is {np.max(ri):0.3f}, using {1 + np.argmax(ri)} dimensions for spectral embedding.')

Maximum ARI is 0.868, using 6 dimensions for spectral embedding.

Recap

Networks are present in many interesting applications.

We have covered the following topics

Graph representations
Cluster coefficients
Centrality measures
Clustering

In addition we used the NewtorkX package to visualize our networks in the following formats

circular
spring
spectral

Overview

Motivation

Centrality and Clustering

Centrality

Definition

Karate Club Graph

Closeness Centrality

Betweenness Centrality

Adjacency Matrices

Eigenvector Centrality

Geometric Intuition

Key Points on Eigenvector Centrality

Centrality Comparison: Karate Example

Clustering and Partitioning

Min \(s\)-\(t\) cut

Minimum Cuts

Balanced Cuts

Spectral Graph Theory

Fiedler Vector

Spectral Partitioning

Spectral Clustering

ARI Spectral Clustering

Recap