
- Graph Theory - Home
- Graph Theory - Introduction
- Graph Theory - History
- Graph Theory - Fundamentals
- Graph Theory - Applications
- Types of Graphs
- Graph Theory - Types of Graphs
- Graph Theory - Simple Graphs
- Graph Theory - Multi-graphs
- Graph Theory - Directed Graphs
- Graph Theory - Weighted Graphs
- Graph Theory - Bipartite Graphs
- Graph Theory - Complete Graphs
- Graph Theory - Subgraphs
- Graph Theory - Trees
- Graph Theory - Forests
- Graph Theory - Planar Graphs
- Graph Theory - Hypergraphs
- Graph Theory - Infinite Graphs
- Graph Theory - Random Graphs
- Graph Representation
- Graph Theory - Graph Representation
- Graph Theory - Adjacency Matrix
- Graph Theory - Adjacency List
- Graph Theory - Incidence Matrix
- Graph Theory - Edge List
- Graph Theory - Compact Representation
- Graph Theory - Incidence Structure
- Graph Theory - Matrix-Tree Theorem
- Graph Properties
- Graph Theory - Basic Properties
- Graph Theory - Coverings
- Graph Theory - Matchings
- Graph Theory - Independent Sets
- Graph Theory - Traversability
- Graph Theory Connectivity
- Graph Theory - Connectivity
- Graph Theory - Vertex Connectivity
- Graph Theory - Edge Connectivity
- Graph Theory - k-Connected Graphs
- Graph Theory - 2-Vertex-Connected Graphs
- Graph Theory - 2-Edge-Connected Graphs
- Graph Theory - Strongly Connected Graphs
- Graph Theory - Weakly Connected Graphs
- Graph Theory - Connectivity in Planar Graphs
- Graph Theory - Connectivity in Dynamic Graphs
- Special Graphs
- Graph Theory - Regular Graphs
- Graph Theory - Complete Bipartite Graphs
- Graph Theory - Chordal Graphs
- Graph Theory - Line Graphs
- Graph Theory - Complement Graphs
- Graph Theory - Graph Products
- Graph Theory - Petersen Graph
- Graph Theory - Cayley Graphs
- Graph Theory - De Bruijn Graphs
- Graph Algorithms
- Graph Theory - Graph Algorithms
- Graph Theory - Breadth-First Search
- Graph Theory - Depth-First Search (DFS)
- Graph Theory - Dijkstra's Algorithm
- Graph Theory - Bellman-Ford Algorithm
- Graph Theory - Floyd-Warshall Algorithm
- Graph Theory - Johnson's Algorithm
- Graph Theory - A* Search Algorithm
- Graph Theory - Kruskal's Algorithm
- Graph Theory - Prim's Algorithm
- Graph Theory - Borůvka's Algorithm
- Graph Theory - Ford-Fulkerson Algorithm
- Graph Theory - Edmonds-Karp Algorithm
- Graph Theory - Push-Relabel Algorithm
- Graph Theory - Dinic's Algorithm
- Graph Theory - Hopcroft-Karp Algorithm
- Graph Theory - Tarjan's Algorithm
- Graph Theory - Kosaraju's Algorithm
- Graph Theory - Karger's Algorithm
- Graph Coloring
- Graph Theory - Coloring
- Graph Theory - Edge Coloring
- Graph Theory - Total Coloring
- Graph Theory - Greedy Coloring
- Graph Theory - Four Color Theorem
- Graph Theory - Coloring Bipartite Graphs
- Graph Theory - List Coloring
- Advanced Topics of Graph Theory
- Graph Theory - Chromatic Number
- Graph Theory - Chromatic Polynomial
- Graph Theory - Graph Labeling
- Graph Theory - Planarity & Kuratowski's Theorem
- Graph Theory - Planarity Testing Algorithms
- Graph Theory - Graph Embedding
- Graph Theory - Graph Minors
- Graph Theory - Isomorphism
- Spectral Graph Theory
- Graph Theory - Graph Laplacians
- Graph Theory - Cheeger's Inequality
- Graph Theory - Graph Clustering
- Graph Theory - Graph Partitioning
- Graph Theory - Tree Decomposition
- Graph Theory - Treewidth
- Graph Theory - Branchwidth
- Graph Theory - Graph Drawings
- Graph Theory - Force-Directed Methods
- Graph Theory - Layered Graph Drawing
- Graph Theory - Orthogonal Graph Drawing
- Graph Theory - Examples
- Computational Complexity of Graph
- Graph Theory - Time Complexity
- Graph Theory - Space Complexity
- Graph Theory - NP-Complete Problems
- Graph Theory - Approximation Algorithms
- Graph Theory - Parallel & Distributed Algorithms
- Graph Theory - Algorithm Optimization
- Graphs in Computer Science
- Graph Theory - Data Structures for Graphs
- Graph Theory - Graph Implementations
- Graph Theory - Graph Databases
- Graph Theory - Query Languages
- Graph Algorithms in Machine Learning
- Graph Neural Networks
- Graph Theory - Link Prediction
- Graph-Based Clustering
- Graph Theory - PageRank Algorithm
- Graph Theory - HITS Algorithm
- Graph Theory - Social Network Analysis
- Graph Theory - Centrality Measures
- Graph Theory - Community Detection
- Graph Theory - Influence Maximization
- Graph Theory - Graph Compression
- Graph Theory Real-World Applications
- Graph Theory - Network Routing
- Graph Theory - Traffic Flow
- Graph Theory - Web Crawling Data Structures
- Graph Theory - Computer Vision
- Graph Theory - Recommendation Systems
- Graph Theory - Biological Networks
- Graph Theory - Social Networks
- Graph Theory - Smart Grids
- Graph Theory - Telecommunications
- Graph Theory - Knowledge Graphs
- Graph Theory - Game Theory
- Graph Theory - Urban Planning
- Graph Theory Useful Resources
- Graph Theory - Quick Guide
- Graph Theory - Useful Resources
- Graph Theory - Discussion
Graph-Based Clustering
Graph-Based Clustering
Graph clustering is used to partition a graph into meaningful subgroups, ensuring that nodes within the same cluster are highly connected, while nodes in different clusters have fewer connections.
The goal is to detect natural divisions or communities within the graph, revealing hidden patterns and relationships.
In this tutorial, we will explore the fundamental concepts, algorithms, and real-world applications of graph-based clustering.
Why Use Graph-Based Clustering?
Graph-based clustering is helpful when data is naturally connected. Some main benefits are −
- Understands Structural Relationships: Unlike traditional clustering, graph clustering considers both node attributes and edge connections.
- Flexibility: It can be applied to weighted, directed, and dynamic graphs.
- Handles Large Networks: Efficient algorithms exist for large-scale networks.
- Easy to Interpret: The clusters often correspond to meaningful real-world communities.
Types of Graph Clustering
Graph-based clustering methods can be divided into the following types −
- Community Detection: Groups nodes that are strongly connected.
- Spectral Clustering: Using eigenvalues of graph Laplacian matrices to identify clusters.
- Density-Based Clustering: Finding clusters based on node density in the graph.
- Hierarchical Clustering: Constructing a hierarchy of clusters.
Common Graph Clustering Algorithms
There are many commonly used algorithms for grouping nodes in a graph such as −
- Girvan-Newman Algorithm
- Spectral Clustering
- Louvain Algorithm
- Markov Clustering (MCL)
Girvan-Newman Algorithm
The Girvan-Newman algorithm finds communities by iteratively removing the edges that connect the highest number of nodes, causing the graph to split into smaller groups i.e. clusters.
Steps for the Girvan-Newman Algorithm:
- Compute the betweenness centrality for all edges.
- Remove the edge with the highest betweenness centrality.
- Repeat until the graph is splitted into desired clusters.
Example
The following example demonstrates how to implement the Girvan-Newman algorithm in Python using NetworkX library. It loads a sample graph, the Karate Club graph, and applies the girvan_newman() function to detect communities −
import networkx as nx from networkx.algorithms.community import girvan_newman # Load a sample graph G = nx.karate_club_graph() comp = girvan_newman(G) top_level_communities = next(comp) print(top_level_communities)
The algorithm's output is the top-level communities, which are printed after the first iteration −
({0, 1, 3, 4, 5, 6, 7, 10, 11, 12, 13, 16, 17, 19, 21}, {2, 8, 9, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33})
Spectral Clustering
Spectral clustering is a technique that uses the eigenvalues of the graph Laplacian matrix to find clusters in a graph. It works by first transforming the graph's adjacency matrix into a Laplacian matrix, which captures the structure of the graph.
Then, dimensionality reduction is performed on this matrix to project the nodes into a lower-dimensional space.
Finally, clustering techniques like k-means are applied to the transformed data to group the nodes into clusters. This method is effective for finding clusters in non-convex shapes or graphs that are difficult to separate using traditional clustering algorithms.
Steps for Spectral Clustering:
- Compute the graph Laplacian.
- Extract the top k eigenvectors.
- Apply k-means clustering to the eigenvectors.
Example
This example demonstrates how to implement spectral clustering using Python. It computes the Laplacian matrix of the graph, performs spectral embedding to reduce the graph's dimensions, and then applies k-means clustering to group the nodes into two clusters −
import numpy as np import networkx as nx from sklearn.cluster import KMeans from scipy.sparse.linalg import eigsh # Define a graph (e.g., using the Karate Club graph) G = nx.karate_club_graph() # Remove isolated nodes G.remove_nodes_from(list(nx.isolates(G))) # Ensure the graph is connected if not nx.is_connected(G): print("Warning: The graph is disconnected.") components = list(nx.connected_components(G)) print(f"Found {len(components)} connected components.") # Select the largest connected component for further analysis largest_component = max(components, key=len) G = G.subgraph(largest_component) else: print("The graph is connected.") # Compute the Laplacian matrix and convert it to float for numerical stability L = nx.laplacian_matrix(G).toarray().astype(np.float64) # Check for NaN or infinite values in the Laplacian matrix print("Laplacian Matrix:") print(L) if np.any(np.isnan(L)) or np.any(np.isinf(L)): print("The Laplacian matrix contains NaN or infinite values.") else: print("The Laplacian matrix is clean.") # Add a small epsilon to diagonal for numerical stability epsilon = 1e-6 L += np.eye(L.shape[0]) * epsilon # Perform spectral clustering using eigenvalue decomposition (Laplacian eigenmap) try: # We will calculate the first 'k' eigenvectors for the embedding k = 2 # Number of clusters (2 in this case) eigenvalues, eigenvectors = eigsh(L, k=k, which='SM') # Normalize the eigenvectors row-wise to form the embedding embedding = eigenvectors / np.linalg.norm(eigenvectors, axis=1)[:, None] except ValueError as e: print(f"Error in eigenvalue computation: {e}") else: # Apply k-means clustering to the embedded data kmeans = KMeans(n_clusters=2).fit(embedding) labels = kmeans.labels_ # Print the cluster labels for each node print("Cluster labels:") print(labels)
The output is the cluster labels for each node in the graph, which indicate how the nodes are grouped into clusters −
The graph is connected. Laplacian Matrix: [[42. -4. -5. ... -2. 0. 0.] [-4. 29. -6. ... 0. 0. 0.] [-5. -6. 33. ... 0. -2. 0.] ... [-2. 0. 0. ... 21. -4. -4.] [ 0. 0. -2. ... -4. 38. -5.] [ 0. 0. 0. ... -4. -5. 48.]] The Laplacian matrix is clean. Cluster labels: [0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1]
Louvain Algorithm
The Louvain algorithm is a community detection method that focuses on optimizing a measure called modularity. Modularity is a value that quantifies the strength of divisions in a network, specifically how well nodes are grouped into communities.
The Louvain algorithm works by iteratively merging smaller communities to form larger ones in a way that increases modularity. This approach is highly efficient and can be applied to large graphs, making it popular for detecting communities in networks like social media, biology, and transportation.
Steps for Louvain Algorithm:
- Assign each node to its own community.
- Merge communities to maximize modularity.
- Repeat until no improvement is possible.
Example
This code uses the Louvain algorithm from the community package to detect communities in a graph G. It returns a dictionary where each node is assigned to a specific community, and then prints the partition (community assignments) −
import networkx as nx import community as community_louvain import matplotlib.pyplot as plt # Create a graph (using Karate Club graph as an example) G = nx.karate_club_graph() # Apply Louvain algorithm to detect communities partition = community_louvain.best_partition(G) # Print the community each node belongs to print(partition) # Visualize the graph with the communities pos = nx.spring_layout(G) plt.figure(figsize=(8, 8)) # Draw the graph with node colors corresponding to their communities nx.draw_networkx_nodes(G, pos, partition.keys(), node_size=700, cmap=plt.cm.jet, node_color=list(partition.values())) nx.draw_networkx_edges(G, pos, alpha=0.5) nx.draw_networkx_labels(G, pos, font_size=10) plt.title("Louvain Community Detection") plt.show()
Following is the output obtained along with the graph −
{0: 0, 1: 0, 2: 0, 3: 0, 4: 1, 5: 1, 6: 1, 7: 0, 8: 3, 9: 3, 10: 1, 11: 0, 12: 0, 13: 0, 14: 3, 15: 3, 16: 1, 17: 0, 18: 3, 19: 0, 20: 3, 21: 0, 22: 3, 23: 2, 24: 2, 25: 2, 26: 3, 27: 2, 28: 2, 29: 3, 30: 3, 31: 2, 32: 3, 33: 3}

Markov Clustering (MCL)
MCL (Markov Clustering) is an algorithm that simulates random walks on a graph to identify densely connected clusters. It works by iteratively expanding and contracting the graph using matrix operations.
Initially, it treats each node as a separate cluster. The algorithm then simulates random walks between nodes, using a process of multiplication and inflation to emphasize strongly connected clusters while ignoring weaker connections.
The result is a partition of the graph into clusters where nodes within the same cluster are more strongly connected to each other compared to nodes in different clusters.
Steps for Markov Clustering:
- Expand: Compute random walks.
- Inflate: Strengthen intra-cluster connections.
- Repeat until convergence.
Applications of Graph-Based Clustering
Graph clustering is commonly used in various domains, such as −
- Social Networks: Detecting user communities and recommending connections.
- Biological Networks: Identifying functional modules in protein interaction networks.
- Fraud Detection: Finding suspicious groups in financial transactions.
- Document Clustering: Organizing text data into topic-based clusters.
Evaluating Graph Clustering Performance
To check how well clustering algorithms perform, we use different evaluation metrics:
- Modularity: Shows how well the graph is divided into communities, with higher values indicating better division.
- Normalized Mutual Information (NMI): Compares the similarity between the predicted clusters and the true clusters, with higher values meaning better matching.
- Silhouette Score: Measures how similar a node is to its own cluster compared to other clusters, with higher scores indicating better clustering.