Link Prediction - Predict edges in a network using Networkx


Link prediction is a key idea in the field of network analysis. It involves foreseeing the potential for a link to establish between network nodes. A potent tool for network analysis, including link prediction tasks, is the NetworkX module for Python. This thorough tutorial will walk you through using NetworkX to anticipate links, replete with succinct and understandable examples.

Introduction to Link Prediction

Nodes in a network or graph represent entities, while edges or linkages between these nodes reflect their relationships. Link prediction uses the network's current topology to forecast potential links between nodes. Link prediction has various uses, such as predicting social relationships in social networks and researcher collaboration in networks of co-authorship.

Getting Started with NetworkX

Make sure the NetworkX library is installed before moving on to link prediction. If not, use pip to install it −

pip install networkx

We also need to install the numpy and pandas libraries in order to do link prediction:

pip install numpy pandas

Basics of Network Creation in NetworkX

Let's begin by establishing a fundamental network in NetworkX −

import networkx as nx

# Create an empty graph
G = nx.Graph()

# Add nodes
G.add_node(1)
G.add_node(2)
G.add_node(3)

# Add edges
G.add_edge(1, 2)
G.add_edge(1, 3)

# Draw the graph
nx.draw(G, with_labels=True)

There are only three nodes and two edges in this straightforward network.

Link Prediction in NetworkX

For doing link prediction, NetworkX offers a number of functions. They are founded on various methods and theories.

Example 1: Common Neighbors

Using common neighbours is a simple link prediction method. It implies that if two nodes have a lot of shared neighbours, they are more likely to create a link.

# Create a graph
G = nx.complete_graph(5)
G.remove_edge(1, 3)

# Perform link prediction
preds = nx.common_neighbors(G, 1, 3)
print(len(list(preds)))  # Output: 3

Here, we add five nodes and eliminate one edge to construct a full graph (a network in which every pair of nodes is connected by a direct edge). On the basis of the quantity of shared neighbours, we then forecast this missing link.

Example 2: Jaccard Coefficient

The number of shared neighbours is divided by the total number of neighbours in order to calculate the Jaccard coefficient, which assesses the likelihood of an edge.

# Create a graph
G = nx.complete_graph(5)
G.remove_edge(1, 3)

# Perform link prediction
preds = nx.jaccard_coefficient(G, [(1, 3)])
for u, v, p in preds:
   print(f'({u}, {v}) -> {p}')  # Output: (1, 3) -> 0.6

In this illustration, the deleted edge's Jaccard coefficient is computed.

Example 3: Preferential Attachment

According to the theory of preferential attachment, nodes with a high degree (more connections) are more likely to connect in the future.

# Create a graph
G = nx.complete_graph(5)
G.remove_edge(1, 3)

# Perform link prediction
preds = nx.preferential_attachment(G, [(1, 3)])
for u, v, p in preds:
   print(f'({u}, {v}) -> {p}')  # Output: (1, 3) -> 12

The preferential attachment score for the deleted edge is calculated in this example.

Example 4: Adamic/Adar Index

Similar to common neighbours, the Adamic/Adar index places less emphasis on nodes with a high degree.

# Create a graph
G = nx.complete_graph(5)
G.remove_edge(1, 3)

# Perform link prediction
preds = nx.adamic_adar_index(G, [(1, 3)])
for u, v, p in preds:
   print(f'({u}, {v}) -> {p}')  # Output: (1, 3) -> 1.8204784532536746

In this instance, a decimal value known as the Adamic/Adar index—which measures the eliminated edge—is calculated.

Example 5: Resource Allocation Index

Another metric that creates a score based on the shared neighbours of two nodes is the resource allocation index.

# Create a graph
G = nx.complete_graph(5)
G.remove_edge(1, 3)

# Perform link prediction
preds = nx.resource_allocation_index(G, [(1, 3)])
for u, v, p in preds:
   print(f'({u}, {v}) -> {p}')  # Output: (1, 3) -> 0.6666666666666666

Here, the removed edge's resource allocation index is calculated.

Conclusion

Link prediction is an interesting subject with many of real-world uses. The NetworkX package in Python provides a number of link prediction techniques, each with advantages and disadvantages. As always, it's essential to comprehend the underlying ideas and techniques, try out many approaches, and select the one that works best for your unique use case.

We've taken a hands-on approach to comprehending link prediction in NetworkX in this guide. The fundamental ideas have been discussed, and a variety of link prediction techniques have been illustrated with simple instances. But this is only the very tip of the iceberg. The possibilities for network research and connection prediction open up as you delve deeper into NetworkX.

Updated on: 18-Jul-2023

264 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements