Graph Theory - Compact Representation



Compact Representation

Compact Representation in the context of graphs refers to methods of representing graph data in a way that minimizes space usage while retaining all necessary information about the graph. This is useful for large graphs where traditional representations like adjacency matrices or lists might be too memory-intensive.

Some of the important characteristics of compact representation are as follows −

  • Efficient Storage: Saves memory by avoiding duplicate information and storing only what's needed. It uses methods like run-length encoding, adjacency list compression, or compact data structures to achieve this.
  • Scalability: Handles larger graphs well, allowing us to work with big networks without using too much memory.
  • Maintains Graph Properties: Keeps all the important properties and relationships in the graph intact, so we can still perform operations and run algorithms effectively.

Types of Compact Representations

There are different ways to represent graphs in a compact form, and each method is suited for certain types of graphs and uses. Below are some commonly used compact representations −

Adjacency Array

An adjacency array, also called Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC), is a compact way to store the adjacency list of a graph. This method uses three arrays: the vertex array, the edge array, and an optional value array for weighted graphs.

The adjacency array consists of the following components −

  • Vertex Array: Stores the indices in the edge array where the edges of each vertex start.
  • Edge Array: Stores the adjacent vertices for each vertex.
  • Value Array (optional): Stores the weights of the edges, if applicable.

Adjacency Array Calculation

Consider a graph with vertices A, B, and C, and edges A-B, A-C, B-C. Here's how the adjacency array representation is calculated −

Step 1: Assign Indexes to Vertices:

  • A = 0
  • B = 1
  • C = 2

Step 2: List the Edges Using Vertex Indexes:

  • A-B: (0, 1)
  • A-C: (0, 2)
  • B-C: (1, 2)

Step 3: Create the Vertex Array:

This array keeps track of where each vertex's edges start in the edge array. For each vertex, count the total number of edges and note the starting index for its edges in the edge array.

  • Vertex A (0): A has 2 edges (A-B and A-C).
  • Vertex B (1): B has 1 edge (B-C).
  • Vertex C (2): C has no outgoing edges listed.

The Vertex Array is constructed as follows:

  • Start of A's edges: 0
  • Start of B's edges: 2 (since A has 2 edges)
  • Start of C's edges: 3 (since B has 1 edge and starts after A's edges)

Thus, the Vertex Array is: [0, 2, 3].

Step 4: Create the Edge Array:

This array lists the vertices connected by each vertex in order.

  • For vertex A (0): the edges are to B (1) and C (2).
  • For vertex B (1): the edge is to C (2).

Thus, the Edge Array is: [1, 2, 2].

Therefore, the adjacency array representation for the given graph is −

Vertex Array: [0, 2, 3]
Edge Array:   [1, 2, 2]

The Vertex Array [0, 2, 3] indicates the starting indices in the Edge Array for the edges of vertices A, B, and C, respectively.

The Edge Array [1, 2, 2] lists the destination vertices for each edge in the graph.

Edge Array

An edge array, also known as an edge list, is a compact way to represent the edges of a graph. Each edge is represented as a pair of vertices, and for weighted graphs, an additional value can be included to represent the edge weight.

The edge array is a simple list of pairs (or tuples) representing the edges −

  • Edges: Each edge is represented as a pair of vertices (u, v).
  • Weights (optional): Each edge may include a weight value.

Consider a graph with vertices A, B, and C, and edges A-B, A-C, B-C. The edge array representation would be −

Edge Array: [(A, B), (A, C), (B, C)]

Compressed Graph Formats

Compressed graph formats use advanced techniques to reduce the amount of storage space needed for representing graphs. These methods are useful when working with very large graphs, such as those found in web pages or social networks, where traditional storage methods might be inefficient.

Two examples of compressed graph formats are k2-trees and WebGraph frameworks.

k2-trees

The k2-trees are a compact data structure used to represent sparse binary matrices. They are effective for representing adjacency matrices of large, sparse graphs.

A k2-tree works by recursively breaking down the adjacency matrix of a graph into smaller blocks, specifically into kxk submatrices. This process is repeated hierarchically, with each level of the tree focusing on progressively smaller and more manageable submatrices.

Here is an example of how a k2-tree can represent a simple graph adjacency matrix −

[10, 00, 10, 01, 00, 00, 00, 01]

K2-Tree Structure:
Top Left:
  Top Left:
    1
  Top Right:
    0
  Bottom Left:
    0
  Bottom Right:
    0
Top Right:
  Top Left:
    1
  Top Right:
    0
  Bottom Left:
    0
  Bottom Right:
    0
Bottom Left:
  0
Bottom Right:
  1

WebGraph Framework

The WebGraph Framework is a set of algorithms and data structures specifically designed to compress web graphs. These web graphs, such as the structure of links between web pages, can be enormous in size.

The WebGraph framework achieves high compression ratios by taking advantage of patterns and redundancies found in the structure of web graphs.

By focusing on common patterns that appear in web graphs, such as frequently linked pages or clusters of pages, the framework significantly reduces the amount of storage space required. This makes it possible to efficiently store and process large web graphs, even when they contain millions or billions of pages and links.

Adjacency List Compression

Adjacency list compression techniques reduce the space required to store adjacency lists by using delta encoding, variable-length coding, and other compression methods.

  • Delta encoding stores the differences between consecutive vertices rather than the vertices themselves. This is effective when the vertex IDs are ordered.
  • Variable-length coding assigns shorter codes to more frequent items, reducing the overall space required for storage.

Applications of Compact Representations

Compact graph representations are used in various applications where space efficiency and fast access to graph data are important. Some of the key applications are as follows −

  • Web Graphs: The World Wide Web can be represented as a large graph, with web pages as vertices and hyperlinks as edges. Compact representations allow efficient storage and analysis of web graphs.
  • Social Networks: Social networks, with users as vertices and relationships as edges, benefit from compact representations to handle large amounts of data.
  • Biological Networks: Biological networks, such as protein-protein interaction networks, require efficient storage to facilitate large-scale analyses.
  • Geographical Information Systems (GIS): GIS applications involve large graphs representing geographical features and their relationships. Compact representations improve storage efficiency and query performance.
Advertisements