How can we measure the similarity or distance between two vertices in a graph?

There are two types of measures such as geodesic distance and distance based on random walk.

Geodesic Distance − A simple measure of the distance among two vertices in a graph is the shortest route among the vertices. Usually, the geodesic distance among two vertices is the length in terms of the multiple edges of the shortest path among the vertices. For two vertices that are not linked in a graph, the geodesic distance is represented as infinite.

By utilizing geodesic distance, it can represent various useful measurements for graph analysis and clustering. Given a graph G = (V, E), where V is the set of vertices and E is the set of edges, it can represent the following −

  • For a vertext v ∈ V, the eccentricity of v, indicated eccen(v), is the highest geodesic distance between v and several vertex u ∈ V − {v}. The eccentricity of v captures how far away v is from its endmost vertex in the graph.

  • The radius of graph G is the minimum eccentricity of all vertices.

  • That is, r = min eccen(v)

    v ∈ V

    The radius captures the distance among the “most central point” and the “furthest border” of the graph.

  • The diameter of graph G is the maximum eccentricity of all vertices.

  • That is, d = max eccen(v)

    v ∈ V

    The diameter defines the highest distance between some pair of vertices.

  • A peripheral vertex is a vertex that produce the diameter.

SimRank − Similarity Based on Random Walk and Structural Context − In various applications, geodesic distance can be inappropriate in computing the similarity among vertices in a graph. In SimRank, a similarity measure depends on random walk and on the fundamental framework of the graph. In mathematics, a random walk is a trajectory that includes taking successive random process.

There are two methods to represent similarity which are as follows −

  • Two users are treated same to one another if they have same neighbors in the social web. This heuristic is perceptive because two persons receiving recommendations from a large number of common friends create same decisions. This type of similarity depends on the local structure (i.e., the neighborhoods) of the vertices, and it is known as structural context–based similarity.

  • Suppose AllElectronics sends promotional data to both Ada and Bob in the social web. Ada and Bob can randomly forward such data to their friends (or neighbors) in the network. The closeness among Ada and Bob can be computed by the likelihood that different users at the same time receive the promotional data that was initially sent to Ada and Bob. This type of similarity depends on the random walk reachability over the web, and therefore is defined as similarity based on random walk.