What is a Distance Function?

Data MiningDatabaseData Structure

Distance is the method the MBR computes similarity. For some true distance metric, the distance from point A to point B, indicated by d(A,B), has four features which are as follows −

  • Well-defined − The distance among two points is continually defined and is a nonnegative real number, d (A,B) ≥ 0.

  • Identity − The distance from one point to itself is continually zero, so d (A, A) = 0.

  • Commutativity − Direction does not create a difference, therefore the distance from A to B is the similar as the distance from B to A: d(A,B) = d(B,A). This feature precludes one-way roads, for instance.

  • Triangle Inequality − It can be visiting an intermediate point C on the method from A to B never compress the distance, therefore d (A,B) ≥ d(A,C) + d(C,B).

For MBR, the points are certainly data in a database. This description of distance is the basis for computing similarity, but MBR operates pretty well when some of these constraints are composed a bit.

For example, the distance function in the news story definition case study was not changeable such that the distance from a news story A to another B was not continually the same as the distance from B to A. But the similarity measure was beneficial for classification purposes.

The fact that distance is welldefined implies that each data has a neighbor somewhere in the database and MBR required neighbors to work. The identity property creates distance conform to the intuitive concept that the most similar data to a given data is the initial record itself.

Commutativity and the Triangle Inequality create the nearest neighbors local and well-defined. It is inserting a new data into the database will not bring an existing record some closer. Similarity is a matter reserved for only two data at a time. Although the distance measure can discover nearest neighbors is well-defined, the set of nearest neighbors can have some peculiar features.

The set of neighbors depends on how the data distance function combines the area distance functions. In fact, the second nearest neighbor utilizing the summation function is the furthest neighbor using the Euclidean etc. It is compared to the summation or normalized metric, the Euclidean metric influence to favor neighbors where all the areas are associatively close.

The summation, Euclidean, and normalized functions can also include weights so each area contributes a different amount to the data distance function. MBR generally makes good results when some weights are same to 1. However, sometimes weights can be used to include a priori knowledge, including a specific field suspected of having a huge effect on the classification.

Updated on 15-Feb-2022 06:46:37