- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP

- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who

# What are the applications of Similarity Measures?

Similarity measures provide the framework on which some data mining decisions are based. Tasks including classification and clustering generally consider the existence of some similarity measure, while fields with poor techniques to evaluate similarity often find that searching information is a cumbersome function.

There are several applications of similarity measures are as follows −

**Information Retrieval** − The goal of information retrieval (IR) systems is to meet user’s needs. In another terms, a need is generally manifested in the form of a short textual query introduced in the text box of some search engine online. IR systems generally do not directly answer a query, instead, they present a ranked list of records that are judged relevant to that query by some similarity measure.

Because similarity measures have the effect of clustering and classifying information concerning a query, users will commonly find new interpretations of their information need that may or may not be useful to them when reformulating their query.

In the case when the query is a record from the initial set, similarity measures can be used to cluster and classify records within a collection. In short, similarity measures can insert a rudimentary architecture to a previously unstructured sets.

## Motivation

Similarity measures utilized in IR systems can distort one’s perception of the whole data set. For example, if a user types a query into a search engine and does not find a satisfactory answer in the top ten returned web pages, then it will usually try to reformulate this query once or twice.

## Classic Similarity Measures

A similarity measure is defined as a mapping from a pair of tuples of size k to a scalar number. By convention, all similarity measures must map to the range [-1, 1] or [0, 1], where a similarity score of 1 denotes maximum similarity. Similarity measure should exhibit the features that their value will increase as the several properties in the two items being compared increases.

## Dice

The dice coefficient is a generalization of the harmonic mean of the precision and recall measures. A system with a high harmonic mean should theoretically be nearer to an ideal retrieval system in that it can manage high precision values at high levels of recall. The harmonic mean for precision and recall is given by

$$E=\frac{2}{\frac{1}{P}+\frac{1}{R}}$$

while the Dice coefficient is denoted by

$$sim(d,d_{j})=D(A,B)=\frac{|A\cap B|}{\alpha|A|+(1-\alpha)|B|}\cong \frac{\propto \sum_{k=1}^{n}w_{kq}w_{kj}}{\propto \sum_{k=1}^{n}\mathrm{w}_{kq}^{2}+(1-\propto)\sum_{k=1}^{n}\mathrm{w}_{kj}^{2}}$$

with α ε [0, 1]. It can display that the Dice coefficient is a weighted harmonic mean, let α = ½.

## Overlap

The Overlap coefficient tries to decide the degree to which two sets overlap. The Overlap coefficient is compared as

$$sim(d,d_{j})=D(A,B)=\frac{|A\cap B|}{min(|A|,|B|)}\cong \frac{\propto \sum_{k=1}^{n}w_{kq}w_{kj}}{\propto \sum_{k=1}^{n}\mathrm{w}_{kq}^{2}+\sum_{k=1}^{n}\mathrm{w}_{kj}^{2}}$$

The Overlap coefficient is calculated using the max operator in place of the min.

- Related Questions & Answers
- What are the applications of DBMS?
- What are the applications of clustering?
- What are the applications of OLAP?
- What are the applications of autoencoders?
- What are the Applications of Electrolysis?
- What are the applications of C++ programming?
- What are the Applications of Perl Programming?
- What are the applications of data mining?
- What are the applications of CRISP-DM?
- What are the applications of Association Rule?
- What are the applications of Machine Learning?
- What are the applications of Text Mining?
- What are the applications of web mining?
- What are the Applications of Pattern Mining?
- What are the applications of Bipartite graphs?