What are Outliers?

An outlier is a data object that diverge essentially from the rest of the objects, as if it were produced by a several mechanism. For ease of presentation, it can define data objects that are not outliers as “normal” or expected information. Usually, it can define outliers as “abnormal” data.

Outliers are data components that cannot be combined in a given class or cluster. These are the data objects which have several behaviour from the usual behaviour of different data objects. The analysis of this kind of data can be important to mine the knowledge.

Outliers are different from noisy information. Noise is a random bug or variance in a computed variable. In general, noise is not fascinating in data analysis, such as outlier detection.

For instance, in credit card fraud detection, a users purchase behavior can be modeled as a random variable. A user can make some “noise transactions” that can view like “random errors” or “variance,” including by buying a larger lunch one day, or receiving one more cup of coffee than usual.

Such transactions should not be considered as outliers; therefore, the credit card company can incur large costs from verifying that some transactions. The company can also lose users by bothering them with several false alarms. As several data analysis and data mining services, noise must be eliminated before outlier detection.

Some real-world databases contains outliers or missing, anonymous, or erroneous data. Some clustering algorithms are intense on such data and can start to clusters of poor quality.

Outliers are fascinating because they are suspected of not being created by the same structure as the rest of the data. Hence, in outlier detection, it is essential to justify why the outliers identified are produced by several mechanisms.

This is achieved by creating various assumptions on the rest of the information and displaying that the outliers detected violate those assumptions essentially. Outlier detection is also associated to novelty detection in including data sets. For instance, by observing a social media website where new content is approaching, novelty detection can identify new subjects and trends in a timely manner.

Novel topics can originally appear as outliers. Outlier detection and novelty detection share some similarity in modeling and detection approaches. But a critical difference among the two is that in novelty detection, once new subjects are confirmed, they are generally integrated into the model of general behavior so that follow-up instances are not considered as outliers anymore.