What are the causes of Anomalies?

In anomaly detection, the objective is to discover objects that are different from multiple objects. Often, anomalous objects are referred to as outliers, because on a scatter plot of the data, they lie far away from multiple data points. Anomaly detection is called a deviation detection, because anomalous objects have attribute values that deviate essentially from the expected or general attribute values, or as exception mining, because anomalies are exceptional in several sense.

In the globe, human society, or the domain of data groups, most events and objects are, by representation, common area or reglar. But it can have a keen knowledge of the feasibility of objects that are different or extraordinary. This contains exceptionally dry or rainy seasons, popular athletes, or an attribute value that is much smaller or higher than all others.

There are some causes of anomalies which are as follows −

Data from Different Classes − An object can be different from multiple objects such as anomalous, because it is of a multiple type or class. For example, someone committing credit card fraud belongs to a multiple class of credit card users than those persons who need credit cards accurately.

Some examples displayed such as fraud, intrusion, outbreaks of disease, and abnormal test results, are instances of anomalies that defines a different class of elements. Such anomalies are considerable interest and are the target of anomaly identification in the area of data mining.

Natural Variation − Some data sets can be modeled by statistical distributions, including a normal (Gaussian) distribution, where the probability of a data object reduced increasingly as the distance of the object from the middle of the distribution increases.

In another terms, some objects are near a center (average object) and the possibility that an object differs essentially from this average object is small. For instance, an exceptionally tall person is not anomalous in the method of being from an independent class of objects, but only in the method of having a complete value for a characteristic (height) consumed by some objects. Anomalies that defines severe or unlikely variations are interesting.

Data Measurement and Collection Errors − Errors in the data set or measurement process are another cause of anomalies. For instance, a measurement can be recorded incorrectly due to a human error problem with the computing device, or the presence of noise.

The objective is to remove such anomalies, because they support no interesting data but only decrease the feature of the data and the subsequent data analysis. Indeed, the deletion of this type of anomaly is the target of data preprocessing, particularly data cleaning.