Data Mining:Data Attributes and Quality

Data Mining

The process of extracting the data from a huge dataset that can be used for analysis and benefit of the organisation. This process helps in identifying patterns and managing relationship among the data to predict business problems.

Data attributes

An attribute can be defined as characteristics or property of an object. Object is described by attributes set and is referred to as a record of entity. Entity is described by a fraction of data i.e. attributes.

For Example:In a Student database. (Name, id,Roll_no, Marks) are the attributes in provided database.

Types of Attributes

Nominal Attribute

It only provides only attributes sufficient to tell the difference between objects. Such as name, roll no, address are all different objects used in the dataset.

Ordinal Attribute

It is an attribute whose possible values provide sufficient information to have a meaningful order of objects. Such as salary_range,education_level, ranking etc.

Binary Attribute

Binary Attributes are 0 and 1.0 represents the lack of any feature and 1 represents the addition of specific characteristics.

Numeric attribute

It is quantitative in nature i.e.the quantity can be measured and represented in form of integers or real values.

It is of two types −

Interval Scaled attribute −

Scale of equal size units is measured with this attribute. It makes us compare Such as Temperature in Celcius or Farenheit.
Ratio Scaled attribute −

Both differences and ratios are significant for Ratio. Such as age, weight, salary etc.

Data quality

Data Quality refers to the implementation of techniques in order to make the data fit to provide the specific information required by the organization. Data that is up to the needs are considered high quality data and are accurately great for decision making in an organization. Six main factors that ensure the quality of data to get better care −

Accuracy

The data must reflect real-world scenario. There may be an inaccurate date due to many reasons i.e. human or computer errors.

Completeness

Completeness means the data that are available are to be delivered effectively. Incomplete data may occur depending on the attributes of interest.

Consistency

It refers to the regularity of data which is used across the networks. There should not be any conflict regarding the similar data stored in different locations. Incorrect data can also result in inconsistency.

Timeliness

The data is available at the time of need. Data is updated in real time so as to make it accessible. It sometimes affects the quality of data by not getting updated or making corrections and adjustments by the user.

Believability

It refers to the amount of trust the user has on the data. Data that is present is believed to be accurate and correct for making future analysis.

Interpretability

It refers to how smoothly the user can understand the data. Data is there to perform tasks like analysis but to perform the tasks successfully, the data must be interpretable with which the user can perform tasks smoothly on the provided data.

Conclusion

This article consists of attributes and quality of data in data mining.

Data attributes refers to property of object followed by their types i.e. Nominal, Ordinal, Binary and Numeric attributes. Nominal tells difference between the objects, Ordinal provides meaningful order to objects, Binary represents 0 and 1 which is lack of feature and addition of specific character respectively,and numeric are quantitative in nature. Data quality refers to quality of data used in organisation for decision making. The factors used are Accuracy, Completeness, Consistency.

Timeliness, Believability and Interpretability.

Amrendra Patel

Updated on: 22-Aug-2023

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started