- Trending Categories
- Data Structure
- Operating System
- MS Excel
- C Programming
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Data Mining:Data Attributes and Quality
The process of extracting the data from a huge dataset that can be used for analysis and benefit of the organisation. This process helps in identifying patterns and managing relationship among the data to predict business problems.
An attribute can be defined as characteristics or property of an object. Object is described by attributes set and is referred to as a record of entity. Entity is described by a fraction of data i.e. attributes.
For Example:In a Student database. (Name, id,Roll_no, Marks) are the attributes in provided database.
Types of Attributes
It only provides only attributes sufficient to tell the difference between objects. Such as name, roll no, address are all different objects used in the dataset.
It is an attribute whose possible values provide sufficient information to have a meaningful order of objects. Such as salary_range,education_level, ranking etc.
Binary Attributes are 0 and 1.0 represents the lack of any feature and 1 represents the addition of specific characteristics.
It is quantitative in nature i.e.the quantity can be measured and represented in form of integers or real values.
It is of two types −
Interval Scaled attribute −
Scale of equal size units is measured with this attribute. It makes us compare Such as Temperature in Celcius or Farenheit.
Ratio Scaled attribute −
Both differences and ratios are significant for Ratio. Such as age, weight, salary etc.
Data Quality refers to the implementation of techniques in order to make the data fit to provide the specific information required by the organization. Data that is up to the needs are considered high quality data and are accurately great for decision making in an organization. Six main factors that ensure the quality of data to get better care −
The data must reflect real-world scenario. There may be an inaccurate date due to many reasons i.e. human or computer errors.
Completeness means the data that are available are to be delivered effectively. Incomplete data may occur depending on the attributes of interest.
It refers to the regularity of data which is used across the networks. There should not be any conflict regarding the similar data stored in different locations. Incorrect data can also result in inconsistency.
The data is available at the time of need. Data is updated in real time so as to make it accessible. It sometimes affects the quality of data by not getting updated or making corrections and adjustments by the user.
It refers to the amount of trust the user has on the data. Data that is present is believed to be accurate and correct for making future analysis.
It refers to how smoothly the user can understand the data. Data is there to perform tasks like analysis but to perform the tasks successfully, the data must be interpretable with which the user can perform tasks smoothly on the provided data.
This article consists of attributes and quality of data in data mining.
Data attributes refers to property of object followed by their types i.e. Nominal, Ordinal, Binary and Numeric attributes. Nominal tells difference between the objects, Ordinal provides meaningful order to objects, Binary represents 0 and 1 which is lack of feature and addition of specific character respectively,and numeric are quantitative in nature. Data quality refers to quality of data used in organisation for decision making. The factors used are Accuracy, Completeness, Consistency.
Timeliness, Believability and Interpretability.
Kickstart Your Career
Get certified by completing the courseGet Started