- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP

- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who

# What is Entropy-Based Discretization?

Entropy-based discretization is a supervised, top-down splitting approach. It explores class distribution data in its computation and preservation of split-points (data values for separation an attribute range). It can discretize a statistical attribute, A, the method choose the value of A that has the minimum entropy as a split-point, and recursively divisions the resulting intervals to appear at a hierarchical discretization.

Specific discretization forms a concept hierarchy for A. Let D includes data tuples described by a group of attributes and a class-label attribute. The class-label attribute supports the class data per tuple. The basic approach for the entropy-based discretization of an attribute A inside the set is as follows −

Each value of A can be treated as a potential interval boundary or split-point (indicated split point) to partition the area of A. That is, a split-point for A can division the tuples in D into two subsets fulfilling the conditions A ≤ split point and A > split point, respectively, thereby making a binary discretization.

Entropy-based discretization uses data regarding the class label of tuples. It can define the intuition following entropy-based discretization, it should take a glimpse at classification. Suppose it is required to define the tuples in D by partitioning on attribute A and some split-point.

For example, if we had two classes, it can hope that some tuples of, say, class C1 will decline into one partition, and some tuples of class C2 will decline into the other partition. But this is unlikely. For instance, the first partition can include several tuples of C1, but also some of C2. This amount is known as the expected data requirement for defining a tuple in D based on partitioning by A. It is given by

$$\mathrm{Info_A(D)\:=\:\frac{\mid\:D_1\:\mid}{\mid\:D\:\mid}Entrophy(D_1)\:+\:\frac{\mid\:D_2\:\mid}{\mid\:D\:\mid}Entrophy(D_2)}$$

where D_{1} and D_{2} correspond to the tuples in D refreshing the conditions A ≤ split point and A > split point, accordingly; |D| is the number of tuples in D, etc. The entropy service for a given set is computed based on the class distribution of the tuples in the set.

For instance, given m classes, C1, C2... Cm, the entropy of D1 is

$$\mathrm{Entrophy(D_1)}\:=\:-\displaystyle\sum\limits_{i=1}^m P_i{\log_{2}(P_i)}$$

The phase of deciding a split-point is recursively used to each partition acquired, until some stopping criterion is met, including when the minimum data requirement on all student split-points is less than a small threshold, ε, or when the multiple is higher than a threshold, max_interval.

- Related Questions & Answers
- What is Data Discretization?
- What are the approaches of Unsupervised Discretization?
- What is the techniques of Discretization and Concept Hierarchy Generation for Categorical Data?
- What is Value-Based Pricing?
- What is Grid Based Methods?
- What is Instance-based representation?
- What is Prototype-Based Clustering?
- What is model-based clustering?
- What is a distance-based outlier?
- What is STING grid-based clustering?
- What is Constraint-Based Association Mining?
- Huffman Codes and Entropy in Data Structure
- What is a Competition based pricing method?
- What are the techniques of Discretization and Concept Hierarchy Generation for Numerical Data?
- What is Switch-based Interconnection Networks in Computer Architecture?