Entropy¶

1. Definition of Entropy¶

Definition: Entropy is a measure of the uncertainty or impurity in a dataset. In the context of decision trees, it quantifies the amount of disorder or randomness in the data, helping to determine how well a dataset is split based on a particular feature. Entropy is used in information theory and is a key concept in the ID3 and C4.5 decision tree algorithms.

2. Entropy Formula¶

Entropy Formula

Where S – current dataset for which entropy is being calculated
C – set of classes in S Example: C = {yes, no}
p(c) – The proportion of the number of elements in class c to the number of elements in S

Example:¶

For a binary classification problem (two classes), the entropy can be calculated as:

[ H(D) = -p_1 \log_2 p_1 - p_2 \log_2 p_2 ]

If the dataset is perfectly split into two equal classes, the entropy will be 1, indicating maximum uncertainty. If all data points belong to one class, the entropy will be 0, indicating no uncertainty.

3. Entropy in Decision Trees ¶

Role in Decision Trees:
- Entropy is used to determine the best feature to split the data at each node of the tree. A split that results in lower entropy (more homogenous subsets) is preferred because it indicates that the data has become more organized and less uncertain.
- The feature with the highest information gain, which is derived from entropy, is chosen to split the dataset.
Information Gain:
- Information gain is calculated as the reduction in entropy after a dataset is split on a particular feature.
- Formula: [ \text{Information Gain}(D, A) = H(D) - \sum_{v \in \text{Values}(A)} \frac{|D_v|}{|D|} H(D_v) ] where ( A ) is the feature being split on, ( D_v ) is the subset of ( D ) where feature ( A ) takes value ( v ).

4. Examples of Entropy¶

Maximum Entropy:
- Occurs when all classes are equally likely. For a binary classification with equal probabilities ( p_1 = p_2 = 0.5 ), the entropy is 1: [ H(D) = -0.5 \log_2 0.5 - 0.5 \log_2 0.5 = 1 ]
Minimum Entropy:
- Occurs when the dataset is pure, i.e., all examples belong to one class. The entropy is 0.

5. Intuition Behind Entropy¶

High Entropy:
- Indicates high uncertainty in the data, meaning the data points are evenly distributed among different classes.
Low Entropy:
- Indicates low uncertainty, meaning the data points are more homogeneously distributed, with more data points belonging to a single class.
Use in Splitting:
- Decision trees aim to reduce entropy with each split, resulting in branches that are as homogenous as possible.

6. Application of Entropy¶

Decision Trees:
- Entropy is a fundamental concept in constructing decision trees. It helps determine the best features for splitting the data, ensuring that the resulting branches have lower uncertainty.
Information Theory:
- Beyond decision trees, entropy is widely used in information theory to measure the efficiency of encoding data and understanding the amount of information contained in a signal or message.